How do cancerous cells differ from healthy cells? A new machine learning algorithm called “ikarus” knows the answer, a team led by bioinformatician Altuna Akalin from the MDC now reports in the journal Genome Biology. The program has found a characteristic gene signature.
When it comes to identifying patterns in mountains of data, a human being has no chance of being inferior to artificial intelligence (AI). Machine learning in particular, a sub-area of AI, is often used to find regularities in data sets – be it for stock market analysis, image and speech recognition or the classification of cells. In order to reliably distinguish cancer cells from healthy cells, a team led by Dr. Altuna Akalin, head of the “Bioinformatics and Omics Data Science” technology platform at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), has now developed a machine learning program called “ikarus”. The program found a cross-cancer pattern in the tumor cells, consisting of a characteristic combination of genes. The algorithm also detected types of genes in the pattern that had not previously been clearly linked to cancer, the research group writes in the journal Genome Biology.
Machine learning basically means that an algorithm independently learns to answer certain questions using training data. His strategy is to look for patterns in the data that will help him solve the problem. After the training phase, the system can generalize what it has learned and thus assess unknown data. “A major challenge was to obtain suitable learning datasets in which specialists had already made a precise classification of the cells into ‘healthy’ and ‘cancer’,” explains Jan Dohmen, the first author of the study.
A surprisingly good hit rate
In addition, data sets from single-cell sequencing are often noisy. This means that the information regarding the molecular properties of the individual cells is not entirely accurate – for example because a different number of genes is recognized in each cell or the samples are not always processed in the same way. Dohmen and his colleague Dr. Vedran Franke, the co-leader of the study. The team finally trained the algorithm with data from lung and colon cancer cells before applying it to data sets from other tumor types.
In the training phase, ikarus had to find a list of characteristic genes that the program might use to classify the cells: “We tried out different approaches and refined them,” says Dohmen. A time-consuming job, as all three researchers tell themselves. “The decisive factor was that ikarus ultimately used two lists: one for cancer genes and one for genes from other cells,” explains Franke. After the learning period, the algorithm was also able to reliably differentiate between healthy and cancerous cells in other types of cancer, for example in tissue samples from liver cancer or neuroblastoma. His hit rate was usually only a few percent off. This also surprised the research group: “We did not expect that there would be a common signature that defines tumor cells from different types of cancer so precisely,” says Akalin. “However, we cannot yet say that the method works for all types of cancer,” adds Dohmen. To ensure that ikarus can reliably help with cancer diagnosis, the researchers want to test it on other types of tumors.
AI as a fully automatic diagnostic aid
The classification of “healthy” versus “cancer” is by no means the end of the project. In initial tests, ikarus has already been able to show that the method can also differentiate between other cell types or certain subtypes of tumor cells. “We want to generalize the approach,” says Akalin, “that is, develop it further in such a way that it can distinguish all possible cell types in a biopsy.”
In the clinic, pathologists usually only look at tissue samples from tumors under the microscope and thus identify the different cell types. It’s tedious and takes a lot of time. With ikarus, this step might eventually run fully automatically. In addition, one can also derive something from the data regarding the immediate vicinity of the tumor, says Akalin. This in turn might help doctors to select the best therapy. Because often the composition of the cancer tissue and the microenvironment indicate whether a certain treatment or drug will work or not. In addition, AI may help to develop new drugs: “With ikarus, we can identify genes that are potential drivers of cancer,” says Akalin. Novel active substances might then be applied to these molecular target structures. (Genome Biology, 2022; doi: 10.1186 / s13059‐022‐02683‐1)
Source: Max Delbrück Center for Molecular Medicine in the Helmholtz Association
14. June 2022
– Kay Sanders