New machine learning algorithm finds a gene signature characteristic of tumors


How do cancer cells differ from healthy cells? A new machine learning algorithm called “ikarus” knows the answer, reports a team led by MDC bioinformatician Altuna Akalin in the journal Genome Biology. The AI ​​program has found a gene signature that is characteristic of tumors.

When it comes to recognizing patterns in mountains of data, humans are no match for artificial intelligence (AI). In particular, a sub-area of ​​AI called machine learning is often used to find regularities in data sets – be it for stock market analysis, image and speech recognition or the classification of cells. In order to reliably distinguish cancer cells from healthy cells, a team led by Dr. Altuna Akalin, head of the Data Science Platform Bioinformatics and Omics at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), is now developing a machine learning program called “Ikarus”. The program found a pattern in tumor cells that is common to different types of cancer and consists of a characteristic combination of genes. According to the team’s article in the journal Genome Biology, the algorithm also discovered types of genes in the pattern that had never before been clearly linked to cancer.

Machine learning essentially means that an algorithm uses training data to learn how to answer certain questions independently. To do this, it looks for patterns in the data that help it solve problems. After the training phase, the system can generalize from what it has learned in order to evaluate unknown data.

It was a big challenge to get suitable training data where experts had already clearly differentiated between ‘healthy’ and ‘cancerous’ cells.”

Jan Dohmen, first author of the paper

A surprisingly high success rate

In addition, single-cell sequencing datasets are often noisy. This means that the information they contain about the molecular properties of individual cells is not very precise – perhaps because a different number of genes are detected in each cell or because the samples are not always prepared in the same way. As Dohmen and his colleague Dr. Vedran Franke, co-leader of the study, report that they viewed countless publications and contacted a number of research groups in order to obtain adequate data sets. The team eventually used data from lung and colon cancer cells to train the algorithm before applying it to datasets from other tumor types.

In the training phase, ikarus had to find a list of characteristic genes, which it then used to categorize the cells. “We have tried out and refined various approaches,” says Dohmen. A time-consuming job, as all three scientists report. “The key was that ikarus ended up using two lists: one for cancer genes and one for genes from other cells,” explains Franke. After the learning phase, the algorithm was also able to reliably distinguish between healthy and tumor cells in other types of cancer, for example in tissue samples from liver cancer or neuroblastoma patients. The success rate tended to be extraordinarily high, surprising even the research group. “We did not expect that there is a common signature that defines the tumor cells of different types of cancer so precisely,” says Akalin. “But we can’t yet say whether the method works for all types of cancer,” adds Dohmen. In order to make ikarus a reliable tool for cancer diagnosis, the researchers now want to test it on other tumor types.

AI as a fully automated diagnostic tool

The project aims to go well beyond the classification of “healthy” versus “cancerous” cells. In initial tests, ikarus has already shown that the method can also distinguish other types (and certain subtypes) of cells from tumor cells. “We want to make the approach more comprehensive,” says Akalin, “and further develop it so that it can distinguish between all possible cell types in a biopsy.”

In hospitals, pathologists usually only examine tissue samples from tumors under the microscope to identify the different cell types. It is a tedious, time-consuming work. With ikarus, this step could one day become a fully automated process. In addition, according to Akalin, conclusions can be drawn from the data about the immediate vicinity of the tumor. And that could help doctors choose the best therapy. This is because the nature of the cancerous tissue and the microenvironment often indicate whether or not a particular treatment or medication will be effective. In addition, AI can also be useful in the development of new drugs. “With Ikarus, we can identify genes that are potential drivers of cancer,” says Akalin. Novel therapeutic agents could then be used to target these molecular structures.

Collaboration in the home office

What is notable about the publication is that it was created entirely during the COVID pandemic. All those involved were not sitting at their usual desks in the Berlin Institute for Medical Systems Biology (BIMSB), which is part of the MDC. Instead, they sat in the home office and only communicated with each other digitally. From Franke’s point of view, “the project therefore shows that a digital structure can be created to facilitate scientific work under these conditions.”


Max Delbrück Center for Molecular Medicine in the Helmholtz Association

Magazine reference:

Dohmen, J. et al. (2022) Identification of tumor cells at the single-cell level using machine learning. genomic biology.


Comments are closed.