Foto: angellodeco/shutterstock
24. September 2024
With the help of modern testing methods, huge amounts of information can now be obtained from blood samples. It is correspondingly complex to evaluate this wealth of data and draw accurate conclusions from it, for example for the diagnosis of diseases.
Joint project between FAU and the biotech company BioVariance
For this purpose, researchers at the Friedrich-Alexander University Erlangen-Nuremberg (FAU) want to develop new artificial intelligence (AI) methods together with the biotech company BioVariance. These will be trained with both actual measurement data and artificially generated synthetic data sets so that they can then find abnormalities that occur frequently in certain diseases. BioSamp is being funded by the Free State of Bavaria with around one million euros; a third of this will go to FAU.
Up to now, doctors have often only been able to base their diagnoses on a few dozen criteria. So-called omics analyses have the potential to change that. They can be used to obtain tens of thousands of measurement data from less than a drop of blood: which proteins does the sample contain and in what quantities? Which fat-like compounds and metabolic products? Which genes are currently being read in the person from whom the blood came?
“In principle, everything that occurs in the blood is measured,” explains Prof. Dr. Daniel Tenbrinck, Professor of Data Science at FAU. “This huge amount of data has the potential to tell us a lot about the health of patients – not only which disease they are suffering from, but possibly even which variant they are affected by. Or whether they have an increased risk of a heart attack or diabetes, but are still completely healthy, so that the disorder can be prevented through prophylactic measures.”
Search for the needle in the haystack
Researchers around the world are therefore searching for abnormalities in omics data that are associated with certain diseases. Due to the abundance of data, this task is similar to the proverbial search for a needle in a haystack. Machine learning methods are therefore increasingly being used to help with this. “The artificial intelligence is trained with a large amount of omics data from patients and the diseases diagnosed in them,” explains Tenbrinck. “This allows the algorithm to learn to recognize telltale traces in new measured values and interpret them accordingly.”
To train the AI, Omics data from thousands of affected people is actually needed. However, obtaining this data is as time-consuming as it is expensive. Tenbrinck therefore wants to use another strategy together with the company BioVariance. In the professional world, it is known as “synthetic data generation”. “We use statistical methods to analyze only up to 100 Omics data sets and look for patterns and regularities in them,” he says. “We then use these to produce new data sets that cannot be statistically distinguished from the data from actual blood analyses.”
The AI can then be trained using this synthetically generated information. What sounds like a sleight of hand trick has actually proven itself many times in practice. “Synthetic data generation is therefore currently a very active area of research in our field,” says Tenbrinck. For example, facial recognition software is now often fed with portraits that have previously been geometrically distorted or provided with image noise. This makes the algorithm much more robust – it is no longer so easily fooled by an unfavorable angle at which a person was photographed or by poor lighting conditions.
The processes can even be trained with completely new, artificially generated images. “But to do this, you have to make sure that the synthetic faces look realistic,” says Tenbrinck. Because if they all have only one eye, for example, the recognition performance of the software trained with them will probably even deteriorate. “We are investigating how we can generate synthetic omics data that is so realistic that it actually makes the AI’s diagnoses more robust and accurate,” emphasizes the scientist. “An important point here is that medical experts look at the artificial data sets and assess how plausible they are.” Figuratively speaking, the one-eyed faces would be sorted out straight away.
In focus: Long Covid and depression
In this way, the partners in the BioSamp project initially want to advance the diagnosis of two diseases – severe depression and chronic fatigue syndrome, a common symptom of long Covid. “Both are disorders that cause a great deal of suffering,” emphasizes Tenbrinck. “BioVariance is already conducting research into depression that we can build on.” The aim is, on the one hand, to identify these disorders more reliably and possibly classify them into different variants. For example, in the case of depression, some sufferers respond better to certain treatment strategies and medications than others.
“But we also want to help identify exactly what is going wrong in the body in these diseases, what causes them,” explains Tenbrinck. For example, the AI could come across a certain gene in the omics data that is particularly active in people with depression. “Then you can look at what research knows about the function of this gene and draw conclusions about how the disease develops,” says the scientist. “Our findings can therefore potentially help not only to improve the diagnosis of diseases, but also their treatment and prevention. That’s what I find so fascinating about this topic.”
More information
Prof. Dr. Daniel Tenbrinck
Professorship in the field of Data Science
Tel.: 09131/85-67233
[email protected]