Scientists use math to increase the accuracy of data analysis results for biomedical research

Kyoto-Since scientists first mapped the complete human genome, attention has now turned to the question of how cells use this master copy of genetic instructions. It is known that when genes are turned on, parts of DNA sequences from the cell nucleus are copied into shorter chain-like molecules, RNA, which deliver molecules essential for cell-specific survival and functions.

Understanding the patterns of RNAs in a cell can show which genes are active and allow researchers to speculate on what the cell is doing. The technology for measuring RNA by massively parallel DNA sequencer, RNA-sequencing, has become a standard technique over the last decade. More recently, rapid technological advances allow single-cell level RNA sequencing from thousands of cells in parallel, accelerating advances in biomedical science. But quantifying RNAs from such a small material poses great technical challenges. Even with state-of-the-art equipment, the data produced from single-cell RNA sequencing data contains significant detection errors, including the so-called “dropout effect”. Moreover, even small errors in the calculations for a large number of genes can quickly add up, so that any useful information is lost among the signal noise.

Now, a team from the Kyoto University Institute for the Advanced Study of Human Biology (WPI-ASHBi) has developed a new mathematical method that can eliminate noise and thus enable the extraction of clear signals from data from single cell RNA sequencing. The new method successfully reduces random sampling noise in the data to enable an accurate and comprehensive understanding of a cell’s activity. The research was recently published in the journal Life Sciences Alliance.

The paper’s lead author, Yusuke Imoto of ASHBi, explains, “Each gene represents a different dimension in RNA sequencing data, which means that tens of thousands of dimensions need to be collected across multiple cells and analyzed. Even the slightest noise in one dimension can have a major impact on downstream data analyses, so that potentially important signals are lost. That’s why we call it the “curse of dimensionality.”

To break the curse of dimensionality, the Kyoto team developed a new noise reduction method, RECODE – which stands for “resolving the curse of dimensionality” – to remove random sampling noise from sequencing data. single-cell RNA. RECODE applies high-dimensional statistical theories to obtain accurate results, even for genes expressed at very low levels.

First, the team tested their method on data from a widely well-studied cell population, human peripheral blood. They confirmed that RECODE successfully removes the dimensionality curse to reveal expression patterns for individual genes close to their expected values.

Then, compared to other state-of-the-art analysis methods, RECODE outperformed the competition by giving much more faithful representations of gene activation. Additionally, RECODE is easier to use than other methods, without relying on parameters or using machine learning to make the calculations work.

Finally, the team tested RECODE on a complex dataset from mouse embryo cells containing many different cell types with unique gene expression patterns. While other methods muddied the results, RECODE clearly resolved gene expression levels, even for rare cell types.

Imoto concludes: “Analysis of single-cell RNA sequencing data remains technically challenging and is a developing technique, but our RECODE algorithm is a step towards being able to reveal the true behaviors of single-cell structures. With our contribution, single-cell RNA sequencing data analysis might become a powerful research tool with massive implications in many biological fields. Another leading author, Tomonori Nakamura, a biologist from ASHBi and Kyoto University’s Hakubi Center for Advanced Study, adds, “By unlocking the true power of single-cell RNA sequencing, RECODE will allow researchers to discover unidentified rare cell types, leading to the development and establishment of the new field of basic science research as well as clinical applications research and drug discovery. »

RECODE calculation programs (Python/R code, office application) are available on GitHub (https://github.com/yusuke-imoto-lab/RECODE).

Source of the story:

Materials provided by Kyoto University. Note: Content may be edited for style and length.

Leave a Replay