
Researchers from NVIDIA and Harvard Develop Deep Learning tool for Epigenomics
This new deep neural network tool – AtacWorks – can speed up Genome Analysis
Last year, NVIDIA came up with a deep convolutional neural network toolkit for epigenomics called AtacWorks. Today, NVIDIA and Harvard University Department of Stem Cell and Regenerative Biology researchers revealed how AtacWorks helped them bring down the cost and time needed for rare and single-cell experiments.
In a study published in the journal Nature Communications, the researchers have mentioned that AtacWorks can run analyses on a whole-genome in just half an hour than the multiple hours taken by traditional methods.
According to VentureBeat, cells in a mammalian body carry around a complete copy of its DNA, with billions of base pairs crammed into the nucleus. But an individual cell pulls out only the subsection of genetic components that it needs to function, with cell types like liver, blood, or skin cells using different genes. The DNA regions that determine a cell’s function are easily accessible, more or less, while the rest are shielded around proteins.
Deep neural network based AtacWorks denoises and identifies accessible chromatin regions from low-coverage or low-quality ATAC-seq data by mapping between noisy ATAC-seq data and corresponding higher-coverage or higher-quality data. It is available from Nvidia’s NGC hub of GPU-optimized software. It works with ATAC-seq – a popular method for finding the parts of the human genome that are accessible in cells. It gauges the intensity of a signal at every spot on the genome. ATAC-seq was pioneered by Harvard professor Jason Buenrostro, one of the paper’s co-authors.
Generally, ATAC-seq requires tens of thousands of cells to get a clean signal notepaper co-authors. They also mention that applying AtacWorks produces the same quality of results with just tens of cells.
AtackWorks was trained on labeled pairs of matching ATAC-seq datasets: one high quality and one noisy. When a down-sampled copy of the data was provided, the model learned to predict an accurate, high-quality version and identify peaks in the signal.
Upon leveraging, AtacWorks, the researchers found that they could spot accessible chromatin, a complex of DNA and protein whose primary function is packaging long molecules into more compact structures, in a noisy sequence of 1 million reads nearly as well as traditional methods did with a clean dataset of 50 million reads.
During their study, the Nvidia and Harvard team applied AtacWorks to a dataset of stem cells that produce red and white blood cells. As per the NVIDIA blog, they used a sample set of just 50 cells to identify distinct regions of DNA associated with cells that develop into white blood cells and separate sequences that correlate with red blood cells. Powered by Nvidia Tensor Core GPUs, the model took under 30 minutes for inference on a whole genome, a process that normally takes 15 hours on a system with 32 CPU cores. This implies quicker, more efficient, and cheaper data analysis.
Researchers believe that this model has the potential to deliver insights into a range of diseases, including cardiovascular disease, Alzheimer’s disease, diabetes, or neurological disorders. Further, it can help in drug discovery by giving researchers a better understanding of disease mechanisms.
For more information click here.