Researchers at Nationwide Children’s Hospital have created a novel computational program that drastically cuts the time it takes to analyze a person’s genome for disease-causing variations from weeks to just a few hours.
The human genome is the complete set of genetic information for humans; it is encoded as DNA sequences within the 23 chromosomes in cell nuclei and in a small DNA molecule found within individual mitochondria.
“It took around 13 years and $3 billion to sequence the first human genome,” says Peter White, PhD, principal investigator and director of the Biomedical Genomics Core at Nationwide Children's and the study's senior author. “Now, even the smallest research groups can complete genomic sequencing in a matter of days. However, once you've generated all that data, that's the point where many groups hit a wall. After a genome is sequenced, scientists are left with billions of data points to analyze before any truly useful information can be gleaned for use in research and clinical settings.”
The program the group created is referred to as “Churchill.” Using complex computational techniques, it has proven to efficiently analyze a whole genome sample in as little as 90 minutes.
“Churchill fully automates the analytical process required to take raw sequence data through a series of complex and computationally intensive processes, ultimately producing a list of genetic variants ready for clinical interpretation and tertiary analysis,” Dr. White explains. “Each step in the process was optimized to significantly reduce analysis time, without sacrificing data integrity, resulting in an analysis method that is 100 percent reproducible.”
The National Institute of Standards and Technology used a series of benchmarks to validate Churchill’s output. Compared to other computational programs of the same ilk, Dr. White’s genome analysis software showed the highest sensitivity at 99.7%; highest accuracy at 99.9%; and highest overall diagnostic effectiveness at 99.66%.
“At Nationwide Children's we have a strategic goal to introduce genomic medicine into multiple domains of pediatric research and healthcare. Rapid diagnosis of monogenic disease can be critical in newborns, so our initial focus was to create an analysis pipeline that was extremely fast, but didn't sacrifice clinical diagnostic standards of reproducibility and accuracy” says Dr. White. “Having achieved that, we discovered that a secondary benefit of Churchill was that it could be adapted for population scale genomic analysis.”
To that latter point he makes, Dr. White’s team examined the program’s computational resource use during the data analysis process and found that Churchill was both highly efficient (>90% resource utilization) and effective in terms of scaling across many servers (for comparative purposes, it’s worth noting that alternative programs limit analysis to a single server and have resource utilization as low as 30%). Such high efficiency coupled with the ability to scale allows researchers to perform population-scale genomic analysis, which Dr. White and his team had the chance to perform when the group received an award from Amazon Web Services (AWS) in Education Research Grants program. Specifically, they were allowed to analyze phase 1 of the raw data generated by the 1000 Genomes Project—an international collaboration to produce an extensive public catalogue of human genetic variation, representing multiple population from around the world. Using cloud-computing resources from AWS, Churchill analyzed the Project’s 1,088 whole genome samples in just seven days, and identified millions of new genetics variants.
“Given that several population-scale genomic studies are underway, we believe that Churchill may be an optimal approach to tackle the data analysis challenges these studies are presenting,” says Dr. White.
Churchill’s algorithm has since been licensed to GenomeNext LLC, which has built upon the technology to develop a secure and automated software-as-a-service platform that allows users to upload raw, whole genome, exome, or targeted panel sequence data and run an analysis that not only identifies genetic variants, but provides the information via fully annotated datasets complete with filtering and identification of pathogenic variants options.
An article describing the ultra-fast, highly-scalable Churchill program was published in the latest issue of Genome Biology.
Learn more about Electronic Products Magazine