Supernova Award Category
The Problem
Autism Speaks has worked for 15 yrs to assemble the largest open-access collections of DNA samples from families affected by the estimated 1 in 68 children in the US on the autism spectrum. Autism Speaks created the Autism Genetic Resource Exchange (AGRE), which is the world’s largest private collection of autism-related DNA samples, collected from 12,000 individuals and their families. In the beginning, Autism Speaks shared genomic information by shipping hard drives around the world, making sharing and using information a slow process. To make data more available to its researchers, the MSSNG database launched in collaboration with the University of Toronto’s Hospital for Sick Children’s Centre for Applied Genomics. This new database lets the autism community instantly power research projects by providing access to genomic data from thousands of individuals, together with new analysis tools. Downloading even one individual’s whole genome took the equivalent of downloading 100 feature films. By the time MSSNG achieves its milestone of 10,000 genomes, the database would have grown to a petabyte scale. As the database grew, the sheer amount of data collected by MSSNG could potentially create many challenges from an infrastructure standpoint. Autism Speaks needed to find a way to store and analyze massive data sets, while allowing remote access to this unprecedented resource for autism researchers around the world.
The Solution
Autism Speaks needed a solution to help store and manage massive amounts of data, while also allowing instant and fast access for researchers around the world. Google Cloud Platform was selected to store its data and enable real-time, collaborative access among researchers around the world. In particular, Autism Speaks is leveraging Google Genomics to store the bulk of its data. Google Genomics will allow scientists to access the data via the Genomics API, explore it interactively using Google BigQuery, and perform custom analysis using Google Compute Engine.
The results
Working through Google Genomics, Autism Speaks and the AGRE team has access to the same technologies that power Google Search and Maps. Using these technologies, the organization is creating solutions for securely storing, processing, exploring and sharing complex biological datasets.
With the addition of Google Cloud Platform, researchers can now spend less time moving data around and more time analyzing data and collaborating with colleagues. Cutting down on administrative efforts and waiting time, this will enable Autism Speaks and its research community to make discoveries and drive innovation faster than ever before.
The insight and expertise the Google team has already brought to the table has been unmatched. Our work with them has been a game-changer for MSSNG. Together, we hold the capability of accelerating breakthroughs in understanding the causes and subtypes of autism in ways that can advance diagnosis and treatment as never before. MSSNG has already completed the sequencing of 1,000 cases, and currently has close to 2,000 additional samples nearing completion.
Metrics
With Cloud platform, the MSSNG project has been able to broadly share one of the largest collections of whole genome sequences ever created with a diverse set of researchers and clinicians. Currently, 3540 genome sequences have been uploaded and 1715 of them have been made available to the research community. Today, 54 investigators from 21 different institutions spread across 5 countries have used the Google Cloud platform to access the MSSNG genomics data. This type of broad sharing of the over one million gibibytes of data those genome represent would not traditionally be possible without Cloud Platform.
The Technology
Autism Speaks selected Google Cloud Platform to store its data and enable real-time, collaborative access among researchers around the world. The organization is in the process of uploading 100 terabytes of data to Google Cloud Storage, and from there, will import it into Google Genomics. Google Genomics will allow scientists to access the data via the Genomics API, explore it interactively using Google BigQuery, and perform custom analysis using Google Compute Engine.
Disruptive Factor
AGRE allows qualified researchers to access the sequencing data using any modern web browse. With access time to genomic data significantly reduced, data from more than 12,000 individuals will be more readily leveraged to expedite autism research. Improving efficiency with Google Cloud Platform is literally helping search to cure autism around the world.
The flexibility of Cloud Platform also enables MSSNG to provide multiple ports of entry to the massive dataset. For well-trained bioinformaticians that want to ask complex questions of the data, the Google Genomics API allows command line querying. But for clinicians or genetic counselors that may only what to ask very specific questions, MSSNG has been able to mount a web-based portal on top of the Google Cloud to allow easy access to the data. Now these individuals can quickly find the answer they are looking for and turn that data back around to the patient.
Shining Moment
AGRE allows qualified researchers to access the sequencing data using any modern web browse. With access time to genomic data significantly reduced, data from more than 12,000 individuals will be more readily leveraged to expedite autism research. Researchers can now spend less time moving data around and more time analyzing data and collaborating with colleagues. Improving efficiency with Google Cloud Platform is literally helping search to cure autism around the world.
