Scientists are leveraging cloud services provided by Google and other companies to streamline and share massive amounts of data generated by sequencing the human genome. The hope is that the increased power and speed of cloud services will accelerate research and increase opportunities for advances in medicine and other fields.
For example, a thousand genomes of people who fall within the autism spectrum were uploaded to Google’s servers Monday as part of a new project sponsored by the nonprofit Autism Speaks.
The data dump represents the first of 10,000 genomes that will eventually be stored in the company’s cloud and shared among autism specialists. The funders of the project, known as Mssng, hope researchers will use the trove of genomes to enable earlier diagnosis, develop targeted treatments or even find a cure for the disorder thought to have at least some basis in genetics.
Mssng is the latest in a series of genome-based projects that use cloud storage technology to advance science. Through these initiatives, cloud storage technology has proven to be a boon for both companies and scientists. Worldwide, cloud services make up a $45.7 billion industry, according to analysts at International Data Corp. Google has welcomed researchers by creating the Google Genomics platform to allow for easy upload, storage and sharing of genomic data and the Compute Engine and BigQuery tools for quick analysis.
Cloud storage has been used in other genomic research. One such project is called Charge and is run by Baylor College of Medicine in conjunction with Amazon and a company called DNAnexus. The project resulted in 3,751 genomes being analyzed to study heart disease and aging. In a similar venture, IBM and the New York Genome Center recently entered into a partnership to leverage Watson’s smart computing software for genomic research.
This first upload by Mssng of autistic genomes to Google was also part of a study published Monday by Nature Medicine that analyzed the genomes of siblings for clues about how genetics influences the disorder’s development. The results showed that a brother and sister who both have autism do not necessarily share the same autism-linked genes from their parents, according to a statement provided by Autism Speaks. Researchers already knew that parents of a child with autism are more likely to have a second child who is affected, and that even identical twins do not always exhibit the same type of autism.
Mssng’s developers hope this new data paired with cloud-based analytical tools will help researchers learn more about the inheritability of autism, which affects one in 68 children in the U.S., according to a blog post written by Robert Ring, chief science officer at Autism Speaks, when the project was announced.
Google may hold other interests in furthering genomic study. Two years ago it started a health care company called Calico that is partnering with drug company AbbVie to build a $1.5 billion research facility in the San Francisco Bay area to seek ways to extend longevity, according to the San Jose Mercury News. The team could look for solutions that engineer or make repairs to parts of the genome. Last year, the company joined Merck & Co. and Amgen as members of the Global Alliance for Genomics and Health, a health consortium focused on developing medicine based in genomics.
Genomic data has been highly prized since it was made possible by DNA sequencing techniques in the late 1970s. In 2003, the Human Genome Project gave geneticists a map of all the genetic variety that exists within humankind. Sequencing a genome in the early days, though, could cost as much as $100,000, according to Nature. Costs have since come down — to about $5,000 per genome — and set off an explosion of data within the field. In the past, server capacity and analytical tools have placed limits on how much of this data was used or shared.
"In the beginning, we shared genomic information by shipping hard drives around the world,” Ring wrote in his blog post. “Downloading even one individual's whole genome in a conventional manner can take hours — the equivalent of downloading a hundred feature films.”
The new open-access database through Google should allow researchers who haven’t yet sequenced a genome to skip that step and move straight to analysis, and to help those who have to stop worrying about the technical details and focus on their science. All entries are stripped of identifying information like names before they are uploaded to the database.
"Researchers will spend less time moving data around and more time analyzing data and collaborating with colleagues,” Ring wrote. “We hope this will enable us to make discoveries and drive innovation faster than ever."