RTX2FEXW
A DNA double helix is seen in an undated artist's illustration released by the National Human Genome Research Institute, May 15, 2012. Reuters/National Human Genome Research Institute

Researchers associated with the Exome Aggregation Consortium (ExAC) have catalogued and released the gene sequencing data of over 60,000 unrelated individuals. The compendium of genetic information, which is the largest catalogue so far of variation in human protein-coding regions (the exome), is now publicly available.

“The scale and diversity of the ExAC resource is invaluable,” Daniel MacArthur, a geneticist at the Broad Institute in Cambridge, Massachusetts, and co-author of a study that analyzed the data, said in a statement released Wednesday. “It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.”

Although exomes constitute just 1 percent of the human genome, these protein-coding regions are believed to contain variations that eventually cause rare genetic diseases such as muscular dystrophy and cystic fibrosis. While the dataset is still not large enough to uncover every single variant of these genes, the researchers have already made some interesting findings.

Their work has uncovered a class of genes that harbor less variation than expected, representing likely disease-causing DNA variants that are rare or absent in the population because they are extremely detrimental to human health. It has also discovered a phenomenon known as “mutational recurrence” — wherein mutations arise multiple times independently among the samples.

“For instance, among synonymous (non-protein-altering) variants, a class of variation expected to have undergone minimal selection, 43 percent of validated de novo events identified in external data sets of 1,756 parent-offspring trios are also observed independently in our data set, indicating a separate origin for the same variant within the demographic history of the two samples,” the researchers wrote in the study.

In addition, analysis of the data also revealed more than 100 erroneous connections that had previously been made between genetic mutations and diseases — a discovery that reduces the number of false positive findings in databases widely used by clinical labs.

Over the next year, the ExAC project aims to include roughly 120,000 exome sequences and 20,000 genome sequences.

“Resources such as ExAC exemplify the benefits that can be achieved for families coping with rare genetic diseases, as a result of the mass altruism of many research participants who allow their data to be aggregated and shared,” Matthew Hurles, a researcher at the U.K.’s Wellcome Trust Sanger Institute, said in the statement. “In our own research, using the ExAC resource has allowed us to apply novel statistical methods to identify several new severe developmental disorders.”