One of the world’s biggest family trees to ever be assembled has been created by a renowned “genome hacker.”
Yaniv Erlich recently presented what may be largest family tree, comprised of 13 million people and stretching back to the 15th century, at the American Society of Human Genetics annual meeting in Boston on Oct. 24, Nature reports.
Erlich, a computational biologist, created the massive tree by using more than 43 million public profiles, such as birth and death certificates, from the genealogy website geni.com. His team assembled numerous family trees, but one drawn from a few thousand individuals was 13 million people in size.
On Erlich’s website, he explains the details surrounding the family tree.
“Geni.com allows genealogists to enter their family trees into the website and to create profiles of family members with basic demographic information such as sex, birth date, marital status, and location,” Erlich writes about the project, known as FamiLinx. “We used graph algorithms to clean the data and organize the pedigrees into fast accessible formats. We also employed natural language processing to tokenize birth, residence, death, and burial locations of individuals and converted this information into quantitative longitude and latitude.”
Described as “crowd-sourced genealogy,” individuals belonging to the FamiLinx family trees remain anonymous. “We removed any explicit identifiers from data such as first names, surnames, exact date of birth or date of death,” according to the website.
The family trees may be able to provide information on human demographics, population expansions and possibly medical information, Nancy Cox, a human geneticist at the University of Chicago, who was not involved in the study, told Nature.
“We’ve really only begun to scratch the surface of what these kinds of pedigrees can tell us,” she said.
While drawing information from public self-reported sources may be unreliable, Erlich’s tree may be the start of a new way to study genealogical data.
“It’s an incredibly powerful approach,” Kári Stefánsson owner of Reykjavik-based genetics company deCODE, said. “People are becoming more willing to contribute data and medical records. It’s an exciting possibility.”