Wikipedia page views could, in the future, become an important tool in predicting disease outbreaks, according to the findings of a new study published in the journal PLOS Computational Biology. The research, carried out by a group of data scientists from the Los Alamos National Laboratory in New Mexico, argued that Wikipedia traffic data could also be used to estimate the current rates of disease outbreaks across the world.
The team of scientists tracked the progress of seven diseases across 11 countries -- using language as an approximate measure for people’s locations -- between 2010 and 2013, and compared page views on Wikipedia articles about those diseases with data obtained from health ministries. Based on this comparison, the researchers found that, in eight out of 14 cases, there was a clear increase in page views nearly a month before an official declaration of an outbreak.
Using this technique, they were able to predict influenza outbreaks in the U.S., Poland, Japan and Thailand, the spread of dengue in Brazil, and a spike in the number of tuberculosis cases in Thailand.
The research was based on the theory that people tend to search online for symptoms of the disease they suspect they have before being officially diagnosed. The researchers claimed that Wikipedia is the best bet to create an Internet-based model to predict outbreaks because data on Wikipedia page views are publicly available.
“Using simple statistical techniques, our proof-of-concept experiments suggest that these data are effective for predicting the present, as well as forecasting up to the 28-day limit of our tests. Our results also suggest that these models can be used even in places with no official data upon which to build models,” the researchers said, in the paper, adding that the new method could overcome “key gaps in existing traditional and internet-based techniques.”
Traditional disease surveillance techniques involve collecting data from laboratory tests and tracking the number of visits to health care facilities. The researchers claimed that while these techniques are accurate, they are also slow and expensive.
However, the Wikipedia-based model was not successful in predicting the spread of slow-progressing diseases like HIV/AIDS, according to the paper. Moreover, several scientists also questioned the extent to which the model could be used in areas with poor Internet penetration, or in relation to poorly understood diseases.
“I'm not sure how much Wikipedia is used in Africa,” Heidi Larson, an anthropologist from the London School of Hygiene and Tropical Medicine, told BBC. “For issues like Ebola, I don't think people at the beginning of the outbreak in West Africa would have (been searching for it), because they wouldn't have had it (Ebola) before.”