Social media may be great for collecting big datasets but that data needs to be corrected for inbuilt biases, researchers say. Reuters

Social media may provide a fast and cheap way for human behavior analysts to gather data, but researchers at McGill University in Montreal and Carnegie Mellon University in Pittsburgh say many of the studies generated from that data are flawed. The researchers note thousands of research papers annually are based on such data, everything from predicting the next summer blockbuster to blips in the stock market.

"Many of these papers are used to inform and justify decisions and investments among the public and in industry and government," Derek Ruths, an assistant professor in McGill's School of Computer Science, said in a press release. In an article published in the journal Science, Ruths and Jugen Pfeffer, of Carnegie Mellon's Institute for Software Research, recommend researchers start correcting their datasets for bias.

"People want to say something about what's happening in the world and social media is a quick way to tap into that," Pfeffer said. Following the Boston Marathon bombing in 2013, for instance, Pfeffer collected 25 million related tweets in just two weeks. "You get the behavior of millions of people -- for free," Pfeffer said in a separate release.

They note that Pinterest, for example, is largely used by women, ages 25 to 34, while Instagram is mainly a platform for people between the ages of 18 and 29, blacks, Latinos, women and urban inhabitants. The design of platforms themselves also can dictate how people behave, they said, noting that it's harder to find dislikes than likes on Facebook. Also, spammers and bots pose as normal users on social media and get incorporated into predictions. There's also the problem of being unable to determine how social media platforms filter their data streams.

"The old adage of behavioral research still applies," Pfeffer said. "Know your data."