The Black Hole: How the Web devours history

Academics, family researchers and even baseball history nuts have noticed recently how some important archives of older newspapers from around the world have vanished off the Web.

The problems have surfaced since PaperofRecord.com, a collection of more than 20 million newspaper pages of papers ranging from the Toronto Star to Mexican village periodicals to newspapers as far as Perth, Australia, merged into Google News Archive.

The problem, researchers discovered, was that Google has had trouble reformatting the newspaper images and gaining rights to display some of the older publications. It has, at least, temporarily removed some of the archives from public view.

There is an idealized view of the Web that sees it as a storehouse of human knowledge, and in the sense of the breadth of what I can find with a random Google search, this is true.

But for all its openness, the Web has proven to be a leaky vessel for historical preservation, with much of its treasure trove lost in a maze of altered Web pages, broken links and deleted sites.

The head of the British Library recently warned in The Observer newspaper that if this digital memory loss is not fixed, we are in danger of creating a black hole for future historians and writers.

Archives of The Sporting News, founded in 1886, and nicknamed the Bible of Baseball, is among the publications that has fallen victim in the transition of PaperofRecord.com to Google ownership. Some older Mexican newspapers are also offline, academics complain.

Preserving history on the Web is a struggle even for Google, whose stated mission is to organize the world's information and make it universally accessible and useful.

We're doing our best to find a solution to include as much of the acquired content as possible, a Google spokesman says of the newspaper archive transition.

But as more and more of our collective memory is hosted online, the danger grows that we lose the content and context of events that happened just days ago, let alone weeks, months or decades back.

Try retracing the links to old scandals or unflattering images on the Web, say to Enron or Parmalat or other fallen corporate names. Most of them are gone, despite the best efforts of sites like Wikipedia or Smoking Gun or the combined energies of the blogosphere to ferret out and preserve such history.

Where is the global sense of outrage that followed the looting of Iraq's National Museum as U.S. troops stood by in the turmoil following the ouster of Saddam Hussein in 2003? While hard to measure, I think it's a safe bet that the world suffers the loss of a museum full of artifacts every day by depending upon the Web to host our precious cultural memories.

That's not to neglect the enormous value of the Web as temporal medium for sharing information. The latest celebrated example of this is how independent analyst Alex Dalmady used financial data from the Stanford Group's own website to uncover the unlikely financial returns promised by the bank.

His Web detective work is the exception that proves the rule. It was all information hiding in plain site and Dalmady simply had the courage to say the emperor had no clothes.

One does not have to be a detective, or even a financial expert, to spot financial institutions that may prove insolvent, or worse, with the passage of time, Dalmady crowed in a report he wrote. As the saying goes, if it looks like a duck…

Examples like Dalmady's are, sadly, the exception.

The World Wide Web as it has evolved over the years has made it almost purpose-built for obscuring or deleting uncomfortable facts. That wasn't the intention of Web inventor Tim Berners-Lee, whose vision was that every address would point to a discreet page of data. Instead, Web designers have found it convenient to create dynamic Web addresses that may make it impossible to find information the next time you return to a site.

Even Dalmady's work in January is already hard to reproduce. The Stanford International Bank Ltd site informs visitors the company has been put into receivership and provides no links to its past business.

The recent privacy backlash by Facebook users began when the management of the world's most popular social networking site attempted to address the issue of who owns the history of conversations that occur between Facebook friends if one of the parties leaves the site.

Changes made last month to Facebook user guidelines implied that the company owned the rights to users' personal data, including message and photos, even after they shutdown their accounts. The company has since back-peddled and assured its 175 million members that, indeed, users control the data they create on the site.

Susan Feldman, an expert on Web search with research firm IDC in Framingham, Massachusetts, says the problem of the disappearing Web is very real and also partly a mirage. The limitations of current search technology that depend on users choosing the right keywords to find what they are looking for is part of the problem.

Help is on the way from improved search tools such as text analytics and concept clustering technology that will help users find more of the information they may think is lost on the Web.

But until the Web's important information archives are secured in modern libraries and improved search tools are widely available, the sense that we are losing our collective digital heritage will only grow.

Enjoy the Web's many benefits, while they are still on your screen. Keep copies of anything you want to remember, or risk losing it, perhaps as early as the next time you refresh your browser. We live in a time where the capacity to record and capture our lives has never been greater.

But using the Web to preserve those memories makes it more and more likely that future generations will consider the early years of the Web to be lost decades.

United States Australia Iraq Google