Google Inc.'s massive crash of its popular Gmail service was the result of a routine service miscalculation, the company explained late Tuesday, leaving the majority of its 150 million users without access to their email accounts.
Ben Treyner, the company's senior vice president for engineering and site reliability, apologized for the error, explaining that it happened when some servers were taken offline, which overloaded others.
Right up front, I’d like to apologize to all of you — today’s outage was a Big Deal, and we’re treating it as such, Treyner wrote in a Google blog.
This morning (Pacific Time) we took a small fraction of Gmail’s servers offline to perform routine upgrades. This isn’t in itself a problem — we do this all the time, and Gmail’s Web interface runs in many locations and just sends traffic to other locations when one is offline, he wrote.
However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct Web queries to the appropriate Gmail server for response.
Google gave a similar explanation in May after a widespread service outage left 14 percent of Google users across the globe without access to many of the search company's services.