While trying to improve Google Maps, the Google Street View team created a new computer program that's so good at recognizing text in photographs, it can fool one of the most secure and best-known tests used for distinguishing humans from bots online.
“Recognizing multi-digit numbers in photographs captured at street level is an important component of modern-day mapmaking,” Google Inc. (NASDAQ: GOOG) said in a research report. “More broadly, recognizing numbers in photographs is a problem of interest to the optical character recognition community.”
Google’s algorithm was able to identify and transcribe addresses contained in several tens of millions of Street View photographs with better than 90 percent accuracy. When Google applied the same program to CAPTCHA puzzles, which are those deliberately hard-to-read strings of distorted text users have to decipher and retype to verify that they are human, the algorithm was correct about 99.8 percent of the time.
“Our evaluations on both tasks ... indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators,” Google researchers wrote in a report published Monday.
Google’s goal is to have a machine that can automatically recognize addresses captured in images taken by special car-mounted Google Street View cameras and automatically match them with the geo-location of a building.
As Google’s Street View cars drive around the globe and capture panoramic images for Google Maps, they also capture photographs of hundreds of millions of multidigit addresses. Google wanted a program that could automatically identify the numbers from the pixels in photographs and associate them with the precise geo-location of the building.
Plenty of computer programs have been designed to recognize text in controlled settings (e.g., apps that can take pictures of W2s and automatically fill out a tax return form). The challenge is recognizing text in the wild, due to hundreds of factors, including everything from font size and color to environmental influences like lighting and shadow.
Google's Street View also has to contend with image resolution, the motion of the car, blurs and the angle from which its cameras capture the numbers.
“Due to these complexities, traditional approaches to solve this problem typically separate out the localization, segmentation and recognition steps,” Google said in the report. Instead, the team tried an approach that merged all of these steps with an artificial learning machine, known as a neural network, which is designed to function by taking in images and turning them into image pixels.
“Turns out that this new algorithm can also be used to read CAPTCHA puzzles,” Vinay Shet, the product manager of Google’s own CAPTCHA generator, reCAPTCHA, said in a blog post. “This shows that the act of typing in the answer to a distorted image should not be the only factor when it comes to determining a human versus a machine.”
By making Google Maps more accurate, Google said its findings will also help improve security systems like reCAPTCHA. It’s likely the technology could also find its way into Google’s mobile programs like the Android operating system and Google Glass.