Microsoft Unveils Incredible Real-Time Speech And Language Translation Technology In China [VIDEO]

 @redletterdave
on November 09 2012 3:15 PM

While Microsoft is enjoying renewed success with the wide release of its Windows 8 interface here in the U.S., the Redmond, Wash.-based software giant just made a big splash in China at its 21st Century Computing event held on Oct. 25. The footage from that event, which was officially released by Microsoft on Thursday night with an accompanying blog explanation, unveiled a promising new advancement in language translation technology.

After explaining how far voice technology has come, Rick Rashid, Microsoft's chief research officer, demonstrated to a packed audience at Microsoft Research Asia's event in Tianjin, China, a brand-new technology that sci-fi writers could only dream of: Near-instant language translation speech technology, where the highly accurate final translation is delivered in what resembles the user's own unique voice.

"For the last 60 years, computer scientists have been working to build systems that can understand what a person says when they talk," Rashid wrote on Microsoft's blog following the event to better describe his demonstration. "In the beginning, the approach used could best be described as simple pattern matching. The computer would examine the waveforms produced by human speech and try to match them to waveforms that were known to be associated with particular words. While this approach sometimes worked, it was extremely fragile. Everyone’s voice is different and even the same person can say the same word in different ways. As a result these early systems were not really usable for practical applications."

During his demonstration, Rashid mentioned the breakthrough technique known as "hidden Markov modeling," which was discovered by Carnegie Mellon University researchers in the late 70s, and said how this method has been the basis for several speech recognition software models currently seen today, from US banks to Apple's Siri to the Xbox Kinect.

However, while speech systems have continually gotten better, they've been relatively inefficient and arbitrary. According to Rashid, "even the best speech systems still had word error rates of 20 to 25 percent."

These error rates are tough to improve, but thanks to a breakthrough accomplished just a little over two years ago -- a joint accomplishment by Microsoft Research and the University of Toronto -- a new technique known as "Deep Neural Networks," which is patterned after a human's own brain behavior, researchers have been able to make speech recognition, and thus, translation, significantly more discriminative and accurate.

"During my October 25 presentation in China, I had the opportunity to showcase the latest results of this work. We have been able to reduce the word error rate for speech by over 30% compared to previous methods. This means that rather than having one word in 4 or 5 incorrect, now the error rate is one word in 7 or 8," Rashid wrote on Microsoft's blog. "While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979, and as we add more data to the training we believe that we will get even better results."

With improved listening skills, Microsoft was able to achieve better voice recognition software and then simultaneously feed that data into a language translation engine -- Bing Translator, anyone? -- to break down the text, find the properly translated equivalent in the new language, and, the hardest part, reorder the words to make it appropriate for the intended listener -- "an important step for correct translation between languages," Rashid noted.

After speaking several full sentences and running them through the translator during his demonstration, the speakers in the auditorium pumped out perfectly-translated words in Mandarin Chinese, which actually had the same tone and vocal qualities of Rashid's own voice. Upon hearing the translation for the first time, the crowd in Tianjin erupted in several rounds of enormous applause. The translated voice may have been delivered by a machine, but it sounded just like the original speaker. Rashid effectively became his own personal translator.

"The results are still not perfect, and there is still much work to be done, but the technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers," Rashid wrote on Microsoft's blog. "In other words, we may not have to wait until the 22nd century for a usable equivalent of Star Trek’s universal translator, and we can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed. The cheers from the crowd of 2000 mostly Chinese students, and the commentary that’s grown on China’s social media forums ever since, suggests a growing community of budding computer scientists who feel the same way."

Watch the incredible video of the demonstration below, and give us your thoughts and impressions in the comments section at the bottom of the page.

Share this article