July 17, 2002
0802 cover image
July 15, 2002
Multilingual Machines
"Gist" translations by commercially available software translate with only 70 to 80 percent accuracy. Will statistical-analysis techniques improve that performance?
By Charles Choi

  1   2   next »

machine translation

The Allies emerged out of World War II victorious--only to face the Cold War immediately afterwards. Code breakers in Britain and the United States, buoyed by their computer-aided triumphs in the former war, sought new breakthroughs by turning the processing power of machines not on codes, but on languages. The mathematical techniques that cracked secret Axis communiqués, went the logic, could prove invaluable in gleaning intelligence from countless reams of Russian science and news text.

It's more than 50 years later, and no foolproof Star Trek-style universal translator technology has yet materialized. The time is nevertheless ripe for such automated translation. The $5-billion-plus worldwide translation-services market is overburdened already, and demand is expected to grow to $7.6 billion by 2006 as the Internet becomes more pervasive.

In one of the latest efforts to crack the codes of language with machines, developers of a prototype translation technology hope to challenge the industry with a radically different technique. It essentially throws books in a blender to see how the comparative phrases in different languages stick back together again. Known as EliMT after its inventor Eli Abir of Meaningful Machines in New York City, the statistical technique may prove key not only to making machine translation, or MT, more accurate, but also in quickly rendering translations for languages that are currently neglected by the corporate world.

"The EliMT method is clearly the most promising and theoretically important MT development in the past several years, and probably since the advent of MT itself," claims machine translation expert Jaime Carbonell of Carnegie Mellon University.

The Trouble With Translations

Machine-translation services provided for free via Altavista's Babelfish or Google by industry leader Systran allow for so-called gist translation, where the translation provides the basic idea, with an error rate of 20 to 30 percent. For commercial applications, the extra time to polish out the inaccuracies in gist translations can prove costly: a professional human translator is paid some $20 per hour, and many are so busy that by the time they are available to take on the job, it may be too late to be useful in the cutthroat realm of international finance.

Most commercial MT systems work in much the same manner as how a person at a library might seek to translate a foreign language. First the systems analyze the unfamiliar text. Then they refer to the appropriate bilingual dictionaries and grammar guides. In a way, these "rule-based" schemes are similar to how someone would read a coded text, once that person knew the rules of the code.

However, after working under that assumption, in the 1950s, scientists quickly realized that natural languages are far more complex than artificial codes. This is due in large part to the problem of a how a word's meaning varies with context. The word "cool" used in regards to temperature, for instance, means something different when used by Fonzie. One apocryphal tale dating back to crude, early machine-translation attempts had the idiom "The spirit is willing but the flesh is weak" translated from English to Russian and back again only to yield "The vodka is good but the meat is rotten."

While rule-based MT has improved substantially since then, it's not foolproof. It can take a team years to develop and debug the algorithms to translate any two languages, and every language pair is a whole new endeavor--an English-to-Chinese system won't necessarily help translate Chinese to English or English to Swahili. Since roughly 20 to 30 languages are key economically, there are roughly 400 to 800 language pairs necessary for global finance. So far on Babelfish, only 19 language pairs are available, and other rule-based products do not offer many more options.

Statistics and Words

The EliMT technique works on a different strategy. Imagine a group of people going into a library, looking up the novel Crime and Punishment in the original Russian and then borrowing every English translation of Dostoevsky's work. If they compared how each sentence was translated, they could find statistically that certain phrases were often interpreted the same way. They could then stitch together a translation for a new sentence by recycling pieces of old translations, taking two halves of a sentence from different books. "Instead of translating from word to word, you're translating from sentence fragments to sentence fragments," says Steve Klein, Fluent Machines' chairman and CEO.

  1   2   next »

More to Explore:
"Multilingualism on the Internet," by Bruno Oudet (Scientific American, March 1997), is available for purchase at the Scientific American Archive

© 1996-2002 Scientific American, Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.