Computers Learn the Fine Art of Translation

Daniel Gildea

Just a few decades ago, anyone seeking a translation of a text written in a foreign language had to find a capable person to translate it.

Now, computer scientists and linguists have created automated translation programs that can roughly translate back and forth in many of the world’s major languages. And while those programs are still imperfect, they are steadily improving, thanks to the continuing work of researchers like associate professor of computer science Dan Gildea and his colleagues at Rochester.

Gildea works in the field known as machine translation, an area of natural language processing. Machine translation is a big challenge for computers as it not only requires knowledge of the languages that are being translated but also an understanding of idioms, double entendres, and often even pop culture.

Gildea generates algorithms that can translate from one language to another. These algorithms can be applied to any language, but he and his team have been concentrating on translating from Chinese into English.

China’s growing economic power and the increasing number of Chinese Internet users—currently about 450 million, or one and a half times the U.S. population—ensure a growing demand for such translations.

Translating from Chinese into English has some intrinsic challenges—whether it’s a machine or a human doing the translation. For example, verbs don’t have tenses in Chinese. So to understand whether the English translation of a verb should appear in past, present, future, or conditional, it is not enough to simply look at the verb in Chinese. The computer or the human doing the translation needs to find a word somewhere else in the sentence—such as “today,” “later,” or “yesterday”—that provides that information.

This is not straightforward for a computer. The real challenge comes in how to apply powerful statistical techniques to create the algorithms a computer will use to translate. The algorithms are effectively the logic in the computer’s “brain,” and they need to learn how to translate.

Just as approaches to teaching a foreign language have changed over time, Gildea explains, so have the models used for machine translation.

For example, when teaching a foreign language these days, teachers no longer require students to become proficient in the grammar before speaking to them in the foreign language. Similarly, recent approaches to machine translation do not require the computer to have “memorized” all grammar rules of the language in advance.

Instead the computer is taught by analyzing the same text in two languages. This is the model Gildea and his team use, which is also used by hundreds of researchers and by programs like Google Translate.

The researchers repeat this process many, many times so the computer can start to recognize certain words, sentences, and grammatical constructions. Vast amounts of data in the form of translated news and websites that exist on the Internet are the perfect training material for these computer translators. The more texts that are fed to the machine, the more likely it is that similar constructions will be present in different texts for the computer to compare.

It is the role of Gildea and other researchers to develop algorithms that extract information from the texts by observing patterns. These are stored in a hierarchical structure called “semantic trees,” from more general sentence structures to more specific word endings. For example, a phrase that contains “taller than” or “quicker than” will always require an object, for example, a “him,” “her,” “Peter,” or “the dog.” And the next time a translation into English requires this phrasing, the algorithm will know it needs to find the appropriate word to fill in the gap.

 
Daniel Gildea

Gildea generates algorithms that can translate from one language to another. These algorithms can be applied to any language, but he and his team have been concentrating on translating from Chinese into English.