Barker Lecture

Posters and Telescopes: an Introduction to Translation

Translation is difficult, even for people. To begin with, you have to know two languages intimately. And even if you speak two or more languages fluently, it is not a trivial matter to produce a good translation. When people start talking about the possibility of a computer replacing a human translator, someone will often bring up a sentence similar to the following:

Time flies like an arrow.

The person who brought it up will typically conclude by asserting that this sentence is an obvious example of a sentence that a computer could not translate. As a matter of fact, a computer could handle this sentence if it were programd to handle just this sentence. The problem is getting a computer to deal adequately with sentences it has not been specifically programd to handle. I will partially analyze this sentence and then give other superficially similar sentences that cannot all be translated in a parallel fashion.

This sample sentence about time flying is a figure of speech that combines a metaphor and a simile. Time does not really fly in the literal sense of a bird flying, but here we metaphorically say it does. Another example of a metaphor would be when we say that a ship ploughs the water, since there is no real plough and no dirt involved. The simile in this expression is the comparison between the metaphorical flight of time with the flight path of an arrow.

Now consider the following sentence, which is a rather dumb-sounding figure of speech modeled on the first one:

Fruit flies like a baseball.

Not all fruit, when thrown, would fly through the air like a baseball, except perhaps an apple, orange, or peach. But wait a minute. Suppose you substitute 'peach' for 'baseball' in the second sentence. All of a sudden, there is a new meaning. This time everything is literal. The 'fruit flies' are pesky little insects you can see crawling around on a juicy peach, having a feast. The 'peach' version of the sentence would be translated very differently from the 'arrow' version or the 'baseball' version.

The point of these sentences for human versus computer translation is that a human translator would know to handle the variation "Fruit flies like a peach" very differently from the baseball version while a computer would probably not even notice the difference and therefore could never replace a human translator. Why wouldn't a computer notice the difference? We will explore differences between humans and computer throughout this paper.

These sentences do show how words can shift in their usage. The word 'flies' shifts from signifying an action to signifying an insect, and in most languages it cannot be translated the same way in both usages. But we do not need anything nearly so exotic as these sentences in order to show that translation is full of pitfalls. Let me give you an example of a human translation of a simple poster, a translation that did not turn out very well.

This summer I attended a conference in Luxembourg and noticed in the train station a poster announcing a coming event. The announcement was in French with an English translation. I will refer to this announcement as the poster example. The English translation of the date and time of the event read as follows:

Saturday the 24 June 1995 to 17 o'clock

Obviously, there are a number of problems in this translation. In English we say "the 24th of June, 1995" or "June 24, 1995," rather than "the 24 June 1995." Also, we say "5 o'clock" or "5 p.m.," because in the United States we divide the day into two 12-hour periods, rather than one 24-hour period, except in the military. In England, the use of a 24-hour clock is more common but even there one would not say "17 o'clock." Perhaps the most puzzling error in this translation is the use of the word 'to'. At first glance, one would assume that the word 'at' was intended, so that the translation becomes, after all our changes:

Saturday, the 24th of June at 5 p.m.

However, an examination of the French shows that this is incorrect. The French original used the word vers, which can mean either 'in the direction of' (as in a movement toward an object or to the left) or 'at an approximate time' (as in a promise to drop off a package around noon). Clearly, the second reading of vers is more likely here. Whoever translated the French probably used a French to English dictionary and just picked the first translation listed under the word vers, without thinking about whether it would work in this context. In the case of this poster, the translator did not have a sufficient knowledge of both languages, and the translation turned out not only awkward but just plain wrong.

This example of bad human translation is interesting because it was most likely done by a human yet in a manner similar to the way computers translate. (By the way, the conference I was attending in Luxembourg, where I saw the poster, was the fifth world summit on computer translation, which is usually called Machine Translation, hence the conference title: Machine Translation Summit V.)

Computers do not really think about what they are doing. They just mechanically pick a translation for each word of the source text, that is, the text being translated, without understanding what they are translating and without considering the context. An examination of the source text for our poster example will illustrate this.

French source text: le samedi 24 juin 1995 vers 17h00

Poster translation: Saturday the 24 June 1995 to 17 o'clock

Better translation: Saturday, the 24th of June, 1995, around 5 p.m.

To give credit where it is due, the translator apparently knew enough about English dates to reposition the translation of the French article le to the other side of 'Saturday.' Other than the re-ordering of the article, the translation on the poster could be obtained using a simple word-for-word substitution technique by either a person or a computer looking up words in a dictionary. No real knowledge of either language would be required. Thus, people can easily translate like computers, that is, mechanically, usually with rather disappointing results. However, the opposite is not true. Computers cannot, in general, translate like people, at least not like people who know both languages and are skilled translators. I have analyzed a poor quality human translation and provided an improved human translation. We will now look at a real-life example of machine translation. I will refer to it as the telescope example.

Last year I was at another conference on machine translation, this one being held at Cranfield University in the England. There were several major companies in the exhibit area demonstrating their commercial machine translation systems. On the way, I had picked up a French magazine similar to the American magazine Air and Space, and at the conference I fed a sentence from the magazine into one of the machine translation systems. Below is the French sentence that went in, followed by the English translation that came out of the computer.

French source sentence: L'atmosphère de la Terre rend un peu myopes mêmes les meilleurs de leur téléscopes.

English machine translation: The atmosphere of the Earth returns a little myopes same the best ones of their telescopes.

Even without knowing French, one can see that the English translation is basically the result of a word-for-word substitution. In the poster example, the translation was awkward and somewhat misleading. This translation is perhaps even worse: it is practically incomprehensible. The context of the source text is an article from a French magazine discussing the problem of turbulence in the atmosphere. The magazine is addressed to a general audience rather than to professional astronomers. One possible human translation would be the following:

The earth's atmosphere makes even the best of their telescopes a little "near sighted" (in the sense that distant objects are slightly blurred).

There are obviously a number of problems in the machine translation. These problems stem from the ambiguity of word meanings. For example, the French verb rend can be translated as 'return' or 'make,' depending on the context. The French word même can be translated as 'same' or 'even,' again depending on the context. In both cases, the computer mechanically chose a translation and in both cases the poor thing got it dead wrong. It is hard to tell whether the computer couldn't find the word myopes in its dictionary and just passed it through unchanged or whether it found it and translated it inappropriately for the audience. A 'myope' is a technical term in English for someone who is myopic, that is, near-sighted. However, this is the wrong level of language to use in a publication intended for a general audience. Computers have no sense of audience; they just blindly follow rules. Another machine translation system, when given the same French sentence, did better in some ways but made other mistakes:

The atmosphere of the Earth renders a same myopic bit best of their telescopes.

Professional human translators seldom, if ever, make errors like the ones we have seen in the poster example and the telescope example. Nevertheless, humans with nothing but a dictionary in hand can choose to stoop to the level of computers. In contrast, computers have not risen to the level of professional human translators. Why not? Why can't a computer translate more like a person?

It is interesting to observe how various persons who have not worked on machine translation react to the title question of this paper. Some believe that there is no fundamental difference between humans and machines. They assume that the quality of machine translation will someday rival the quality of human translation in all respects. They point out that computers can do arithmetic much faster and more accurately than people. Then they remind us that math is harder than language for many students. Furthermore, they take it as obvious that the human brain is ultimately a type of computer. From this basis, they conclude that it is just a matter of time until we have a new kind of computer that will function like the brain, only faster and better, and will surpass the capabilities of humans in the area of language processing. Others take a contrary position. They believe that humans and computers are so entirely different in the way they work inside that computers will never approach the capabilities of human translators. Still others are puzzled by the question. They were under the impression that the problem of machine translation was solved years ago.

The fact of the matter is that machine translation is a problem that is far from solved. Experts in the field agree that computers do not yet translate like people. On some texts, particularly highly technical texts treating a very narrow topic in a rather dry and monotonous style, computers sometimes do quite well. (In the annex to this paper, I give a sentence of English and its computer- generated translation that was offered by a vendor as part of a showcase example of machine translation.) But with other texts, particularly with texts that are more general and more interesting to humans, computers are very likely to produce atrocious results. Professional human translators, on the other hand, can produce good translations of many kinds of text. People can handle a range of text types; computers cannot. Where the experts disagree is on the question of why computers are so limited in their ability to translate. I will present an answer to this controversial question, but only at the end of this paper. I will build up to it in the following stages:

a few more examples of why translation is difficult

Secondly, I will very briefly describe the mainstream approach to characterizing human language and point out how it fails to address the difficulties presented in the first stage.

Thirdly, I will discuss a key factor that is missing in current theories of human language, a factor that I believe will be needed in computers for them to be able to translate more like people.

Table of contents

Next section