Translation, Theory and Technology

Sections
Homepage
Theory
Technology
New! XLT Page
SALT Project
OSCAR
Press Releases
CLS Framework
TAMA 2001
About us


Copyright 2001
Translation Research Group
Send comments to comments@ttt.org
Last updated:
April 12, 2001
Backward Forward

Machine Translation

Ambiguity

What makes machine translation so difficult? Part of the problem is that language is highly ambiguous when looked at as individual words. For example, consider the word "cut" without knowing what sentence the word came from. It could have been any of the following sentences:

- He told me to cut off a piece of cheese.
- The child cut out a bad spot from the apple.
- My son cut out early from school again.
- The old man cut in line without knowing it.
- The cut became infected because it was not bandaged.
- Cut it out! You're driving me crazy.
- His cut of the profit was too small to pay the rent.
- Why can't you cut me some slack?
- I wish you could be serious, and not cut up all the time.
- She was unwilling to take a cut in pay.
- His receiver made the cut much sooner than the quarterback expected.
- Hardly anyone made the cut for the basketball team.
- If you give me a cut like that, I'll have your barber's license revoked.
- Lousy driver! Look before you cut me off like that!
- The cut of a diamond is a major determiner of its value.

If a computer (or a human) is only allowed to the word "cut" and the rest of the sentence is covered up [Figure 2: 19k GIF], it is impossible to know which meaning of "cut" is intended [Figure 3: 24k GIF]. This may not matter if everything stays in English, but when the sentence is translated into another language, it is unlikely that the various meanings of "cut" will all be translated the same way. We call this property of languages "asymmetry".

We will illustrate an asymmetry between English and French with the word "bank." The principal translation of the French word banque (a financial institution) is the English word "bank." If banque and "bank" were symmetrical then "bank" would always translate back into French as banque. However, this is not the case. "Bank" can also translate into French as rive, when it refers to the edge of a river. [Figure 4: 66k GIF] Now you may object that this is unfair because the meaning of "bank" was allowed to shift. But a computer does not deal with meaning, it deals with sequences of letters, and both meanings, the financial institution one and the edge of a river one, consist of the same four letters, even though they are different words in French. Thus English and French are asymmetrical.

Early researchers in machine translation (in the late 1940s and early 1950s) were already aware of the problem of asymmetry between languages, but they seriously underestimated the difficulty of overcoming it. They assumed that by giving the computer access to a few words of context on either side of the word in question the computer could figure out which meaning was intended and then translate it properly. By about 1960, some researchers had realized that even if the entire sentence is available, it is still not always obvious how to translate without using knowledge about the real world. A classic sentence that illustrates this difficulty uses the word "pen," which can refer to either a writing instrument or to an enclosure in which a child is placed to play so that it will not crawl off into another room. The ambiguity must be resolved or the word "pen" will probably be translated incorrectly.

- The pen was in the box.

This sentence will typically be interpreted by a human as referring to a writing instrument inside a cardboard box [Figure 5: 52k GIF], such as a gift box for a nice fountain pen or gold-plated ballpoint pen, rather than a play pen in a big box. However, look what happens if the sentence is rearranged as follows:

- The box was in the pen.

This sentence will typically be interpreted by a human as referring to a normal-size cardboard box inside a child's play pen [Figure 6: 81k GIF] rather than as a tiny box inside a writing instrument. A human uses knowledge about typical and relative sizes of objects in the real world to interpret sentences. For a human, this process is nearly effortless and usually unconscious. For a computer that does not have access to real-world knowledge, this process is impossible.

The situation is also taken into account. Returning to the sentence about the pen in the box, there are texts, such as a description of a family with small children moving their affairs to another apartment, in which a human would interpret the pen as the child's play pen being put into a large box to protect it while it is moved to a new location. And there are texts, such as a spy story about ultra-miniature boxes of top secret information, in which the sentence about the box in the pen would be interpreted as referring to a writing instrument containing a tiny box. The words in these sentences do not change, yet the interpretation changes. Here even real-world knowledge is insufficient. Some sense of the flow of discourse and the current situation are needed.

Backward Forward

| Homepage | Theory | Technology | XLT Page | SALT Project | OSCAR |
| Press Releases | CLS Framework | TAMA 2001 | About us |