Translation, Theory and Technology

Sections
Homepage
Theory
Technology
New! XLT Page
SALT Project
OSCAR
Press Releases
CLS Framework
TAMA 2001
About us


Copyright 2001
Translation Research Group
Send comments to comments@ttt.org
Last updated:
April 12, 2001
Backward Forward

Machine Translation

The Translation Tripod

A translation project can be thought of as sitting on a tripod whose three legs are the source text, the specifications, and the terminology. If any of the three legs is removed, the project falls down. [Figure 1: 16k GIF]
  1. Source text

    Obviously, no translation can be done without a source text (i.e., the document to be translated). But for machine translation, an additional basic requirement is that the source text be available in machine-readable form. That is, it must come on diskette or cartridge or tape or by modem and end up as a text file on your disk. A fax of the source text is not considered to be in machine-readable form, even if it is in a computer file. A fax in a computer file is only a graphical image of the text, and the computer does not know which dots compose the letter a or the letter b. Conversion of a source text on paper or in a graphical image file to machine-readable form using imaged character recognition (ICR) is not usually accurate enough to be used without human editing, and human editing is expensive, adding an unacceptable cost component to the total cost of machine translation. Thus, for machine translation to be appropriate, it is usually necessary to obtain the word processing or desktop publishing file from the organization that created the source text. But this is only one of many requirements.

  2. Specifications

    All translations projects have specifications. The problem is that they are seldom written down. Specifications tell how the source text is to be translated. One specification that is always given is what language to translate into. But that is insufficient. Should the format of the target text (i.e., the translation) be the same as that of the source text or different? Who is the intended audience for the target text? Does the level of language need to be adjusted? In technical translation, perhaps the most important specification is what equivalents to use for technical terms. Are there other target texts with which this translation should be consistent? What is the purpose of the translation? If the purpose is just to get a general idea of the content of the source text, then the specifications would include "indicative translation only." An indicative translation is usually for the benefit of one person rather than for publication and need not be a high-quality translation. Thus, publication-quality translations are high-quality translations (and are usually the result of human translation), while indicative translations are low-quality translations (and are usually the result of machine translation). These two types of translation are not normally in competition with each other, since a requester of translation will typically want one type or the other for a given document and a given set of specifications. Sometimes, the two types are complementary, such as when an indicative translation is used to decide whether or not to request a high-quality translation of a particular document. In this environment, an indicative translation may be requested for a number of documents, and, using the indicative translations, the requester may select one or two documents for publication quality translation.

    As previously mentioned, indicative translations are usually done using machine translation and high-quality translations are usually done using human translation. This fact reveals a basic difference between humans and computers. Humans, with proper study and practice, are good at producing high-quality translations but typically can only translate a few hundred words an hour to approximately a thousand words an hour, depending on such factors as the difficulty of the source text. Even with very familiar material, human translators are limited by how fast they can type or dictate their translations. Computers are good at producing low-quality translations very quickly. Some machine translation systems can translate tens of thousands of words an hour. But as they are "trained" by adding to their dictionaries and grammars, they reach a plateau where the quality of the output does not improve. By upgrading to a more powerful computer, the speed of translation improves but not the quality. By upgrading to a "more powerful" human translator, the quality of translation improves but not necessarily the speed. Here we have a classic case of a trade-off. You can have high speed or high quality but not both.

    Indicative translation (high speed, low cost, but low quality) represents a new and growing market but does not substantially overlap with the existing market for publication quality translation. The existing market, variously estimated at 10,000,000,000 to 20,000,000,000 US dollars world-wide per year, is primarily for high-quality technical translation. If, on the one hand, your specifications include low quality (barely understandable) translation, then machine translation is for you, and you can stop reading right here. If, on the more likely hand, your specifications include high-quality translation, then it is not obvious that machine translation is appropriate for your current translation job. Here quality would be measured by whether the target text is grammatical, accurate, understandable, readable, and usable. Usability can be measured by selecting tasks, such as maintenance operations, which can be accomplished by a source-language reader with the help of the source text and seeing whether those same tasks can be performed by a target-language reader with the help of the target text. Such measurements are notoriously expensive, but a skilled reviewer can accurately predict usability simply by studying the source and target texts. Grammaticality, and understandability, and readability, which are progressively more stringent requirements, can be measured by a target-language monolingual person. But accuracy requires the assistance of a skilled bilingual person who examines both the source and target texts.

  3. Terminology

    The treatment of terminology could have been included soley under specifications. But terminology is so important that the actual terminological database (also called a "termbase") supplied with a source text has been listed as a third essential component of a translation job. The aspect of terminology that does fit under specifications is the requirement that the translation job use a certain termbase into order to achieve desired consistency. Let me explain what I mean by consistency. Translation requesters typically want the terminology in their translated documents to mesh closely with terminology in related documents. For example, a software company will want all revisions of a software manual to use the same terms as the original, to avoid confusing readers. Translation requesters should track all terminology relevant to a given document and deliver that terminology to the translation provider along with specifications and source text. The specification component of the job tells what appropriate termbase to use and, as is all too common, tells what to do if a source-text term is missing from the termbase. The terminology component of the job contains the termbase itself.

Now we can define an appropriate translation job (for a human or for a computer) as one that sits on a stable tripod. It must include a source text (in machine-readable form if for machine translation); it must include well-defined follow the specifications; and it must include any specified termbase. In addition, we can define an appropriate translation as a translation that combines the source text and the termbase in a way that matches the specifications. Note that I said "appropriate" translation, not "good" translation. A poor (low-quality) translation may be appropriate if the specifications include a requirement for a fast, indicative translation.

Backward Forward

| Homepage | Theory | Technology | XLT Page | SALT Project | OSCAR |
| Press Releases | CLS Framework | TAMA 2001 | About us |