ISO 12620 data categories
Copyright © 2000
Translation Research Group
Send comments to firstname.lastname@example.org
Last updated: January 27, 2001
Introduction to ISO 12200 (negotiated MARTIF)
MARTIF is a format to facilitate
the interchange of terminological data among terminology management systems.
This format is the result of several years of intense international collaboration
among terminologists and database experts from various organizations,
including academic institutions, the Text Encoding Initiative (TEI), and
the Localisation Industry Standards Association (LISA). MARTIF, also know
as ISO 12200, is associated with ISO Technical Committee 37.
The identified purpose of MARTIF is to establish:
...a universally applicable format...for the negotiated interchange of structured terminological data among various applications, system environments, and hardware platforms. It is primarily designed for use with terminological data that can be stored, read, retrieved, and manipulated by a computer. (ISO DIS 12200.2:1)
To achieve the platform-independence and flexibility that are central to its purpose, MARTIF embeds terminological data in a data category structure formally defined with a specialized declarative computer programming language called SGML (Standard Generalized Markup Language) (ISO 8879:1986). The data category structure for MARTIF is found in an SGML program called a DTD (Document Type Definition). A DTD enumerates the logical pieces of data that are permissible in a document of a given type and how those data items may be combined. DTDs can be written for many document types. One familiar example is the DTD that defines HTML (Hyper Text Markup Language), the document type used on the World Wide Web.
Before examining the MARTIF DTD, which is fairly complex, consider how a DTD could be used to define the structure of a recipe such as we might find in an ordinary cookbook.
If you are already familiar with DTDs, please skip to the section headed the MARTIF DTD.
A simple DTD for a recipe might
specify that each Recipe must consist of a RecipeTitle, followed by an
IngredientList (which consists of one or more IngredientItems), followed
by an InstructionStatement. The formal SGML expression of this logical
structure in a DTD might look like this (Figure 1):
Figure 1: A simple recipe DTD
In this example, each part
of a recipe is introduced by the ELEMENT key word. The name of the element
is then given, followed by a declaration of how that element must be delimited
in the running text of the document. For all the recipe elements, this
delimiter specification consists of two hyphens, which means that both
the beginning and end of each element must be explicitly tagged. Document
elements can be defined to allow an implicit start or end (e.g., the IngredientList
might be allowed to end implicitly when the InstructionStatement starts),
in which case either the first or the second hyphen would be replaced
by an 'o' (for "omitable"). To tag the start of an element,
the element's name appears in the running text of the document, enclosed
in angle brackets (
Following the specification of delimiters for each element, a content model details the internal structure of each element. Content may be any combination of other defined elements or atomic data types (such as #CDATA, which is unparsed text), joined by operators such as the comma [,] (forces the specified sequence), the plus [+] (1 or more), the star [*] (0 or more), the vertical bar [|] (logical OR), and the question mark [?] (0 or 1). Other SGML operators are used, but are not vital to our discussion here.
Once defined, a DTD serves as a kind of blueprint for the construction of its named document type. We might use the sample recipe DTD to construct the following SGML document from a recipe on a Campbell's Soup can (Figure 2):
Figure 2: A recipe represented as an SGML document
One important advantage of using SGML to define a document type is that associated documents can be compared to the DTD by SGML parser programs to validate the internal structure of the documents. In this way many kinds of integrity and format constraints can be strictly enforced, which allows a high degree of standardization within a document family.
In practice, most DTDs are
considerably more complex than this one, but the example gives a general
idea of how structure is represented. For a more thorough discussion of
DTD syntax, consult standard reference works such as ISO 8879:1986 (the
formal definition of SGML), The SGML Handbook (Goldfarb 1990), and Practical
SGML (van Herwijnen 1991). THE MARTIF DTD MARTIF is an SGML document type with an abstract structure that is specified
in a DTD similar to the one shown above.
Figure 3 shows the core section of the MARTIF DTD. Figure 3: Core of the MARTIF DTD
THE MARTIF DTD
MARTIF is an SGML document type with an abstract structure that is specified in a DTD similar to the one shown above.
Figure 3 shows the core section of the MARTIF DTD.
Figure 3: Core of the MARTIF DTD
This DTD shows several new features that our earlier recipe example did not have. The first line declares an ENTITY, which is essentially a nickname or shorthand that can be used for convenience later in the DTD. In this case the name of the ENTITY is AuxInfo, and it expands into the combination of elements shown at its right. Each element is also associated with an ATTLIST, which enumerates the attributes that the element can have. For example, a tig (term information group) element can have two attributes: id (a unique identifier), which may remain implicit, and lang (language of the element), which must be explicit or inherited from an enclosing element. Some of the ATTLISTs reference an ENTITY defined separately called a.global. When expanded, a.global adds an implicit lang and an implicit id to the ATTLIST of an element. Other entities are referenced in several places (e.g., nText in the content definition for termNote), but for the purposes of our discussion these can be considered to expand to plain text.
Figure 4 contains a graphical representation of the information in the MARTIF DTD. It can be roughly approximated in text as follows. A MARTIF instance consists of:
Figure 4: A graphical representation of the MARTIF DTD
Strengths of MARTIF
Importantly, MARTIF allows the representation of virtually all of the data categories present in actual terminology management systems around the world (including categories from previous interchange formats, such as MicroMATER and NTRF). That is, data can be transferred from a native environment to MARTIF with almost no information loss. The data categories used by MARTIF are taken from ISO DIS 12-620.
The scope of MARTIF for interchange
MARTIF's designers purposely chose flexibility over uniformity. The advantage of this flexibility is that MARTIF can be implemented without requiring major structural changes to existing terminology management systems, while still providing a somewhat predictable channel to achieve its goal of negotiated interchange.
However, some of the flexibility of MARTIF comes by allowing multiple representations for the same data, or by leaving certain pieces of data system-defined, or by foregoing the enforcement of logical integrity constraints. It is anticipated that various optional restrictions on and subsets of MARTIF will be defined for specific environments.
| Applications: Representation; Design; Sharing |
| ISO 12620 Data Categories | Downloads | XML info |