[CLS Framework]

TTT Homepage
CLS Framework
Introduction
Section map
Overview
Applications:
  ·Representation   ·Design
  ·Sharing
ISO 12620 data categories
Downloads
XML information

Copyright © 2000
Translation Research Group
Send comments to comments@ttt.org
Last updated: January 27, 2001

The CLS Framework: Negotiated Sharing


Introduction to ISO 12200 (negotiated MARTIF)

MARTIF is a format to facilitate the interchange of terminological data among terminology management systems. This format is the result of several years of intense international collaboration among terminologists and database experts from various organizations, including academic institutions, the Text Encoding Initiative (TEI), and the Localisation Industry Standards Association (LISA). MARTIF, also know as ISO 12200, is associated with ISO Technical Committee 37.

The identified purpose of MARTIF is to establish:

...a universally applicable format...for the negotiated interchange of structured terminological data among various applications, system environments, and hardware platforms. It is primarily designed for use with terminological data that can be stored, read, retrieved, and manipulated by a computer. (ISO DIS 12200.2:1)

To achieve the platform-independence and flexibility that are central to its purpose, MARTIF embeds terminological data in a data category structure formally defined with a specialized declarative computer programming language called SGML (Standard Generalized Markup Language) (ISO 8879:1986). The data category structure for MARTIF is found in an SGML program called a DTD (Document Type Definition). A DTD enumerates the logical pieces of data that are permissible in a document of a given type and how those data items may be combined. DTDs can be written for many document types. One familiar example is the DTD that defines HTML (Hyper Text Markup Language), the document type used on the World Wide Web.

Before examining the MARTIF DTD, which is fairly complex, consider how a DTD could be used to define the structure of a recipe such as we might find in an ordinary cookbook.

If you are already familiar with DTDs, please skip to the section headed the MARTIF DTD.

A simple DTD for a recipe might specify that each Recipe must consist of a RecipeTitle, followed by an IngredientList (which consists of one or more IngredientItems), followed by an InstructionStatement. The formal SGML expression of this logical structure in a DTD might look like this (Figure 1):

Figure 1: A simple recipe DTD

<!ELEMENT Recipe

<!ELEMENT RecipeTitle
<!ELEMENT IngredientList
<!ELEMENT IngredientItem
<!ELEMENT InstructionStatement
- - (RecipeTitle, IngredientList,
     InstructionStatement)
- - (#CDATA)+
- - (IngredientItem)+
- - (#CDATA)+
- - (#CDATA)+
>
>
>
>
>

In this example, each part of a recipe is introduced by the ELEMENT key word. The name of the element is then given, followed by a declaration of how that element must be delimited in the running text of the document. For all the recipe elements, this delimiter specification consists of two hyphens, which means that both the beginning and end of each element must be explicitly tagged. Document elements can be defined to allow an implicit start or end (e.g., the IngredientList might be allowed to end implicitly when the InstructionStatement starts), in which case either the first or the second hyphen would be replaced by an 'o' (for "omitable"). To tag the start of an element, the element's name appears in the running text of the document, enclosed in angle brackets (); to tag the end of the element, the same mechanism is used, except that a forward slash precedes the name of the element (). The content of any given element consists of everything between its start and end tags.

Following the specification of delimiters for each element, a content model details the internal structure of each element. Content may be any combination of other defined elements or atomic data types (such as #CDATA, which is unparsed text), joined by operators such as the comma [,] (forces the specified sequence), the plus [+] (1 or more), the star [*] (0 or more), the vertical bar [|] (logical OR), and the question mark [?] (0 or 1). Other SGML operators are used, but are not vital to our discussion here.

Once defined, a DTD serves as a kind of blueprint for the construction of its named document type. We might use the sample recipe DTD to construct the following SGML document from a recipe on a Campbell's Soup can (Figure 2):

Figure 2: A recipe represented as an SGML document

specifies DTD used <!DOCTYPE Recipe SYSTEM "recipe.dtd">
beginning of Recipe <Recipe>
Recipe Title <RecipeTitle>Mexicali Dip
</RecipeTitle>
IngredientList,
consisting of
multiple
IngredientItems
<IngredientList>
<IngredientItem>1 can Bean with Bacon Soup</IngredientItem>
<IngredientItem>1/2 cup salsa</IngredientItem>
</IngredientList>
InstructionStatement <InstructionStatement>In saucepan, combine soup, cheese, and salsa. Over medium heat, heat through, stirring often. Serve with tortilla chips for dipping</InstructionStatement>
end of Recipe </Recipe>

One important advantage of using SGML to define a document type is that associated documents can be compared to the DTD by SGML parser programs to validate the internal structure of the documents. In this way many kinds of integrity and format constraints can be strictly enforced, which allows a high degree of standardization within a document family.

In practice, most DTDs are considerably more complex than this one, but the example gives a general idea of how structure is represented. For a more thorough discussion of DTD syntax, consult standard reference works such as ISO 8879:1986 (the formal definition of SGML), The SGML Handbook (Goldfarb 1990), and Practical SGML (van Herwijnen 1991).

THE MARTIF DTD

MARTIF is an SGML document type with an abstract structure that is specified in a DTD similar to the one shown above.

Figure 3 shows the core section of the MARTIF DTD.

Figure 3: Core of the MARTIF DTD

<!ENTITY %AuxInfo


<!ELEMENT   body

<!ELEMENT   termEntry
<!ATTLIST     termEntry


<!ELEMENT   tig

<!ATTLIST     tig


<!ELEMENT   ntig
<!ATTLIST     ntig


<!ELEMENT   termGrp

<!ATTLIST     termGrp


<!ELEMENT   termNoteGrp
<!ATTLIST     termNoteGrp

<!ELEMENT   descripGrp
<!ATTLIST     descripGrp

<!ELEMENT   adminGrp
<!ATTLIST     adminGrp

<!ELEMENT   term
<!ATTLIST     term


<!ELEMENT   termNote
<!ATTLIST     termNote


<!ELEMENT   descrip
<!ATTLIST     descrip


<!ELEMENT   admin
<!ATTLIST     admin



- -

- -



- -




- -



- -




- -


- -


- -


- -



- -



- -



- -
'descrip | descripGrp | admin |
adminGrp | ptr | ref | note'

(termEntry+)

((%AuxInfo;)*, (tig | ntig)+)
%a.global;
type CDATA #IMPLIED

(term, (termNote)*,
(descrip | admin | ptr | ref | note)*)
id ID #IMPLIED
lang CDATA #REQUIRED

(termGrp, (%AuxInfor;)*)
id ID #IMPLIED
lang CDATA #REQUIRED

(term, (termNote | termNoteGrp | ptr |
ref | note)*)
%a.global;
type CDATA #IMPLIED

(termNote, (ptr | ref | note)*)
%a.global;

(descrip, (ptr | ref | note)*)
%a.global;

(admin, (ptr | ref | note)*)
%a.global;

(%bText;)
%a.global;
type CDATA #IMPLIED

(%nText;)
%a.global;
type CDATA #IMPLIED

(%dText;)
%a.global;
type CDATA #IMPLIED

(%bText;)
%a.global;
type CDATA #IMPLIED


>

>

>

>


>

>

>

>


>

>

>
>

>
>

>
>

>

>

>

>

>

>

>

>

This DTD shows several new features that our earlier recipe example did not have. The first line declares an ENTITY, which is essentially a nickname or shorthand that can be used for convenience later in the DTD. In this case the name of the ENTITY is AuxInfo, and it expands into the combination of elements shown at its right. Each element is also associated with an ATTLIST, which enumerates the attributes that the element can have. For example, a tig (term information group) element can have two attributes: id (a unique identifier), which may remain implicit, and lang (language of the element), which must be explicit or inherited from an enclosing element. Some of the ATTLISTs reference an ENTITY defined separately called a.global. When expanded, a.global adds an implicit lang and an implicit id to the ATTLIST of an element. Other entities are referenced in several places (e.g., nText in the content definition for termNote), but for the purposes of our discussion these can be considered to expand to plain text.

Figure 4 contains a graphical representation of the information in the MARTIF DTD. It can be roughly approximated in text as follows. A MARTIF instance consists of:

Figure 4: A graphical representation of the MARTIF DTD

Graphical representation of the MARTIF DTD

Strengths of MARTIF

Importantly, MARTIF allows the representation of virtually all of the data categories present in actual terminology management systems around the world (including categories from previous interchange formats, such as MicroMATER and NTRF). That is, data can be transferred from a native environment to MARTIF with almost no information loss. The data categories used by MARTIF are taken from ISO DIS 12-620.

The scope of MARTIF for interchange

MARTIF's designers purposely chose flexibility over uniformity. The advantage of this flexibility is that MARTIF can be implemented without requiring major structural changes to existing terminology management systems, while still providing a somewhat predictable channel to achieve its goal of negotiated interchange.

However, some of the flexibility of MARTIF comes by allowing multiple representations for the same data, or by leaving certain pieces of data system-defined, or by foregoing the enforcement of logical integrity constraints. It is anticipated that various optional restrictions on and subsets of MARTIF will be defined for specific environments.

 

| Return to ttt homepage | Introduction | Section map | Overview |
| Applications: Representation; Design; Sharing |
| ISO 12620 Data Categories | Downloads | XML info |