MARTIF - Putting Complexity in Perspective

5. Broader Interchange Environments

Contacts and discussions with colleagues working with lexicographical and NLP applications (i.e., thesauri and MT-based lexical resources) have elicited comments to the effect that MARTIF is an inappropriate interchange vehicle, i.e., that is not powerful enough for their purposes. Obviously, MARTIF does not address the interchange needs of these environments because it was designed specifically for the interchange of concept-oriented terminological data, where each entry treats a concept and all the terms associated with that concept. Lexicographical entries, on the other hand, are word rather than concept oriented. They treat a word and all its meanings. Links between the two environments would have to join a meaning node in a lexicographical entry with an individual terminology information group (i.e., a specific term) in a terminological entry. The complexity of these links would increase with the number of languages included in the databases, the number of subject fields covered, and the degree of polysemy inherent in those subject fields.

Positive experience testing the existing MARTIF format and the definition of the blind interchange levels (see above) leads to the conclusion that it does not make sense to expand the existing MARTIF format itself to accommodate these essentially different applications. Even though certain data categories are used in common in these different environments, they are frequently interpreted and used differently as a result of structural variation and the divergent objectives of the two theoretical approaches. As a result, different kinds of systems employ different data modeling conventions.

In order to coordinate data exchange between these two environments, it would be highly desirable to pursue parallel development between MARTIF and interchange formats designed for use in specific related areas and to provide linking mechanisms among these formats. In fact, the TEI work group had originally hoped to achieve this kind of linkage as a result of the work done by the lexicography group in that project. Unfortunately, the counterpart TEI lexicography group failed to resolve internal differences in their own discipline and returned two conflicting DTD fragments to the TEI central committee, at which point efforts to coordinate DTDs between terminology and lexicography were regrettably abandoned.

The first prerequisite for a renewed attempt to coordinate between terminological and lexicographical interchange formats will require that a comparable lexicography group develop a format that is based on the general SGML approach and that reflects the level of sophistication that MARTIF has reached over the years that it has been under development. Once this requirement has been met, it will be possible to design an integrated framework within which the exchange of information among lexicographical, terminological, and other approaches to linguistic information processing could take place. Initial steps have been taken to design such a uniform framework, and cooperation has begun with the EU-supported OTELO project, as well as with the MARCLIF (Machine-readable Conceptual and Lexicographical Interchange Format) project being conducted by the International Association for Machine Translation (IAMT).

Some critics have questioned the idea of using SGML as a language for expressing terminological data structures unless the SGML DTD is accompanied by a conceptual data model. Although MARTIF was originally developed without using this methodology, there is a commercial endeavor [CMR-TermSoft RELTEF™] to develop a relational database that parallels MARTIF, and this relational database is designed according to a conceptual data model consisting of an entity-relationship diagram. This model is designed to address issues arising in the environment of the MARCLIF framework.

