Translation, Theory and Technology Homepage

[CLS Framework]

CLS Framework
Introduction
Section map
Overview
Applications:
·Representation ·Design
·Sharing
ISO 12620 data categories
Downloads
XML information

Applications of the CLS Framework: Sharing

Terminological data may be shared for many different reasons and the requirements for sharing data vary by the type of sharing. The least demanding type of sharing is making terminological data available for viewing by a human. For even this task a table of correspondence should be established between the data categories of the term base and the data categories of ISO 12620.

Introduction

Since it will never be the case that every user group in the language industries will accept the same monolithic format for data sharing, a method of sharing needs to include in the interchange standard a convenient method of specifying subsets of the standard that captures what the subsets have in common. There are several reasons for this. One is that industry is very practical and do not want to program for unused possibilities.

For example, take the localization industry, as represented by LISA (Localization Industry Standards Association), which includes most of the major requesters and suppliers of translation in the high-tech world. At one of their last conferences held in Madrid in August 1998 the question of terminology interchange was brought up at the OSCAR meeting (their data exchange standards creation body). It was clear that the companies represented do not want an overly complex interchange format that includes many elements not used in their current termbases. It was also clear that the idea of a simple format that is a subset of a more extensive framework was appealing to them, so that they do not "paint themselves into a corner".

Negotiated versus blind interchange

An important distinction for the purposes of terminological sharing is between "blind" and "negotiated" formats. Essentially, a blind format is one in which the results of negotiating various details is expressed in a formalism. In other words, a blind format is defined using a method of formally expressing the results of negotiation. Here is a discussion that further explains the distinction in terms of negotiated vs. blind interchange of terminological data.

Negotiated interchange involves a negotiation between two or more partners who agree on the use of some framework for interchange. The partners negotiate among themselves until they reach an agreement on the details of applying the interchange framework to their particular interchange needs so that they can write conversion routines between their terminological databases (termbases). In "blind" interchange, one or more parties define a formal specification of an intermediate format so that anyone who accepts the formal specification can export to it and import from it even if they were not part of the original definition of the intermediate format. A blind interchange format can also be used to disseminate terminological data to various parties. Clearly, there could be a relationship between negotiated interchange and blind interchange. Suppose that several blind interchange formats are defined within one framework for negotiated interchange. Then those blind interchange formats could be considered to be "pre-negotiated" formats defined using some formalism. Thus, when associated with a framework for negotiated interchange, "blind interchange" could be called "formally specified, pre-negotiated interchange".

The hallmark of any useable intermediate format, including a blind format, is that the structure and content are sufficiently predictable to allow automatic processing of the data without knowing who created a particular intermediate file and without knowing what termbase the data came from.

Preparing data for sharing

The following steps are illustrated with an example using sample data from Oracle Corporation in a separate file.

Sharing terminological data involves the following three subtasks:

identifying the data categories used
identifying the structure of the terminological data, see Representation for information on this aspect.
mapping the data to an agreed upon intermediate format. This standard selects data categories from ISO 12620 and uses a structural description based on ISO 12200. It describes a formalism for defining "families" of intermediate formats.

Sections containing further information on negotiated interchange using MARTIF and blind interchange using Blind MARTIF are also available.

Applications of the CLS Framework: Sharing

Links:

Introduction

Negotiated versus blind interchange

Preparing data for sharing