Notes on Features and Tags

TEI EDW5

C. M. Sperberg-McQueen

Lou Burnard

November 1989

rev. September 1991

Abstract

This paper defines a number of basic concepts for the design of markup languages, notably tag, tag name, and feature; some basic characteristics of markup-language syntaxes are also described. Two concepts are suggested for the classification of textual features: binding, which describes a feature's being restricted to specific contexts (as the title page of a document is restricted to appearance within the front matter), and structure, which describe's a feature's possible decomposition into subordinate parts. Finally, the paper proposes a structure for a database (or SGML document) containing information about tags or other markup constructs.


1. Introduction

A marked-up document is one in which specific textual features are identified by tags or other mechanisms. TEI working committees are charged with proposing lists of such textual features and the tags used to specify them. When comparing such lists it will be essential to group similar features, independent of the way in which they are identified, both to avoid redundancy and to make possible a tighter focus on related but distinct areas of concern. Use of a simple taxonomy of features will allow one both to assess the likelihood that two tags, purportedly for the same textual feature, also have the same semantics, and to focus on differences among tags grouped together by the taxonomy.
This paper therefore proposes a simple model for classifying textual features and their associated tags based on two structural characteristics of all textual features: their content and their location. It begins by describing the various possible relationships among tags, features, and syntaxes, continues with a theory of textual features, and concludes with a design for a database to hold information about specific tags in existing or new markup languages.

2. Definitions

We begin with the distinction between the tag and the feature it denotes. The Formex tag pic, for example, and the ISO “starter set” tag artwork are both used to signal an occurrence of the same textual feature, namely that at this position in the document a graphic document element is to be located.[1] A page formatter might leave space on the page for the artwork; a galley formatter might print a marginal note calling attention to the expected artwork; a text database might embed a special symbol indicating that the original document had material not included in the database. The Formex tag id and the starter set tag docnum similarly mark the same textual feature.
Note that a tag must refer to a feature, but a feature need not be tagged. The same feature can be denoted by many tags. Thus in the ISO starter set, the tag h1 is used to identify occurrences of the feature chapter. In other tag sets the same feature might be tagged with chapter, Kapitel, chapitre, or caput. We use the term feature precisely in order to stress what would be common among all these tags and ignore what varies (the name). Users of the TEI Guidelines may wish to abbreviate tag names or render them into another language. The TEI must provide mechanisms for such substitutions which allow the tag names to be changed without altering either the definition of the features themselves or the syntax which governs their occurrences.
More formally, we define
feature
a characteristic of some segment of text or of some location in a text, considered independently of any name used to denote it. In SGML terms, a segment of text possessing the feature may be tagged as a document element, though the presence of the feature may also be indicated by other SGML constructs.
tag
a component of some markup language used to identify an occurrence of a specific textual feature. A tag has both a tag name and a textual feature associated with it.
tag name
the string of characters used to identify a tag, considered as a string of characters and independent both of any particular instance of the tag used to mark the feature and of the feature itself. In SGML terms, the tag name is the generic identifier of an element.
We also identify sets (that is, unordered collections) of features, tags and names as follows:
feature set
a set of features, considered independently of the names used to denote them
tag set
a set of tags, i.e. features together with the names used to denote them in any one markup scheme, considered independently of any syntax defining their allowable interrelationships
name set
a set of tag names (e.g. all those related to a given tag in different markup schemes, or all those used in a given DTD)
To define the allowable relationships among tags and features, document markup languages define syntaxes for the tags of the language. The formal syntax is necessarily defined in terms of the tag names of the language; the semantics of the tags (their links to specific features) are not usually specified formally. If the distinction between tag and tag name is rigorously maintained, then, formal syntaxes like DTDs must be viewed strictly speaking as governing only tag names, not tags. If we wish to speak of rules governing tags rather than tag names, we must include both the DTD and the semantic specifications of the markup scheme. We make this distinction in what follows, without wishing to insist that this degree of rigor is always necessary. We define:
feature syntax
a set of rules specifying the allowable combinations of feature occurrences within a given type of text, considered independently of the tag names used to identify them
tag syntax
a set of rules defining the allowable sequences of tags and textual data in a markup language
DTD (document type definition)
a tag syntax defined according to the rules of SGML
From any document type definition, one may derive a tag set, which is precisely all and only the tags used in the DTD.[2] The tag syntax of a DTD is expressed by the content models it contains. If we modify a DTD by changing the syntax rules expressed in its content models, the tag set with its associated name set and feature set remain unchanged. The tag set, specifically, is what remains constant when the syntax is changed without introducing new tags or deleting old ones.
We may change a DTD by translating all the tag names into another language (as is done with the tag names for the ISO starter set in ISO TR 9573, for example). The tag syntax remains the same, the features described remain the same; the DTD changes only because the names used to define the syntax change. The term feature syntax is coined to denote what does not change if one translates all the tag names into another vocabulary.

3. Formal Relationships among Features, Tags, and Syntaxes

Each tag corresponds with exactly one feature and one tag name, although each feature may have many tag names and each name may be used in different markup languages for many different features.[3] Each feature, tag, and tag name can appear in many feature sets, tag sets, or name sets respectively. Each set, of course, can contain many individual features, tags, or names.
A given tag syntax corresponds to exactly one name set (viz., the set of tag names used to define the tag syntax); in conjunction with the semantic rules of a language, a tag syntax also corresponds to exactly one tag set. Similarly a given feature syntax corresponds to exactly one feature set. Many tag syntaxes may however be defined for any given name set or tag set, just as many feature syntaxes may be postulated for any given feature set.
These relationships are summarized in the following diagram, in which the lines indicate meaningful relationships and the arrowheads their degree. A single headed arrow indicates that just one item of the type indicated is involved in the relationship; a double headed arrow that more than one item of the type indicated may be involved.
Among the syntaxes corresponding to any given set of names or features, several classes may usefully be distinguished. At one extreme are syntaxes using the full power of SGML DTDs to restrict the legal sequences of tags and data in a document. We call this the class of fully restrictive syntaxes. At the other extreme is a syntax which places no restrictions on the combination of tags in a document except that they must properly nest and that each tag must have an end tag. We call this a Waterloo syntax. In SGML terms, a Waterloo syntax is derivable from any more restrictive DTD by
  • replacing the “omitted tag minimization” for each element with “- -” to prohibit omission of any tags, and
  • replacing the “declared content” or “content model” for each element with the keyword “ANY”, which allows any combination of data and properly nested tags to occur within the element.[4]
Each element declaration, that is, must look like this:
    <mdecl>ELEMENT tagname - - ANY</mdecl>
Intermediate classes of syntaxes may be defined, including e.g.
  • the class of syntaxes which define element contents using only the declared content “EMPTY”, the content model “(#PCDATA)”, or the content model “ANY”, or
  • those which do not use inclusion or exclusion exceptions.
Syntaxes which specify rules of greater expressive power than those provided by SGML may also be imagined. No further discussion of these intermediate syntaxes is provided here.

4. Classification of Textual Features

Tags are most usefully grouped according to the similarity of the features they denote. One approach might be to base a classification on semantic or functional properties of the features: features concerned with presentation, language, register, application area etc. Elaborating such a taxonomy would however be at once too complex and too controversial for our present purposes. We propose instead to classify features simply in terms of their position and use within a feature syntax, specifically the ways in which other features may nest within it (in SGML terms, its “content model”) and the places where it may itself occur (its appearance in other features' content models). We distinguish these two characteristics as the binding and the structure of the feature.

4.1. Binding

A feature may either be bound to some specific location(s) in a document or else it may float and appear anywhere within running text in the document. In the former case the feature is bound; in the latter it is unbound.[5] Some features may have one fixed location at which they are required, while retaining the ability to float and appear additionally elsewhere in the document. The feature date, for example, which may be required in the front matter of a document to show the document's date of composition, may also appear in free text wherever dates are used. Features of this third type, both bound and unbound, are termed bindable. Each may be resolved, if desired, into two distinct features, one bound and one unbound.
[Alternative terms: Anchored, floating, tethered. Moored, floating, anchored. Sedentary, migrant, and semi-nomadic. Bound, unbound, bindable. Bound, floating, semi-bound. Anchored, floating, floatable. Bound, free, and indentured.]
For SGML-based markup languages, this distinction may be made formally on the basis of content models in the DTD as follows:
  • if a tag name appears in content models only in an or group, and group, or sequence group with other tag names or with #PCDATA, then the feature denoted by that tag name is bound (in that feature syntax).
  • if a tag name appears in content models only in an alternation with the keyword #PCDATA, then the feature denoted by that tag name is unbound (in that feature syntax).
  • if a tag name appears in both types of content models (i.e. both within and outside of alternations including #PCDATA), then the feature denoted by that tag name is bindable (in that feature syntax).
Classification on a purely formal basis may be inconsistent from one markup language to another. A tag date, marking calendar dates, for example, may be required (and thus bound) in a document's front matter, where it indicates the date of copyright or publication. Dates, of course, are not restricted a priori to such use: they may appear freely in running text. If such running-text dates can also be tagged date, then date will be a bindable feature; if such occurrences of dates in running text are not accommodated, then date must be classed as a bound feature. Such contradictions reveal underlying differences in the semantics of the tags and isolate a problem to be resolved in developing any new syntax.
Examples of bound features (taken from the features of Formex and the ISO starter set) are front matter, body, back matter, cell of a figure, glossary, document number, and table of contents. Examples of unbound features are numbered list (and all other list types), footnote, highlighted phrase, and reference to a figure (and all other reference types). Examples of bindable features are address (bound when in the front matter it applies to the author's address, unbound when in free text it marks other addresses, date, and title.

4.2. Internal Structure

The feature itself may be internally structured or unstructured. As a special case, we may note features whose contents have unstructured, but restricted, values; these may be called typed features, by analogy with typed variables in programming languages. Formally, these correspond to SGML content models with restricted element-only content models (structured features), content models containing an alternation including #PCDATA (unstructured features), and content models containing only #PCDATA but which could be expressed by enumeration or some other data-type definition mechanism (typed features).[6]
Examples of structured features (from the list of examples given earlier) are front matter (comprises title page, table of contents, etc., usually in a strict order), body (comprises hierarchy of chapter, sections, etc.), glossary (comprises structured list of glossary entries), numbered list and all other list types (comprises sequences of list entries), date (comprises month, day of month, and year), and address (comprises multiple address lines, or in some definitions comprises street address, city, etc.). Examples of unstructured features are cell of a figure, title, footnote, and highlighted phrase, any of which can contain free text with other elements intermingled. Examples of typed features are document number, table of contents, reference to a figure and all other reference types (like table of contents, a special case of enumerable data types: empty elements).

5. Design for a TAGS Database

5.1. Column Descriptions

The salient information about any tag in any markup language may be stored in a database with the following fields (columns) in each record (row):
tagname (internal identifier)
the tag name as used by the markup language, e.g. `tbl'
full name
the tag name expanded to a natural-language word or phrase, e.g. `table'
equivalent tags
tag names used in other languages for the same feature [7]
feature
A description in prose of the use and meaning of the tag (or: a description of the feature marked by the tag)
binding
Is the feature bound, unbound, or bindable? Only these three terms should be used. (Or: yes, no, half)
structure
Is the feature internally structured, unstructured, or typed? Only these three terms (or: yes, no, typed) should be used. If typed or structured, the constraints on the contents of the element should be given in the data description column.
data description (content)
what sort of content may occurrences of this feature have? If feature is typed or otherwise constrained, the domain of legal values for occurrences of the feature (e.g. “a code taken from ISO list of language abbreviations”). If feature is structured, specify what features may be occur within occurrences of this feature. If feature is untyped, unstructured, and unconstrained, specify either “free text” (if other features may nest within this one) or “character data” (if nested features are not allowed).
usage
Is it optional or required?
arity
Can it be repeated?
[If it can, consider whether repetitions are ordered, and if so, whether an additional feature (the ordering) is required.]
parent tags
Where can this tag appear? “Free” for unbound features, a list of legal parent tags for bound or bindable features. E.g. for author in the ISO starter set, “in titlepage”.
default content
What will be supplied if this element is omitted or included without content? E.g. in some implementations date will default to “the current date (when document is processed)”.
example
An short example, in context, of the use of the tag
Some additional fields may prove useful for database maintenance:
language
which markup language uses this tag?
source of information
short version of manual title and page number
date added to database
when was the record describing this tag created?
added by
who created this record?
date last updated
when was this record last modified?
updated by
who modified it?

5.2. SGML Tags for Tag Descriptions

These pieces of information can of course, also be expressed in an SGML syntax using the following element declarations:

<!ELEMENT tagdef        - -  (tagname,
                             fullname,
                             equivalent+,
                             description,
                             binding,
                             structure,
                             datadesc,
                             usage,
                             arity,
                             parents,
                             default,
                             example+,
                             language,
                             source,
                             added,
                             updated)                           >
<!ELEMENT equivalent    - -  (language, tagname)                >
<!ELEMENT (added,
          updated)      - -  (date, by)                         >
 
<!ELEMENT (tagname,
          fullname,
          description,
          binding,
          structure,
          datadesc,
          usage,
          arity,
          parents,
          default,
          language,
          source,
          language,
          date,
          by)           - o  (#PCDATA)                          >
<!ELEMENT example       - o  EMPTY                              >
<!ATTLIST example
          file               ENTITY              #IMPLIED       >
 

The expectation is that the file attribute of the example element would refer to a CDATA entity (for example, an external file) containing an example of the use of the tag in question. Alternatively, the tag could be defined thus:

<!ELEMENT example       - o  (#PCDATA)                          >

with the expectation that the element would contain:
  • a reference to a CDATA entity
  • a CDATA marked section
  • an example in which the SGML delimiters have been rendered by entity references, so that SGML does not recognize them as delimiters.
  • an example in a concrete syntax other than the one in use in the tag-description document itself

6. Extension of TAGS Design to Other Markup Conventions

As noted above, not all markup information is conveyed by SGML tags; some is conveyed by SGML attributes, by parameters to tags in non-SGML languages, by special symbolic names (corresponding to standard entity names in SGML) or by other conventions.
To record a broader range of markup conventions in a database we must broaden the field descriptions above by understanding them to apply to whatever convention is being recorded, not just to conventions which correspond to SGML tags. SGML attributes, for example, will always be classed as bound and the generic identifier of the elements for which they are defined will be given under parent tags.
Two further pieces of information are also required:
tag class (or native class)
What type of thing is this, in the classification used by the markup language being described? For SGML tag sets, the answers would be “tag”, “attribute”, “entity”, “short reference”, etc. For others, the preferred terms may be “command”, “parameter”, “control word”, etc.
attributes
Just as the data description of a tag should contain a list of elements which are expected to occur within it, the attributes field should contain a list of attributes defined for the tag, if the tag is from an SGML markup language, or in other cases a list of parameters (or whatever term is used).
These can be expressed in SGML using the tag tagclass, which should be inserted in the list above after fullname, and the tag attributes, which should be inserted before default.

A. Column Definitions for a TAGS Database in a DBMS

The TAGS database design described above has been implemented in Chicago in the Waterloo file management system WatFile. [8] The column names, column widths, and data types are specified in the following WatFile file definition. Data types used are “L” (character data, left-aligned) and “YMD” (date in YY/MM/DD form). The column widths are by no means sacred and need not be respected in other implementations; the database for TEI tags, especially, will need wider columns for descriptions and example. (WatFile has a system maximum of 79 characters per column.)
%WATFILE/Plus V3.5  Saved 89/12/05 17:46:27
define Tag Name         = 16   L
define Full Name        = 32   L
define Native Class     = 12   L
define Description      = 72   L
define Bound            =  5   L
define Structured       =  5   L
define Data Description = 72   L
define Usage            = 64   L
define Arity            = 64   L
define Parents          = 32   L
define Attributes       = 32   L
define Default Content  = 32   L
define Example          = 72   L
define Source           = 20   L
define AddDt            =  8   YMD
define AddBy            =  3   L
define UpdDt            =  8   YMD
define UpdBy            =  3   L
title "FORMEX SGML  89/11/09"

B. Extract from Formex Tag Database

As illustration, the tag name, full name, binding, and structure columns for each record in the Formex tag database in Chicago are given below.
Tag Name
Full Name Bound Structured
AB abstract ? no
ACCOMP accompanying material yes no
AD address yes no
AF affiliation yes yes
AN authority number yes typed
BINDING binding yes typed
BLKn document block yes yes
BODY corporate body ? yes
CCF CCF data ? yes
CLASSclassification scheme notation ? typed
COLn column heading in a table yes yes
CY country yes typed
DATE date half no
DIM dimensions of the item yes no
ED edition statement ? no
EXPL explanatory note no no
FRAGMENT page fragment yes typed
HT highlighted text no no
ID document identification number ? typed
ITM item or figure in a table yes no
LOC location yes no
MAT type of material ? typed
MEDIUM physical medium yes typed
MEETING meeting ? yes
NA name yes no
NO number of meeting yes typed
NOTE note yes no
OT other part of name yes no
PAGE pagination defining a part yes typed
PART part statement ? no
PERSON person ? yes
PHYS physical description no yes
PIC picture no typed
PIECES number of pieces and designation yes no
PRICEprice of the item yes typed
QT quotation no no
QUAL qualifier yes no
REF bibliographic reference no no
ROWn row heading in a table yes yes
SO Agency at source of record ? typed
SUBJECT description of subject ? typed
TARIFF tariff ? yes
TBL table no yes
TI title no no
WEIGHT weight yes typed

C. Sample SGML-tagged Tag Descriptions for Formex

The full database information for a few tags is given below, in SGML form.
[In the electronic form of this document, the SGML tag examples given below must remain uninterpreted. In a conformant SGML document, such uninterpreted data might either be included in an external entity declared as CDATA, within a conditional section using the CDATA keyword, or within an element whose content is declared CDATA or RCDATA. It is not clear which of these possibilities will be preferred by the TEI Metlanguage committee. The technique used here is to refer to an external file containing the material to remain uninterpreted. As a result, two files are included in the electronic distribution of this document. You are now reading one of them. The other is to be embedded immediately following this note.]
     <tagdef>
        <tagname>AB
        <fullname>abstract
        <native>tag
        <description>A brief description of the contents of the item.
        <bound>?
        <structured>no
        <datadesc>Free format.  Attribute LA.
        <usage>optional
        <arity>repeatable for each language
        <parents>?
        <attributes>LA
        <default>none
        <example>
             <ab LA = DE>Text of abstract ... </ab>
        <source>p.119 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>ACCOMP
        <fullname>accompanying material
        <native>tag
        <description>describes any material that accompanies the document
        <bound>yes
        <structured>no
        <datadesc>Free format
        <usage>optional
        <arity>not repeatable
        <parents>PHYS
        <attributes>
        <default>none
        <example>
            <ACCOMP>folding map and three diskettes</ACCOMP>
        <source>p.120 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>AD
        <fullname>address
        <native>tag
        <description>postal address of a corporate body or private address of
        a person
        <bound>yes
        <structured>no
        <datadesc>free format?
        <usage>optional
        <arity>not repeatable inside an AF, BODY, or PERSON group
        <parents>AF, BODY, PERSON
        <attributes>
        <default>none
        <example>
            <AD>5, rue du Commerce, L-2410 Luxembourg</AD>
        <source>p.121 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>AF
        <fullname>affiliation
        <native>tag
        <description>name and/or address of organization to which a person is
        associated
        <bound>yes
        <structured>yes
        <datadesc>Contains AD, AN, CY, NA (either AN or NA mandatory)
        <usage>optional
        <arity>repeatable for each person or affiliation
        <parents>PERSON
        <attributes>
        <default>none
        <example>
            <AF><NA>Office for ...<OT>New Techn....</OT></NA><AD>...</AD><AF>
        <source>p.122 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>AN
        <fullname>authority number
        <native>tag
        <description>a unique number code assigned to a corporate body or
        group
        <bound>yes
        <structured>typed
        <datadesc>use AN if there is an assigned code; else use NA and name
        <usage>optional
        <arity>not repeatable inside a group
        <parents>AF, BODY, PERSON
        <attributes>
        <default>none
        <example>(none given)
        <source>p.123 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/12/05 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>BINDING
        <fullname>binding
        <native>tag
        <description>physical binding:  stapled, hardcover, softcover ...
        <bound>yes
        <structured>typed
        <datadesc>Code taken from appendix D10
        <usage>optional
        <arity>not repeatable
        <parents>PHYS
        <attributes>
        <default>none
        <example>
            <phys> ... <binding>BR</binding> ... </phys>
        <source>p.124 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>BLKn
        <fullname>document block
        <native>tag
        <description>block of text of the document itself, exclusive of
        secondary information
        <bound>yes
        <structured>yes
        <datadesc>free format; doc body divided into nested blocks.  Attr. ID,
        LA, TYPE
        <usage>optional
        <arity>repeatable
        <parents>BLK(n-1)
        <attributes>ID, LA, TYPE
        <default>none
        <example>
            <BLK1><TI>Commission Regulation ...</TI><BLK2 TYPE=preamble>The ...
        <source>p.125 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>BODY
        <fullname>corporate body
        <native>tag
        <description>corp. body responsible for the work
        <bound>?
        <structured>yes
        <datadesc>Contains AD, AN, CY, LOC, NA (AN or NA is mandatory).  Attr:
        RESP, ROLE
        <usage>mandatory when applicable
        <arity>repeatable for each corporate body
        <parents>free
        <attributes>RESP, ROLE
        <default>none
        <example>
            <BODY resp=1 role=260><NA>Office for Official ...</NA> ...</BODY>
        <source>p.126 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>CCF
        <fullname>CCF data
        <native>tag
        <description>Delimits purely CCF data to be ignored by SGML
        <bound>?
        <structured>yes
        <datadesc>CCF data
        <usage>optional
        <arity>repeatable
        <parents>free
        <attributes>
        <default>none
        <example>(none given)
        <source>p.127 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>CLASS
        <fullname>classification scheme notation
        <native>tag
        <description>notation assigned to item according to classification
        scheme
        <bound>?
        <structured>typed
        <datadesc>use notation defined by classification scheme
        <usage>optional
        <arity>repeatable for each classification scheme
        <parents>?
        <attributes>SCHEME
        <default>none
        <example>
            <class scheme='Dewey 1982'>812.23</class>
        <source>p.128 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>COLn
        <fullname>column heading in a table
        <native>tag
        <description>column heading in a table
        <bound>yes
        <structured>yes
        <datadesc>free format + seq. of lower-ranked columns.  Attr:  ID,
        DATA, LA
        <usage>mandatory
        <arity>repeatable
        <parents>TBL
        <attributes>ID, DATA, LA
        <default>none
        <example>
            <TBL>...<COL1>Country<COL1>Direct costs<COL2>Total ... </TBL>
        <source>p.129 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>CURRENCY
        <fullname>currency
        <native>attribute
        <description>the currency unit in which the amount is expressed
        <bound>yes
        <structured>yes
        <datadesc>a code taken from Appendix D11
        <usage>mandatory
        <arity>not repeatable
        <parents>PRICE
        <attributes>
        <default>ECU
        <example>
            <PRICE currency=esc>50</PRICE>
        <source>p.153 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/12/05 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>CY
        <fullname>country
        <native>tag
        <description>country where person, corporate body, or meeting is
        situated
        <bound>yes
        <structured>typed
        <datadesc>a code taken from Appendix D2, or free form.  Attr:  STD
        <usage>optional
        <arity>not repeatable
        <parents>AF, BODY, MEETING
        <attributes>STD
        <default>none
        <example>
            <BODY><NA>Office ...</NA><AD>5, rue du ...</AD><CY>LU</CY></BODY>
        <source>p.130 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/12/05 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>DATA
        <fullname>Type of data in column
        <native>attribute
        <description>nature of data in the column (data-type) -- will guide
        presentation
        <bound>yes
        <structured>no
        <datadesc>short name describing data:  integer, decimal number,
        string, code ...
        <usage>optional
        <arity>not repeatable
        <parents>COLn
        <attributes>
        <default>data-type of next higher level
        <example>(none given)
        <source>p.129 Formex Manual
        <added> <date>89/11/30 <by>MSM </added>
        <updated> <date>89/12/05 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>DATE
        <fullname>date
        <native>tag
        <description>calendar date (of publication or other event)
        <bound>half
        <structured>no
        <datadesc>ISO std form, or free form.  Attributes:  LA, STD, TYPE
        <usage>mandatory
        <arity>repeatable for different dates
        <parents>MEETING
        <attributes>LA, STD, TYPE
        <default>none
        <example>
            <date>19850000</date><date man>19850501</date>
        <source>p.131 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/12/05 <by>MSM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>DIM
        <fullname>dimensions of the item
        <native>tag
        <description>size in centimeters
        <bound>yes
        <structured>no
        <datadesc>customary to enter only height or height times width
        <usage>optional
        <arity>not repeatable
        <parents>PHYS
        <attributes>
        <default>none
        <example>
            <PHYS LA=EN><MEDIUM> ... </> ... <DIM>23 cm.</DIM> ... </PHYS>
        <source>p.132 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>VAM </updated>
     </tagdef>
 
     <tagdef>
        <tagname>ED
        <fullname>edition statement
        <native>tag
        <description>edition number and/or identifier for monograph or
        collection
        <bound>?
        <structured>no
        <datadesc>As given in the item, abbreviating; incl. word 'edition'.
        Attr: LA
        <usage>mandatory
        <arity>repeatable for multiple or parallel edition statements
        <parents>free?
        <attributes>LA
        <default>none
        <example>
            <ed la=en>Braille edition</ed>
        <source>p.133 Formex Manual
        <added> <date>89/11/13 <by>VAM </added>
        <updated> <date>89/11/30 <by>MSM </updated>
     </tagdef>

Notes

[1] For Formex, see C. Guittet, ed., Formex: Formalized Exchange of Electronic Publications (Luxembourg: Office for Official Publications of the European Communities, `New Technologies -- Project Management' Department, 1984; for the starter set see ISO 8879-1986 Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML) ([n.p.]: ISO, 1986), Annex E.1, pp. 136-139. The starter set is discussed and various national-language versions defined in ISO technical report ISO/TR 9573-1988(E) Information processing--SGML support facilities--Techniques for using SGML.
[2] More rigorously, the DTD directly determines only a name set: the set of tag names declared in the DTD. Taken in conjunction with the semantic rules of the encoding scheme, however, the name set uniquely determines a tag set, so within the context of a specific markup language, a DTD does uniquely determine a tag set.
[3] Examples of synonymous tags have already been given. Use of the same tag name for different features is rarer but does occur. Formex, for example, uses a BODY tag to mark names of corporate bodies; the ISO starter set and other tag sets use the same tag name to mark the main body of a document. In our terms, Formex and the starter set are using the same tag name but not the same tag, because they assign different features to it.
[4] This process may be referred to informally as tompering with the DTD.
[5] In a tompered feature syntax, all features are unbound.
[6] In document TEI TRR7, Steven DeRose proposes the term crystals for structured features, but as he uses it the term appears also to apply to some typed features.
[7] In a normal-form relational database, this information should go into a separate table with columns for feature name (with some canonical name for the feature), markup language, tag name, and comments. In a hierarchical database, it should go either in a separate file with fields as just given, or in a repeating structure comprising fields for markup language, tag name, and comments.
[8] Terry Wilkinson, WATFILE/Plus Data Manipulation System Tutorial and Reference, Third Edition 1987 (Waterloo, Ont.: WATCOM Publications Limited, 1987). The program itself was written by T.G. Galvin, S.G. McDowell, and T.A. Wilkinson.