1 World Wide Web Consortium / MIT Laboratory for Computer Science
cmsmcq@acm.org
2 University of Bergen / Humanistic Information Technology Centre
Claus.Huitfeldt@hit.uib.no
[[This paper was published in DDEP-PODDP 2000, ed. P. King and E.V. Munson, Lecture Notes in Computer Science 2023 (Berlin: Springer, 2004), pp. 139-160. The version given here differs in some minor ways from the published volume. It was generated from a pre-final version of the text, so it may differ in some details from the final publication. Footnote and bibliographic references are styled slightly differently. Post-publication additions and corrections are enclosed in double square brackets.]]
<lg> <l>Summer grass --</l> <l>all that's left</l> <l>of warriors' dreams.</l> </lg>
<p><s><del>Der Anblick</del> <add>Das Bild </add><del>der</del> <add>einer</add> menschlichen Gestalt sowie die menschliche Gestalt selbst sind uns wohlvertraute Gegenstände.</s> <s>Von einem Wiedererkennen aber ist hier keine Rede.</s></p>The start-tags mark the beginning of a passage characterized by some feature (e.g. being a paragraph, or having been deleted), the matching end-tags mark the end of the passage. In a markup language with conventional interpretation, the label on each node in the tree identifies some feature possessed by the subtree it dominates. Sometimes the feature is possessed by the subtree as a unit, as with the p element here; sometimes it is inherited by each part of the subtree, as with the del and add elements here.
.
In English:AASE: Peer, du lyver!PEER GYNT : Nej, jeg gjør ej!AASE: Naa, saa band paa, det er sant!PEER GYNT: Hvorfor bande?AASE: Tvi, du tør ej!
Alt ihob er Tøv og Tant!
AASE: Peer, you're lying!PEER GYNT : No, I'm not!AASE: Well then, swear to me it's true.PEER GYNT: Swear? why should I?AASE: See, you dare not!
Every word of it's a lie.
<sp who='Aase'> Peer, you're lying! </sp> <sp who='Peer'> No, I'm not! </sp> <sp who='Aase'> Well then, swear to me it's true! </sp> <sp who='Peer'> Swear? why should I? </sp> <sp who='Aase'> See, you dare not! Every word of it's a lie! </sp>
<l> Peer, you're lying! No, I'm not! </l> <l> Well then, swear to me it's true! </l> <l> Swear? why should I? See, you dare not! </l> <l> Every word of it's a lie! </l>
the metrical structure in Fig. 4.
<l> <sp who='Aase'> Peer, you're lying! </sp> <sp who='Peer'> No, I'm not! </sp> </l> <l> <sp who='Aase'> Well then, swear to me it's true! </sp> </l> <l> <sp who='Peer'> Swear? why should I? </sp> <sp who='Aase'> See, you dare not! </l> <l> Every word of it's a lie! </l> </sp>The failure of the elements to nest properly makes it impossible to create an SGML document type definition (DTD) for the document, and thus impossible to use SGML or XML tools to process it.
<div1 type="act"> <(D,V)div2 type="scene"> <(V)l> <(D)sp who='Aase'> Peer, you're lying! </(D)sp> <(D)sp who='Peer'> No, I'm not! </(D)sp> </(V)l> <(V)l> <(D)sp who='Aase'> Well then, swear to me it's true! </(D)sp> </(V)l> <(V)l> <(D)sp who='Peer'> Swear? why should I? </(D)sp> <(D)sp who='Aase'> See, you dare not! </(V)l> <(V)l> Every word of it's a lie!</(D)sp> </(V)l> ... </(D,V)div2> </div1>Note that each tag carries, in parentheses, the names of the root elements of the concurrent document types to which it applies; when the tag belongs to all concurrent views, the list may be exhaustive, as shown in the <div2> tags, or omitted, as shown in the <div1> tags.
<sp who='Aase'><lb n="1"/> Peer, you're lying! </sp> <sp who='Peer'> No, I'm not! </sp> <sp who='Aase'><lb n="2"/> Well then, swear to me it's true! </sp> <sp who='Peer'><lb n="3"/> Swear? why should I? </sp> <sp who='Aase'> See, you dare not! <lb n="4"/> Every word of it's a lie! </sp>
<sp who="Aase"> <l part="i">Peer, you're lying!</l> </sp> <sp who="Peer"> <stage>without stopping</stage> <l part="f">No, I'm not!</l> </sp> <sp who="Aase"> <l part="n">Well then, swear to me it's true!</l> </sp> <sp who="Peer"> <l part="i">Swear? Why should I?</l> </sp> <sp who="Aase"> <l part="f">See, you dare not!</l> <l part="n">Every word of it's a lie.</l> </sp>
<sp who="Aase"> <l id="L1a">Peer, you're lying!</l> </sp> <sp who="Peer"> <stage>without stopping</stage> <l id="L1b">No, I'm not!</l> </sp> <sp who="Aase"> <l id="L2">Well then, swear to me it's true!</l> </sp> <sp who="Peer"> <l id="L3a">Swear? Why should I?</l> </sp> <sp who="Aase"> <l id="L3b">See, you dare not!</l> <l id="L4" >Every word of it's a lie.</l> </sp>The <join> elements themselves each signal the existence of a virtual element of type <l>, created by concatenating the elements pointed at by the targets attribute:
<joinGrp result="l" targOrder="y" targType="L"> <join scope="branches" targets="L1a L1b"/> <join scope="branches" targets="L3a L3b"/> </joinGrp>
<sp/<speaker/Aase/speaker> <l/ Peer, you're lying! /sp> <sp/<speaker>Peer/speaker> No, I'm not! /l> /sp> <sp/<speaker/Aase/speaker> <l/ Well then, swear to me it's true! /l> /sp> <sp/<speaker/Peer/speaker> <l/ Swear? Why should I? /sp> <sp/<speaker/Aase/speaker> See, you dare not! /l> <l/ Every word of it's a lie! /l> /sp>
<s/<a/ John <b/ likes /a> Mary /b>/s>
Overlap can be represented by graphs that are very like trees, but in which nodes may have multiple parents./
<s/<a/ John <b/ likes /a> Mary /b>/s>we will process, in turn, eleven segments:
There are no doubly dominated nodes, so no further arcs are removed. After adding the character-data segment for " Mary ", processing the end-tag for the <b> element does not cause any arcs to be removed. After processing the empty string between the <b> and <s> end-tags, the graph is shown in Fig. 11.
At this point, we remove the direct arcs from s to the leaf nodes other than the first and last. The result is shown in Fig. 12.
<a/ John <b/ likes /a> Mary /b>If any one of the regions is empty, then the overlap is spurious: the document can be rewritten without overlap, without changing the interpretation of any character of the document. Here are three examples of spurious overlap.
<a/<b/ John likes /a> Mary /b> <a/ John likes <b//a> Mary /b> <a/ John likes <b/ Mary /a>/b>
<b/<a/ John likes /a> Mary /b> <a/ John likes /a><b/ Mary /b> <a/ John likes <b/ Mary /b>/a>
1. Association for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL), and Association for Literary and Linguistic Computing (ALLC). 1994. Guidelines for Electronic Text Encoding and Interchange (TEI P3), ed. C. M. Sperberg-McQueen and Lou Burnard. Chicago, Oxford: TEI, 1994.
2. Barnard, David, Ron Hayter, Maria Karababa, George Logan, and John McFadden. 1988. “SGML-based markup for literary texts: Two problems and some solutions.” Computers and the Humanities 22: 265-276.
3. Barnard, David T., Lou Burnard, Jean-Pierre Gaspart, Lynne A. Price, C. M. Sperberg-McQueen, and Giovanni Battista Varile. 1995. “Hierarchical encoding of text: Technical problems and SGML solutions.” Computers and the Humanities 29: 211-231.
4. Bray, Tim, Jean Paoli, and C. M. Sperberg-McQueen, ed. Extensible Markup Language (XML) 1.0 [Cambridge, Mass., Sophia-Antipolis, Tokyo]: World Wide Web Consortium, 1998. [[3d ed. 2004 edited by the above with Eve Maler and François Yergeau. Available at http://www.w3.org/TR/REC-xml/]]
5. Goldfarb, Charles F. The SGML Handbook. Oxford: Clarendon Press, 1990.
6. International Organization for Standardization (ISO). 1986. ISO 8879: Information processing — Text and office systems — Standard Generalized Markup Language (SGML). [Geneva]: ISO, 1986.
7. McKelvie, D., C. Brew, and H. S. Thompson. 1998. “Using SGML as a basis for data-intensive natural language processing.” Computers and the Humanities 31: 367-388.
8. Murata, M. 1995. “File format for documents containing both logical structures and layout structures.” Electronic publishing 8: 295-317.
9. Sperberg-McQueen, C. M., and Claus Huitfeldt. 1999. “Concurrent document hierarchies in MECS and SGML.” Literary & Linguistic Computing 14.1: 29-42.
10. Sperberg-McQueen, C. M., Claus Huitfeldt, and Allen Renear. 2000. “Meaning and Interpretation of Markup.” Markup Languages Theory & Practice [forthcoming]. [[Published in 2.3 (2000): 215-234. Available at http://www.w3.org/People/cmsmcq/2000/mim.html]] Paper originally presented at ALLC/ACH 2000, Glasgow, and at Extreme Markup Languages 2000, Montréal.