SGML/XML and corpora |
SGML |
XML |
Who is the W3C? |
W3C goals and operating principles |
Markup |
Markup languages, SGML, XML |
1875 PEER GYNT by Henrik Ibsen THE CHARACTERS ASE, a peasant's widow. PEER GYNT, her son. TWO OLD WOMEN with corn-sacks. ASLAK, a smith. WEDDING-GUESTS. A MASTER-COOK, A FIDDLER, etc. A MAN AND WIFE, newcomers to the district. SOLVEIG and LITTLE HELGA, their daughters. THE FARMER AT HEGSTAD. INGRID, his daughter. THE BRIDEGROOM and His PARENTS. THREE SAETER-GIRLS. A GREEN-CLAD WOMAN. THE OLD MAN OF THE DOVRE. A TROLL-COURTIER. SEVERAL OTHERS. TROLL-MAIDENS and TROLL-URCHINS. A COUPLE OF WITCHES. BROWNIES, NIXIES, GNOMES, etc. AN UGLY BRAT. A VOICE IN THE DARKNESS. BIRD-CRIES. KARI, a cottar's wife. Master COTTON, Monsieur BALLON, Herren VON EBERKOPF and TRUMPETERSTRALE, gentlemen on their travels. A THIEF and A RECEIVER. ANITRA, daughter of a Bedouin chief. ARABS, FEMALE SLAVES, DANCING-GIRLS, etc. THE MEMNON-STATUE (singing). THE SPHINX AT GIZEH (muta persona). PROFESSOR BEGRIFFENFELDT, Dr. Phil., director of the madhouse at Cairo. HUHU, a language-reformer from the coast of Malabar. HUSSEIN, an eastern Minister. A FELLAH, with a royal mummy. SEVERAL MADMEN, with their KEEPERS. A NORWEGIAN SKIPPER and HIS CREW. A STRANGE PASSENGER. A PASTOR. A FUNERAL-PARTY. A PARISH-OFFICER. A BUTTON-MOULDER. A LEAN PERSON. The action, which opens in the beginning of the nineteenth century, and ends around the 1860's, takes place partly in Gudbrandsdalen, and on the mountains around it, partly on the coast of Morocco, in the desert of Sahara, in a madhouse at Cairo, at sea, etc. ACT FIRST SCENE FIRST [A wooded hillside near ASE's farm. A river rushes down the slope. On the further side of it an old mill shed. It is a hot day in summer.] [PEER GYNT, a strongly-built youth of twenty, comes down the pathway. His mother, ASE, a small, slightly built woman, follows him, scolding angrily.] ASE Peer, you're lying! PEER [without stopping]. No, I am not! ASE Well then, swear that it is true! PEER Swear? Why should I? ASE See, you dare not! It's a lie from first to last.
Further examples |
|s001 |l001 ich sâz ûf eime steine |l002 und dâhte bein mit beine |l003 dar ûf sazt ich den ellenbogenor
|b001 |l001a Hw*a/et, we GAR-DENA |l001b in geardagum t*rym gefrunnon |l002a Hu t*a *a/ed*elingas |l002b ellen fremedon
Annotation |
S0CF6003 v [S [N TROUBLED_JJ [ morning_NNT1 television_NN1 ] station_NN1 GMTV_NP1 N] finally_RR [V had_VHD [N something_PN1 [Ti to_TO smile_VVI [P about_II P]Ti]N][Nr last_MD night_NNT1 [Fr when_RRQ [N it_PPH1 N][V was_VBDZ revealed_VVN [Fn[N it_PPH1 N][V gained_VVD [N an_AT1 extra_JJ million_NNO viewers_NN2 N][P over_II [N the_AT last_MD two_MC weeks_NNT2 N]P]V]Fn]V]Fr]Nr]V] ._YSTP S]
Markup languages |
Greetings example |
<!DOCTYPE greetings [ <!ELEMENT greetings (hello+) > <!ELEMENT hello (#PCDATA) > <!ATTLIST hello
lang CDATA #IMPLIED > <!ENTITY szlig "ß" > <!ENTITY uuml "ü" > ]>
<greetings>
<hello lang="en">Hello, world!</hello>
<hello lang="fr">Bon jour, tout le monde!</hello>
<hello lang="no">Goddag!</hello>
<hello lang="de">Guten Tag!</hello>
<hello lang="de-franken">Grüß Gott!</hello>
</greetings>
The XML landscape: applications |
The XML landscape: related specs |
Document grammars (DGs) |
The uses of DGs |
DTDs as DGs |
A document grammar |
poem ::= limerick | canzone limerick ::= trimeter trimeter dimeter dimeter trimeter trimeter ::= CHAR+ dimeter ::= CHAR+ canzone ::= aufgesang abgesang aufgesang ::= stollen stollen stollen ::= line+ abgesang ::= line+
A DTD |
<!ELEMENT poem (limerick | canzone) > <!ELEMENT limerick (trimeter, trimeter, dimeter, dimeter, trimeter)> <!ELEMENT trimeter (#PCDATA)> <!ELEMENT dimeter (#PCDATA)> <!ELEMENT canzone (aufgesang, abgesang) > <!ELEMENT aufgesang (stollen, stollen) > <!ELEMENT stollen (l+) > <!ELEMENT abgesang (l+) > <!ELEMENT l (#PCDATA) >
A limerick |
<limerick> <trimeter> There was a young lady named Bright </trimeter> <trimeter> whose speed was much faster than light. </trimeter> <dimeter>She set out one day,</dimeter> <dimeter>in a relative way,</dimeter> <trimeter> and returned on the previous night. </trimeter> </limerick>
A canzone |
<canzone> <aufgesang> <stollen> <l>unter den linden an der heide</l> <l>da unser zweier bette was</l> </stollen> <stollen> <l>da mugt ir vinden schone beide</l> <l>gebrochen bluomen unde gras</l> </stollen> </aufgesang> <abgesang> <l>kuste er mich? wol tusentstunt</l> <l>tandaradei</l> <l>seht wie rot mir ist der munt</l> </abgesang> </canzone>
Note on the canzone DTD |
Removing non-terminals |
<!ENTITY % aufgesang "stollen, stollen" > <!ENTITY % lines "l+" > <!ELEMENT canzone (%aufgesang;, abgesang) > <!ELEMENT stollen (%lines;) > <!ELEMENT abgesang (%lines;) > <!ELEMENT l (#PCDATA) >
The canzone minus explicit Aufgesang |
<canzone> <stollen> <l>unter den linden an der heide</l> <l>da unser zweier bette was</l> </stollen> <stollen> <l>da mugt ir vinden schone beide</l> <l>gebrochen bluomen unde gras</l> </stollen> <abgesang> <l>kuste er mich? wol tusentstunt</l> <l>tandaradei</l> <l>seht wie rot mir ist der munt</l> </abgesang> </canzone>
XML Schema |
The canzone schema v.1 |
<xsd:schema> <!--* element declarations go here *--> </xsd:schema>N.B. the schema does not identify a document-root element / start symbol.
Declaring elements |
<xsd:element name="canzone"> <xsd:complexType> <xsd:sequence> <xsd:element ref="aufgesang"/> <xsd:element ref="abgesang"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="aufgesang"> <xsd:complexType> <xsd:sequence> <xsd:element ref="stollen"/> <xsd:element ref="stollen"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Declaring elements |
Positive closure |
<xsd:element name="abgesang"> <xsd:complexType> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="l"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="stollen"> <xsd:complexType> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="l"/> </xsd:sequence> </xsd:complexType> </xsd:element>
XML Query, XSLT, and XPath |
Why does XSLT have two parts? |
XSLT |
XML Query |
XPath |
XPath: an addressing language |
XPath data model |
XPath expressions |
XPath steps |
XPath selection axes |
Simple XPath examples |
Long syntax: more complex |
Long syntax: predicates |
XPath short syntax |
Sample queries |
XPath as query language |
XQuery as query language |
for $d in document("depts.xml")//deptno let $e := document("emps.xml")//emp[deptno = $d] where count($e) >= 10 order by avg($e/salary) descending return <big-dept> { $d, <headcount>{count($e)}</headcount>, <avgsal>{avg($e/salary)}</avgsal> } </big-dept>
Observables |
Consequences |
Thank you |
XML Query, XSLT, and XPath |