Introduction to XSLT

As a tool for humanities computing

1 Introductions

ALLC/ACH 2002, Tübingen

21-22 July 2002

C. M. Sperberg-McQueen

Wendell Piez

TOC | First


I. Sunday morning 1: introductions

Goals of this tutorial

previous table of contents next
1 of 60
XSLT is a tool for processing XML documents. After this course, you should know how to:
  • transform XML from one datatype to another
  • style XML for display in off-the-shelf Web browsers
  • use XPath to denote specific parts of a document
  • use XSLT as a search engine
  • find anomalies in your data, normalize tagging
  • segment your data
  • retag your data
  • express recursive algorithms in XSLT

Assumptions

previous table of contents next
2 of 60
  • You know XML and can edit XML documents.
  • You expect to work with XML data, and need to:
    • extract information
    • display information
    • reformat information
  • You know something about programming (or are tolerant of programmer talk).
  • You are interested in literary and historical texts and databases (or are tolerant of talk about them).
  • You know a bit* about HTML, CSS, and TEI Lite.

Non-assumptions

previous table of contents next
3 of 60
Not assumed:
  • You have used XSLT.
  • You are a programmer.
Not covered in the course:
  • XML
  • TEI
  • HTML, CSS (regard as magic*)
  • XSL formatting objects
  • Web design, page design, typography
  • tricks of the trade (production use)

Introductions

previous table of contents next
4 of 60
Who the instructors are.
Who the participants are.

Overview

previous table of contents next
5 of 60
  1. Sunday a.m.: introductions
  2. Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
  3. Sunday p.m.: modes (e.g. tables of contents)
  4. Sunday p.m.: XPath
  5. Monday a.m.: functions, numbering
  6. Monday a.m.: near-identity transformations
  7. Monday p.m.: named templates, recursion
  8. Monday p.m.: sorting and grouping

Sunday a.m.: introductions

previous table of contents next
6 of 60
  • overview of course
  • first pass over material:
  • basics of XSLT, simple transformations
  • modes
  • XPath
  • functions and numbering
  • XML to XML transformations
  • named templates
  • sorting and grouping

Basics of XSLT

previous table of contents next
7 of 60
  • What is XSLT and how does it work?
  • Hello-world example
  • Built-in rules
  • Flow of control (if, choose and why you often don't need them)

What is XSLT?

previous table of contents next
8 of 60
  • a solution to a ‘high-level formatting’ problem
  • in XML syntax
  • a language for specifying transformations
    • into XSL formatting objects
    • into arbitrary XML
    • into HTML
    • into non-XML formats
    with
    • data replication
    • data movement
    • data insertion and deletion

XSLT processing

previous table of contents next
9 of 60

Push and pull

previous table of contents next
10 of 60
How do we transform one tree to another?
  • procedurally (DOM)
  • declaratively (XSLT, DSSSL)
    • push model (DSSSL)
    • pull model (XSLT)

Syntax of XSLT

previous table of contents next
11 of 60
  • It's XML.*
  • Yes, there's a DTD; it has 35 elements.
  • Next question?

Syntax of XSLT (Take 2)

previous table of contents next
12 of 60
  • It's three* XML vocabularies:
    • the XSLT vocabulary describes stylesheet structure: namespace http://www.w3.org/1999/XSL/Transform
    • one or more XML vocabularies in the input
    • one or more XML vocabularies in the output
  • It's also XPath, a language for pointing into XML.

Hello, world (XML)

previous table of contents next
13 of 60
Consider the following XML (greeting.xml):
<!DOCTYPE greetings [
<!ELEMENT greetings (hello+) >
<!ELEMENT hello (#PCDATA) >
<!ATTLIST hello
          lang CDATA #IMPLIED > <!ENTITY szlig "&#223;" > <!ENTITY uuml "&#252;" > ]>

<greetings>
<hello lang="en">Hello, world!</hello>
<hello lang="fr">Bon jour, tout le monde!</hello>
<hello lang="no">Goddag!</hello>
<hello lang="de">Guten Tag!</hello>
<hello lang="de-schwaben">Gr&uuml;&szlig; Gott!</hello>
</greetings>

Hello, world (XSL)

previous table of contents next
14 of 60
The XSLT stylesheet greeting-03.xsl, color coded:
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt     "&#38;#60;" >
<!ENTITY gt     ">"      >
]>
<xsl:stylesheet version="1.0"      xmlns:xsl=        "http://www.w3.org/1999/XSL/Transform">

 <xsl:template match="greetings">
  <html>
   <head>
    <title>
     Hello, world! A simple XSLT demo
    </title>
   </head>
   <body>
    <h1>
Hello, world! A simple XSLT demo</h1>
    <p>
A hundred ways to say hello (er, well,
     <xsl:value-of select="count(//hello)"/>
     ways, anyway).</p>
    <ol>

     <xsl:apply-templates/>
    </ol>
   </body>
  </html>

 </xsl:template>

 <xsl:template match="hello">
  <xsl:element name="li">
   <xsl:apply-templates/>
  </xsl:element>
 </xsl:template>

</xsl:stylesheet>

Hello, world

previous table of contents next
15 of 60
How does it work?
  1. show XML source in editor
  2. show XML source in browser
  3. add stylesheet link
  4. show in browser
  5. run batch process
  6. show HTML in editor
  7. show HTML in browser
If you want to follow along ... then choose editor, launch browser, find workshop directory.

A simple transformation

previous table of contents next
16 of 60
Let's look at the details. First, the framework:
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt     "&#38;#60;" >
<!ENTITY gt     ">"      >
]>
<xsl:stylesheet version="1.0"      xmlns:xsl=        "http://www.w3.org/1999/XSL/Transform">

<!--* ... guts of stylesheet here ... *-->

</xsl:stylesheet>

The greetings element

previous table of contents next
17 of 60
One template handles the top-level greetings element:
 <xsl:template match="greetings">
  <html>
   <head>
    <title>
Hello, world! A simple XSLT demo</title>
   </head>
   <body>
    <h1>
Hello, world! A simple XSLT demo</h1>
    <p>
A hundred ways to say hello (er, well,
     <xsl:value-of select="count(//hello)"/>
     ways, anyway).</p>
    <ol>
<xsl:apply-templates/></ol>
   </body>
  </html>

 </xsl:template>

Handling the hello element

previous table of contents next
18 of 60
The other handles the hello element:
 <xsl:template match="hello">
  <xsl:element name="li">
   <xsl:apply-templates/>
  </xsl:element>
 </xsl:template>
Note the different ways to specify the output: element constructors (<xsl:element name="li">...</xsl:element>) or literal output elements (<html>)

Built-in rules

previous table of contents next
19 of 60
There are three built-in rules.
<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>
For text and attributes:
<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>
For comments and processing instructions:
<xsl:template match="processing-instruction()
                     | comment()"/>

Flow of control

previous table of contents next
20 of 60
flow of control = ‘how processor decides what to do when
Which part of the stylesheet fires (is ‘in control’) at any given moment?

Control flow in XSLT

previous table of contents next
21 of 60
Several ways to affect sequence of processing:
  • <xsl:template match="xpe">...</xsl:template>
  • <xsl:apply-templates/>
  • <xsl:apply-templates select="xpe"/>
  • <xsl:apply-templates mode="..."/>
  • <xsl:if test="..."> ... </xsl:if>
  • <xsl:choose>
      <xsl:when test="..."> ... </xsl:when>
      <xsl:when test="..."> ... </xsl:when>
      <xsl:otherwise> ... </xsl:otherwise>
    </xsl:choose>
  • <xsl:for-each select="xpe"> ... </xsl:for-each>

Flow of control using match

previous table of contents next
22 of 60
 <xsl:template match="hello[@lang='no']" priority="2">
  <li style="font-family: Comic Sans MS; color: brown;">
   <xsl:apply-templates/>
  </li>
 </xsl:template>

 <xsl:template match="hello[@lang='fr']" priority="2">
  <li style="font-family: Script; color: blue; 
             font-size: larger;">
   <xsl:apply-templates/>
  </li>
 </xsl:template>

Flow of control using select

previous table of contents next
23 of 60
<ul>
 <xsl:apply-templates 
  select="hello[@lang='de-schwaben']"/>
</ul>

Modes

previous table of contents next
24 of 60
  • What they are
  • How to use them

Modes

previous table of contents next
25 of 60
Process the same element multiple times:
 <xsl:template match="greetings"> 
   ...
    <h3>Index of language codes</h3>
    <ul>
     <xsl:apply-templates mode="index"/>
    </ul>
   ... 
 </xsl:template>
...
 <xsl:template match="hello" mode="index" priority="1">
  <xsl:element name="li">
   <xsl:value-of select="@lang"/>
  </xsl:element>
 </xsl:template>

XPath

previous table of contents next
26 of 60
  • What is XPath?
  • XPath data model
  • Expressions as location ladders
  • Axes
  • Long syntax, short syntax

XPath: an addressing language

previous table of contents next
27 of 60
Many applications need to ‘address’ parts of XML documents:
  • formatting (e.g. XSLT)
  • hyperlinking
  • document construction
  • query / search and retrieval
  • schema / language specification
  • ...
XPath captures the common functionality.
XSLT select and match values are XPath expressions.

XPath data model

previous table of contents next
28 of 60
A document is an ordered tree with
  • a root node
  • element nodes
  • text nodes
  • attribute nodes
  • namespace nodes
  • processing instructions
  • comment nodes
No structure sharing. No entity boundaries. Namespace prefixes resolved.

Data model example

previous table of contents next
29 of 60
greetings.xml drawn as a tree (color-coded).

Data model example

previous table of contents next
30 of 60
greetings.xml drawn as a sideways tree.

XPath expressions

previous table of contents next
31 of 60
An expression is a sequence of steps:
/step/step/step/step ...
A step is
axis::node test [predicate] [predicate] ...

XPath selection axes

previous table of contents next
32 of 60
  • child (→ e, t, c, p)
  • parent (→ e)
  • attribute (→ a)
  • following, following-sibling (→ e, t, c, p)
  • preceding, preceding-sibling (→ e, t, c, p)
  • self
  • namespace (→ n)
  • ancestor, ancestor-or-self (→ e)
  • descendant, descendant-or-self (→ e, t, c, p)

XPath long syntax: simple

previous table of contents next
33 of 60
  • child::para
  • child::* (all element children)
  • child::text() (all text node children)
  • child::node() (all children)
  • attribute::name
  • attribute::*
  • descendant::para
  • ancestor::div
  • ancestor-or-self::div
  • descendant-or-self::para

Long syntax: more complex

previous table of contents next
34 of 60
  • self::para (context node if para, otherwise nothing)
  • child::chapter/descendant::para
  • child::*/child::para (all para grandchildren)
  • /
  • /descendant::para (all para elements in the document)
  • /descendant::olist/child::item

Long syntax: predicates

previous table of contents next
35 of 60
  • child::para[position()=1]
  • child::para[position()=last()-1]
  • child::para[position()>1]
  • following-sibling::chapter[position()=1]
  • /child::doc/child::chapter[position()=5]/child::section[position()=2]
  • child::para[attribute::type="warning"]
  • child::chapter[child::title='Introduction']
  • child::chapter[child::title]
  • child::*[self::chapter or self::appendix][position()=last()]

XPath short syntax

previous table of contents next
36 of 60
  • para
  • * (all element children)
  • text() (all text node children)
  • node() (all children)
  • @name
  • @*
  • //para
  • chapter/descendant::para
  • */para (all para grandchildren)
  • para[1]
  • /doc/chapter[5]/section[2]
  • para[@type="warning"]
  • para[@type='warning'][5]
  • para[5][@type="warning"]
  • chapter[title='Introduction']
  • chapter[title]
  • *[self::chapter or self::appendix]

Functions and numbering

previous table of contents next
37 of 60
  • Built-in functions:
    • for node sets
    • for extracting names
    • for strings
    • Booleans
    • numerics
  • Numbering elements: xsl:number

Built-in functions

previous table of contents next
38 of 60
Functions from several sources: XPath functions for node sets
  • number last()  → number equal to context size
  • number position()
  • number count(node-set)
  • node-set id(object)   → the object with this ID

XPath name functions

previous table of contents next
39 of 60
  • string local-name(node-set?)
  • string namespace-uri(node-set?)
  • string name(node-set?)

XPath string functions

previous table of contents next
40 of 60
  • string string(object?)
  • string concat(string, string, string*)
  • boolean starts-with(string, string)
  • boolean contains(string, string)
  • string substring-before(string, string)
  • string substring-after(string, string)
  • string substring(string, number, number?)
  • number string-length(string?)
  • string normalize-space(string?)
  • string translate(string, string, string)

XPath Boolean functions

previous table of contents next
41 of 60
  • boolean boolean(object)
  • boolean not(boolean)
  • boolean true()
  • boolean false()
  • boolean lang(string) (is context node in language named?)

XPath numeric functions

previous table of contents next
42 of 60
  • number number(object?)
  • number sum(node-set)
  • number floor(number)
  • number ceiling(number)
  • number round(number)

XSLT functions

previous table of contents next
43 of 60
  • node-set document(object, node-set?)
  • node-set key(string, object)
  • string format-number(number, string, string?)
  • node-set current()
  • string unparsed-entity-url(string)
  • string generate-id(node-set?)
  • object system-property(string)
  • boolean element-available(string)
  • boolean function-available(string)
XPath processor must recognize as functions; XSLT processor must implement.

Numbering

previous table of contents next
44 of 60
The xsl:number element does element numbering:
level = single | multiple | any
count = pattern
from = pattern
value = number-expr
format = {string}
...

Numbering example: Footnote numbers

previous table of contents next
45 of 60
To number footnotes in the document with a single running arabic number:
<xsl:number level="any" 
            count="note[@place='foot']" 
            format="1"/>

Numbering example: Div1 numbers

previous table of contents next
46 of 60
To number div1 elements within a div0 or higher, Arabic within the body and letters in back matter:
<xsl:number level="single" 
            count="div1" 
            format="1. "/>
<!--* ... *-->
<xsl:number level="single" 
            count="div1|div" 
            format="A. "/>

Numbering example: Dewey-style sections

previous table of contents next
47 of 60
Assigning Dewey-style numbering to vanilla div elements:
<xsl:number level="multiple" 
            count="div" 
            format="1. "/>

XML to XML transformations

previous table of contents next
48 of 60
Two kinds
  • into a different tag set
  • into same tag set, with different tagging (near-identity transform)
Development:
  • start with null transform
  • work by successive approximation

Identity transform

previous table of contents next
49 of 60
The identity transform changes the default rules for elements and attributes:
 <xsl:template match="@*|*">
  <xsl:copy>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

PIs and comments

previous table of contents next
50 of 60
The identity transform changes the default rules for comments and processing instructions:

 <xsl:template match='comment()'>
  <xsl:comment>
   <xsl:value-of select="."/>
  </xsl:comment>
 </xsl:template>

 <xsl:template match='processing-instruction()'>
  <xsl:variable name="pitarget" select="name()"/>
  <xsl:processing-instruction name="{$pitarget}">
   <xsl:value-of select="."/>
  </xsl:processing-instruction>
 </xsl:template>

Supplying IDs

previous table of contents next
51 of 60
A simple near-identity transform:

 <xsl:template match="*">
  <xsl:copy>
   <xsl:if test="not(@id)">
    <xsl:attribute name="id">
     <xsl:value-of select="generate-id(.)"/>
    </xsl:attribute>
   </xsl:if>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:copy>
   <xsl:apply-templates select="node()"/>
  </xsl:copy>
 </xsl:template>

Named templates, variables

previous table of contents next
52 of 60
  • syntax
  • usage
  • variables and parameters
  • recursion

Named templates

previous table of contents next
53 of 60
Syntax: supply a name attribute:

<xsl:template name="xyz">
 ...
</xsl:template>
and call using call-template

<xsl:call-template name="xyz"/>
Use like a function call: clarify what is happening, call from different locations.

Variables and parameters

previous table of contents next
54 of 60
We can declare variables
<xsl:variable name="pitarget" select="name()"/>
and parameters:
<xsl:param name="n">0</xsl:param>
and call named templates with parameters
    <xsl:call-template name="count-down">
     <xsl:with-param name="n" select="$n - 1"/>
    </xsl:call-template>

Example: find the last slash

previous table of contents next
55 of 60
We have data of the form x/y/z, x/y/z/w, x/y/z/r/r, x/y/z/r/s, x/y/v; find the last segment.
findslash(s : string) {
   if the string does not contain '/', then {
      // no slash, we are done
      return s
   }
   else { // recursion
      strip s up to and including '/'
      return findslash(s)
   }
}   

Last slash in XSLT

previous table of contents next
56 of 60
 <xsl:template name="findlast">
  <xsl:param name="sValue"/>
  <xsl:choose>
   <xsl:when test='contains($sValue,"/")'>
    <xsl:call-template name="findlast">
     <xsl:with-param name="sValue">
      <xsl:value-of 
       select='substring-after($sValue,"/")'/>
     </xsl:with-param>
    </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="$sValue"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Sorting and grouping

previous table of contents next
57 of 60
  • sorting (easy)
  • grouping (takes some thought)

Sorting

previous table of contents next
58 of 60
The xsl:sort element occurs as child of apply-templates
    <xsl:apply-templates select="//w" mode="conc">
     <xsl:sort select="."/>
    </xsl:apply-templates>
Attributes:
select
data-type (text, number)
order (ascending, descending)

Grouping

previous table of contents next
59 of 60
Not easy to solve alone. Declarative nature of XSLT works against us.
Fortunately, there are recipes. Find one, follow it.

Overview

previous table of contents next
60 of 60
  1. Sunday a.m.: introductions
  2. Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
  3. Sunday p.m.: modes (e.g. tables of contents)
  4. Sunday p.m.: XPath
  5. Monday a.m.: functions, numbering
  6. Monday a.m.: near-identity transformations
  7. Monday p.m.: named templates, recursion
  8. Monday p.m.: sorting and grouping