Introduction to XSLT

As a tool for humanities computing

1 Introductions

ALLC/ACH 2002, Tübingen

21-22 July 2002

C. M. Sperberg-McQueen

Wendell Piez

TOC | First

I.1. Goals of this tutorial
I.2. Assumptions
I.3. Non-assumptions
I.4. Introductions
I.5. Overview
I.6. Sunday a.m.: introductions
I.7. Basics of XSLT
I.8. What is XSLT?
I.9. XSLT processing
I.10. Push and pull
I.11. Syntax of XSLT
I.12. Syntax of XSLT (Take 2)
I.13. Hello, world (XML)
I.14. Hello, world (XSL)
I.15. Hello, world
I.16. A simple transformation
I.17. The greetings element
I.18. Handling the hello element
I.19. Built-in rules
I.20. Flow of control
I.21. Control flow in XSLT
I.22. Flow of control using match
I.23. Flow of control using select
I.24. Modes
I.25. Modes
I.26. XPath
I.27. XPath: an addressing language
I.28. XPath data model
I.29. Data model example
I.30. Data model example
I.31. XPath expressions
I.32. XPath selection axes
I.33. XPath long syntax: simple
I.34. Long syntax: more complex
I.35. Long syntax: predicates
I.36. XPath short syntax
I.37. Functions and numbering
I.38. Built-in functions
I.39. XPath name functions
I.40. XPath string functions
I.41. XPath Boolean functions
I.42. XPath numeric functions
I.43. XSLT functions
I.44. Numbering
I.45. Numbering example: Footnote numbers
I.46. Numbering example: Div1 numbers
I.47. Numbering example: Dewey-style sections
I.48. XML to XML transformations
I.49. Identity transform
I.50. PIs and comments
I.51. Supplying IDs
I.52. Named templates, variables
I.53. Named templates
I.54. Variables and parameters
I.55. Example: find the last slash
I.56. Last slash in XSLT
I.57. Sorting and grouping
I.58. Sorting
I.59. Grouping
I.60. Overview

I. Sunday morning 1: introductions

Goals of this tutorial

1 of 60

XSLT is a tool for processing XML documents. After this course, you should know how to:

transform XML from one datatype to another
style XML for display in off-the-shelf Web browsers
use XPath to denote specific parts of a document
use XSLT as a search engine
find anomalies in your data, normalize tagging
segment your data
retag your data
express recursive algorithms in XSLT

Assumptions

2 of 60

You know XML and can edit XML documents.
You expect to work with XML data, and need to:
- extract information
- display information
- reformat information
You know something about programming (or are tolerant of programmer talk).
You are interested in literary and historical texts and databases (or are tolerant of talk about them).
You know a bit* about HTML, CSS, and TEI Lite.

Non-assumptions

3 of 60

Not assumed:

You have used XSLT.
You are a programmer.

Not covered in the course:

XML
TEI
HTML, CSS (regard as magic*)
XSL formatting objects
Web design, page design, typography
tricks of the trade (production use)

Introductions

4 of 60

Who the instructors are.

Who the participants are.

Overview

5 of 60

Sunday a.m.: introductions
Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
Sunday p.m.: modes (e.g. tables of contents)
Sunday p.m.: XPath
Monday a.m.: functions, numbering
Monday a.m.: near-identity transformations
Monday p.m.: named templates, recursion
Monday p.m.: sorting and grouping

Sunday a.m.: introductions

6 of 60

overview of course
first pass over material:
basics of XSLT, simple transformations
modes
XPath
functions and numbering
XML to XML transformations
named templates
sorting and grouping

Basics of XSLT

7 of 60

What is XSLT and how does it work?
Hello-world example
Built-in rules
Flow of control (if, choose and why you often don't need them)

What is XSLT?

8 of 60

a solution to a ‘high-level formatting’ problem
in XML syntax
a language for specifying transformations
- into XSL formatting objects
- into arbitrary XML
- into HTML
- into non-XML formats
with
- data replication
- data movement
- data insertion and deletion

XSLT processing

9 of 60

Push and pull

10 of 60

How do we transform one tree to another?

procedurally (DOM)
declaratively (XSLT, DSSSL)
- push model (DSSSL)
- pull model (XSLT)

Syntax of XSLT

11 of 60

It's XML.*
Yes, there's a DTD; it has 35 elements.
Next question?

Syntax of XSLT (Take 2)

12 of 60

It's three* XML vocabularies:
- the XSLT vocabulary describes stylesheet structure: namespace http://www.w3.org/1999/XSL/Transform
- one or more XML vocabularies in the input
- one or more XML vocabularies in the output
It's also XPath, a language for pointing into XML.

Hello, world (XML)

13 of 60

Consider the following XML (greeting.xml):

<!DOCTYPE greetings [
<!ELEMENT greetings (hello+) >
<!ELEMENT hello (#PCDATA) >
<!ATTLIST hello

          lang CDATA #IMPLIED >
<!ENTITY szlig   "&#223;" >
<!ENTITY uuml    "&#252;" >
]>

<greetings>

<hello lang="en">Hello, world!</hello>

<hello lang="fr">Bon jour, tout le monde!</hello>

<hello lang="no">Goddag!</hello>

<hello lang="de">Guten Tag!</hello>

<hello lang="de-schwaben">Gr&uuml;&szlig; Gott!</hello>

</greetings>

Hello, world (XSL)

14 of 60

The XSLT stylesheet greeting-03.xsl, color coded:

<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt     "&#38;#60;" >
<!ENTITY gt     ">"      >
]>

<xsl:stylesheet version="1.0"
     xmlns:xsl=
       "http://www.w3.org/1999/XSL/Transform">



 <xsl:template match="greetings">

  <html>

   <head>

    <title>

     Hello, world! A simple XSLT demo

    </title>

   </head>

   <body>

    <h1>Hello, world! A simple XSLT demo</h1>

    <p>A hundred ways to say hello (er, well, 

     <xsl:value-of select="count(//hello)"/>

     ways, anyway).</p>

    <ol>

     <xsl:apply-templates/>

    </ol>

   </body>

  </html>

 </xsl:template>



 <xsl:template match="hello">

  <xsl:element name="li">

   <xsl:apply-templates/>

  </xsl:element>

 </xsl:template>



</xsl:stylesheet>

Hello, world

15 of 60

How does it work?

show XML source in editor
show XML source in browser
add stylesheet link
show in browser
run batch process
show HTML in editor
show HTML in browser

If you want to follow along ... then choose editor, launch browser, find workshop directory.

A simple transformation

16 of 60

Let's look at the details. First, the framework:

<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt     "&#38;#60;" >
<!ENTITY gt     ">"      >
]>

<xsl:stylesheet version="1.0"
     xmlns:xsl=
       "http://www.w3.org/1999/XSL/Transform">



<!--* ... guts of stylesheet here ... *-->



</xsl:stylesheet>

The greetings element

17 of 60

One template handles the top-level greetings element:

 <xsl:template match="greetings">

  <html>

   <head>

    <title>Hello, world! A simple XSLT demo</title>

   </head>

   <body>

    <h1>Hello, world! A simple XSLT demo</h1>

    <p>A hundred ways to say hello (er, well, 

     <xsl:value-of select="count(//hello)"/>

     ways, anyway).</p>

    <ol><xsl:apply-templates/></ol>

   </body>

  </html>

 </xsl:template>

Handling the hello element

18 of 60

The other handles the hello element:

 <xsl:template match="hello">

  <xsl:element name="li">

   <xsl:apply-templates/>

  </xsl:element>

 </xsl:template>

Note the different ways to specify the output: element constructors (<xsl:element name="li">...</xsl:element>) or literal output elements (<html>)

Built-in rules

19 of 60

There are three built-in rules.

<xsl:template match="*|/">
  <xsl:apply-templates/>
</xsl:template>

For text and attributes:

<xsl:template match="text()|@*">
  <xsl:value-of select="."/>
</xsl:template>

For comments and processing instructions:

<xsl:template match="processing-instruction()
                     | comment()"/>

Flow of control

20 of 60

flow of control = ‘how processor decides what to do when’

Which part of the stylesheet fires (is ‘in control’) at any given moment?

Control flow in XSLT

21 of 60

Several ways to affect sequence of processing:

<xsl:template match="xpe">...</xsl:template>
<xsl:apply-templates/>
<xsl:apply-templates select="xpe"/>
<xsl:apply-templates mode="..."/>
<xsl:if test="..."> ... </xsl:if>
<xsl:choose>
  <xsl:when test="..."> ... </xsl:when>
  <xsl:when test="..."> ... </xsl:when>
  <xsl:otherwise> ... </xsl:otherwise>
</xsl:choose>
<xsl:for-each select="xpe"> ... </xsl:for-each>

Flow of control using match

22 of 60

 <xsl:template match="hello[@lang='no']" priority="2">
  <li style="font-family: Comic Sans MS; color: brown;">
   <xsl:apply-templates/>
  </li>
 </xsl:template>

 <xsl:template match="hello[@lang='fr']" priority="2">
  <li style="font-family: Script; color: blue; 
             font-size: larger;">
   <xsl:apply-templates/>
  </li>
 </xsl:template>

Flow of control using select

23 of 60

<ul>
 <xsl:apply-templates 
  select="hello[@lang='de-schwaben']"/>
</ul>

Modes

24 of 60

What they are
How to use them

Modes

25 of 60

Process the same element multiple times:

 <xsl:template match="greetings"> 
   ...
    <h3>Index of language codes</h3>
    <ul>
     <xsl:apply-templates mode="index"/>
    </ul>
   ... 
 </xsl:template>
...
 <xsl:template match="hello" mode="index" priority="1">
  <xsl:element name="li">
   <xsl:value-of select="@lang"/>
  </xsl:element>
 </xsl:template>

XPath

26 of 60

What is XPath?
XPath data model
Expressions as location ladders
Axes
Long syntax, short syntax

XPath: an addressing language

27 of 60

Many applications need to ‘address’ parts of XML documents:

formatting (e.g. XSLT)
hyperlinking
document construction
query / search and retrieval
schema / language specification
...

XPath captures the common functionality.

XSLT select and match values are XPath expressions.

XPath data model

28 of 60

A document is an ordered tree with

a root node
element nodes
text nodes
attribute nodes
namespace nodes
processing instructions
comment nodes

No structure sharing. No entity boundaries. Namespace prefixes resolved.

Data model example

29 of 60

greetings.xml drawn as a tree (color-coded).

Data model example

30 of 60

greetings.xml drawn as a sideways tree.

XPath expressions

31 of 60

An expression is a sequence of steps:

/step/step/step/step ...

A step is

axis::node test [predicate] [predicate] ...

XPath selection axes

32 of 60

child (→ e, t, c, p)
parent (→ e)
attribute (→ a)
following, following-sibling (→ e, t, c, p)
preceding, preceding-sibling (→ e, t, c, p)
self
namespace (→ n)
ancestor, ancestor-or-self (→ e)
descendant, descendant-or-self (→ e, t, c, p)

XPath long syntax: simple

33 of 60

child::para
child::* (all element children)
child::text() (all text node children)
child::node() (all children)
attribute::name
attribute::*
descendant::para
ancestor::div
ancestor-or-self::div
descendant-or-self::para

Long syntax: more complex

34 of 60

self::para (context node if para, otherwise nothing)
child::chapter/descendant::para
child::*/child::para (all para grandchildren)
/
/descendant::para (all para elements in the document)
/descendant::olist/child::item

Long syntax: predicates

35 of 60

child::para[position()=1]
child::para[position()=last()-1]
child::para[position()>1]
following-sibling::chapter[position()=1]
/child::doc/child::chapter[position()=5]/child::section[position()=2]
child::para[attribute::type="warning"]
child::chapter[child::title='Introduction']
child::chapter[child::title]
child::*[self::chapter or self::appendix][position()=last()]

XPath short syntax

36 of 60

para
* (all element children)
text() (all text node children)
node() (all children)
@name
@*
//para
chapter/descendant::para
*/para (all para grandchildren)
para[1]
/doc/chapter[5]/section[2]
para[@type="warning"]
para[@type='warning'][5]
para[5][@type="warning"]
chapter[title='Introduction']
chapter[title]
*[self::chapter or self::appendix]

Functions and numbering

37 of 60

Built-in functions:
- for node sets
- for extracting names
- for strings
- Booleans
- numerics
Numbering elements: xsl:number

Built-in functions

38 of 60

Functions from several sources: XPath functions for node sets

number last() → number equal to context size
number position()
number count(node-set)
node-set id(object) → the object with this ID

XPath name functions

39 of 60

string local-name(node-set?)
string namespace-uri(node-set?)
string name(node-set?)

XPath string functions

40 of 60

string string(object?)
string concat(string, string, string*)
boolean starts-with(string, string)
boolean contains(string, string)
string substring-before(string, string)
string substring-after(string, string)
string substring(string, number, number?)
number string-length(string?)
string normalize-space(string?)
string translate(string, string, string)

XPath Boolean functions

41 of 60

boolean boolean(object)
boolean not(boolean)
boolean true()
boolean false()
boolean lang(string) (is context node in language named?)

XPath numeric functions

42 of 60

number number(object?)
number sum(node-set)
number floor(number)
number ceiling(number)
number round(number)

XSLT functions

43 of 60

node-set document(object, node-set?)
node-set key(string, object)
string format-number(number, string, string?)
node-set current()
string unparsed-entity-url(string)
string generate-id(node-set?)
object system-property(string)
boolean element-available(string)
boolean function-available(string)

XPath processor must recognize as functions; XSLT processor must implement.

Numbering

44 of 60

The xsl:number element does element numbering:

level = single | multiple | any

count = pattern

from = pattern

value = number-expr

format = {string}

...

Numbering example: Footnote numbers

45 of 60

To number footnotes in the document with a single running arabic number:

<xsl:number level="any" 
            count="note[@place='foot']" 
            format="1"/>

Numbering example: Div1 numbers

46 of 60

To number div1 elements within a div0 or higher, Arabic within the body and letters in back matter:

<xsl:number level="single" 
            count="div1" 
            format="1. "/>
<!--* ... *-->
<xsl:number level="single" 
            count="div1|div" 
            format="A. "/>

Numbering example: Dewey-style sections

47 of 60

Assigning Dewey-style numbering to vanilla div elements:

<xsl:number level="multiple" 
            count="div" 
            format="1. "/>

XML to XML transformations

48 of 60

Two kinds

into a different tag set
into same tag set, with different tagging (near-identity transform)

Development:

start with null transform
work by successive approximation

Identity transform

49 of 60

The identity transform changes the default rules for elements and attributes:

 <xsl:template match="@*|*">
  <xsl:copy>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

PIs and comments

50 of 60

The identity transform changes the default rules for comments and processing instructions:


 <xsl:template match='comment()'>
  <xsl:comment>
   <xsl:value-of select="."/>
  </xsl:comment>
 </xsl:template>

 <xsl:template match='processing-instruction()'>
  <xsl:variable name="pitarget" select="name()"/>
  <xsl:processing-instruction name="{$pitarget}">
   <xsl:value-of select="."/>
  </xsl:processing-instruction>
 </xsl:template>

Supplying IDs

51 of 60

A simple near-identity transform:


 <xsl:template match="*">
  <xsl:copy>
   <xsl:if test="not(@id)">
    <xsl:attribute name="id">
     <xsl:value-of select="generate-id(.)"/>
    </xsl:attribute>
   </xsl:if>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:copy>
   <xsl:apply-templates select="node()"/>
  </xsl:copy>
 </xsl:template>

Named templates, variables

52 of 60

syntax
usage
variables and parameters
recursion

Named templates

53 of 60

Syntax: supply a name attribute:


<xsl:template name="xyz">
 ...
</xsl:template>

and call using call-template


<xsl:call-template name="xyz"/>

Use like a function call: clarify what is happening, call from different locations.

Variables and parameters

54 of 60

We can declare variables

<xsl:variable name="pitarget" select="name()"/>

and parameters:

<xsl:param name="n">0</xsl:param>

and call named templates with parameters

    <xsl:call-template name="count-down">
     <xsl:with-param name="n" select="$n - 1"/>
    </xsl:call-template>

Example: find the last slash

55 of 60

We have data of the form x/y/z, x/y/z/w, x/y/z/r/r, x/y/z/r/s, x/y/v; find the last segment.

findslash(s : string) {
   if the string does not contain '/', then {
      // no slash, we are done
      return s
   }
   else { // recursion
      strip s up to and including '/'
      return findslash(s)
   }
}

Last slash in XSLT

56 of 60

 <xsl:template name="findlast">
  <xsl:param name="sValue"/>
  <xsl:choose>
   <xsl:when test='contains($sValue,"/")'>
    <xsl:call-template name="findlast">
     <xsl:with-param name="sValue">
      <xsl:value-of 
       select='substring-after($sValue,"/")'/>
     </xsl:with-param>
    </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="$sValue"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Sorting and grouping

57 of 60

sorting (easy)
grouping (takes some thought)

Sorting

58 of 60

The xsl:sort element occurs as child of apply-templates

    <xsl:apply-templates select="//w" mode="conc">
     <xsl:sort select="."/>
    </xsl:apply-templates>

Attributes:

select

data-type (text, number)

order (ascending, descending)

Grouping

59 of 60

Not easy to solve alone. Declarative nature of XSLT works against us.

Fortunately, there are recipes. Find one, follow it.

Overview

60 of 60

Sunday a.m.: introductions
Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
Sunday p.m.: modes (e.g. tables of contents)
Sunday p.m.: XPath
Monday a.m.: functions, numbering
Monday a.m.: near-identity transformations
Monday p.m.: named templates, recursion
Monday p.m.: sorting and grouping