Introduction to XSLT
As a tool for humanities computing
1 Introductions
ALLC/ACH 2002, Tübingen
21-22 July 2002
C. M. Sperberg-McQueen
Wendell Piez
I. Sunday morning 1: introductions
XSLT is a tool for processing XML documents. After this
course, you should know how to:
- transform XML from one datatype to another
- style XML for display in off-the-shelf Web browsers
- use XPath to denote specific parts of a document
- use XSLT as a search engine
- find anomalies in your data, normalize tagging
- segment your data
- retag your data
- express recursive algorithms in XSLT
- You know XML and can edit XML documents.
- You expect to work with XML data, and need to:
- extract information
- display information
- reformat information
- You know something about programming (or are
tolerant of programmer talk).
- You are interested in literary and historical
texts and databases (or are tolerant of talk about
them).
- You know a bit* about HTML, CSS, and TEI Lite.
Not assumed:
- You have used XSLT.
- You are a programmer.
Not covered in the course:
- XML
- TEI
- HTML, CSS (regard as magic*)
- XSL formatting objects
- Web design, page design, typography
- tricks of the trade (production use)
Who the instructors are.
Who the participants are.
- Sunday a.m.: introductions
- Sunday a.m.: basics (simple transformations, if,
choose, selection
by attribute values)
- Sunday p.m.: modes (e.g. tables of contents)
- Sunday p.m.: XPath
- Monday a.m.: functions, numbering
- Monday a.m.: near-identity transformations
- Monday p.m.: named templates, recursion
- Monday p.m.: sorting and grouping
Sunday a.m.: introductions | |
- overview of course
- first pass over material:
- basics of XSLT, simple transformations
- modes
- XPath
- functions and numbering
- XML to XML transformations
- named templates
- sorting and grouping
- What is XSLT and how does it work?
- Hello-world example
- Built-in rules
- Flow of control (if, choose and
why you often don't need them)
- a solution to a ‘high-level formatting’
problem
- in XML syntax
- a language for specifying transformations
- into XSL formatting objects
- into arbitrary XML
- into HTML
- into non-XML formats
with - data replication
- data movement
- data insertion and deletion
How do we transform one tree to another?
- procedurally (DOM)
- declaratively (XSLT, DSSSL)
- push model (DSSSL)
- pull model (XSLT)
- It's XML.*
- Yes, there's a DTD; it has 35 elements.
- Next question?
- It's three* XML vocabularies:
- the XSLT vocabulary describes stylesheet structure:
namespace http://www.w3.org/1999/XSL/Transform
- one or more XML vocabularies in the input
- one or more XML vocabularies in the output
- It's also XPath, a language for pointing into XML.
Consider the following XML (greeting.xml):
<!DOCTYPE greetings [
<!ELEMENT greetings (hello+) >
<!ELEMENT hello (#PCDATA) >
<!ATTLIST hello
lang CDATA #IMPLIED >
<!ENTITY szlig "ß" >
<!ENTITY uuml "ü" >
]>
<greetings>
<hello lang="en">Hello, world!</hello>
<hello lang="fr">Bon jour, tout le monde!</hello>
<hello lang="no">Goddag!</hello>
<hello lang="de">Guten Tag!</hello>
<hello lang="de-schwaben">Grüß Gott!</hello>
</greetings>
The XSLT stylesheet greeting-03.xsl, color coded:
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
'/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt "&#60;" >
<!ENTITY gt ">" >
]>
<xsl:stylesheet version="1.0"
xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform">
<xsl:template match="greetings">
<html>
<head>
<title>
Hello, world! A simple XSLT demo
</title>
</head>
<body>
<h1>Hello, world! A simple XSLT demo</h1>
<p>A hundred ways to say hello (er, well,
<xsl:value-of select="count(//hello)"/>
ways, anyway).</p>
<ol>
<xsl:apply-templates/>
</ol>
</body>
</html>
</xsl:template>
<xsl:template match="hello">
<xsl:element name="li">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
How does it work?
- show XML source in editor
- show XML source in browser
- add stylesheet link
- show in browser
- run batch process
- show HTML in editor
- show HTML in browser
If you want to follow along ... then
choose editor, launch browser, find workshop directory.
Let's look at the details. First, the framework:
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
'/SGML/Public/W3C/xslt10.dtd' [
<!ENTITY lt "&#60;" >
<!ENTITY gt ">" >
]>
<xsl:stylesheet version="1.0"
xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform">
<!--* ... guts of stylesheet here ... *-->
</xsl:stylesheet>
One template handles the top-level greetings element:
<xsl:template match="greetings">
<html>
<head>
<title>Hello, world! A simple XSLT demo</title>
</head>
<body>
<h1>Hello, world! A simple XSLT demo</h1>
<p>A hundred ways to say hello (er, well,
<xsl:value-of select="count(//hello)"/>
ways, anyway).</p>
<ol><xsl:apply-templates/></ol>
</body>
</html>
</xsl:template>
Handling the hello element | |
The other handles the hello element:
<xsl:template match="hello">
<xsl:element name="li">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
Note the different ways to specify the output: element
constructors (<xsl:element name="li">...</xsl:element>) or
literal output elements (<html>)
There are three built-in rules.
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
For text and attributes:
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
For comments and processing instructions:
<xsl:template match="processing-instruction()
| comment()"/>
flow of control =
‘how processor decides what to do when’
Which part of the stylesheet fires (is ‘in control’)
at any given moment?
Several ways to affect sequence of processing:
- <xsl:template match="xpe">...</xsl:template>
- <xsl:apply-templates/>
- <xsl:apply-templates select="xpe"/>
- <xsl:apply-templates mode="..."/>
- <xsl:if test="..."> ... </xsl:if>
- <xsl:choose>
<xsl:when test="..."> ... </xsl:when>
<xsl:when test="..."> ... </xsl:when>
<xsl:otherwise> ... </xsl:otherwise>
</xsl:choose>
- <xsl:for-each select="xpe"> ... </xsl:for-each>
Flow of control using match | |
<xsl:template match="hello[@lang='no']" priority="2">
<li style="font-family: Comic Sans MS; color: brown;">
<xsl:apply-templates/>
</li>
</xsl:template>
<xsl:template match="hello[@lang='fr']" priority="2">
<li style="font-family: Script; color: blue;
font-size: larger;">
<xsl:apply-templates/>
</li>
</xsl:template>
Flow of control using select | |
<ul>
<xsl:apply-templates
select="hello[@lang='de-schwaben']"/>
</ul>
- What they are
- How to use them
Process the same element multiple times:
<xsl:template match="greetings">
...
<h3>Index of language codes</h3>
<ul>
<xsl:apply-templates mode="index"/>
</ul>
...
</xsl:template>
...
<xsl:template match="hello" mode="index" priority="1">
<xsl:element name="li">
<xsl:value-of select="@lang"/>
</xsl:element>
</xsl:template>
- What is XPath?
- XPath data model
- Expressions as location ladders
- Axes
- Long syntax, short syntax
XPath: an addressing language | |
Many applications need to ‘address’ parts of
XML documents:
- formatting (e.g. XSLT)
- hyperlinking
- document construction
- query / search and retrieval
- schema / language specification
- ...
XPath captures the common functionality.
XSLT select and
match values are XPath expressions.
A document is an ordered tree with
- a root node
- element nodes
- text nodes
- attribute nodes
- namespace nodes
- processing instructions
- comment nodes
No structure sharing. No entity boundaries. Namespace prefixes resolved.
An expression is a sequence of steps:
A step is
axis::node test
[predicate]
[predicate] ...
- child (→ e, t, c, p)
- parent (→ e)
- attribute (→ a)
- following, following-sibling (→ e, t, c, p)
- preceding, preceding-sibling (→ e, t, c, p)
- self
- namespace (→ n)
- ancestor, ancestor-or-self (→ e)
- descendant, descendant-or-self (→ e, t, c, p)
XPath long syntax: simple | |
- child::para
- child::*
(all element children)
- child::text()
(all text node children)
- child::node()
(all children)
- attribute::name
- attribute::*
- descendant::para
- ancestor::div
- ancestor-or-self::div
- descendant-or-self::para
Long syntax: more complex | |
- self::para
(context node if para, otherwise nothing)
- child::chapter/descendant::para
- child::*/child::para
(all para grandchildren)
- /
- /descendant::para
(all para elements in the document)
- /descendant::olist/child::item
- child::para[position()=1]
- child::para[position()=last()-1]
- child::para[position()>1]
- following-sibling::chapter[position()=1]
- /child::doc/child::chapter[position()=5]/child::section[position()=2]
- child::para[attribute::type="warning"]
- child::chapter[child::title='Introduction']
- child::chapter[child::title]
- child::*[self::chapter or self::appendix][position()=last()]
- para
- *
(all element children)
- text()
(all text node children)
- node()
(all children)
- @name
- @*
- //para
- chapter/descendant::para
- */para
(all para grandchildren)
- para[1]
- /doc/chapter[5]/section[2]
- para[@type="warning"]
- para[@type='warning'][5]
- para[5][@type="warning"]
- chapter[title='Introduction']
- chapter[title]
- *[self::chapter or self::appendix]
- Built-in functions:
- for node sets
- for extracting names
- for strings
- Booleans
- numerics
- Numbering elements: xsl:number
Functions from several sources: XPath functions for node sets
- number last() → number equal to context size
- number position()
- number count(node-set)
- node-set id(object)
→ the object with this ID
- string local-name(node-set?)
- string namespace-uri(node-set?)
- string name(node-set?)
- string string(object?)
- string concat(string, string, string*)
- boolean starts-with(string, string)
- boolean contains(string, string)
- string substring-before(string, string)
- string substring-after(string, string)
- string substring(string, number, number?)
- number string-length(string?)
- string normalize-space(string?)
- string translate(string, string, string)
- boolean boolean(object)
- boolean not(boolean)
- boolean true()
- boolean false()
- boolean lang(string)
(is context node in language named?)
- number number(object?)
- number sum(node-set)
- number floor(number)
- number ceiling(number)
- number round(number)
- node-set document(object,
node-set?)
- node-set key(string, object)
- string format-number(number, string, string?)
- node-set current()
- string
unparsed-entity-url(string)
- string
generate-id(node-set?)
- object
system-property(string)
- boolean
element-available(string)
- boolean
function-available(string)
XPath processor must recognize as functions;
XSLT processor must implement.
The
xsl:number element does element numbering:
level = single | multiple | any
count = pattern
from = pattern
value = number-expr
format = {string}
...
Numbering example: Footnote numbers | |
To number footnotes in the document with a single
running arabic number:
<xsl:number level="any"
count="note[@place='foot']"
format="1"/>
Numbering example: Div1 numbers | |
To number div1 elements within a div0 or higher,
Arabic within the body and letters in back matter:
<xsl:number level="single"
count="div1"
format="1. "/>
<!--* ... *-->
<xsl:number level="single"
count="div1|div"
format="A. "/>
Numbering example: Dewey-style sections | |
Assigning Dewey-style numbering to vanilla
div elements:
<xsl:number level="multiple"
count="div"
format="1. "/>
XML to XML transformations | |
Two kinds
- into a different tag set
- into same tag set, with different tagging
(near-identity transform)
Development:
- start with null transform
- work by successive approximation
The identity transform changes the default rules for
elements and attributes:
<xsl:template match="@*|*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
The identity transform changes the default rules for
comments and processing instructions:
<xsl:template match='comment()'>
<xsl:comment>
<xsl:value-of select="."/>
</xsl:comment>
</xsl:template>
<xsl:template match='processing-instruction()'>
<xsl:variable name="pitarget" select="name()"/>
<xsl:processing-instruction name="{$pitarget}">
<xsl:value-of select="."/>
</xsl:processing-instruction>
</xsl:template>
A simple near-identity transform:
<xsl:template match="*">
<xsl:copy>
<xsl:if test="not(@id)">
<xsl:attribute name="id">
<xsl:value-of select="generate-id(.)"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
Named templates, variables | |
- syntax
- usage
- variables and parameters
- recursion
Syntax: supply a name attribute:
<xsl:template name="xyz">
...
</xsl:template>
and call using call-template
<xsl:call-template name="xyz"/>
Use like a function call:
clarify what is happening, call from different
locations.
We can declare variables
<xsl:variable name="pitarget" select="name()"/>
and parameters:
<xsl:param name="n">0</xsl:param>
and call named templates with parameters
<xsl:call-template name="count-down">
<xsl:with-param name="n" select="$n - 1"/>
</xsl:call-template>
Example: find the last slash | |
We have data of the form
x/y/z,
x/y/z/w,
x/y/z/r/r,
x/y/z/r/s,
x/y/v; find the last segment.
findslash(s : string) {
if the string does not contain '/', then {
// no slash, we are done
return s
}
else { // recursion
strip s up to and including '/'
return findslash(s)
}
}
<xsl:template name="findlast">
<xsl:param name="sValue"/>
<xsl:choose>
<xsl:when test='contains($sValue,"/")'>
<xsl:call-template name="findlast">
<xsl:with-param name="sValue">
<xsl:value-of
select='substring-after($sValue,"/")'/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$sValue"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
- sorting (easy)
- grouping (takes some thought)
The xsl:sort element occurs as child of
apply-templates
<xsl:apply-templates select="//w" mode="conc">
<xsl:sort select="."/>
</xsl:apply-templates>
Attributes:
select
data-type (text, number)
order (ascending, descending)
Not easy to solve alone. Declarative nature of XSLT works
against us.
Fortunately, there are recipes. Find one, follow it.
- Sunday a.m.: introductions
- Sunday a.m.: basics (simple transformations, if,
choose, selection
by attribute values)
- Sunday p.m.: modes (e.g. tables of contents)
- Sunday p.m.: XPath
- Monday a.m.: functions, numbering
- Monday a.m.: near-identity transformations
- Monday p.m.: named templates, recursion
- Monday p.m.: sorting and grouping