Introduction to XSLT

As a tool for humanities computing

4 XPath

ALLC/ACH 2002, Tübingen

21-22 July 2002

C. M. Sperberg-McQueen

Wendell Piez

TOC | First


I. Sunday afternoon 2: XPath

Overview

previous table of contents next
1 of 15
  1. Sunday a.m.: introductions
  2. Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
  3. Sunday p.m.: modes (e.g. tables of contents)
  4. Sunday p.m.: XPath
  5. Monday a.m.: functions, numbering
  6. Monday a.m.: near-identity transformations
  7. Monday p.m.: named templates, recursion
  8. Monday p.m.: sorting and grouping

XPath

previous table of contents next
2 of 15
  • What is XPath?
  • XPath data model
  • Expressions as location ladders
  • Axes
  • Long syntax, short syntax

XPath: an addressing language

previous table of contents next
3 of 15
Many applications need to ‘address’ parts of XML documents:
  • formatting (e.g. XSLT)
  • hyperlinking
  • document construction
  • query / search and retrieval
  • schema / language specification
  • ...
XPath captures the common functionality.
XSLT select and match values are XPath expressions.

XPath data model

previous table of contents next
4 of 15
A document is an ordered tree with
  • a root node
  • element nodes
  • text nodes
  • attribute nodes
  • namespace nodes
  • processing instructions
  • comment nodes
No structure sharing. No entity boundaries. Namespace prefixes resolved.

Data model example

previous table of contents next
5 of 15
greetings.xml drawn as a tree (color-coded).

Data model example

previous table of contents next
6 of 15
greetings.xml drawn as a sideways tree.

XPath expressions

previous table of contents next
7 of 15
An expression is a sequence of steps:
/step/step/step/step ...
Each step
  • starts from some current node(s),
  • moves to some result node(s).
Cf. HyTime, TEI location ladders.

XPath steps

previous table of contents next
8 of 15
A step is
axis::node test [predicate] [predicate] ...
where
axis says what direction to move
node test says which nodes go into result
predicate adds further constraints
E.g. descendant::figure[@rend="svg"]

XPath selection axes

previous table of contents next
9 of 15
  • child (→ e, t, c, p)
  • parent (→ e)
  • attribute (→ a)
  • following, following-sibling (→ e, t, c, p)
  • preceding, preceding-sibling (→ e, t, c, p)
  • self
  • namespace (→ n)
  • ancestor, ancestor-or-self (→ e)
  • descendant, descendant-or-self (→ e, t, c, p)

XPath long syntax: simple

previous table of contents next
10 of 15
  • child::para
  • child::* (all element children)
  • child::text() (all text node children)
  • child::node() (all children)
  • attribute::name
  • attribute::*
  • descendant::para
  • ancestor::div
  • ancestor-or-self::div
  • descendant-or-self::para

Long syntax: more complex

previous table of contents next
11 of 15
  • self::para (context node if para, otherwise nothing)
  • child::chapter/descendant::para
  • child::*/child::para (all para grandchildren)
  • /
  • /descendant::para (all para elements in the document)
  • /descendant::olist/child::item

Long syntax: predicates

previous table of contents next
12 of 15
  • child::para[position()=1]
  • child::para[position()=last()]
  • child::para[position()=last()-1]
  • child::para[position()>1]
  • following-sibling::chapter[position()=1]
  • preceding-sibling::chapter[position()=1]
  • /descendant::figure[position()=42]
  • /child::doc/child::chapter[position()=5]/child::section[position()=2]
  • child::para[attribute::type="warning"]
  • child::para[attribute::type='warning'][position()=5]
  • child::para[position()=5][attribute::type="warning"]
  • child::chapter[child::title='Introduction']
  • child::chapter[child::title]
  • child::*[self::chapter or self::appendix]
  • child::*[self::chapter or self::appendix][position()=last()]

XPath short syntax

previous table of contents next
13 of 15
  • para
  • * (all element children)
  • text() (all text node children)
  • node() (all children)
  • @name
  • @*
  • //para
  • chapter/descendant::para
  • */para (all para grandchildren)
  • para[1]
  • /doc/chapter[5]/section[2]
  • para[@type="warning"]
  • para[@type='warning'][5]
  • para[5][@type="warning"]
  • chapter[title='Introduction']
  • chapter[title]
  • *[self::chapter or self::appendix]

XPath as query language

previous table of contents next
14 of 15
XPath and XSLT already widely used for queries.
  • select elements, attributes, strings, numbers
  • select by position in tree, generic identifier, attribute name, value, content
  • co-occurrence constraints
Some drawbacks:
  • no data types
  • no type safety

Examples: searching

previous table of contents next
15 of 15
  • An XSLT-driven search engine.
  • Exercises from Greenstein, A historian's guide to computing. (See exercises page.)