Introduction to XSLT

As a tool for humanities computing

8 Sorting and grouping

ALLC/ACH 2002, Tübingen

21-22 July 2002

C. M. Sperberg-McQueen

Wendell Piez

TOC | First


I. Monday afternoon 2: sorting and grouping

Overview

previous table of contents next
1 of 12
  1. Sunday a.m.: introductions
  2. Sunday a.m.: basics (simple transformations, if, choose, selection by attribute values)
  3. Sunday p.m.: modes (e.g. tables of contents)
  4. Sunday p.m.: XPath
  5. Monday a.m.: functions, numbering
  6. Monday a.m.: near-identity transformations
  7. Monday p.m.: named templates, recursion
  8. Monday p.m.: sorting and grouping

Sorting and grouping

previous table of contents next
2 of 12
  • sorting (easy)
  • grouping (takes some thought)

Sorting

previous table of contents next
3 of 12
The xsl:sort element occurs as child of apply-templates
    <xsl:apply-templates select="//w" mode="conc">
     <xsl:sort select="."/>
    </xsl:apply-templates>
Attributes:
select
data-type (text, number)
order (ascending, descending)

Group exercises

previous table of contents next
4 of 12
  1. Sort the Gorbals data by surname.
  2. ... by occupation.
  3. ... by town and county of birth.
  4. ... by age and marital status.

Grouping elsewhere

previous table of contents next
5 of 12
A common idiom:
sortedWords = words.sort("lemma");
wordPrev = null;
write("<word-entry>");
for each w in sortedWords {
    wordThis = w.getAttribute("lemma");
    if (wordThis != wordPrev) {
       write("</word-entry>\n<word-entry>");
    }
    ...
    wordPrev = wordThis;
}
write("</word-entry>\n");

Grouping in XSLT

previous table of contents next
6 of 12
Not a common idiom in functional languages!
  • assignment statements
  • we output a tree, not a stream
Three alternatives:
  1. make list of unique items by checking for duplicates
    [not(@lemma=preceding::w/@lemma)]
    at cost of ca. n × (n - 1) / 2 (ouch?)

Grouping

previous table of contents next
7 of 12
Basically: sort, and then test for equality with preceding value:
 <xsl:template match="w" mode="conc">
  <xsl:variable name="type" select="."/>

  <xsl:if test="not(preceding::w[. = $type])">
   <xsl:element name="h3">
    <xsl:value-of select="."/>
   </xsl:element>
  </xsl:if>

  <!--* ... *-->
 </xsl:template>

A simple optimization

previous table of contents next
8 of 12
  1. sort first, then check using
    [not(@lemma=preceding-sibling::w[1]/@lemma)]
    You probably want them sorted anyway.

Grouping

previous table of contents next
9 of 12
Or: sort, test for equality with preceding, select:
<xsl:element name="li">
 <xsl:value-of select="."/>
 <xsl:call-template name="get-scraps-by-file">
  <xsl:with-param name="fn">
   <xsl:value-of select="."/>
  </xsl:with-param>
 </xsl:call-template>
</xsl:element>

Muenchian grouping: keys

previous table of contents next
10 of 12
Called for Steve Muench (Oracle)
  1. Define a key:
    <xsl:key name="lemma-key" 
             match="w" 
             use="@lemma"/>
    Test on
    generate-id(.) = 
        generate-id(key('lemma-key', @lemma)[1])
    or on
    1 = count(. | key('lemma-key',@lemma)[1])

Group exercise: reformatting census data

previous table of contents next
11 of 12
  1. Define key on household number.
  2. Define unique-households variable.
  3. For each household, select all individuals in it and process them.
  4. Write sanity-checker to ensure we have not lost information.

Individual exercises

previous table of contents next
12 of 12
If there is time, consult the exercises page.