Fundamental primitives of XSLT programming


A friend planning an introductory course on programming for linguists recently asked me what I thought such linguist-programmers absolutely needed to learn in their first semester of programming. I thought back to a really helpful “Introduction to Programming” course taught in the 1980s at the Princeton University Computer Center by Howard Strauss, then the head of User Services. As I remember it, it consisted essentially of the introduction of three flow-chart patterns (for a sequence of steps, for a conditional, and for a while-loop), with instructions on how to use them that went something like this:

  1. Start with a single box whose text describes the functionality to be implemented.
  2. If every box in the diagram is trivial to implement in your programming language, stop: you’re done. Implement the program using the obvious translation of sequences, loops, and conditionals into your language.
  3. Otherwise choose some single box in the diagram whose functionality is non-trivial (will require more than a few lines of code) and replace it with a pattern: either break it down into a sequence of steps, or make it into a while-condition-do-action loop, or make it into an if-then-else choice.
  4. Return to step 2.

I recommended this idea to my friend, since when I started to learn to program I found these three patterns extremely helpful. As I thought about it further, it occurred to me that the three patterns in question correspond 1:1 to (a) the three constructors used in regular languages, and (b) the three patterns proposed in the 1970s by Michael A. Jackson. The diagrams I learned from Howard Strauss were not the same as Jackson’s diagrams graphically, but the semantics were essentially the same. I expect that a good argument can be made that together with function calls and recursion, those three patterns are the atomic patterns of software design for all conventional (i.e. sequential imperative) languages.

I think the patterns provide a useful starting point for a novice programmer: if you can see how to express an idea using those three patterns, it’s hard not to see how to capture it in a program in Pascal, or C, or Python, or whatever language you’re using. Jackson is quite good on deriving the structure of the program from the structures of the input and output in a systematic way.

The languages I most often teach, however, are XSLT and XQuery; they do not fall into the class of conventional sequential imperative languages, and the three patterns I learned from Howard Strauss (and which Howard Strauss may or may not have learned from Michael A. Jackson) do not help me structure a program in either language.

Is there a similarly small set of simple fundamental patterns that can be used to describe how to build up an XSLT transformation, or an XQuery program?

What are they?

Do they have a plausible graphical representation one could use for sketching out a stepwise refinement of a design?

The joy of testing

[5 November 2009, some additions 6 November 2009]

I’m using Jeni Tennison’s xspec to develop tests for a simple stylesheet I’m writing. An xspec test takes the form of a scenario along the lines of

  • When you match a foo element, do this.
  • When you call function bar with these arguments, expect this result.
  • When you call this named template, expect this result.

It’s a relatively young project, and the documentation is perhaps best described as nascent. Working from the documentation (it does exist, which makes for a nice change from some things I work with), I first wrote nine or ten tests to describe the behavior of an existing stylesheet; when I ran the tests against that stylesheet, all of them reported failures, because my formulation of the expected results violated various silent assumptions of the xspec code. That might indicate opportunities for making the xspec documentation more informative. I’ve spent an enjoyable hour or two this evening, however, looking at the xspec code and figuring out how my test cases are confusing it, reformulating them, and watching the bars of red in the test report change, one by one, to green. It’s nice to have a visible sign of forward progress.

There are other XSLT test frameworks I haven’t tried, and I can’t compare xspec to any of them. But I can say this: if you are developing XSLT stylesheets and aren’t using any of the available test frameworks, you really ought to look into xspec.

A helpful page about XSLT testing is maintained by Tony Graham of Menteith Consulting. If xspec doesn’t work out for you, check out the other frameworks he lists there.

Firefox and namespace nodes: an open plea

[21 October 2009]

One of the long-standing gaps in Mozilla’s support for XSLT 1.0 is its failure to support the XPath namespace axis; for the many stylesheets that don’t use that axis, the gap is not a problem. But access to the namespace axis is essential for many stylesheets that work upon XSLT stylesheets, or XSD schema documents, or any other documents which may have namespace-qualified data in attribute values and element content; Firefox’s failure to support it means that browser-based tools for those vocabularies must often carry warnings like “Works with everything except Firefox.” What a drag.

So it was encouraging, early this year, when a team of students at Simon Fraser University provided a fix for the bug. (Thank you, SFU! Way to go!) What I don’t understand is: given that there is a fix, given that it passes all the tests, given that this fix removes one of the major blots on Firefox’s XSLT conformance, why isn’t it in the product yet?

I wonder if it’s because the responsible parties don’t perceive the bug or the fix as important; that would be understandable, since with 17 votes in favor of fixing the issue, this bug is way down among the weeds. If that’s the reason, then perhaps it would help if those who do feel the bug is important were to raise the vote total of the relevant bug a bit.

So if you care about XSLT support and have a login ID on, please navigate to bug 94270 and vote for the bug. (Click the ‘vote’ link next to the display of the Importance field.) If you care about XSLT support in Firefox and don’t have a login ID on, I urge you to consider getting a login ID, so you can vote for this bug.

If anyone reading this has insight into the dynamics of getting a fix that’s ready and tested into the next release, I for one would be interested to learn more.

[My evil twin Enrique has produced a poetic version of this plea, addressed to the committers of Firefox:

Without a warning, you broke my heart.
I’ve got this QName to take apart,
But the namespace axis returned no nodes.
Can’t find no binding to guide my code.

So come on, Firefox, Firefox please!
an’ I’m beggin’ you, Firefox, and I’m on my knees.
Please fix this bug,
Accept the patch.
The namespace axis —
Let it match, let it match, let it match.

I’d refuse to include it, as being a sacrilege against the memory of Ron McKernan, but he’d just hack the web site and add it anyway. And if he did, he’d make it look as if I had included it myself, against my better judgement. He’s really past all controlling lately.]


[6 May 2009]

Short version: I’ve made a new toy for playing with one aspect of XSD 1.1, namely the conditional inclusion of elements in schema documents, controlled by attributes in the vc (version control) namespace. The experience has reminded me of reasons to like the vc:* design and of reasons to like (and hate) Javascript in the browser.

Continue reading

XPath 1, Enrique 0

[27 January 2009]

My evil twin Enrique came by the other evening, excited and full of himself. “I’ve just found a bug in Saxon!” he announced.

“Really?” I said. “That would be entertaining. Michael Kay keeps finding new bugs in the XSD 1.1 spec; it would be nice to pay him back. Still, I doubt very seriously that you’ve found a bug. What’s the story?”

“I’m working on a stylesheet to generate an SVG image showing a particular hierarchy of objects. And at one point, I have to know how many elements named object there are, descended from the object with name="X", including X itself, if X is a preceding sibling of the current element or one of its ancestors. At another point, I need the same number, but excluding X itself. So I wrote two XPath expressions, like this:


“I expected both to evaluate to 0, if X is not somewhere on our left, and to some pair of numbers n and n – 1, if it is.”

“I think I see what you’re trying to do,” I said. (And I did; I did something very similar myself, not long ago, working on a new type hierarchy diagram for XSD 1.1 Datatypes.) “And what did you get?”

He pulled out his laptop and ran a stylesheet, from which messages reported that the two expressions evaluated sometimes to 0 and 0, and sometimes to 11 and 11, respectively, but never to 12 and 11, which is what, by inspection, we established was what Enrique wanted.

At this point, dear reader, you may already know what Enrique’s mistake was. If so, I congratulate your perspicuity. I confess that I did not. If you share my uncertainty, it might be rewarding to pause now, before reading on, to figure out for yourself why Enrique was wrong.

“So now do you believe I’ve found a bug in Saxon?” asked Enrique.

“Well, no,” I said.

“What do you mean, no?!” he protested. “Object X has eleven descendants of type object. Right?”


“Plus one for self, so the count of objects descended from X by zero or more steps (i.e. including X itself) is twelve, right?”


“So “preceding::object [@name = "X"] / descendant-or-self :: object’ should be returning 12, not 11! Right?”


“You know that ‘//’ is short for descendant-or-self, right?”


[Well, wrong, actually. See below.]

“So it’s a bug! Why won’t you admit that I’ve found a bug in Saxon?”

“Well, let’s put it this way. I know Michael Kay. And I know you. And if your expectations disagree with his code — even if my expectations disagree with his code — well, I know where I’m putting my money. What does xsltproc say?”

Xsltproc also gave the same value to both expressions.

And both Saxon and xsltproc gave the expected answer of 12 for the expression


“A bug in both Saxon and xsltproc?” marveled Enrique. “That’s amazing, I must be brilliant!”

“A bug in both Saxon and xsltproc? That is incredible,” I corrected him. “As in: not believable. Michael can be wrong; that’s possible. Daniel Veillard can be wrong; that’s possible. Michael and DV both wrong, possible, but extremely unlikely. Michael and DV wrong, and you right? Slightly less likely than the spontaneous formation, by a set of atoms, of a working Infinite Improbability Drive.”

[Enrique doesn’t need to be told this, but some readers may need to be reminded that xsltproc is the command-line interface to libxslt, which is written by my friend and former W3C colleague Daniel Veillard, best known to friends as DV. And Michael Kay, of course, is the author of Saxon. If you don’t know what Saxon and libxslt are, dear reader, why on earth are you reading this posting?]

But what was the story?

Eventually, a certain amount of groveling through the prose of section 2.5 of the XPath 1.0 spec showed where Enrique and I had gone wrong. “//” is short for “descendant-or-self” only in a rough and ready way. In particular, “$X//object” is not equivalent to “$X/descendant-or-self::object”, which is clearly what Enrique (and I) had reckoned. Strictly speaking, what XPath says is:

// is short for /descendant-or-self::node()/.

So “$X // object” is equivalent not to “$X/ descendant-or-self :: object”, but to “$X/ descendant-or-self :: node()/ child::object” — or (confusingly, to me, and despite the note in the XPath spec which appears to say differently) to “$X / descendant :: object”. (The note is making a point about how predicates are evaluated, which doesn’t apply in Enrique’s case.)

Enrique was crestfallen; he had been sure that his technical credibility would rise sharply if he had found a bug in Saxon and libxslt. I, on the other hand, was relieved; I now knew how to fix the bug in the stylesheet that generates the SVG image of the XSD 1.1 type hierarchy.

Me, I’m going back to my old habit of just ignoring the abbreviated syntax and using the full syntax all the time: it’s less error prone, because it’s more explicit.