XPath 1, Enrique 0

[27 January 2009]

My evil twin Enrique came by the other evening, excited and full of himself. “I’ve just found a bug in Saxon!” he announced.

“Really?” I said. “That would be entertaining. Michael Kay keeps finding new bugs in the XSD 1.1 spec; it would be nice to pay him back. Still, I doubt very seriously that you’ve found a bug. What’s the story?”

“I’m working on a stylesheet to generate an SVG image showing a particular hierarchy of objects. And at one point, I have to know how many elements named object there are, descended from the object with name="X", including X itself, if X is a preceding sibling of the current element or one of its ancestors. At another point, I need the same number, but excluding X itself. So I wrote two XPath expressions, like this:

count(preceding::object[@name="X"]//object)
count(preceding::object[@name="X"]/descendant::object)

“I expected both to evaluate to 0, if X is not somewhere on our left, and to some pair of numbers n and n – 1, if it is.”

“I think I see what you’re trying to do,” I said. (And I did; I did something very similar myself, not long ago, working on a new type hierarchy diagram for XSD 1.1 Datatypes.) “And what did you get?”

He pulled out his laptop and ran a stylesheet, from which messages reported that the two expressions evaluated sometimes to 0 and 0, and sometimes to 11 and 11, respectively, but never to 12 and 11, which is what, by inspection, we established was what Enrique wanted.

At this point, dear reader, you may already know what Enrique’s mistake was. If so, I congratulate your perspicuity. I confess that I did not. If you share my uncertainty, it might be rewarding to pause now, before reading on, to figure out for yourself why Enrique was wrong.

“So now do you believe I’ve found a bug in Saxon?” asked Enrique.

“Well, no,” I said.

“What do you mean, no?!” he protested. “Object X has eleven descendants of type object. Right?”

“Right.”

“Plus one for self, so the count of objects descended from X by zero or more steps (i.e. including X itself) is twelve, right?”

“Right.”

“So “preceding::object [@name = "X"] / descendant-or-self :: object’ should be returning 12, not 11! Right?”

“Right.”

“You know that ‘//’ is short for descendant-or-self, right?”

“Right.”

[Well, wrong, actually. See below.]

“So it’s a bug! Why won’t you admit that I’ve found a bug in Saxon?”

“Well, let’s put it this way. I know Michael Kay. And I know you. And if your expectations disagree with his code — even if my expectations disagree with his code — well, I know where I’m putting my money. What does xsltproc say?”

Xsltproc also gave the same value to both expressions.

And both Saxon and xsltproc gave the expected answer of 12 for the expression

count(preceding::object[@name="X"]/descendant-or-self::object)

“A bug in both Saxon and xsltproc?” marveled Enrique. “That’s amazing, I must be brilliant!”

“A bug in both Saxon and xsltproc? That is incredible,” I corrected him. “As in: not believable. Michael can be wrong; that’s possible. Daniel Veillard can be wrong; that’s possible. Michael and DV both wrong, possible, but extremely unlikely. Michael and DV wrong, and you right? Slightly less likely than the spontaneous formation, by a set of atoms, of a working Infinite Improbability Drive.”

[Enrique doesn’t need to be told this, but some readers may need to be reminded that xsltproc is the command-line interface to libxslt, which is written by my friend and former W3C colleague Daniel Veillard, best known to friends as DV. And Michael Kay, of course, is the author of Saxon. If you don’t know what Saxon and libxslt are, dear reader, why on earth are you reading this posting?]

But what was the story?

Eventually, a certain amount of groveling through the prose of section 2.5 of the XPath 1.0 spec showed where Enrique and I had gone wrong. “//” is short for “descendant-or-self” only in a rough and ready way. In particular, “$X//object” is not equivalent to “$X/descendant-or-self::object”, which is clearly what Enrique (and I) had reckoned. Strictly speaking, what XPath says is:

// is short for /descendant-or-self::node()/.

So “$X // object” is equivalent not to “$X/ descendant-or-self :: object”, but to “$X/ descendant-or-self :: node()/ child::object” — or (confusingly, to me, and despite the note in the XPath spec which appears to say differently) to “$X / descendant :: object”. (The note is making a point about how predicates are evaluated, which doesn’t apply in Enrique’s case.)

Enrique was crestfallen; he had been sure that his technical credibility would rise sharply if he had found a bug in Saxon and libxslt. I, on the other hand, was relieved; I now knew how to fix the bug in the stylesheet that generates the SVG image of the XSD 1.1 type hierarchy.

Me, I’m going back to my old habit of just ignoring the abbreviated syntax and using the full syntax all the time: it’s less error prone, because it’s more explicit.