The axes of XPath

[25 March 2010; error noticed by Dimitre Novatchev fixed 29 March 2010]

Steve DeRose and I have been discussing the XPath [1.0] data model recently (among other things), and in the course of the discussion an interesting observation has emerged.

it’s obvious that some of the axes in XPath expressions are inverses of each other, and also that some are transitive closures of others (or, going the other way, that some are transitive reductions of others). What surprised me a little was that (if for the moment you leave out of account the self and the XYZ-or-self axes, the attribute axis, and the namespace axis [and also preceding and following) all of the XPath axes fit naturally into a pattern that can be represented by three squares. (Will table markup work here? I wonder.) The first square represents the up/down axes:

Up/down
parent ancestor
child descendant

The next square covers sibling relations. Unlike parent and child, which are just short-hand for single steps along the up or down axis, XPath provides no syntactic sugar for preceding-sibling :: * [1] and following-sibling :: * [1], so I’ve invented the names “nextsib” and “prevsib” (marked with a star here to signal that they are invented):

Sideways
*prevsib preceding-sibling
*nextsib following-sibling

The third square describes overall document order; again, I’ve invented names for the single-step relations [note that the names used here for the transitive relations are given by XPath 2.0; XPath 1.0 doesn’t provide notation for them]:

Overall document order
*prevnode >>
*nextnode <<

[In the first version of this post, the right-hand columns were labeled preceding and following, but Dimitre Novatchev reminded me gently that these axes do not in fact correspond to document order: preceding excludes andestors and following excludes descendants. That’s a plausible exclusion, since no one in their right mind would say that chapter one of Moby Dick precedes the first paragraph of Moby Dick. Contains, yes; precedes, no. In fact, I remember getting into an argument with Phil Wadler about this, early on in the days of the XML Query working group, not realizing (a) that the document ordering he was describing was actually prescribed by XPath 1.0, nor (b) that saying that ancestors precede their descendants in document order didn’t mean that the ancestors would have to be present on the preceding axis. Thank you, Dimitre! And sorry, Phil!]

In each table row, the relation on the right is the positive transitive closure of the one on the left, and the one on the left is the transitive reduction of the one on the right.

In each table column, the relations in the top and bottom rows are inverses of each other.

The tables make it easy to see that it suffices to take a single pair of relations on nodes as primitive (e.g. child [or better first-child] and nextsib, or parent and prevsib); everything else in the tree can be defined in terms of the two primitive relations. (It’s not completely clear to me yet whether any two relations will do as long as they are from different tables, or not. Taking nextnode and parent seems to work, as does the pair nextnode and child but nextnode and first-child seems to pose problems — why can child be replaced by first-child in some situations but not others? Hmm.)

There seem to be implications for the formalization of the data model (which is how we got here in the first place), but maybe also for teaching new users how to think about or learn XPath.

2 thoughts on “The axes of XPath

  1. Hi Michael,

    This is *very interesting* article. Here is one first observation:

    It is not correct to label Preceding/following with “overall document order”.

    As we know well, the preceding/following and ancestor/descendant directions are orthogonal and non-overlapping. Therefore, both directions must take part in any attempt to cover the complete document order (of all nodes) of a document.

  2. Dimitre, thank you for the comment. I’m very glad you like the squares.

    And thank you for correcting my error with regard to the preceding and following axes; checking the text of the XPath 1.0 spec I see that you are right. The two axes use document order, but I had apparently forgotten that preceding excludes ancestors and following excludes descendants. It seems to me that there ought to be a square whose right-hand columns are less-than and greater-than in document order (what XPath 2.0 writes as “<<” and “>>”). If preceding and following are retained, then “<<” and “>>” must go to a fourth square.

Comments are closed.