[25 March 2010; error noticed by Dimitre Novatchev fixed 29 March 2010]
Steve DeRose and I have been discussing the XPath [1.0] data model recently (among other things), and in the course of the discussion an interesting observation has emerged.
it’s obvious that some of the axes in XPath expressions are inverses of each other, and also that some are transitive closures of others (or, going the other way, that some are transitive reductions of others). What surprised me a little was that (if for the moment you leave out of account the
self and the
XYZ-or-self axes, the attribute axis, and the namespace axis [and also
following) all of the XPath axes fit naturally into a pattern that can be represented by three squares. (Will table markup work here? I wonder.) The first square represents the up/down axes:
The next square covers sibling relations. Unlike
child, which are just short-hand for single steps along the up or down axis, XPath provides no syntactic sugar for
preceding-sibling :: *  and
following-sibling :: * , so I’ve invented the names “nextsib” and “prevsib” (marked with a star here to signal that they are invented):
The third square describes overall document order; again, I’ve invented names for the single-step relations [note that the names used here for the transitive relations are given by XPath 2.0; XPath 1.0 doesn’t provide notation for them]:
|Overall document order|
[In the first version of this post, the right-hand columns were labeled
following, but Dimitre Novatchev reminded me gently that these axes do not in fact correspond to document order:
preceding excludes andestors and
following excludes descendants. That’s a plausible exclusion, since no one in their right mind would say that chapter one of Moby Dick precedes the first paragraph of Moby Dick. Contains, yes; precedes, no. In fact, I remember getting into an argument with Phil Wadler about this, early on in the days of the XML Query working group, not realizing (a) that the document ordering he was describing was actually prescribed by XPath 1.0, nor (b) that saying that ancestors precede their descendants in document order didn’t mean that the ancestors would have to be present on the
preceding axis. Thank you, Dimitre! And sorry, Phil!]
In each table row, the relation on the right is the positive transitive closure of the one on the left, and the one on the left is the transitive reduction of the one on the right.
In each table column, the relations in the top and bottom rows are inverses of each other.
The tables make it easy to see that it suffices to take a single pair of relations on nodes as primitive (e.g.
child [or better
prevsib); everything else in the tree can be defined in terms of the two primitive relations. (It’s not completely clear to me yet whether any two relations will do as long as they are from different tables, or not. Taking
parent seems to work, as does the pair
first-child seems to pose problems — why can
child be replaced by
first-child in some situations but not others? Hmm.)
There seem to be implications for the formalization of the data model (which is how we got here in the first place), but maybe also for teaching new users how to think about or learn XPath.