[25 March 2010; error noticed by Dimitre Novatchev fixed 29 March 2010]
Steve DeRose and I have been discussing the XPath [1.0] data model recently (among other things), and in the course of the discussion an interesting observation has emerged.
it’s obvious that some of the axes in XPath expressions are inverses of each other, and also that some are transitive closures of others (or, going the other way, that some are transitive reductions of others). What surprised me a little was that (if for the moment you leave out of account the self
and the XYZ-or-self
axes, the attribute axis, and the namespace axis [and also preceding
and following
) all of the XPath axes fit naturally into a pattern that can be represented by three squares. (Will table markup work here? I wonder.) The first square represents the up/down axes:
Up/down | |
---|---|
parent |
ancestor |
child |
descendant |
The next square covers sibling relations. Unlike parent
and child
, which are just short-hand for single steps along the up or down axis, XPath provides no syntactic sugar for preceding-sibling :: * [1]
and following-sibling :: * [1]
, so I’ve invented the names “nextsib” and “prevsib” (marked with a star here to signal that they are invented):
Sideways | |
---|---|
*prevsib | preceding-sibling |
*nextsib | following-sibling |
The third square describes overall document order; again, I’ve invented names for the single-step relations [note that the names used here for the transitive relations are given by XPath 2.0; XPath 1.0 doesn’t provide notation for them]:
Overall document order | |
---|---|
*prevnode | >> |
*nextnode | << |
[In the first version of this post, the right-hand columns were labeled preceding
and following
, but Dimitre Novatchev reminded me gently that these axes do not in fact correspond to document order: preceding
excludes andestors and following
excludes descendants. That’s a plausible exclusion, since no one in their right mind would say that chapter one of Moby Dick precedes the first paragraph of Moby Dick. Contains, yes; precedes, no. In fact, I remember getting into an argument with Phil Wadler about this, early on in the days of the XML Query working group, not realizing (a) that the document ordering he was describing was actually prescribed by XPath 1.0, nor (b) that saying that ancestors precede their descendants in document order didn’t mean that the ancestors would have to be present on the preceding
axis. Thank you, Dimitre! And sorry, Phil!]
In each table row, the relation on the right is the positive transitive closure of the one on the left, and the one on the left is the transitive reduction of the one on the right.
In each table column, the relations in the top and bottom rows are inverses of each other.
The tables make it easy to see that it suffices to take a single pair of relations on nodes as primitive (e.g. child
[or better first-child
] and nextsib
, or parent
and prevsib
); everything else in the tree can be defined in terms of the two primitive relations. (It’s not completely clear to me yet whether any two relations will do as long as they are from different tables, or not. Taking nextnode
and parent
seems to work, as does the pair nextnode
and child
but nextnode
and first-child
seems to pose problems — why can child
be replaced by first-child
in some situations but not others? Hmm.)
There seem to be implications for the formalization of the data model (which is how we got here in the first place), but maybe also for teaching new users how to think about or learn XPath.
Hi Michael,
This is *very interesting* article. Here is one first observation:
It is not correct to label Preceding/following with “overall document order”.
As we know well, the preceding/following and ancestor/descendant directions are orthogonal and non-overlapping. Therefore, both directions must take part in any attempt to cover the complete document order (of all nodes) of a document.
Dimitre, thank you for the comment. I’m very glad you like the squares.
And thank you for correcting my error with regard to the
preceding
andfollowing
axes; checking the text of the XPath 1.0 spec I see that you are right. The two axes use document order, but I had apparently forgotten thatpreceding
excludes ancestors andfollowing
excludes descendants. It seems to me that there ought to be a square whose right-hand columns are less-than and greater-than in document order (what XPath 2.0 writes as “<<
” and “>>
”). Ifpreceding
andfollowing
are retained, then “<<
” and “>>
” must go to a fourth square.