[7 April 2010]

The old Paul Simon song “There must be fifty ways to leave your lover” keeps running through my head. I can see close to fifty ways to define the XPath 1.0 data model in terms of (a) a set of nodes and (b) two relations defined on that set, which are taken as primitive; all other relations (i.e. all the other axes of XPath) are defined in terms of those two primitive relations.

Strictly speaking, I make it forty-eight ways. First, pick any single relation from any of the following four groups:

- parent, child, ancestor, descendant
- prevsib, nextsib, preceding-sibling, following-sibling
- prevnode, nextnode, document-order preceding (>>), document-order-following (<<)

That’s twelve possibilities.

Second, pick any single relation from either of the other two groups; that makes eight possible choices, times twelve first choices, for ninety-six ordered pairs of relations. But the order doesn’t matter, so we have forty-eight distinct pairs.

In recent days, taking some pairs not quite at random, defining the constraints they must satisfy in order to be a suitable basis for defining an XPath 1.0 tree, and defining all the other relations in terms of the chosen primitives, I have learned a couple of mildly interesting things.

- It’s more convenient to take parent as a primitive, than child.
- It’s more convenient to take one of the single-step relations (parent, child, nextsib, prevsib, nextnode, prevnode) than one of their transitive closures (ancestor, descendant, etc.).

The nextsib relation, for example, needs to be acyclic, functional, injective, and not transitive. If it is, then its transitive closure following-sibling will automatically be suitable. But if you start with following-sibling and specify (as you will need to) that it is transitive and acyclic, that does not suffice to guarantee that its transitive reduction nextsib is functional and injective. You can of course simply say that a following-sibling relation is suitable if and only if (a) it’s transitive, (b) it’s acyclic, and (c) its transitive reduction is functional and injective, but now you’re forcing the reader to work with two relations, not just one: both following-sibling and its transitive reduction. It would be interesting either to find a way to constrain the closure directly to ensure the necessary properties in the reduction, or else to find a proof that there is no way to constrain a closure to ensure that its reduction is functional and injective without explicitly referring to the reduction.

- From any relation in any group, the other relations in that group are (relatively) easy to derive in terms of inversion, transitive closure, or transitive reduction. Defining a relation in the third group typically proves more interesting. And while it’s more convenient to choose the primitive relations from among the reductions, it turns out that at least in some cases it’s easiest to define the third group in terrms of one of the closures. For example, given the parent and next-sibling relations, it proves easier to define document-order-following in terms of the primitives than to define next-node.

It occurs to me to wonder whether there are ways to define XPath 1.0 trees that don’t reduce to or include one of these forty-eight.

The order doesn’t matter, if it’s only idle chatter of a comb’natorial kind!

Second, pick any single relation from either of the other two groups; that makes eight possible choices,

maybe you are simplifying to a single document, otherwise I think you have to use a relation from group 3 as groups 1 and 2 are only defined for nodes in the same document, so you can’t define document order which is defined to apply (if not with a defined result) to nodes in different documents. So that’s 4*8=only 32 ways

also to define document order in terms of nextnode you’d have to use the fact that there are only countably many xml documents, so practically speaking it’s vastly more convenient to take document order (or equivalently reverse document order) as primitive.

David, you are quite right; I have been simplifying life by assuming that the set taken as a starting point will constitute the nodes of a single instance. In this I believe I am following the example of the XPath 1.0 spec.

I am not sure, though, whether in the more general case it’s essential to take one of the document-order relations as primitive. When one defines the model in terms of parent and nextsib (for example) (see the Alloy model on the Web for details), it’s necessary to specify explicitly that in document order, descendants of any node all precede its next sibling; it feels off hand as if something quite similar ought to suffice to ensure that document order behaves properly. If we note, for example, that in a full instance, exactly one document node is reachable from any node by following the parent arc zero or more times, and define docnode(N) as denoting that document node, then we can say that for all nodes A, B, and C, (A << B << C and docnode(A) = docnode(C)) implies docnode(B) = docnode(C). Document order is then not uniquely determined by the rules (but that's already the case with attributes and namespace nodes). I shall have to try to see if I can work that out more explicitly. But I will want to study the 2.0 data model spec first, I think. On countability, I'm not sure I follow your argument. I think of nextnode as the successor function of document order, and the strict total document order as its transitive closure, so getting to document order from next node seems hardly a step. And in the definition of document order in the Alloy model, I am not aware of any implicit appeal to countability (which does not mean there is none.) It did become clear along the way true that going the other way round, from the << relation to the nextnode relation, requires the premise that the set of nodes is finite, in order to ensure that the number of nodes between any two nodes in the document order is finite. (There is no succeessor function on the real numbers.) Is that related to your point?

> In this I believe I am following the example of the XPath 1.0 spec.

Xpath1 certainly requires document order to be defined across documents, this is why in section 2.2 while defining the following axis in terms of document order, there needs to be an extra restriction that the nodes are in the same document.

the following axis contains all nodes in the same document as the context node that are after the context node in document order,

It’s left rather dangerously implicit in xpath1 but then made explicit in xslt1 (last para of 12.1) when the document() function is defined.

Yes quite, a successor function basically only gets you the natural numbers, ie a countable set, which is fine in this case as the set of nodes in all xml documents is countable.

> It did become clear along the way true that going the other way round, from the << relation to the nextnode relation, requires the premise that the set of nodes is finite

you only need it to be countable, not finite. Given < on the natural numbers (or integers) you can derive the successor function, despite the fact they are infinite sets.

As you say, you can’t do that on the reals as it’s not a countable set.

So if you assume that there is a < I am not sure, though, whether in the more general case itâ€™s essential to take one of the document-order relations as primitive.

I can’t see how just given in-document relations you can define <<.

In a single document a<< b is definable as b is a descendant of a or b follows a. But if a and b are in different documents then none of the relations in sets 1 or 2 apply at all.