[20 January 2010]
Consider the XML document shown below, and in particular consider its representation in the XPath 1.0 data model.
<a><b/><b/><b/></a>
How many element nodes are there in this document, regarded as an instance of the XPath 1.0 data model? I think it’s clear that, for purposes of XPath 1.0, the expected answer is four: one of type ‘a
’ and three of type ‘b
’, all children of the ‘a
’ element.
I am finding it unexpectedly difficult to prove that conclusion formally on the basis of the definition of the data model given in the spec. I wonder if anyone else will have better luck.
The document is well-formed and contains, beyond all doubt, one start-tag and three empty-tags. Therefore it contains four elements. The first sentence of XPath 1.0 section 5.2 says there is one element node for each element in the document. By the axiom of choice (which is trivially true for finite sets), there are four element nodes.
Where’s the difficulty?
More details later in another post, but in brief, I believe your second sentence is a non sequitur.
And on second thought, I have begun to wonder about the first sentence of 5.2. I suppose it must either be (a) assuming a mapping between XML documents and instances of the data model (roughly in the style of XDM), or (b) taking the term ‘document’ as short-hand for ‘an instance of the model being defined’. Empirically, (a) seems to be supported by current usage in discussions of XSLT by implementors and standards bodies; (b) seems to be supported by the usage elsewhere in the XPath 1.0 spec.
But if (a), then the sentence is assuming a mapping that has not been defined (and which appears to be a mapping from XML serialized form to model instance; I’m interested also in model instances which never came from XML’s serialized form), and assuming properties for that mapping which need to be established, not assumed into existence. If we assume (a), then I think your reasoning has a gap where it assumes that the XML spec determines how many elements we have to assume in the document shown. That is what I had in mind with my previous comment about non sequitur.
And if (b), then the sentence states a platitude, or a rule of usage: one ‘element’, one ‘element node’. If we assume (b), then I think your reasoning is performing petitio principii: you are assuming that there are four elements, when that is what is to be proven.
Am I wrong?
It’s not clear how formal any such proof could be, given that the XPath model and even more so, XML itself are defined so informally.
@David, actually I have the impression that the data model section of XPath 1.0 was intended to be read fairly formally, and on the whole I think it provides a pretty good abstract definition of the model, straightforward to translate into whatever logical formalism one prefers. In a later post, I’ll show my Alloy formulation of the XPath 1.0 data model.
There are a few slips here and there, I think, including the one at the heart of this post. But on the whole, I think the XPath 1.0 data model does rather a good job, considering the authors lacked tools like Alloy for checking to see just how a given set of rules works out in practice. It’s much cleaner, formally, than we managed in the XML 1.0 spec outside of the grammar.