Tell me, Captain Yossarian, how many elements do you see?

[22 January 2010]

In an earlier post, I asked how many element nodes are present in the following XML document’s representation in the XPath 1.0 data model.

<a><b/><b/><b/></a>

I think the spec clearly expects the answer “four” (a parent and three children). More than that, I think the spec reflects the belief of its authors and editors that that answer follows necessarily from the properties of the data model as defined in section 5 of the spec.

But I don’t think “four” is the only answer consistent with the data model as defined.

In particular, consider the answer “two”: one ‘a’ element node, and one ‘b’ element node (which for brevity I’ll just call A and B from here on out; note that A and B are element nodes, whose generic identifiers happen to be ‘a’ and ‘b’.). As far as I can tell, the abstract object just defined obeys all the rules which define the XPath 1.0 model. The rules which seem to apply to this document are these:

  1. “The tree contains nodes.”

    Check.

  2. “Some types of nodes … have an expanded-name …”

    Check: here the names are “a” and “b”.

  3. “There is an ordering, document order, defined on all the nodes in the document …”

    Check: in the model instance I have in mind the nodes are ordered. (In fact they have a total order, which is more than the spec explicitly requires here.) The root node is first, A second, and B third.

  4. “… corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities.”

    Check.

  5. “Thus, the root node will be the first node.”

    Check.

  6. “Element nodes occur before their children.”

    Check: The element node A occurs before its child B in the ordering.

  7. “Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities).”

    Check: the start-tags for B begin at positions 4, 8, and 12 of the document’s serial form (counting from 1), and the start-tag for A begins at position 1. So the order of the start-tags in the XML matches the order of the nodes in the model.

    If we had several elements with multiple occurrences and thus multiple start-tags, and the positions of the start-tags were intermingled (as in <a> <b/> <c/> <b/> <c/> <b/> </a>), then it would appear that we had only a partial order on them. if the spec specified that document order was a total ordering over all nodes, we might have a problem. But it doesn’t actually say that; it just speaks of an “ordering”; it would seem strange to argue that a partial ordering is not an “ordering”.

  8. “Root nodes and element nodes have an ordered list of child nodes.”

    Check: the root node’s list of children is {1 → A}, A has the list {1 → B, 2 → B, 3 → B}, and B has the empty list {}.

  9. “Nodes never share children: if one node is not the same node as another node, then none of the children of the one node will be the same node as any of the children of another node.”

    Check: the sets {A}, {B}, and {} (representing the children, respectively, of the root node, A, and B) are pairwise disjoint.

  10. “Every node other than the root node has exactly one parent, which is either an element node or the root node.”

    Check. A has the root node as its parent, B has A as its parent.

  11. “A root node or an element node is the parent of each of its child nodes.”

    Check: the root’s only child is A, and A’s parent is the root. A’s only child is B, and B’s parent is A.

  12. “The descendants of a node are the children of the node and the descendants of the children of the node..”

    Check. The descendants of the root node are {A, B}, those of A are {B}, those of B are {}.

That’s it for the general rules; I think it’s clear that the construction we are describing satisfies them. The subsections of section 5 have some more specific rules, including one that is relevant here.

  1. “There is an element node for every element in the document.” (Sec. 5.2.)

    This rule was cited by John Cowan in his answer to the earlier post; it seems to me it can be taken in either of two ways.

    First, we can take it (as John did, and as I did the first time through this analysis) as saying that for each element node in an instance of the data model, there is an element in the corresponding serial-form XML document, and conversely (I read it as claiming a one to one correspondence) for every element in the serial-form document, there is an element node in the data model instance.

    In this case, the rule seems to me to have two problems. The first problem is that the rule assumes a mapping between XML serial-form documents and data model instances and further assumes (if we take the word “the” and the use of the singular seriously — we are, after all, dealing with a formal specification written by a group of gifted spec-writers, and edited by two of the best in the business) that the mapping from data model instance to serial-form document is a function. But how can it be a function, given that the data model does not model insignficant white space? There are infinitely many serial-form XML documents equivalent to any given data model instance. Serialization will not be a function unless additional rules are specified. And in any case, when we set out to define a formal data model as is done in the XPath 1.0 spec, I think the usual idea is that we should define the data model in such a way as to make it possible to prove that every data-model instance corresponds to a class of XML documents treated as equivalent for XPath purposes, and that every XML document corresponds to a data model instance. If the rule really does appeal to the number of elements in the serial-form XML document, then it’s assuming, rather than establishing, the correspondence. It’s hard to believe that either Steve DeRose or James Clark could make that mistake.

    The second problem, on this reading of the rule, is that it’s hard to say whether a given data model instance obeys the rule, because it’s not clear that XML gives a determinate answer for the question.

    Some argue that XML documents are, by definition, strings that match the relevant production of the XML spec (on this see my post of 5 March 2008); by the same logic we can infer that an element is a string matching the element production.

    [Note: For what it’s worth I don’t think the XML spec explicitly identifies either documents or elements with strings; the argument that XML documents and elements are strings rests on the claim that they can’t be anything else, or that making them anything else would make the spec inconsistent. As I noted in my blog post of 2008, there is at least one passage which seems to assume that documents are strings (it speaks of a document matching the document production), but I believe that passage is just a case of bad drafting.]

    If for discussion’s sake we accept this argument, then it seems we must ask ourselves: is the string consisting of the four characters U+003C, U+0062, U+002F, U+003E, in order, one string or three strings?

    The answer, as students of philosophy will have been shouting out at home for some moments now, is “yes”. If by character you mean ‘character type’, then one string (or string type). If on the other hand you mean ‘character token’, then for the document shown above, I think pretty clearly three strings (string tokens).

    So, on this first reading of the rule, check. Two distinct elements in the XML (counting string types), two in the data model instance. (To show that this rule excludes the model instance we’re discussing, it would be necessary to show that the serialized XML document has four elements, and that counting only two elements is inconsistent with the XML spec. Given how coy the XML spec is on the nature of XML documents, I don’t believe such a showing possible.)

    The second reading of the rule is that “document” does not mean, in this sentence, something different from the data model instance, but is just a way of referring to the entirety of the data model instance itself. A quick glance at the usage of the word “document” in the XPath 1.0 spec suggests that that is in fact its most common usage. In recent years, influenced perhaps by the work on the XPath 2.0 data model, with formalists of the caliber of Mary Fernández and Phil Wadler, many people have begun to think it natural to define an abstract model independently of the XML spec, and then (as I suggested above) establish in separate steps that there is a correspondence between the set of all XML documents viewed as character sequences and the set of all instances of the data model.

    The XPath 1.0 spec seems to take a slightly different tack, rhetorically. The definition of the data model begins

    XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree.

    I take this as a suggestion that the data model instance operated on by XPath 1.0 can be thought of not as a thing separate from the XML document (whatever that is) but as a particular way of looking at and thinking about the XML document. I think it’s true that there was (historically speaking) no consensus among the XML community at that time that the term XML document referred to a string, as opposed to a tree. I think the idea would have met fierce resistance.

    On this reading, the rule quoted above is either a vacuous statement, or a statement about usage, establishing the correspondence (or equivalence) between the terms element and element node.

    So, on this second reading, check. Two elements, two element nodes. Same thing, really, element node and element.

As I say, I think it’s quite clear which answer the XPath 1.0 spec intends the question to have: plenty of places in the spec clearly rely on element nodes never having themselves as siblings, just as plenty of places rely on element nodes never having more than one parent. Both properties are a common-sensical interpretation of the element structure of XML. I believe the point of defining the data model explicitly is to eliminate, as far as possible, the need to appeal to common sense and “what everyone knows”, to get the required postulates down on paper so that any implementation which respects those postulates and obeys the constraints will conform and inter-operate. For the parent relation, the definition of the model makes the common-sense interpretation of XML explicit. But not (as far as I can see) for the sibling relation.

Perhaps the creators of the XPath 1.0 spec felt that no explicit statement about no elements being their own siblings was necessary, because it followed from other properties of the model as specified. If so, I think either I must have missed something, or (less likely, but still possible) they did. If the property is to hold for all instances of the model, and if it does not follow from the current definition of the model, then perhaps it needs to be stated explicitly as part of the definition of the model.

[When he reached the end of this post, my evil twin Enrique turned to me and asked “Who’s Yossarian? Was he a member of the XSL Working Group?” “No, he was a character in Joseph Heller’s novel Catch 22. The title of the post is a reference to an elaborate bit in chapter 18 of the novel.” “And by ‘elaborate,’” mused Enrique, “you mean —” “Exactly: that it’s too long to quote here and still claim fair use. Besides, this isn’t a commentary on Catch 22. Just search the Web (or the book) for the phrase ‘I see everything twice.’”]

4 thoughts on “Tell me, Captain Yossarian, how many elements do you see?

  1. Grumble grumble shoulda known grumble type token grumble grumble Talmudistic grumble grumble grumble.

    Now that that’s over with: I think it would be an excessively arbitrary reading of the XML Rec to suppose that a single element could have three distinct start-tags. The definition of start-tag at the beginning of Section 3.1 says “The beginning of every non-empty XML element is marked by a start-tag”, not “… by one or more start-tags”. Empty-tags may be used in place of start-tags and empty-tags for elements which have no content. So: four start/empty-tags, four elements. Non sequitur? De contrario, sequitissimur!

    Turning to the XPath 1.0 Rec, we are told that Section 5 “describes how XPath models an XML document as a tree.” So the document is that which conforms to the XML Rec, and the rules for construction of the data model say that for every element in the document there is to exist one element node in the data model. The fact that a document model can be serialized in more than one way as a document is irrelevant, because what counts here is that a document can be deserialized in only one way as a document model. “Same thing, really, element node and element”, so four elements, four element nodes.

  2. Since eternity of XPath 1.0 being in existence, we have come to know, that the answer to this question will be, “four”. I haven’t gone (again) to the sections of the XPath 1 spec, which might have an answer to this (but I will).

    But running following XPath expression, with most of known, good XPath 1.0 engines can tell us the answer:

    count(//*)

    The answer I see, with Xalan, Saxon & MSXSL is 4 (so major vendors, agree with the answer, “four”).

  3. Talmudistic, do you think? Thank you for the compliment, unworthy though I am of it.

    Let’s rewrite the document so we don’t have to keep switching back and forth between start- and sole-tags. I understand you to claim that in the document <a> <b></b> <b></b> <b></b> </a>, there are four start-tags, and to refute in this way the claim that an element is a string type, rather than a string-token.

    But if an element is a string type, then presumably a start-tag is also a string type. The view that there are only two elements in the document is perfectly consistent with the words you quote from the XML 1.0 spec: on that view the (single) element U+003C, U+0062, U+003E, U+003C, U+002F, U+0062, U+003E, which occurs three times in the document, begins with the (single) start-tag U+003C, U+0062, U+003E, which also occurs three times. In this document (on that view), two start-tags, two elements.

    It does not seem to me that the two-element analysis of the document has yet been refuted.

  4. “Talmudic” and “Talmudistic” aren’t synonyms: the first is neutral, the second definitely negative. Though it seems that most dictionaries don’t recognize the distinction.

Comments are closed.