A small gotcha in Alloy’s closure operator

March 24th, 2010

[24 March 2010]

Consider the following Alloy model:

sig Node {}
one sig gl { r, s : Node -> Node }{ s = *r }
run {}

It has no instances, and the Alloy Analyzer comments laconically “Predicate may be inconsistent.” But the similar model below does have instances. Why?

sig Node {}
one sig gl { r, s : univ -> univ }{ s = *r }
run {}

I ran into this phenomenon today, when I was trying to do some work relating to the definition of document order in the XPath data model. It’s convenient to have both a relation like the successor relation of arithmetic on the natural numbers, which gives you the next item in document order, and a relation like the less-than-or-equals relation of arithmetic, which is the reflexive transitive closure of the successor relation. And if you want to have both, you will want to specify, as is done above, that the one is the reflexive transitive closure of the other. And when you do, it’s a bit alarming to be told (and mostly very quickly) that nothing in your model has any instances at all.

It took me a couple hours to reduce the problem to the terms shown above (there were several complications which looked more likely to be the cause of the trouble, and which took time to eliminate), and then a little more time to realize that declaring the relations in question as Node -> Node or as univ -> univ made a difference.

I invite the interested reader, if a user of Alloy, to consider the two models and explain why one has instances and the other doesn’t. No peeking at what follows until you have a theory.

Found it? If you found it without spending a couple hours on it, my hat’s off to you.

The * operator produces the reflexive transitive closure of a binary relation r by essentially taking the union of the positive transitive closure of r and the identity relation iden. That is, for all relations r, *r is defined as ^r + iden.

The problem is that iden covers everything in the model, including the gl (globals) object and the automatically included integers. And the upshot is that s cannot satisfy the constraint s = *r while also satisfying its declaration (Node -> Node).

In his book Software Abstractions, Daniel Jackson remarks on the oddity of * including ‘irrelevant’ tuples in the relation, but (as he points out) it almost never matters, because in many context the irrelevant tuples are dropped out during evaluation of the containing expression. The consequence is that it’s possible to work with Alloy (as I have obviously been doing) with a mental model in which * is somehow smart enough to add only those tuples which are ‘relevant’ to the relation whose closure is being taken. That mental model proves to be an over-complexification.

One reason I like Alloy a lot is that it allows the user to operate at a fairly high level of abstraction, if you can just find the right level. Working in Alloy presents fewer of the small issues of syntax and data typing and so on that can bedevil attempts to explore problems even in high-level programming languages, and so you mostly get to your results a lot faster. But I guess no formal system is ever completely free of cases where the user stares blankly at the screen for some indefinite period of time, trying to figure out why the system is producing such a counter-intuitive and obviously wrong result.

In the words of the old MVS error message: Probable user error. Solution: correct and re-submit.

Day of Digital Humanities, 18 March 2010

March 17th, 2010

[17 March 2010]

Tomorrow I’ll be participating in a mass experiment with self-consciousness, the 2010 edition of the Day of Digital Humanities. The organizers have persuaded close to 150 people who self-identify with the description “digital humanist” (list) to blog, during the course of 18 March, about what it is they actually spend their time doing. “The goal of the project” (say the organizers) “is to create a web site that weaves together the journals of the participants into a picture that answers the question, ‘Just what do computing humanists really do?’”

For the day, I’ll be using the special Day of Digital Humanities blog set up for me by the organizers; the blogs of all participants are aggregated on the project site; there is also an RSS feed.

XML Prague is over

March 14th, 2010

[14 March 2010]

XML Prague took place yesterday and today, and I’m still coming down from the adrenaline high. Lots of good talks, in a single-tracked conference; I had the task of trying to knit them all together in the closing.

I’ll do a fuller trip report later, if working on my taxes doesn’t prevent it. But Norm Walsh has already asked that I post at least the last bit of my remarks.

[Readers who did not attend the conference or follow the streaming video need to know that in the opening sessions of the two conference days, Tony Graham and Sharon Adler had each referred at critical moments to the book of Genesis. One of Sharon's slides bore the title “In the beginning was SGML,”, and Tony began his talk on Saturday with an extended reference to the book of Genesis: “In the beginning was the page. And the page was without form and void, ...” Sharon wondered aloud whether people involved with descriptive markup all have God complexes; she may have something there. The text below also has some other references to things said at the conference, but I think for now I'll spare readers the detailed annotation necessary to explain them all. Apologies in advance to any readers made uncomfortable by parodies of scripture.]

At the end of my talk, I described wandering through the streets of Prague, trying to find my way from the Strahov Monastery, where the conference dinner was held, back to my hotel, when suddenly I had a vision — one might almost say, a revelation.

And I saw in the right hand of him that sat on the throne an XML document canonicalized and serialized with EXI, containing seven pi-trees encrypted with seven public-key encryption key pairs.

And I saw a strong angel proclaiming with a loud voice, Who is worthy to parse the XML document, and to decrypt the encryption thereof?

And no one in heaven, nor in earth, neither under the earth, was able to parse the XML document, neither by buffering it in memory nor by processing it in streaming mode.

And I wept much, because no man was found worthy to parse and to process the XML document, neither to exploit its vocabulary-specific semantics nor to perform vocabulary-independent pretty-printing thereon.

And one of the elders saith unto me, Weep not: behold, the standards-compliant XML application, with support for C14n and EXI, for XML encryption and schema validation, and for interoperable stylesheet technologies like XSLT 17.2 and XSL FO 42.0, hath prevailed to open the XML document, and to decrypt the seven encryptions thereof, and to display it coherently, yea even for those who abhor the sight of angle brackets and prefer beautiful well-formatted text with tasteful images.

And every creature which is in heaven, and on the earth, and under the earth, and such as are in the sea, and all that are in them, heard I saying, Blessing, and honour, and glory, and power, be unto those that preserve data, and provide access to information, if not for ever and ever, then at least for the foreseeable future.

And the presentations, and the coffee breaks, were the second day.

And the nine conference organizers said, Amen. And the representatives of the two gold sponsors, and the four silver sponsors, and the four bronze sponsors, and the five media partners, and the three sister events, stood up and said Amen. And the one hundred and forty-four conference particpants clapped their hands and thanked the conference organizers for a great conference focusing on information that shall outlive the applications which create and process it.

XML for the Long Haul

February 23rd, 2010

[23 February 2010]

The organizing committee for Balisage 2010 have announced the topic of this year’s one-day pre-conference symposium: “XML for the Long Haul: Issues in the Long-term preservation of XML,” and have issued a call for participation. The basic question is roughly this: what do we need to do to make sure that XML-encoded data is usable for long periods? Descriptive markup was invented by people who needed and desired longevity, application independence, and device independence for their data, so longevity is often used as a selling point for the use of SGML and XML. And as sometimes happens with selling points, the precise nature of the relation between XML and long-lived data is sometimes obscured, to the point where some potential customers may believe they have been told that the use of XML in itself guarantees data longevity. And maybe they have, but not (I should think) by anyone who knows what they are talking about. The use of XML, or more generally descriptive markup, may be a necessary condition of data longevity, but it’s unlikely to be sufficient, just as a hammer may be necessary (or extremely helpful) in getting a nail driven, but buying a hammer does not by itself get the nail into the wall.

There’s a lot to be said about the facets and ramifications of the topic, but I think I’ll save those for later posts. For now, I’ll just say that I’ll be chairing the symposium this year, and I hope to see readers of this blog in Montréal in August.

[My evil twin Enrique had been tugging my elbow for some time, and now asked “So why is the logo a moving truck? Will non-native speakers of English understand the reference?” I don't know (but if you do, I'm interested to learn: native speakers of other languages, please speak up! Does the logo make sense outside of English?), but I can at least explain. The English phrase “long haul” refers most literally to long distances, especially for the transport of freight or people (as in “long-haul flights” and “long-haul trucking”). In an extended sense (originally metaphorical, I guess) it denotes a protracted or difficult task (“we're in it for the long haul”) or an extended period of time. Long-term preservation of data and meaning involves a long haul both in the sense of being a difficult task and of involving long period of time. “Oh,” said Enrique. “I get it! The logo is a truck used for long-haul freight transport, the way XML may be used for long-haul preservation of information. Don't you think you should explain that somewhere?” “Maybe,” I said. Maybe I should.

ACH and ALLC co-sponsoring Balisage

February 12th, 2010

[12 February 2010]

The Association for Computers and the Humanities and the Association for Literary and Linguistic Computing have now signed on as co-sponsors of the Balisage conference held each year in August in Montréal. They join a number of other co-sponsors who also deserve praise and thanks, but I’m particularly happy about ACH and ALLC because they have provided such an important part of my intellectual home over the years.

Balisage will take place Tuesday through Friday, 3-6 August, this year; on Monday 2 August there will be a one-day pre-conference symposium on a topic to be announced real soon now. It’s a conference for anyone interested in descriptive markup, information preservation, access to and management of information, accessibility, device independence, data reuse — any of the things that descriptive markup helps enable. The deadline for peer review applications is 19 March; the deadline for papers is 16 April. Time to start thinking about what you’re going to write up; you don’t want to be caught up short at the last minute, without time to work out your idea properly.

Mark your calendars!

XML Prague 2010

February 10th, 2010

[10 February 2010]

Friends and colleagues who have attended XML Prague in previous years have come back with such enthusiastic tales that I have always wished I could attend. Smart people, good discussions, and of course Prague is Prague, one of the most beautiful cities in Europe. This year, it seems, my luck is in. My tickets are booked, my passport has been renewed, and my Czech phrasebook has been dusted off. (Pivo, prosim. Velký.)

There’s still time to decide to go. See you there, perhaps?

Mystery and mathematics

February 2nd, 2010

[2 February 2010]

Lately I’ve been spending some time of an evening reading E. T. Bell’s classic collection of biographies of mathematicians: Men of Mathematics (1937; rpt. New York: Simon and Schuster, 1985). (As the title suggests, Bell is thoroughly pre-feminist; a collection called Women of Mathematics has been published more recently, which may be intended as a kind of pendant to Bell, though sadly it’s nowhere like as much fun to read; its authors are for the most part better historians and much less opinionated.) And something rather puzzling caught my eye the other evening.

Bell writes (p. 357 of the Simon and Schuster paperback reprint) in his sketch of the Irish mathematician William Rowan Hamilton:

So long as there is a shred of mystery attached to any concept that concept is not mathematical.

Granted, Bell was probably tired of having to tell generations of incredulous undergraduates that there is nothing especially imaginary about imaginary numbers; one might grow peevish on the subject over the years. But still: did Bell seriously believe that the natural numbers (say) are not mysterious?

Can anyone not dead to all sense of wonder contemplate the commutative property of integer addition without feeling themselves to be in the presence of mystery?

How formal can you get?

January 26th, 2010

[26 January 2010; updated a URI 28 Jan 2010]

In a recent comment on an earlier post here, David Carlisle wrote of proofs concerning properties of the XPath 1.0 data model

It’s not clear how formal any such proof could be, given that the XPath model and even more so, XML itself are defined so informally.

It’s true that the XPath spec defines the data model in prose and not in formulas. But on the whole it’s relatively formal and explicit prose, and I would expect it to be relatively easy to translate the prose into a formal notation.

As a test of that proposition, I recently improved a shining hour or four by translating section 5 of the XPath 1.0 spec into Alloy, the modeling tool developed by Daniel Jackson and others in the MIT Software Design Group. (I would describe Alloy briefly here, but I’ve written about Alloy often enough in this blog that I won’t describe it again now; regular readers of this blog will already know about it, and those who don’t can search the Web for information, or search this blog for what I’ve said about it before.)

The full Alloy model of XPath 1.0 is available from the Black Mesa Technologies web site. It will be of most interest, of course, for those familiar with Alloy, or wanting to learn more about it. But Alloy’s formalism is simple enough that anyone with a taste for formal methods will find it easy going, and the document paraphrases all the important things in English prose.

The net result, I think, is that there are (unsurprisingly) some places where the definition of the XPath 1.0 data model could or should be more explicit, and a few places where it seems a rule or two given explicitly is strictly speaking redundant, but by and large the definition is pretty clean and seems to mean what its creators meant it to mean. By and large, the definition of the data model is formulated without reliance on the XML spec (there are a few places where this separation could be cleaner), so that the informality of the XML spec (or what I prefer to think of as its programmatic promiscuity regarding models) does not in fact make it hard to formalize the XPath 1.0 data model.

In the current version, there are a couple of rough bits that I hope to sand down before I use this model in any other contexts. The ones I’m currently aware of are these:

  • The definition of the parent relation does not require that whenever parent(c,p) is true, either child(p,c) or attribute(p,c) is true. The XPath 1.0 spec doesn’t say this explicitly, but I think its authors probably felt that it would be pedantic to say something like that for a relation with a name like parent.
  • Similarly, the relations parent and ch are not guaranteed acyclic, though I think the original spec assumes that the names parent and child make clear that the relations should be acyclic. (But see the song “I’m my own Grandpa” and the Alloy models illustrating it, developed in Daniel Jackson’s Software Abstractions and shipped with Alloy in the samples directory.)
  • The predicate precedes intended to model document order is defined recursively; this is not legal in Alloy. So a conventional relation on nodes will need to be defined instead, with appropriate constraints. It is still not clear whether the rules given in the spec suffice to make the order total (at least in the absence of multiply occurring children); if they don’t, I’d like to propose an additional predicate which has that effect.

Two, four, three, who’s counting?

January 25th, 2010

[25 January 2010; correct botched formulation, 26 January]

The XPath 1.0 puzzle introduced and discussed in earlier posts continues to occupy some of my thoughts. Consider the document which can be written

  <a><b/><b/><b/></a>

The central question is this: given an instance of the XPath 1.0 data model corresponding to this document (or to any equivalent document), how many element nodes are present in the instance?

It turns out that not only are the answers four and two both consistent (as far as I can tell) with the definition of the XPath 1.0 data model, but that other answers are possible as well. Well, another answer.

Consider the document

<!DOCTYPE a [
<!ELEMENT a ANY>
<!ELEMENT b ANY>
<!ENTITY b '<b/>' >
]>
<a>&b;&b;<b/></a>

This is not the same serial-form XML document as the one at the top of the page, and the data-model equivalent is not necessarily the same, either, I think. But they both have a document element of type ’a’, whose ordered list of children has length three, with each child being of type ‘b’.

If we take token as denoting some physical object, marks on paper or some other way of writing a character type (here, pixels on screens, magnetic fields on suitable media, or optical effects on other media), which I am told is its proper meaning as it was introduced by Peirce, then here there appear to me to be three string tokens matching the element production (one ‘a’ and two ‘b’s), not four and not two, and thus three element nodes in the data model, not four and not two.

I’m not really at all sure that the right way to define XML documents (and by extension XML elements) is as strings. But if we do wish to say that an element is a string (of some kind), there seem to be at least three different approaches we can take:

  1. a sequence of character types (i.e. a string-type)
  2. a sequence of character tokens (i.e. a string-token)
  3. an occurrence, within some context, of a sequence of character types

Some of the issues relating to these concepts are very well laid out in the article “Types and Tokens” written by Linda Wetzel for the Stanford Encyclopedia of Philosophy. It is interesting to know that SGML and XML accomplish simply, through entity references, what some philosophers have regarded as impossible: we have more occurrences of a thing than we have tokens for it. In the second document given above, it seems clear (at least to me, but I am no philosopher) that there are two string-tokens of type “<b/>” — one in the entity declaration and one in the document content — and three occurrences of that string-type in the document body. When entity expansion is performed by a machine, we are quite likely to be able to identify different tokens with the different occurrences of the string-type (as long as we are willing to entertain time-slices as part of our definition of string-token), but when I just look at the document and count the occurrences of the string-type, I don’t create new tokens in the physical world (unless you want to include mental acts as tokens, but I am pretty sure that that would be a bad idea).

The good news is that there is plenty of philosophical light to throw on these issues, and the issues are well understood by philosophers. Also good news is that the XPath 1.0 data model appears to be consistent with all three possible views.

The bad news, I think, is that while the philosophers understand the issues quite well, they don’t agree on what to do about them. Also bad news is that the XPath 1.0 spec is consistent with all three views, even though it clearly wants to have a single answer to questions like “How many elements are in this document?”. Also, it seems clear that the XPath 1.0 spec really wants to view elements as occurrences (or, at least, the occurrence story produces the results the XPath 1.0 spec relies on its data model producing), but the notion of occurrence appears to be regarded by philosophers as easily the most problematic of the three concepts competing for our patronage.

Fortunately, to make the XPath 1.0 data model do what its creators wanted it to do, all that is necessary is to say explicitly that no node occurs more than once (there’s that word again!) in its parent’s ordered list of children. Or, equivalently, that the cardinality of the set of child nodes is the same as the length of the ordered list of children.

[In full XPath terms, we may also think of it as forbidding any node to appear among the result of evaluating the expression “preceding-sibling::* | following-sibling::*” with that node as the context item. But the data model does not provide a primitive definition of sibling-hood, so it can't conveniently be formulated that way in the data model.]

Tell me, Captain Yossarian, how many elements do you see?

January 22nd, 2010

[22 January 2010]

In an earlier post, I asked how many element nodes are present in the following XML document’s representation in the XPath 1.0 data model.

  <a><b/><b/><b/></a>

I think the spec clearly expects the answer “four” (a parent and three children). More than that, I think the spec reflects the belief of its authors and editors that that answer follows necessarily from the properties of the data model as defined in section 5 of the spec.

But I don’t think “four” is the only answer consistent with the data model as defined.

In particular, consider the answer “two”: one ‘a’ element node, and one ‘b’ element node (which for brevity I’ll just call A and B from here on out; note that A and B are element nodes, whose generic identifiers happen to be ‘a’ and ‘b’.). As far as I can tell, the abstract object just defined obeys all the rules which define the XPath 1.0 model. The rules which seem to apply to this document are these:

  1. “The tree contains nodes.”

    Check.

  2. “Some types of nodes … have an expanded-name …”

    Check: here the names are “a” and “b”.

  3. “There is an ordering, document order, defined on all the nodes in the document …”

    Check: in the model instance I have in mind the nodes are ordered. (In fact they have a total order, which is more than the spec explicitly requires here.) The root node is first, A second, and B third.

  4. “… corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities.”

    Check.

  5. “Thus, the root node will be the first node.”

    Check.

  6. “Element nodes occur before their children.”

    Check: The element node A occurs before its child B in the ordering.

  7. “Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities).”

    Check: the start-tags for B begin at positions 4, 8, and 12 of the document’s serial form (counting from 1), and the start-tag for A begins at position 1. So the order of the start-tags in the XML matches the order of the nodes in the model.

    If we had several elements with multiple occurrences and thus multiple start-tags, and the positions of the start-tags were intermingled (as in <a> <b/> <c/> <b/> <c/> <b/> </a>), then it would appear that we had only a partial order on them. if the spec specified that document order was a total ordering over all nodes, we might have a problem. But it doesn’t actually say that; it just speaks of an “ordering”; it would seem strange to argue that a partial ordering is not an “ordering”.

  8. “Root nodes and element nodes have an ordered list of child nodes.”

    Check: the root node’s list of children is {1 → A}, A has the list {1 → B, 2 → B, 3 → B}, and B has the empty list {}.

  9. “Nodes never share children: if one node is not the same node as another node, then none of the children of the one node will be the same node as any of the children of another node.”

    Check: the sets {A}, {B}, and {} (representing the children, respectively, of the root node, A, and B) are pairwise disjoint.

  10. “Every node other than the root node has exactly one parent, which is either an element node or the root node.”

    Check. A has the root node as its parent, B has A as its parent.

  11. “A root node or an element node is the parent of each of its child nodes.”

    Check: the root’s only child is A, and A’s parent is the root. A’s only child is B, and B’s parent is A.

  12. “The descendants of a node are the children of the node and the descendants of the children of the node..”

    Check. The descendants of the root node are {A, B}, those of A are {B}, those of B are {}.

That’s it for the general rules; I think it’s clear that the construction we are describing satisfies them. The subsections of section 5 have some more specific rules, including one that is relevant here.

  1. “There is an element node for every element in the document.” (Sec. 5.2.)

    This rule was cited by John Cowan in his answer to the earlier post; it seems to me it can be taken in either of two ways.

    First, we can take it (as John did, and as I did the first time through this analysis) as saying that for each element node in an instance of the data model, there is an element in the corresponding serial-form XML document, and conversely (I read it as claiming a one to one correspondence) for every element in the serial-form document, there is an element node in the data model instance.

    In this case, the rule seems to me to have two problems. The first problem is that the rule assumes a mapping between XML serial-form documents and data model instances and further assumes (if we take the word “the” and the use of the singular seriously — we are, after all, dealing with a formal specification written by a group of gifted spec-writers, and edited by two of the best in the business) that the mapping from data model instance to serial-form document is a function. But how can it be a function, given that the data model does not model insignficant white space? There are infinitely many serial-form XML documents equivalent to any given data model instance. Serialization will not be a function unless additional rules are specified. And in any case, when we set out to define a formal data model as is done in the XPath 1.0 spec, I think the usual idea is that we should define the data model in such a way as to make it possible to prove that every data-model instance corresponds to a class of XML documents treated as equivalent for XPath purposes, and that every XML document corresponds to a data model instance. If the rule really does appeal to the number of elements in the serial-form XML document, then it’s assuming, rather than establishing, the correspondence. It’s hard to believe that either Steve DeRose or James Clark could make that mistake.

    The second problem, on this reading of the rule, is that it’s hard to say whether a given data model instance obeys the rule, because it’s not clear that XML gives a determinate answer for the question.

    Some argue that XML documents are, by definition, strings that match the relevant production of the XML spec (on this see my post of 5 March 2008); by the same logic we can infer that an element is a string matching the element production.

    [Note: For what it's worth I don't think the XML spec explicitly identifies either documents or elements with strings; the argument that XML documents and elements are strings rests on the claim that they can't be anything else, or that making them anything else would make the spec inconsistent. As I noted in my blog post of 2008, there is at least one passage which seems to assume that documents are strings (it speaks of a document matching the document production), but I believe that passage is just a case of bad drafting.]

    If for discussion’s sake we accept this argument, then it seems we must ask ourselves: is the string consisting of the four characters U+003C, U+0062, U+002F, U+003E, in order, one string or three strings?

    The answer, as students of philosophy will have been shouting out at home for some moments now, is “yes”. If by character you mean ‘character type’, then one string (or string type). If on the other hand you mean ‘character token’, then for the document shown above, I think pretty clearly three strings (string tokens).

    So, on this first reading of the rule, check. Two distinct elements in the XML (counting string types), two in the data model instance. (To show that this rule excludes the model instance we’re discussing, it would be necessary to show that the serialized XML document has four elements, and that counting only two elements is inconsistent with the XML spec. Given how coy the XML spec is on the nature of XML documents, I don’t believe such a showing possible.)

    The second reading of the rule is that “document” does not mean, in this sentence, something different from the data model instance, but is just a way of referring to the entirety of the data model instance itself. A quick glance at the usage of the word “document” in the XPath 1.0 spec suggests that that is in fact its most common usage. In recent years, influenced perhaps by the work on the XPath 2.0 data model, with formalists of the caliber of Mary Fernández and Phil Wadler, many people have begun to think it natural to define an abstract model independently of the XML spec, and then (as I suggested above) establish in separate steps that there is a correspondence between the set of all XML documents viewed as character sequences and the set of all instances of the data model.

    The XPath 1.0 spec seems to take a slightly different tack, rhetorically. The definition of the data model begins

    XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree.

    I take this as a suggestion that the data model instance operated on by XPath 1.0 can be thought of not as a thing separate from the XML document (whatever that is) but as a particular way of looking at and thinking about the XML document. I think it’s true that there was (historically speaking) no consensus among the XML community at that time that the term XML document referred to a string, as opposed to a tree. I think the idea would have met fierce resistance.

    On this reading, the rule quoted above is either a vacuous statement, or a statement about usage, establishing the correspondence (or equivalence) between the terms element and element node.

    So, on this second reading, check. Two elements, two element nodes. Same thing, really, element node and element.

As I say, I think it’s quite clear which answer the XPath 1.0 spec intends the question to have: plenty of places in the spec clearly rely on element nodes never having themselves as siblings, just as plenty of places rely on element nodes never having more than one parent. Both properties are a common-sensical interpretation of the element structure of XML. I believe the point of defining the data model explicitly is to eliminate, as far as possible, the need to appeal to common sense and “what everyone knows”, to get the required postulates down on paper so that any implementation which respects those postulates and obeys the constraints will conform and inter-operate. For the parent relation, the definition of the model makes the common-sense interpretation of XML explicit. But not (as far as I can see) for the sibling relation.

Perhaps the creators of the XPath 1.0 spec felt that no explicit statement about no elements being their own siblings was necessary, because it followed from other properties of the model as specified. If so, I think either I must have missed something, or (less likely, but still possible) they did. If the property is to hold for all instances of the model, and if it does not follow from the current definition of the model, then perhaps it needs to be stated explicitly as part of the definition of the model.

[When he reached the end of this post, my evil twin Enrique turned to me and asked “Who's Yossarian? Was he a member of the XSL Working Group?” “No, he was a character in Joseph Heller's novel Catch 22. The title of the post is a reference to an elaborate bit in chapter 18 of the novel.” “And by ‘elaborate,’” mused Enrique, “you mean —” “Exactly: that it's too long to quote here and still claim fair use. Besides, this isn't a commentary on Catch 22. Just search the Web (or the book) for the phrase ‘I see everything twice.’”]