[5 March 2008]
My evil twin Skippy continues his reaction to the issue Henry Thompson has raised against the SML specification. He has lost the first point, in which he claimed that the XML spec doesn’t actually say that XML documents are character sequences.
Well, OK. But even if only character sequences can be XML documents strictu sensu, how does referring to information sets instead of XML documents actually help? Any system which handles XML documents in their serial form can also, if it wishes to, handle other representations of the same information without any loss of conformance.
At this point, I interrupt to remind Skippy that some people, at least, believe that if a spec requires that its implementations work upon XML documents, then by that same rule it forbids them from accepting DOM objects, or SAX event streams, and working on them. This is roughly how the XSD spec came to describe an XML syntax for schema documents, while carefully denying that its input needed to be in XML.
Skippy responds:
Suppose I claim to be a conforming implementation of the XYZ spec, which defines document the way SML does, as well-formed XML, instead of as XSD does, as “an information set as defined in [XML-Infoset]”.
Actually, I interject, this isn’t offered as a definition of document, just as a characterization of the input required for schema-validity assessment. Skippy points out, in return, that “as defined in [XML-Infoset]” is more hand-waving, since in fact the Infoset spec does not offer any definition of the term information set.
Eventually, we stop squabbling and I let him continue his argument.
I allow people to define new documents through a DOM or SAX interface, and I work with those documents just as I work with documents I acquire through a sequence of characters conforming to the grammar of the XML spec.
Now, suppose someone who believes that angle brackets are the true test of XMLness decides to challenge my claim to conformance. They will claim, I believe, that I am working with things that are not really XML documents, and thus violating the XYZ spec, which says that they MUST be XML documents. My response will be to ask why they think that. “Show us the data!” they will cry. And I will provide a sequence of characters which is indisputably an XML document.
The claim that I am not conforming is necessarily based on a claim about what I am doing ‘under the covers’, rather than at my interfaces to the outside world. But precisely because it’s a question of what I do “inside”, there is no way for an outside observer to know whether I am working directly from the DOM data structure or SAX event stream, or am making a sequence of characters internally, and working from that. One reason there is so little problem in reality about the distinction between serialized XML documents and other representations, even in arguments about conformance, is that in reality, if XML is accepted at, and produced at, the interfaces when requested, there is no way to make any argument that forbids other forms from being accepted at interfaces.
I have always thought of myself as a reasonably good language lawyer, but I am not sure how to refute Skippy’s claim. Can any readers suggest ways to prove that it matters? Or is Skippy right that SML will do better to refer to the XML spec, to describe its inputs, than to the Infoset spec?
My certainly very naive thoughts —
<joe>: my IRC nickname is not </joe>
A sequence of characters can be anything, really anything (like a strange IRC log). It becomes something when we said it was. For example, sending it somewhere with a specific label (over HTTP and a mimetype), sending it manually to a product which groks only XML, reading it as an XML document with my eyes because I chose to.
To be XML, we have to say so, which means we define a referential context for processing, analyzing it, processing it — aka the XML spec.