[3 December 2009]
I just encountered the following statements in technical documentation for a family of products which I’ll leave nameless.
This document does not describe the complete XML schema for either [Application 1] or [Application 2]. The complete XML schema for both applications is not available and will not be made public.
Perhaps there can be good reasons for such a situation. Perhaps the developers really don’t know how to use any existing schema language to describe the set of documents they actually accept; perhaps only a Turing machine can actually identify set of documents accepted, and the developers were unwilling to work with a simpler set whose membership could be more cheaply decided. (Well, wait, those may be reasons, but they don’t actually qualify as “good”.)
I wonder whether this is an insidious attempt to look like the products have an open format (See? it’s XML! How much more open can you get?) while ensuring that the commercial products in question remain the only arbiters of acceptable documents? Or whether the programmers in question were just too lazy to specify a clean vocabulary and ensure that their software handles all documents which meet some standard of validity that does not require Turing completeness?
Having a partially defined XML format is, at least for me, still a great deal more convenient than having the format be binary and completely undocumented. But it certainly seems to fall a long distance short of what XML could make possible.
i cannot resist the temptation to quote “we use bits, which are a widely accepted data format, and thus our interface is easy to use” from our “Document Design Matters” article available at http://dret.net/netdret/docs/wilde-cacm2008-document-design-matters/
Thank you for reminding me of that article. Nice quote, even though in many practical cases, including this one, the fact that XML is higher-level than bits means that XQuery and other tools can be useful in studying instances of the format in question to understand it. I have got a lot further with the format mentioned in the blog post than I could have gotten in an equivalent time with a binary format, because the necessary tools are more easily accessible to me.