XSD 1.1 is a Candidate Recommendation

[4 May 2009; some typos corrected and phrases tweaked 5, 6, and 7 May]

The World Wide Web Consortium has published XSD 1.1 Part 1: Structures and Part 2: Datatypes as Candidate Recommendations, and issued a call for implementation.

As the version number is intended to suggest, XSD 1.1 is mostly very similar to XSD 1.0 and restricts itself to relatively modest changes to the spec.

[At this point, Enrique snorted loudly enough to break my concentration. “If it’s just modest changes, why did it take so long? Let’s see, when did you start? XSD 1.0 was 2001, so …”

“Well, we didn’t start on 1.1 right away,” I hurriedly interjected. “But, well, I guess you’re right. It did take a lot longer than you would have expected.”

“Why? What could possibly take that long?”

“Well, different members of the working group turned out to entertain rather different views of what counts as a modest change. So we spent a lot of the last several years arguing about the relative importance of compatibility, of fixing problems in the spec, and of making the spec more useful for users. And then, on the next issue, arguing about them again. And again. And again.”

“And again?”

“And again. You know, some people say you can be a success in committee work in several different ways: being smarter than everyone else, —”

“You mean, the James Clark approach?”

“Yeah — only that doesn’t always work for people who aren’t James Clark. Or by working harder than everyone else,”

“Paul Cotton always used to talk about how much leverage you have to influence a group if you are the one who always does the minutes. I always thought he was just trying to find a sucker.”

“Well, maybe. But I think he also meant it; it really can be an important role.”

“Then why are members of the W3C Team so strongly encouraged not to do it?”

“Long story; another time, perhaps. Or, third alternative, you can just have more endurance than everyone else.”

“The ‘Iron Butt Rule’?”

“Exactly. The XML Schema working group had several members who seemed determined to try their hand at that technique.”

“Well, there’s you, of course. That would be your only option, really, wouldn’t it? I mean, the other methods …. But you mean, others tried to play the Iron Butt card, too?”

“Hush. I was going to talk about what 1.1 has that 1.0 doesn’t have.”

“So who’s stopping you?”]

XSD 1.1 is mostly similar to 1.0, I was saying before being interrupted. But it does have a number of improvements that can make a difference.

  • XSD 1.1 supports XML 1.1 and XML 1.0 Fifth Edition. (That last does not distinguish it, in my view, from XSD 1.0. But some people believe that 1.0 requires old versions of its normative dependencies, because the working group did not instruct the editors to say explicitly that of course newer editions can be used. Some things should go without saying, you know?)

    This constitutes a significant improvement from the point of view of internationalization.

  • There’s a conditional inclusion mechanism (the vc:* attributes) for allowing a schema document to provide multiple versions of a declaration and select the right one at schema construction time based on which version of XSD the processor supports, what spec- and implementation-defined datatypes are automatically available, and so on.

    This mechanism should make it much easier to produce new versions of XSD without being tied in knots over questions of what back-level processors will make of schema documents which use new constructs. (If XSD 1.0 had had such a mechanism, we could probably have done a better 1.1 in half the time. But we did not learn enough, when doing 1.0, from the example of, say, XSLT 1.0.)

  • Elements can now be declared with a form of conditional type assignment that makes the type assigned in an instance depend on the values of its attributes; this allows a variety of co-occurrence constraints on attributes and content to be expressed.
  • Assertions can be associated with complex and simple types. This also makes it easier (or in some cases possible for the first time) to express certain co-occurrence constraints on attributes and content.

    The assertions of XSD 1.1 are less powerful than the assertions of Schematron, in that they cannot refer to anything outside the element being validated. They will in some cases be less convenient to express. (Ask about the HTML input rule, for example.) But they preserve the context-independence of type validity and an aggressive optimizer should be able to check them in a streaming context, which is not true in general of Schematron assertions.

  • Attributes can be marked inherited; inherited values are written into the XDM data model instance before assertions and conditional type assignment evaluate any XPath expressions, which means that inherited attributes like xml:lang can be consulted in conditional type assignment and assertions.

    I’m proud of this not only because it helps handle internationalization better, but because it aligns the principle of context-free validation better with some of the core idioms of XML.

  • A precisionDecimal datatype has been added, which is intended to mirror the new IEEE 754-2008 specification of floating-point decimal.

    This one is controversial: some members of the XSL and XML Query working groups are vocal in saying it’s a bad idea, it will complicate their type hierarchy and type coercion rules yet again, and we shouldn’t support it.

    [“Of course, some of the same members of QT also predicted that the IEEE spec would never be finished at all, and that the sky would fall, hell would freeze over, and Intel would fall into the Pacific Ocean before supporting it, didn’t they?” said Enrique. “But the spec was published, and Intel is supporting it. So …” “Hush,” I said. “They’ll hear you.” But it doesn’t matter: they don’t much care what Enrique thinks.]

  • The xsd:redefine construct has been deprecated.

    This is a disappointment to some people, who believe that it had great promise. And they are right: it did have great promise. But the 1.0 spec is vague (to put it charitably) on some points; interoperability problems in 1.0 implementations have been reported and the working group has been unable to agree on the correct interpretation of the 1.0 spec.

  • A simpler mechanism for reusing an existing schema document while changing it selectively is now provided under the name xsd:override. For the situations where redefine turns out to be under- (or over-) specified, override provides relatively clear, straightforward answers.
  • The rules for restriction have been made much simpler and more correct. It is no longer possible to use xsi:type with the name of a member type in order to evade facet restrictions on a union.
  • The determinism rule (the so-called “unique particle attribution” constraint) has been relaxed. It’s now legal for wildcards to compete with element declarations; elements win.
  • It’s easier to specify ‘open content’ and effectively insert wildcards everywhere, without cluttering up your content models.
  • Wildcards can now say, in effect, “any of these, except for those.” Some people call these “negative wildcards”.
  • All-groups can now contain wildcards, the elements and wildcards in all-groups can now have maxOccurs greater than one, and all-groups can be extended.
  • To align better with XPath 2.0 and related specs, the simple type hierarchy now includes an xsd:anyAtomicType. Also, the two totally ordered subtypes of duration defined for XPath 2.0 and related specs have (with the cooperation of the XML Query and XSL working groups) been integrated into the XML Schema namespace.
  • A new facet has been added for requiring the timezone to be present (or absent) in datatypes derived by restriction from any of the date/time types; a dateTimeStamp datatype which requires a timezone has been added, at the suggestion of the OWL working group.
  • Lists and unions contructed from ID and IDREF retain the ID- and IDREFness of the ID and IDREF values. Also, you can have more than one ID on an element, which means it’s now a lot easier to support xml:id without having to whack the rest of your vocabulary.
  • Much of the spec has been rewritten, sentence by sentence and phrase by phrase. It was not possible to reorganize the exposition from the ground up (although I agree with those who believe the spec could use it), but while retaining the same organization we were able to make individual paragraphs and sentences easier to follow and understand. More liberal use of technical terms, variable notation, and section headings may seem like trivial changes, but empirically they appear to have a perceptible effect on the readability of the spec.

    Most users, of course, don’t read the spec, even power users. But implementors do, members of the working group do, members of other working groups who need to layer their stuff on top of XSD do. And some users do. I wish we could do more to make the spec more welcoming and legible for them. But while there is a lot of room for further improvement, I think (if I say so myself) that 1.1 is somewhat easier to read than 1.0. It benefits, of course, from being the second go at formulating these things.

It has been a long, hard slog — I lied to Enrique, we actually did start on it in 2001, though we also were doing a lot of other things at the same time — and I think we would not have made it without the perseverance of the chair, David Ezell of Verifone, representing the National Association of Convenience Stores (to both of whom thanks for seconding David to the group and supporting the time he spends on XSD), and the hard work of Sandy Gao of IBM on the Structures spec and Dave Peterson of SGMLWorks! (who serves as an invited expert) on the Datatypes spec. XSD 1.1 is not a perfect spec, by any means. But it’s an improvement on 1.0, and it’s worth pushing forward for that reason. And without David, and Sandy, and Dave, it would not be happening. Anyone interested in the validation of XML owes these three a debt of gratitude.

The long hard slog is by no means over. Publication as a Candidate Recommendation means the W3C has now called for implementations. If you are a programmer looking for a challenge, I challenge you to implement XSD 1.1! If you are a user, not a provider, of XSD software, urge the supplier of your software to implement XSD 1.1, and test their implementation! The more you push on the implementations now, the stronger they will be when the time comes to demonstrate implementation experience and progress the spec to Proposed Recommendation. And the more experience we will have gained towards the goal of having a broadly supported validation language which supports the full spectrum of XML usage.

[“Wow!” said Enrique. “Did you know that perseverance is a theological term? “‘continuance in a state of grace leading to a state of glory’!” “In other words,” I said, “you looked it up because you didn’t think I knew how to spell it correctly, did you?” “Oh, hush,” he said.]