Daylight analysis

[14 July 2009, Happy Bastille Day]

In interoperability testing, test cases are particularly interesting when they elicit different behaviors from different processors.

In revision of a grammar, strings are of particular interest when they are grammatical according to one version of the grammar, but not according to the other. Either the change was intended (and the string provides an example) or it was not intended (and the string exhibits a problem). Differences in the structure of the parse trees produced by the different grammars may be interesting, too.

Several working groups I’ve served on have spent time worrying about whether our spec’s rules for handling (especially escaping and unescaping) URIs and IRIs should align with the rules specified by a variety of other specs (HTML 4.01, XSLT 1.0, any of the various RFC which have at various times been the authoritative source of information, any of the various internet drafts which have later turned into, or failed to turn into, RFCs, etc. etc. ad luxuriam). At any given time, it would have been really really useful to have an answer to the question “Do these two different formulations of the rules ever actually produce different results? Or are they just different ways of saying the same thing? And if they do ever produce different results, are the cases involved already so pathological on other grounds that we don’t actually mind?”

These cases have in common that they exhibit discrepancies among things that (other things being equal) are (or might be) expected to be indistinguishable. That is, they document the daylight (which my Oxford dictionary glosses as “visible distance between one … thing and another”) between things that shouldn’t have daylight between them.

I’m coming to believe that during the development of a spec, daylight analysis — seeking and finding instances of daylight between things, or seeking and failing to find any daylight — may be the most important function of test cases. If not the most important, then surely a very important use.

If this conjecture is true, then it ought to have implications for judging the relative effectiveness of different methods for constructing collections of test cases. Traditional testing can be measured by how many bugs it finds, at what cost, and techniques for generating test cases are valued high or low depending on how likely they seem to be to find new bugs. For spec development, the utility of a test is tied to the likelihood of its finding daylight between the old and the new formulations of some rule.

Hmm. Is that a difference or not? I suppose a bug can be regarded as an instance of daylight between the implementation and the spec it’s supposed to be implementing. So perhaps bug-finding is just a special case of daylight analysis.

Of course, each revision of a rule or a grammar provides a new opportunity for daylight to arise and be found. It seems to follow that you may need new test cases for each revision. Automated methods that allow quick generation of relevant new test cases would be particularly useful.

Daylight analysis is not the only useful goal of test generation during spec development. Tests that illustrate the intended behavior of a rule are useful for clarification. Call these exemplary tests.

And for checking that a working group understands a problem area well, and that the rule for that problem is well formulated, it can be useful to construct tests with randomly selected properties, just to check to see that everyone agrees on how the rule should handle them. Call these sanity checks. Random selection of properties in sanity checks helps ensure that you don’t inadvertently feed your rule only ‘sensible’ examples which happen not to exercise its flaws. I once used an Alloy model to make twelve very simple test cases for the schema composition part of XSD; nine of them turned out to pose questions to which the spec’s answer is not obvious. (The twelve were a bit redundant: eliminating the redundancies, the twelve auto-generated examples boiled down to four non-redundant examples, for one of which the spec provides a clear analysis.) If I had limited myself to sensible examples I would almost certainly have missed most or all of the problematic cases.

Daylight analysis as a goal works well with random generation of test cases, because it helps deal with one of the great problems of random generation: most randomly generated test cases (at least, the ones I have been able to generate using random means) are rather boring. The goal of finding daylight provides a simple filter: a randomly generated test case is interesting if it finds daylight in two things you are comparing; it is uninteresting otherwise. In realistic cases, uninteresting test cases will overwhelmingly outnumber interesting ones, but if you can apply an automated filter (parse with grammar G1, parse with grammar G2, compare the results; if they differ, you have found daylight), then you can keep a few uninteresting test cases (just in case) but throw most of them away and focus on interesting ones.

Formal methods and WGs (response to Jacek Kopecky)

[30 June 2009]

Jacek Kopecky has commented on my earlier posting about Brittleness, regression testing, and formal methods. I started to reply in a further comment, but my response has gotten a bit long, so I’m making a separate post out of it.

Examples needed

Jacek writes:

Dear Michael, you are raising a good point, but you’re doing it the same way as the “formal methods proponents”: you don’t show concrete examples.

Dear Jacek, you are quite correct: I don’t have persuasive examples. I have a gut feeling that formal methods would be helpful in spec development, but I can’t point to convincing examples of places it has helped. Having really convincing examples seems to require first persuading working groups to use formal methods, which is (as you suggest) not easy in the absence of good examples. It’s also a hard sell if the spec in question is at all far along in its development, because it appears that the first thing the group has to do is stop everything else to build a formal model of the current draft. No working group is going to want to do that.

I did embark once on an effort to model the XProc spec in Alloy (also in HTML), in an attempt to capture a spec which was still being worked on, and perhaps persuade the WG to apply formal methods in its work going forward. The example seems to illustrate both the up and the down of using formal methods on a spec. On the up side: the work I did raised a number of questions about the properties of pipeline ‘components’ as they were defined in the then-current draft and may have helped persuade some in the WG to support the proposal to eliminate the separate ‘component’ level. But on the down side: the draft was moving faster than I could move: before I was able to finish the model, submit the paper formally to the WG, and propose that the WG use the model as part of our on-going design process, the draft had been revised in such a way as to make much of the model inaccurate. No one in a WG really wants to consider proposals based on an outdated draft, so no one really wanted to look at or comment on the fragmentary model I had produced. I could not keep up, and eventually abandoned the attempt. Perhaps someone with better modeling skills than mine would have been able to find a way to move faster, or would have found a way to make a lighter-weight model which would be easier to update and keep in synch with the spec. But one possible lesson is: if the group doesn’t use formalisms from the very beginning, it may be very hard to adopt them in mid-process.

Starting small

Other lessons are possible. Kaufmann, Manolios, and Moore write, in Computer-aided reasoning: ACL2 case studies ([Austin: n.p.], 2002), p. 13:

The formalizers will often be struggling with several issues at once: understanding the informal descriptions of the product, discovering and formalizing the relevant knowledge that the implementors take for granted, and formalizing the design and its specification. To a large extent this phenomenon is caused by the fact that formal methods [are] only now being injected into industry. Once a significant portion of a group’s past projects has been formalized, along with the then-common knowledge, it will be much easier to keep up. But at this moment in history, keeping up during the earliest phases of a project can be quite stressful.

When working on a new project, we recommend that the formalizers start by formalizing the simplest imaginable model, e.g., the instruction set with one data operation and a branch instruction, or the protocol with a simple handshake. Choosing this initial model requires some experience…. Since iteration is often required to get things to fit together properly, it is crucial that this initial foray into the unknown be done with a small enough model to permit complete understanding and rapid, radical revision.

In my effort to model XProc, I failed to find a suitably small initial model. (As they say, it takes some experience, which is what I was trying to gain.) But then, how persuasive would a really small toy model be to a working group skeptical of formal methods in the first place? I don’t yet know how to square this circle.

I was more successful in finding useful small partial models when I used Alloy in 2007 to model schema composition. In that case I was able to generate a number of extremely useful test cases, but the exercise, sadly, showed merely that the XML Schema WG does not have any consensus on the meaning of the XSD spec’s description of schema composition. Being able to generate those examples during the original design phase might conceivably have led to a better design. At least, I like to think so. A skeptic might say it would only have led the working group to introduce more special cases and ad hoc rules, in order to avoid having to explain how certain cases are supposed to work. (This is, after all, the XML Schema WG we’re talking about here.) Nothing is perfect.

The tedium of formalization

JK continues:

I, for one, avoid formal methods other than programmatic reference implementation for a few reasons: the tediousness of translating the spec into the formalism that would support all those tests; creating all those tests; and finally the difficulty of being relatively sure that what is written in the formalism actually corresponds to the intent of what is written in the text.

I could change my mind if it was shown that enough formalization for XML Schema that would support your regression tests is not all that tedious.

Here JK makes several excellent points, to which I do not currently have good answers.

On tedium: It’s possible to model a spec selectively (as in my model of schema composition) with relatively little tedium. Indeed, the point of light-weight formal methods (as promoted by Daniel Jackson and embodied in Alloy) is to allow the user to construct relatively small and simple models which cover just the aspects of the design one is currently worried about. There is no fixed foundational set of primitives in which models must be described; the model can specify its own primitives.

For example, I was able to model some simple aspects of XSD 1.0 schema composition without having to model all of schema validation, and without modeling all the details of XML well-formedness. If the paper on that model does not seem especially light-weight, I think it’s due to the need to specify how to inspect the behavior of implementations unambiguously.

I have to confess, thought, that it’s not always obvious to me how best to construct such partial models. For a long time I avoided trying to model any part of XSD because I thought I was likely to have to model the Infoset and XML in full, first. I’d like to do that, someday, just for the satisfaction, but I don’t expect it to be quick. It was only after a longish while that I saw a way to make a simpler partial model. The more accurately I can identify some part of the design that I’m worrying about, the easier it is to find a partial model for that part of the design.

But in the earlier post, I was thinking about a kind of regression testing which would be intended to identify problems and interactions I was not consciously worried about. For that to work, I expect I would need a model that covered pretty much all of the salient properties of the entire spec. And at the moment, I cannot say I expect the construction of such a model to be entirely free of tedium. If I think that it might nevertheless be a good idea, it’s because checking natural-language prose for consistency is also tedious (which is why working groups sometimes do not do it well, or at all).

Consistency of prose and formalism

In other words, if you have one model (the spec in English), it will have dark corners that need discovering and clearing up. That will have a cost. If you have two models (the spec in English and the formalism), in my experience both may have dark corners (but one model’s dark corner may be clarified by the other model where it’s not a dark corner, so that may be a net plus), but there’s also the consistency of the two models that comes in question. So, gimme examples, show me that the consistency is not a problem, please, and I’ll be very grateful indeed.

I read somewhere once (in a source I have not managed to find again) that some international standards bodies require, or recommend, that their working groups produce both the English and the French version of their specifications, rather than working monolingually and then having a normative translation made. Why? Because working monolingually left too many dark corners in the text; working on both versions simultaneously led to clearer texts in both languages. The initial process was slower, but the additional effort was paid back by better results with lower maintenance costs.

I speculate that working both in English and in a suitable formalism like Alloy or ACL2 would have a similar effect: initial progress would feel slower, but the results would be better and maintenance would be easier. (This is a lot like having test cases in software development: it seems to slow you down at first, but it speeds things up later.)

Consistency between the two formulations should not, in principle, be any greater problem than consistency between the natural-language formulation of different parts of a complex spec. In fact, it should (note that modal verb!) be less of a problem: natural language provides only so much help with consistency checking, whereas formalisms tend to be a bit better on that score. The only really complex spec I have ever seen with no inconsistencies I could detect was ISO 8879, whose editor was a lawyer skilled in the drafting of contracts.

There are, of course, things a working group can do to make it easier, or harder, to maintain consistency between the prose and the formalism. Putting the formal treatment in a separate document (like the XPath formal semantics) or a separate part of the same document (like the schema for schema documents in XSD) is a good way to make it easier for inconsistencies to remain undetected. Integrating fragments of the formalism with the prose describing those constructs (as is done with the BNF in the XML spec, or as is done in any good literate program) makes it easier to detect and remove inconsistencies. It also produced pressure on the working group to use a legible formalism; the XML form of XSD might be more readable if the working group had forced itself to read it more often instead of hiding it in the appendix to the spec.

Integrating the formalism into the text flow of the spec itself can work only if the members of the working group are willing to learn and use the formalism. That’s why I attach such importance to finding and using simple notations and light-weight methods.

But this long post is just more speculation on my part. You are right that examples are needed. Someday, I hope to oblige.

Brittleness, regression testing, and formal methods

[17 June 2009]

For some years now I have thought that the development of specs at places like W3C would benefit from the application of formal methods. Examples from the past few years illustrate (a) that I’m not alone in this thought, and (b) that it’s difficult to persuade a WG as a whole to adopt formal methods. Some WGs have published documents with formalizations of their spec in Z or in Gentzen calculus or in some other notation, but as far as I can tell most members of those WGs did not read those documents, let alone understand them, participate actively in their drafting and revision, or use the formalism as a way of working on or thinking about the design of the main spec. (And if I have understood what most proponents of formal methods have in mind, it’s really the last idea that some people regard as the main goal.) In most WGs I’ve been in, even the maintenance of a BNF in the spec, or the formal definition of an XML vocabulary, ends up being handled by a small minority of the group.

Part of the problem is the cost/benefit analysis. Most people in WGs don’t have any real grasp of or skill with any tool in the formal-methods toolbox.

[Enrique nudged my elbow here and said “Speak for yourself!” “Oh, I do,” I said. I use Alloy and enjoy it, but my command of Alloy is very weak compared to my command of most of the programming languages I use. And ACL2? Otter? HOL? “Ha,” muttered Enrique. “In your dreams.”]

And many formal tools look rather forbidding at first glance. And second glance.

[“How about, tenth glance?” said Enrique. “You have, what, ten books on Z on your shelf —” [yep, I just counted] “— quick, what does an arrow with a double head, no tail, and a cross mean?” “Er,” I said. “Come on, smart guy! Partial function? Total function? Total injection? Partial injection? Fuel injection? Which?” “Oh, hush.” (For the record, double head, no tail, and cross mean it’s a partial surjection. “Surjection?” asked Enrique. “What’s that?” “I can’t tell you; it’s a secret.” “A secret?” “I mean, look it up yourself.”)]

So a WG perceives a real cost, including a possibly steep learning curve and the real chance of appearing ignorant and foolish in front of colleagues. Alloy does its best to address this problem with (a) a non-forbidding syntax and (b) some chances of very fast payoff, so the learning curve is really not so steep. But it’s still a learning curve.

And the benefit? What benefits will accrue if my WG uses formal methods?

This is one place where I think formal methods proponents could do better. They always show you nifty uses of the tool to prove correctness of a design; it might be more persuasive if they showed you (a) a design plausible enough to make you say “yeah, that looks all right” and then (b) a failure to prove it correct, because of a flaw in the design, followed by (c) a fix and (d) a proof of correctness for the repaired design.

Here, too, Alloy does a good job: Daniel Jackson’s book includes the claim “Transitive closure is not axiomatizable in first-order logic”, with an exercise that involves first trying to axiomatize transitive closure in Alloy [which is first-order] and then using Alloy to find the flaw. “Now execute the command, examine the counterexample, and explain what the bug is. The official definition of UML 1.0 had this problem.” I’d like more examples like that!

This morning I found myself once more wishing, for a very specific reason, that I had a good formal model of XSD 1.1. It might or might not persuade an entire WG, but it certainly illustrates the kind of situation where I think WGs might find formal methods to their advantage.

What happened is that Tony Coates raised an issue against XSD 1.1 relating to all-groups and named model groups. The details are not important for this discussion, but the upshot is simple: (1) I think Tony has identified a real and unintended flaw in XSD, (2) I think there is a relatively straightforward fix for the flaw, and (3) I am deeply frightened by the prospect of changing the spec to include that fix now, at this stage of its development. Why? Because I can see too clearly the possibility that in making what looks like a straightforward change we might break something else which depends on what we are changing but which we overlook.

[“You wouldn’t be speaking from experience here, would you?” sneered Enrique, digging his elbow hard into my ribs. Let’s just say that here are several corrections to XSD 1.0 which themselves introduced errors. And yes, I’m embarrassed by them. But it’s not just XSD: the phenomenon is also familiar from programming. “Yeah, yeah,” said Enrique. “Spread the blame. It’s not so bad, because everybody else does it, too? Haven’t you ever heard of the Categorical Imperative?” “I told you to hush! If you can’t be quiet, you’ll have to wait outside.”]

When you get to the point where you don’t want to fix a problem because you’re afraid you’ll just break something else, then you are well on the way to paralysis.

For software, one answer to incipient paralysis is to have a good suite of regression tests. Make the change, run the regression tests, and if all the tests pass, you can feel more confident that you didn’t break things.

The reason I am wishing, today, that I had a good formal model of XSD is that formal models can serve as a kind of regression test suite for prose specs. You have a model, you’ve proven that the spec has this and that property, and that operation X on example Y produces result Z. Now you’re considering a change to the design. Make the corresponding change to the model, and check: does it still have this and that property? Does X(Y) still evaluate to Z? Mechanized systems can make it much easier to check that specified properties of a design are stable in the face of a given change. In Alloy, I find it helpful to re-check all assertions after a significant change to the model (or at least the assertions I think might be affected); Alloy makes that easy. In ACL2, similarly, you can have a set of theorems with proof scripts (including a theorem that proves that the value of X(Y) is Z) and re-run them automatically after changes.

I begin to suspect that a key advantage of formal methods is not just being able to prove that a design has certain properties, but being able to re-prove that the design has those properties, again and again, each time you revise it. So you can make sure that changing this little bit over here didn’t suddenly cause a catastrophic failure in that bit over there.

As Matt Kaufmann, Panagiotis Manolios, and J. Strother Moore put it in the introduction to Computer-aided reasoning: ACL2 case studies ([n.p.: n.p.], 2002), describing a project at Motorola:

Because the design was under constant evolution during the period, the formal models were also under constant evolution and “the” equivalence theorem [that is, a key result in establishing that the microcode worked correctly] was proved many times. This highlights the advantage of developing general proof strategies…. It also highlights the utility of having a good inference engine: minor changes in the theorem being proved do not necessarily disrupt the proof replay.

If formal methods can help us design robust specs, they will be a huge help. But they can be helpful even in working with brittle specs, the kind where changing one thing here can easily break something over there. When we work with such a spec, formal methods can help ensure that we know at once if a change breaks something, so we can avoid making that change in the normative version of the spec. (I’m pretty sure that this was one advantage of the work the CICS group at IBM’s Hursley Laboratory did when they specified parts of CICS in Z. It will have made it a bit easier to contemplate the possibility of changes to the interface if you could prove that the changes would retain the properties you cared most about keeping.)

Of course, if you use formal methods a lot, you can hope that you will learn how to make your designs more robust and cleaner. That’s part of the selling proposition for Alloy, at least. But even for those of us whose designs are sometimes, ah, not as robust as we would like, a formalization of an existing design can be a help in maintenance.

Fear, uncertainty, and XML 1.0 Fifth Edition

[11 June 2009]

From time to time people tell me that the transition from XML 1.0 Fourth Edition to XML 1.0 Fifth Edition is hard. Just as from time to time people have said the transition from XML 1.0 to XML 1.1 would be hard, and might break systems that consume XML data. I just spoke with a friend who told me their company was having internal discussions about what to do about XML 1.0 Fifth Edition, because some of their customers had expressed “concern” (possibly “deep concern”).

I’ve never understood what is hard about either transition; perhaps if I ask here someone can explain it to me.

There are two classes of software to consider: (a) software which checks that a string is a legal XML name, and (b) software which just consumes valid or well-formed XML, without doing its own checking.

Software that actively checks XML names

Obviously, if you are going to upgrade your XML processors from 1.0 Fourth Edition to 1.0 Fifth Edition (or to XML 1.1), you are going to need to change them. No one has ever argued seriously that that’s hard (not even Noah Mendelsohn). Anyone who has written a parser for names can tell you that the new definition of Name is simpler; the only serious likelihood is that a programmer comparing the complexity of the old definition with the relative simplicity of the new may be mildly depressed that the complexity was ever needed (long story, let’s not go there), and will lose a moment or two sighing deeply.

Software that isn’t an XML parser but which has decided for reasons of its own to use XML’s definition of Name may or may not also need to change. Since it’s not an XML parser, it has no obligation to follow the XML spec in the first place. But if you want to change it to keep it in synch with XML, the change is simple, just as it is for an XML parser.

Software that doesn’t check XML names but assumes they are OK

Noah Mendelsohn (my esteemed colleague in the W3C XML Schema working group) was eloquent, in presentations I heard him give, about the danger that an XML 1.1 processor would let data through that an XML 1.0 processor would not have let through, and that that new data might break other software which had been relying on the XML processor upstream for sanity-checking its input data. Such reliance is not at all a bad thing; one point of using XML is precisely that valid or well-formed XML is much more predictable than arbitrary octet sequences.

Software of this kind, which doesn’t itself check its input data, can in principle break when presented with data it’s not prepared for. So the prospect that XML 1.1 (or XML 1.0 Fifth Edition) might break such software naturally scares people a lot. Noah (and possibly others) successfully scared enough people that many people shied away from XML 1.1. Now purveyors of fear, uncertainty, and doubt are trying to scare people away from XML 1.0 Fifth Edition.

But what they are spreading was FUD when they were talking about XML 1.1, and it’s FUD now. It’s not logically impossible for software to exist which works fine when presented with XML 1.0 Fourth Edition input, and which will break when presented with Fifth Edition input. But such software would be unusual in the extreme, even eccentric. No one has ever actually identified such software to me. I’ve been asking, every time this comes up, for the last five or six years.

It’s not at all clear that such software could be constructed by any programmer of ordinary competence. To try to prevent the use of minority scripts in XML names for the sake of avoiding the hypothetical risk of breaking hypothetical software which (if it existed) would be a textbook case of poor design and poor implementation, is just insane.

Let us imagine the existence of such a piece of software; let’s call this software N.

We know very little about N, only that N has no problem with any XML 1.0 name, but will break when confronted with at least some XML 1.1 names that are not also XML 1.0 names. So, let’s see: that means that N is perfectly happy to consume a name containing Tibetan characters, but N might break in an ugly way when confronted with Hittite. Or perhaps N is perfectly happy with the Tibetan characters U+0F47 and U+0F49, which are legal in XML 1.0 4E names, but N will break if confronted with the character U+0F48, which lies between them.

How can this be? By hypothesis, N is not running its own name checker that implements the 1.0 4E rules (if it is, then N belongs in class (a) above, and when confronted with U+0F48 N does not break but issues an error message). What can it possibly be doing with data that comes in marked as a Name, that causes it to handle U+0F47 and U+0F49, but not U+0F48?

As far as I can tell, by far the most common thing to do when ingesting something marked as an XML name is to copy it into a variable typed to accept Unicode strings. Use of this variable may well exploit the fact that it won’t contain blanks. But I haven’t seen much code that is written to exploit the fact that a Unicode string does or does not contain any occurrences of U+0F48, or of characters in various minority writing systems. Maybe I’m just young and ignorant; it’s only thirty years since I started programming, and I’ve mostly worked in fairly restricted areas (text processing, markup, character set problems, that kind of thing), so there’s a lot I don’t know.

So, please, if anyone can enlighten me, please do. What rational programmer of even modest competence — or for that matter, what programmer completely lacking in competence, will write code that (a) is not an implementation of the name rules of XML 1.0 4E, that (b) accepts all names defined according to the rules of XML 1.0 4E, and that (c) will die when confronted with some name which is legal by the rules of 1.0 5E?

In earlier discussions, Michael Kay tried to suggest why a program might fail on 1.0 5E names, but all the plausible examples of such a program involve the program assuming that the characters are all ASCII, or all ISO 8859-1, or all in some other historical character set. Such programs will certainly fail when confronted with 1.0 5E names. But they will also fail when confronted with XML 1.0 4E names, so they don’t satisfy condition (b) in the list.

In order to have properties (a), (b), and (c), software would have to be seriously pathological in design and coding. And I don’t mean that in a good sense.

I conclude: insofar as the resistance to XML 1.1, and to XML 1.0 Fifth Edition is based on fear that the shift will break deployed software, it’s irrational and based on a complete misunderstanding of the detailed technical issues involved. Those who are spreading this FUD are doing neither themselves, nor their companies, nor the community, a service.

XSD 1.1 is a Candidate Recommendation

[4 May 2009; some typos corrected and phrases tweaked 5, 6, and 7 May]

The World Wide Web Consortium has published XSD 1.1 Part 1: Structures and Part 2: Datatypes as Candidate Recommendations, and issued a call for implementation.

As the version number is intended to suggest, XSD 1.1 is mostly very similar to XSD 1.0 and restricts itself to relatively modest changes to the spec.

[At this point, Enrique snorted loudly enough to break my concentration. “If it’s just modest changes, why did it take so long? Let’s see, when did you start? XSD 1.0 was 2001, so …”

“Well, we didn’t start on 1.1 right away,” I hurriedly interjected. “But, well, I guess you’re right. It did take a lot longer than you would have expected.”

“Why? What could possibly take that long?”

“Well, different members of the working group turned out to entertain rather different views of what counts as a modest change. So we spent a lot of the last several years arguing about the relative importance of compatibility, of fixing problems in the spec, and of making the spec more useful for users. And then, on the next issue, arguing about them again. And again. And again.”

“And again?”

“And again. You know, some people say you can be a success in committee work in several different ways: being smarter than everyone else, —”

“You mean, the James Clark approach?”

“Yeah — only that doesn’t always work for people who aren’t James Clark. Or by working harder than everyone else,”

“Paul Cotton always used to talk about how much leverage you have to influence a group if you are the one who always does the minutes. I always thought he was just trying to find a sucker.”

“Well, maybe. But I think he also meant it; it really can be an important role.”

“Then why are members of the W3C Team so strongly encouraged not to do it?”

“Long story; another time, perhaps. Or, third alternative, you can just have more endurance than everyone else.”

“The ‘Iron Butt Rule’?”

“Exactly. The XML Schema working group had several members who seemed determined to try their hand at that technique.”

“Well, there’s you, of course. That would be your only option, really, wouldn’t it? I mean, the other methods …. But you mean, others tried to play the Iron Butt card, too?”

“Hush. I was going to talk about what 1.1 has that 1.0 doesn’t have.”

“So who’s stopping you?”]

XSD 1.1 is mostly similar to 1.0, I was saying before being interrupted. But it does have a number of improvements that can make a difference.

  • XSD 1.1 supports XML 1.1 and XML 1.0 Fifth Edition. (That last does not distinguish it, in my view, from XSD 1.0. But some people believe that 1.0 requires old versions of its normative dependencies, because the working group did not instruct the editors to say explicitly that of course newer editions can be used. Some things should go without saying, you know?)

    This constitutes a significant improvement from the point of view of internationalization.

  • There’s a conditional inclusion mechanism (the vc:* attributes) for allowing a schema document to provide multiple versions of a declaration and select the right one at schema construction time based on which version of XSD the processor supports, what spec- and implementation-defined datatypes are automatically available, and so on.

    This mechanism should make it much easier to produce new versions of XSD without being tied in knots over questions of what back-level processors will make of schema documents which use new constructs. (If XSD 1.0 had had such a mechanism, we could probably have done a better 1.1 in half the time. But we did not learn enough, when doing 1.0, from the example of, say, XSLT 1.0.)

  • Elements can now be declared with a form of conditional type assignment that makes the type assigned in an instance depend on the values of its attributes; this allows a variety of co-occurrence constraints on attributes and content to be expressed.
  • Assertions can be associated with complex and simple types. This also makes it easier (or in some cases possible for the first time) to express certain co-occurrence constraints on attributes and content.

    The assertions of XSD 1.1 are less powerful than the assertions of Schematron, in that they cannot refer to anything outside the element being validated. They will in some cases be less convenient to express. (Ask about the HTML input rule, for example.) But they preserve the context-independence of type validity and an aggressive optimizer should be able to check them in a streaming context, which is not true in general of Schematron assertions.

  • Attributes can be marked inherited; inherited values are written into the XDM data model instance before assertions and conditional type assignment evaluate any XPath expressions, which means that inherited attributes like xml:lang can be consulted in conditional type assignment and assertions.

    I’m proud of this not only because it helps handle internationalization better, but because it aligns the principle of context-free validation better with some of the core idioms of XML.

  • A precisionDecimal datatype has been added, which is intended to mirror the new IEEE 754-2008 specification of floating-point decimal.

    This one is controversial: some members of the XSL and XML Query working groups are vocal in saying it’s a bad idea, it will complicate their type hierarchy and type coercion rules yet again, and we shouldn’t support it.

    [“Of course, some of the same members of QT also predicted that the IEEE spec would never be finished at all, and that the sky would fall, hell would freeze over, and Intel would fall into the Pacific Ocean before supporting it, didn’t they?” said Enrique. “But the spec was published, and Intel is supporting it. So …” “Hush,” I said. “They’ll hear you.” But it doesn’t matter: they don’t much care what Enrique thinks.]

  • The xsd:redefine construct has been deprecated.

    This is a disappointment to some people, who believe that it had great promise. And they are right: it did have great promise. But the 1.0 spec is vague (to put it charitably) on some points; interoperability problems in 1.0 implementations have been reported and the working group has been unable to agree on the correct interpretation of the 1.0 spec.

  • A simpler mechanism for reusing an existing schema document while changing it selectively is now provided under the name xsd:override. For the situations where redefine turns out to be under- (or over-) specified, override provides relatively clear, straightforward answers.
  • The rules for restriction have been made much simpler and more correct. It is no longer possible to use xsi:type with the name of a member type in order to evade facet restrictions on a union.
  • The determinism rule (the so-called “unique particle attribution” constraint) has been relaxed. It’s now legal for wildcards to compete with element declarations; elements win.
  • It’s easier to specify ‘open content’ and effectively insert wildcards everywhere, without cluttering up your content models.
  • Wildcards can now say, in effect, “any of these, except for those.” Some people call these “negative wildcards”.
  • All-groups can now contain wildcards, the elements and wildcards in all-groups can now have maxOccurs greater than one, and all-groups can be extended.
  • To align better with XPath 2.0 and related specs, the simple type hierarchy now includes an xsd:anyAtomicType. Also, the two totally ordered subtypes of duration defined for XPath 2.0 and related specs have (with the cooperation of the XML Query and XSL working groups) been integrated into the XML Schema namespace.
  • A new facet has been added for requiring the timezone to be present (or absent) in datatypes derived by restriction from any of the date/time types; a dateTimeStamp datatype which requires a timezone has been added, at the suggestion of the OWL working group.
  • Lists and unions contructed from ID and IDREF retain the ID- and IDREFness of the ID and IDREF values. Also, you can have more than one ID on an element, which means it’s now a lot easier to support xml:id without having to whack the rest of your vocabulary.
  • Much of the spec has been rewritten, sentence by sentence and phrase by phrase. It was not possible to reorganize the exposition from the ground up (although I agree with those who believe the spec could use it), but while retaining the same organization we were able to make individual paragraphs and sentences easier to follow and understand. More liberal use of technical terms, variable notation, and section headings may seem like trivial changes, but empirically they appear to have a perceptible effect on the readability of the spec.

    Most users, of course, don’t read the spec, even power users. But implementors do, members of the working group do, members of other working groups who need to layer their stuff on top of XSD do. And some users do. I wish we could do more to make the spec more welcoming and legible for them. But while there is a lot of room for further improvement, I think (if I say so myself) that 1.1 is somewhat easier to read than 1.0. It benefits, of course, from being the second go at formulating these things.

It has been a long, hard slog — I lied to Enrique, we actually did start on it in 2001, though we also were doing a lot of other things at the same time — and I think we would not have made it without the perseverance of the chair, David Ezell of Verifone, representing the National Association of Convenience Stores (to both of whom thanks for seconding David to the group and supporting the time he spends on XSD), and the hard work of Sandy Gao of IBM on the Structures spec and Dave Peterson of SGMLWorks! (who serves as an invited expert) on the Datatypes spec. XSD 1.1 is not a perfect spec, by any means. But it’s an improvement on 1.0, and it’s worth pushing forward for that reason. And without David, and Sandy, and Dave, it would not be happening. Anyone interested in the validation of XML owes these three a debt of gratitude.

The long hard slog is by no means over. Publication as a Candidate Recommendation means the W3C has now called for implementations. If you are a programmer looking for a challenge, I challenge you to implement XSD 1.1! If you are a user, not a provider, of XSD software, urge the supplier of your software to implement XSD 1.1, and test their implementation! The more you push on the implementations now, the stronger they will be when the time comes to demonstrate implementation experience and progress the spec to Proposed Recommendation. And the more experience we will have gained towards the goal of having a broadly supported validation language which supports the full spectrum of XML usage.

[“Wow!” said Enrique. “Did you know that perseverance is a theological term? “‘continuance in a state of grace leading to a state of glory’!” “In other words,” I said, “you looked it up because you didn’t think I knew how to spell it correctly, did you?” “Oh, hush,” he said.]