Tony Coates has written a URI resolver for .Net

Tony Coates reports that he has written a URI resolver for .net that supports XML catalogs.

XML catalogs allow local caching of long-lived material like schemas, DTDs, and the like. They thus make it feasible to work with XML software that uses URIs to locate such material even when you are currently not network accessible. They also make it easier to work around discrepancies in software. Some programs I use support normal URIs as system identifiers, others (mostly older stuff, admittedly, but programs I still want to use) really only directly supports access to the file system. But when programs support catalogs, then I can generally make them both work fine and avoid having to munge the system identifiers in a document each time I open it in a different program or my network status changes. Richard Tobin‘s rxp became much more useful (and went from being something I used every few months to something I use essentially all the time) when he added catalog support; other software that I would otherwise use routinely doesn’t support catalogs, because the developer doesn’t see the point (I won’t name names, but you know who you are), with the result that I don’t use it at all when I can avoid it.

Tim Berners-Lee once told me that really, catalogs seemed just a special case of local caching, and that one should use the local cache instead of a special-purpose mechanism. In some ways, he’s right, and I’m willing to stop using catalogs when I learn about a local cache mechanism that is as well documented, as simple to use, and as application independent as catalogs have proven to be. (Who is willing to hold their breath til that happens?)

Good work, Tony!

If only all XML software supported XML catalogs.

Posted in XML

Triage

The XML Schema Working Group is trying to move its focus to the Structures spec, whose Last Call comment period ended a couple months ago. (Work on the remaining Datatypes issues will continue in the background, but currently it is the editors, not the Working Group, who are the bottleneck on Datatypes. If the Datatypes editors manage to work effectively in the background, they may manage to use these next few weeks to produce wording proposals for the remaining issues.)

So I just finished reading through all 104 Last Call issues (so far) opened on the Structures spec, trying to impose some order on them.

Whenever I have a longish list of tasks, or of issues that need resolution, my fingers itch to classify them. Which ones are easy and will take just a few minutes of the editors’ time to fix? (Ha! in my experience, editors’ time doesn’t come in increments smaller than half an hour — once you factor in the time it takes to generate a new version of the spec and check to make sure you actually made the change correctly, even a simple fix to a broken paragraph can take forty-five minutes.) Which ones are hard and will take a lot of time and effort, either Working Group discussion time to reach consensus on the right thing to do, or editorial time to draft, or both? Which lie in the middle?

The process is a little like some of those classic AI programs discussed in textbooks (bagging groceries, for example): it’s hard to do perfectly, but even an imperfect classification can be much better than nothing. Of course, it’s even more like the process of deciding, after a catastrophe, which victims will benefit from medical attention, which victims are too far gone to expend resources trying to save, and which can get along without help. So I always think of the process of classifying issues as triage.

In scheduling issues for WG discussion, you want (in my experience) to put a few easy items on the agenda each week, so that every week the WG has the experience of nailing a few issues. But you can’t just sequence the list from easy to hard, because that leads to a dramatic slowdown as you get past the easy ones and move into the hard ones, which can have dreadful effects on morale. So in addition to the easy ones each week, you want to schedule some hard ones early on, so that the WG can spend the time it may take to understand them, develop solutions, argue about them, develop some better solutions, and eventually converge on a good solution, before you start getting into deadline pressure.

But every time I start to perform triage on a list by estimating how much time it will take, I am reminded that some items are important, and others are less important, and that importance doesn’t really correlate well with how long things will take. There is then a crisis while I contemplate some issue that is clearly important, but will take a while — or won’t take long but also won’t actually contribute much to making the spec better. And in the time-honored fashion, I compromise by trying to do both.

I almost always end up performing a sort of double triage, classifying things along two distinct axes:

  • cost: hard, easy, work (as in “this will take some work”)
  • importance: important, medium, or thimble (thimble? yes, thimble. I’ll explain in a minute)

Of course, the result is at best a rough and ready way of subdividing the problem space in order to control complexity and stave off despair. Different people will judge the importance of issues differently, likewise their cost, and even if everyone agreed on the importance or the likely cost of an issue, they could still be wrong. (For this reason, I encourage others in the WG to perform their own rough and ready classifications, but discourage the occasional attempt to discuss the correct categorization at length. By all means, tell me you think issue 666 is likely to be devilishly hard, so you think I’m wrong to class it easy. I may change it, then. But for heaven’s sake, let’s not waste time arguing about a back of the envelope scheduling estimate!)

I use the term thimble for items that aren’t either important or medium important, mostly because I find I can’t bear to call any problem in a spec “unimportant”. No issue raised by a reader is unimportant. And yet, some are more important than others. And if some are more important, then it seems to follow with logical necessity that some are (sigh) less important than others.

The image comes from a discussion of Dante’s Paradiso during my student days. Some students found it hard to come to terms with the fact that there are hierarchies even in Dante’s paradise: some of the blessed are in the inner circles, close to God, and others are, well, they are in the outer circles. This offended some readers’ egalitarianism (are they less virtuous than the other souls? less good? less deserving? Just where does God thinks he gets off, banishing some virtuous souls to the outer circles?!), and so we discussed it for a while. Eventually, someone said that when they had discussed this kind of thing in school, the nun teaching the class had finally taken up a thimble and a water tumbler and filled them with water. Each, she said, demonstrating, was full, as full as it could get. One, to be sure, held more water, but saying that the tumbler held more water did not entail saying that the thimble was not full. In a similar way, we can believe that all the souls in heaven are as good as they can be, while still recognizing that their capacity for goodness may vary.

A comment correctly pointing out that a particular sentence is malformed or confusing can never be unimportant. it is as important as it can be, even if the sentence in question describes a corner case seldom encountered and thus of comparatively little practical import, compared to (say) a problem in the definition of a concept that is constantly appealed to.

I call this process ‘triage“, but of course it works not with three classes but nine, in a three by three table. In a perfect world, of course, you’ll resolve all problems and the categorization is used only for scheduling. If, however, in this world of woe you or your Working Group sometimes miss that level of perfection, then it can matter which issues you address and which you end up closing with a laconic WONTFIX message. If you haven’t botched the categorization, you will get the best bang for the buck if the important, easy items have gotten done, and if you haven’t poured precious resources into unsuccessful attempts to resolve the items classed thimble, hard. Me, I figure you want to start at the top left of the table (important, easy) and move more or less systematically to the bottom right.

I still haven’t figured out how to decide between spending time on important, hard items or on medium, work items. The first are more important (d’oh!), but you’re likely to get fewer of them actually done. So it’s a judgement call, at best.

Unfortunately, I’m never happy even with a three by three classification of issues.

To understand any issue, you need to understand what the commenter claims the spec is doing, what is wrong with that, and how to fix it. But that doesn’t suffice to resolve it: before you touch the spec, you must decide whether you think the commenter is right. What did the working group intend to do here? And why? What’s the underlying design story? What does the current text actually do? Is the problem reported a symptom of some larger complex of problems? What are the options for changing things? What are the relevant design principles? Which invariants should the spec be seeking to achieve or preserve? If the issue is at all complex, it can take a while to get up to speed on the relevant information. So if there are several issues that deal with related topics, you really, really want to deal with them all at once, or in succession. (Few things sap morale more effectively than the discovery that in dealing with four related issues, at four separate times, the WG made four decisions which effectively contradict each other because it failed to spin up successfully or remember what it did earlier.)

So I almost always find that I also want a sort of topical clustering of issues. “If we’re going to deal with issue 2218, then we might as well deal with 5078 at the same time. And then the answers to 5163 will just fall out from the earlier decisions.” Perfect clustering involves perfect knowledge and perfect classification, so it doesn’t happen. And I often change my mind about what real issue a given problem ticket involves. So my attempts at topic clustering are even less stable and reproducible than my cost and value estimates. But failure to cluster is like non-locality of reference in a program: it leads to thrashing.

The XML Schema Working Group maintains its issue list in public, so for what it’s worth the current triage of Structures issues is visible. You have to make the Whiteboard column visible, and sort on it, to see the triage clearly.

Several questions arise. Other people presumably face this problem, just as I do. But I don’t hear them moaning and groaning about it the way I do. Have they found better, less painful ways of managing this process? Or are they just more stoic?

And why oh why do so few issue tracking systems make it convenient to add yet another dimension on which to classify issues? Bugzilla at least provides the Whiteboard field, where you can do pretty much anything you like, and then sort. But there isn’t a convenient way to say “sort first by cost and then by importance” or “sort first by importance and then by cluster and finally by cost”, etc. What would it take to make it better?

Exposing your ignorance

A wise friend just told me something that deserves recording. I had mentioned some encounter during which I was tap-dancing furiously, trying to avoid exposing too much ignorance.

My friend (who shall remain nameless since I haven’t asked permission to broadcast this story) responded that it’s always (not almost always) a waste of time and effort to try to avoid exposing your ignorance. “Those who think you know nothing anyway won’t be impressed, and those who admire your intellect will further admire your fearlessness. In short, always dare to be wrong, even if you’re not sure you are.

Words worth trying to live up to.

Sandboxes and test cases

Playing in sandboxes can be a lot of fun. I just spent a little while playing in a software sandbox I set up last fall, and can only wish I had gotten around to setting it up a lot earlier.

Last October, for reasons I need not go into, I conceived a strong desire to generate a large-ish number of test cases for particular areas of XML Schema, for use in thinking about the current state of implementation of XSDL 1.0, and in thinking about what ought to happen in XSDL 1.1. So I spent some time wrapping my head around the relatively elaborate test suite framework the XML Schema Working Group adopted some years ago for our test suite. (I had looked at it when we adopted it, of course, but you look at a vocabulary in a different way when you are going to be generating data using it.) Eventually, my plan was (and still is) to generate test cases automatically from Alloy models, but as a first step I mocked some test cases up by hand.

But to make sure the test cases were actually testing what I wanted to test, I needed to run them, systematically, on different processors. So I spent an afternoon or two (or three) installing every schema processor I could conveniently get my hands on and persuade to run under MaC OS X, or under Windows XP (since I started using this Mac I have not had any machines running Linux [I say it with a bit of a sigh]), and writing some scripts to wrap around them so I don’t have to remember what order they want their command-line arguments in, or what options they want.

It’s always interesting to construct test cases to illustrate some question or other that arises, whether from a comment on the spec or an inquiry by email. And I have directories with scores of small test cases I have constructed over the last few years. But until I started this more systematic construction of a schema validation sandbox, I contented myself with checking those test cases with one or two processors.

But it turns out that having more processors to test is just a lot more fun. Here is a test case. OK, what does libxml say about it? MSV? Saxon? Xerces C? Xerces J? (And if I’m really energetic, move to the other machine and check MSXML 4, MSXML 6, and xsv. I ought to get a current version of XML Spy installed, too, but I haven’t been that energetic yet.) For substantially the same effort, to get five times as much information, just makes the whole thing more fun. (And the more fun we can make it to construct and run test cases, the better.)

Today’s effort was an attempt to answer a question raised by Xan Gregg in a comment on an open issue against XSDL 1.1 Structures. Two schema documents (one pretty much as Xan provided it, one modified to correct a possible oversight), and five instances, provide a simple test of how implementations have interpreted a rule in XSDL which Xan and I turn out to have read differently. (The test cases, and a catalog for this tiny collection of tests, are all on the W3C server at http://www.w3.org/XML/2008/xsdl-exx/. I plan to put every schema example I generate this year there, instead of hiding it on my hard disk. At least the interesting ones.) The process of making them and testing them was delayed for a bit of yak-shaving (quick, how do you embed selected XHTML modules into your DTD, in a way that you are willing to let other people see in public?), but I got them made eventually, and demonstrated to my own satisfaction that virtually all the implementors have agreed on the meaning of this bit of the spec. (Fortunately for me, they all read it the way I read it. But Xan is right that it could be taken in a different way; the wording should be changed to make it clearer.)

Maintaining a software sandbox with installed copies of all the software one wants to play with can be time consuming. And since in the usual case, you aren’t familiar with the software yet, you may not be able to make a strong case for making it an urgent task. Uncertain cost, uncertain benefit, low priority. Other things always seem more urgent. But having such a sandbox, and playing in it from time to time, are important tasks, even if not often urgent. It’s nice when you get a chance to do it.

Happy New Year.

Spolsky (and Usdin and Piez) on specs

Joel Spolsky (of Joel on Software) has put up a talk he gave at the Yale Computer Science department a few weeks ago. (Actually, he put it up a few weeks ago, too, shortly after giving the talk. I’m just slow seeing it.)

In it, he has an interesting riff on specifications and their discontents, which feels relevant to the perennial topics of improving the quality of W3C (and other) specs, and of the possible uses of formalization in that endeavor.

If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer.

This is not the whole story, by any means, if only because specs can and often do explicitly refrain from specifying everything a conforming implementation is to do. But it raises an important point, which is that the more precise one tries to make a spec, the easier it can be for contradictions or other problems to creep into it. (In my experience, this is particularly likely to wreak havoc with later attempts to correct errata.)

In their XML 2007 talk on “Separating mapping from coding in transformation tasks”, Tommie Usdin and Wendell Piez talk about the utility of separating the specification of an XML-to-XML transform (“mapping”) from its implementation (“coding”), and provide a lapidary argument against one common way of trying to make a specification more precise: “Code-like prose is hard to read.” (Has there ever been a more concise diagnosis of many reader’s problems with the XML Schema spec? I am torn between the pleasure of insight and the feeling that my knuckles have just been rapped, really hard. [Deep breath.] Thank you, ma’am, may I have another?)

How do we make specs precise and complete without making them as hard to write, and as hard to read, and as likely to contain insidious bugs, as the source code for a reference implementation?