Archive for February, 2008

A little formalism (variable names)

Friday, February 29th, 2008

Many readers associate the use of variables with mathematics and feel threatened by paragraphs that begin “Let E be … and F be …. Then …” And similarly with technical terms: when a text defines and uses a lot of technical terms, it can be very daunting to the first-time reader (and many others).

So it’s understandable that sometimes, in trying to keep a text accessible to the reader, one works hard to avoid having to introduce variables to refer to things, and to avoid relying on technical terms with special meanings.

But sometimes such efforts backfire. In the XSD (XML Schema Definition Language) 1.0 spec, you end up with rules that read like this:

Validation Rule: Element Locally Valid (Type)
For an element information item to be locally ·valid· with respect to a type definition all of the following must be true:
1 The type definition must not be ·absent·;
2 It must not have {abstract} with value true.
3 The appropriate case among the following must be true:

3.1 If the type definition is a simple type definition, then all of the following must be true:
3.1.1 The element information item’s [attributes] must be empty, excepting those whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is one of type, nil, schemaLocation or noNamespaceSchemaLocation.
3.1.2 The element information item must have no element information item [children].
3.1.3 If clause 3.2 of Element Locally Valid (Element) (§3.3.4) did not apply, then the ·normalized value· must be ·valid· with respect to the type definition as defined by String Valid (§3.14.4).
3.2 If the type definition is a complex type definition, then the element information item must be ·valid· with respect to the type definition as per Element Locally Valid (Complex Type) (§3.4.4);

I would say “Maybe it’s just me, but I find that kind of hard to read,” but that would be disingenuous. There is ample evidence from the last eight or nine years that I am not the only reader of the XSD 1.0 spec who finds parts of it hard to read. This is a relatively mild example, as the XSD spec goes. But if we can overcome our fear of formality, the text can become a bit simpler. Two changes in particular seem useful here.

  • Introduce the names E for the element and T for the type, and use them.
  • Follow the example of most specs that define and use namespaces: specify and use a conventional prefix to represent a given namespace, and say once and for all, when that prefix is identified, that in practice the user can use any prefix they wish (or none). Then just use the QNames, rather than writing out the namespace in full each time you have to talk about names in that namespace.

Applying these rules to the fragment just given, we get something a bit easier to read.

Validation Rule: Element Locally Valid (Type)
For an element information item E to be locally ·valid· with respect to a type definition T all of the following must be true:
1 T is not ·absent·;
2 T does not have {abstract} with value true.
3 The appropriate case among the following is true:
3.1 If T is a simple type definition, then all of the following are true:
3.1.1 E’s [attributes] are empty, excepting those named xsi:type, xsi:nil, xsi:schemaLocation, or xsi:noNamespaceSchemaLocation.
3.1.2 E has no element information item [children].
3.1.3 If clause 3.2 of Element Locally Valid (Element) (§3.3.4.3) did not apply, then the ·normalized value· is ·valid· with respect to T as defined by String Valid (§3.16.4).
3.2 If T is a complex type definition, then E is ·valid· with respect to T as per Element Locally Valid (Complex Type) (§3.4.4.2);
4 If E has an xsi:type [attribute] and does not have a ·governing element declaration·, then the ·actual value· of xsi:type ·resolves· to T.

I won’t claim that the text has become easy to read and follow, but I think there is one salient difference: in the first text above, my first difficulty as a reader is understanding what the text is trying to say, and once I have figured that out, I may or may not have energy left to try to understand why it’s saying that. In the second text, it’s easier (I think) to understand what the individual clauses are saying. The reader still has the task of understanding why, but at least the difficulties of comprehension are now those related to the intrinsic difficulty of the topic, without the additional barrier of complex syntax.

Another tactic adopted by some in trying to make difficult material easier to read is to avoid defining technical terms. The XSD 1.0 spec raises this to a fine art; often, the easiest way to understand how a given rule came to be formulated as it is, is to imagine that it was first written in a simple, straightforward clause using technical terms, and then the technical terms were eliminated and their definitions inserted inline. And then the process was repeated once, or twice, or more. The result is mostly devoid of difficult or obscure technical usages, but it’s often also a sentence only an eighth-grade English teacher teaching the unit on sentence diagramming could love.

If we re-introduce appropriate technical terms, this process can be reversed. Sometimes the introduction of even a single technical term can do a surprising amount of good.

Take the following example from the XSD spec:

2.3.1 The element declaration is local (i.e. its {scope} must not be global), its {abstract} is false, the element information item’s [namespace name] is identical to the element declaration’s {target namespace} (where an ·absent· {target namespace} is taken to be identical to a [namespace name] with no value) and the element information item’s [local name] matches the element declaration’s {name}.

In this case the element declaration is the ·context-determined declaration· for the element information item with respect to Schema-Validity Assessment (Element) (§3.3.4) and Assessment Outcome (Element) (§3.3.5).

This is followed by another clause with almost identical wording, covering global elements.

If we make use of the term expanded names, defined by the Namespaces in XML recommendation, and refer to the expanded names of the declaration and element instead of inlining the definition of expanded name by referring to namespace name + local name pairs — this entails defining the term expanded name as it applies to schema components — and supply the obvious variable names for element and declaration, then it’s easier to see that this rule for local element declarations can be merged with the following rule for global element declarations, since the two do exactly the same thing. So we can replace both the rule above and the the rule that follows it in the spec with:

If I’m smiling this evening, it’s because this morning the XML Schema working group agreed to these changes, and scores of other similar changes, to the text of the XSD 1.1 spec. The design of the language, I admit, is still very complex. The exposition, I concede, still has a sub-optimal structure. But the third source of difficulty, namely the complexity of individual sentences in the validation rules and contraints on schema components, is somewhat diminished by this change.

Variable names as a short-hand for complex noun phrases; technical terms to capture frequently needed concepts; conventions to allow things to be said simply instead of in convoluted clauses: it’s almost enough to make you think that mathematical writing is the way it is, in order to make things easier to read, instead of harder to read. Food for thought.

Choosing schema-validation roots

Monday, February 18th, 2008

[18 February 2008, Prague]

A colleague asks:

naive XML schema question — How does a validating parser know which xs:element is supposed to be the root/document element? I don’t see anything in the schema that tells it.

I’m not getting any love from google or the schema Recs. (I’ve looked at every use of the word “root” in the Recs, with no clues.)

I hate it when smart people who are willing to put in some work to understand things can’t find the answer to their questions in the schema spec. So first of all, I’m sorry. I apologize on behalf of the spec on which I’ve now spent a large proportion of my working life. (I wish I thought I could do something about it, but the XML Schema WG has been appallingly reluctant to fixing the incomprehensibility problems of the spec. I think the 1.1 spec is marginally better than 1.0 in some ways, but only marginally and only in some ways. If you hated the 1.0 spec, you may find you hate 1.1 ever so slightly less, but it’s unlikely to charm you into liking it.)

But this question does come up a lot. And if the WG won’t explain it clearly in the spec, then at least I can try to explain it clearly here.

The choice of validation root is not specified by XSD. Formally it’s regarded as out of scope; in practice, the expectation is that processors will either provide a useful method of choosing where to start validation and users will specify the validation root at invocation time, or that processors will provide a useful default choice (e.g. the document root), or that in some cases processors will provide a fixed choice (e.g. the document root). In the latter case the user can be said to have chosen to start validation at that fixed point by choosing to use that particular validator. That may sound Orwellian, but in principle, at least, the rule is that if you don’t like the level of control given you by a given tool, then why are you using that tool? File a bug report, or an enhancement request, or get another tool. Or both.

The closest the XSD spec comes to talking about this is in section 5.2 (“Assessing Schema-validity”). Personally, I find the discussion in XSD 1.1 marginally clearer than the discussion in 1.0, but I may be exhibiting my bias in that.

My colleague continues:

Preliminary experiments suggest that at least in a normal schema, you can, in fact, just give a fragment of a document and have the document be considered schema valid. So “<br/>” is a schema-valid HTML document? Very odd.

Well, no and yes. “<br/>” is schema-valid against the HTML schema, if schema-validity assessment starts with that element and any of (a) the corresponding element declaration, (b) the relevant type definition, or (c) the instruction to start in lax or strict wildcard mode and look for an applicable definition. And if that element happens to be the document root, then yes, it’s a document valid against the XHTML schema.

Since the default setting for many XSD validators is to start at the document root in lax-wildcard mode, they accept your sample document as valid.

An analogous result could be achieved using a DTD, by writing

<!DOCTYPE br PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<br/>

I think that those who run an XML validator over that document will find that it is valid against the DTD.

The document type definition at http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd has no formal specification that any particular element must be the root element; the constraint on the generic identifier of the root element is specified as part of the document type declaration, the “<!DOCTYPE” part. Analogously, the XSD schema for XHTML doesn’t have any formal specification of any required root element, or required starting declaration; both get specified at validation time. Both when using DTD s and when using XSD, this allows you to validate one part of a document at a time. If you’re editing a large document and are storing different parts of it in different files, it’s convenient to be able to validate each part independently.

Another analogy is with the formal definition of a grammar: the set of productions that most of us think of a grammar does not specify the start symbol. The start symbol is specified in a different part of the tuple that is, for formal-language purposes, the grammar. To describe schemas in these terms: the schema, or the collection of element and other declarations in a DTD file, does not define a full document grammar, but a set of productions for a document grammar. The start symbol is specified separately, in a doctype declaration for DTDs, and at validator invocation time for XSD schemas.

The rules for the HTML vocabulary specify that a conforming HTML document should start with an ‘html’ element, so if you want to check conformance to the HTML spec (as opposed to schema-validity against the XHTML schema, which is not quite the same thing) you don’t get so much choice of how to invoke the validator: you should start with the declaration for the ‘html’ element and with the document’s root element.

If the validator you’re using doesn’t allow you to specify (a) where to start, and (b) what to start with, then you really should file a bug report or a request for enhancement. And whether you do that or not, you really should understand that some of the consequences of the implementation’s default choices are properties of how you are performing validity assessment, not properties of XSD validation in itself.

Some people dislike having to say explicitly that use of a particular vocabulary must start with a particular element, so they take pains to make only that one element top-level; all other elements are defined locally to complex types. This is an effective way of preventing abuse, but it also pretty effectively prevents re-use, and it makes the schema harder to maintain, work with, or reason about. I can’t see such a schema without thinking someone has just cut off their nose in order to spite their face.

Scenes from a Recommendation 3: Boston, Prudential Tower

Tuesday, February 12th, 2008

Another memory from the development of XML.

It’s November 1996, at the GCA SGML ‘96 conference, at the Sheraton in Boston. The SGML on the Web Working Group and ERB have just been through an exhausting and exhilarating few weeks, when from a standing start we prepared the first public working draft of XML. At this conference, we have been given a slot for late-breaking news and will give the first public presentation of our work.

Lou Burnard, of Oxford University Computing Services, the founder of the Oxford Text Archive, is there to give an opening plenary talk about the British National Corpus, a 100-million-word representative corpus of British English, tagged in SGML. Lou and I are old friends; since 1988 we have worked together as editors of the Guidelines of the Text Encoding Initiative. Working together to shepherd a couple dozen working groups and task forces full of recalcitrant academics and other-worldly text theorists (“but why should a stanza have to contain lines of verse? I can perfectly well imagine a stanza containing no lines at all”) from requirements to draft proposals, to turn their wildly inconsistent and incomplete results into something resembling a coherent set of rules for encoding textual material likely to be useful to scholarship, and to produce in the end 1500 pages of mostly coherent prose documentation for the TEI encoding scheme, Lou and I have been effectively joined at the hip for years. We have consumed large quantities of good Scotch whisky together, and some quantities of beer and not so good whisky. We have told each other our life stories, in a state of sufficient inebriation that neither of us remembers any details beyond our shared admiration for Frank Zappa. We have sympathized with each other in our struggles with our respective managements; we have supported each other in working group and steering committee meetings; we have pissed each other off repeatedly, and learned, with a little help from our friends (thank you, Elaine), to patch things up and keep going. No one but my wife knew me better than Lou; no one but my wife could push my buttons and enrage me more effectively. (And she didn’t push those buttons nearly as often as Lou did.)

Tim Bray is also there, naturally. He and I have not worked together nearly as long as Lou and I have, but the compressed schedule and the intensity of the XML work have made it a similarly close relationship. We spend time on the phone arguing about the best way to define this feature or that, or counting noses to see which way a forthcoming decision is likely to come out (we liked to try to draft wording in advance of the decisions, when possible). We commiserate when Charles Goldfarb calls and spends a couple hours trying to wear us down on the technical issue of the day. (Fortunately, Charles called Tim and Jon Bosak more often than me. Either he decided he couldn’t wear me down, or he concluded I was a lightweight not worth worrying about. I’m not complaining.) Like Lou, Tim often reads a passage I have drafted and says “This is way too complicated, let’s just leave this, and this, and this, and that, out. See? Now it’s a lot simpler.”

At one point I believed it was generally a good idea for an editorial team to have a minimalist and a maximalist yoked together: the maximalist gets you the functionality you need, and the minimalist keeps it from being much more than you need. Maybe it is a good idea in general. Or maybe it was just that it worked well both in the TEI and in XML. At the very least, it’s suggestive that in the work on the XML Schema spec, I was the resident minimalist; if in any working group I am the minimalist, it’s a good bet that the product of that WG will be regarded as baroque by most readers.

It’s the evening before the conference proper, and there is a reception for attendees in a lounge at the top of the Prudential Tower. I am standing chatting with Tim Bray and Lauren Wood, and suddenly Lou comes striding urgently across the room towards us. He reaches us. He looks at me; he looks at Tim; and he says, in pitch-perfect tones of the injured spouse, “So this is the other editor you’ve been seeing behind my back!”

Scenes from a Recommendation 2: subtle and devious

Monday, February 11th, 2008

Tim Bray’s prose sketch of Jon Bosak is good, and vivid, but it doesn’t mention what I think is one of Jon’s outstanding traits. In a quiet, utterly unassuming way, Jon is one of the most persuasive and politically astute people I have ever met. He will not thank me for pointing this out: I think he thinks that if people know, they’ll be on their guard. He doesn’t do a hard sell (at least not to me); he takes the trouble to understand where his interlocutor is coming from and to find common ground with them. And he has patience; he is not dissuaded from a goal by the idea that it might take a while, or that it must be approached indirectly. And he is very reticent about taking credit.

We wrote Jon’s name into the XML spec, in the passage

XML was developed by an XML Working Group … formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group … also organized by the W3C.

because we wanted to get Jon’s contribution on record and force him to accept credit. Without Tim Bray, or without me, or any of the other members of the editorial review board and working group, the spec would have been different. Without Jon it would not have come to pass.

In memory of a particularly difficult political task undertaken and successfully negotiated, some of Jon’s friends once gave him a gift that I have always thought apposite: a dark gray bomber jacket, embroidered in dark gray (so the embroidery was virtually invisible) with the words “Subtle and devious”.

Hail, xml:Father!

Scenes from a Recommendation 1: Chicago, Cafe des Artistes

Monday, February 11th, 2008

[11 February 2008]

The XML spec became a W3C Recommendation ten years ago this week.

Tim Bray has posted some character sketches from the period; Eve Maler has followed suit with some recollections (and an online version of Maler/El Andaloussi! Woo hoo!); this has inspired me to think about doing the same. What follows is the first in (what I hope will be) a series of moments I remember from the creation of XML.

If you look, you can find a lot of stories about the beginning of XML. It surprised me, at first, that they all seem to be different; it surprised me even more to find some told in the first person by people whom I had not suspected of being involved with XML at all. But I shouldn’t have been surprised. Scores or hundreds of people were involved in the development of XML, thousands in its spread and uptake. In some sense, then, XML will have had scores, or hundreds, or thousands of beginnings. Why should I think I know about them all? Questions like “How did X start?” often mean not “How did X start?” but “How did you come to be involved in X?” — or, at least, that’s how we answer them. The beginnings of XML? I don’t know. But I’ll tell you what I do know; I know when I first heard about it.

The second WWW conference was in Chicago, in October 1994. With Bob Goldstein, one of my colleagues at the University of Illinois at Chicago computer center, I had submitted a paper on how the Web would achieve its true potential only once it had SGML awareness (“HTML to the Max”). We had to do it for the conference in Chicago, because the computer center would certainly not have paid for travel to go anywhere else; they weren’t happy at being asked to pay the registration fee, let alone travel.

This was the conference at which I saw SoftQuad Panorama for the first time, but I could not figure out the stylesheet mechanism well enough to use it to deliver my slides; later I did, and used Panorama for slides for years after that.

Sometime during the conference I ran into Jon Bosak, whom I knew from SGML conferences, and Dave Hollander, whom I may have been meeting for the first time. Jon was at Novell, Dave was at HP (where I believe he was in charge of building hp.com), and the conference hotel (as I remember it, the Hilton on Michigan Avenue it) was hopelessly overcrowded; there was no chance of finding a place to sit and have a cup of coffee.

Fortunately, I was local, and knew the neighborhood slightly, so I led Jon and Dave up the street a bit to the Cafe des Artistes, where we sat on the sidewalk in the October sun and talked about markup and the Web. Browser makers didn’t want to support SGML; they thought it was too complicated and would make the browsers heavy-weight. This seemed like an excuse to me: browsers were already putting on more weight than SGML support would have required. We speculated about defining a simpler profile of SGML, so they couldn’t use complexity as an excuse. Where? The ISO working group was not going much of anywhere with the five-year review and revision of the spec; it seemed unlikely anyone would get any joy there. SGML Open? Somewhere else? Jon had an idea it might fly as an idea for a working group at the new World Wide Web Consortium. Dave thought that SGML support would be the best thing, but if we couldn’t persuade people to do that, he had a fallback idea. HTML appeared hopelessly unequal to the task of publishing serious technical documentation, but Dave’s experience developing the Semantic Delivery Language (SDL) had persuaded him that you could have a reasonably small markup language with some hooks for richer semantics. SDL had been developed for help systems and delivery on CD-ROMs, and in it each element carried an attribute that said what kind of element it had come from in the SGML vocabulary in which the document had been prepared. (Nowadays you’d do that with the HTML class element.)

[Googling for “semantic delivery language” just now, I find that about that time Jon put together a proposal for specifying a Hypertext Delivery Language (HDL) based on SDL and optimized for WWW delivery; the copy on the W3C site says it was last modified 1 November 1994, less than a month after the Web conference.]

Other people had specified subsets of SGML before; I had done one myself in 1992 (“Poor-Folks SGML”), and several others were also used as input to the design of XML. And of course I had just given a paper urging that Web software start supporting SGML.

But I think of that cup of coffee at the Cafe des Artistes on Michigan Avenue as the first time I heard anyone talking about the specific idea that became XML.

And you, dear reader? What are your recollections of the beginnings of XML? When did you start thinking about the ideas that became XML? When did you first encounter the effort that in the end produced the spec? Tell me here, or write about it on your own blog.

Now THAT’s a birthday present!

Monday, February 11th, 2008

Eve Maler has marked the tenth anniversary of the XML spec by posting an online copy of the book she and Jeanne El Andaloussi wrote on vocabulary development: Developing SGML DTDs: From Text to Model to Markup.

There is a huge body of knowledge, craft, and/or art about document analysis, vocabulary design, and the use of markup in systems that went into the design of SGML and XML. (Some call it “the SGML methodology” as opposed to “SGML” or “the spec”.) Almost all of it circulates largly in oral tradition; Maler/El Andaloussi was for a long time the only, and is still one of the best, attempts to write it down.

Thank you for the birthday present, Eve!

Tim Bray on XML People

Monday, February 11th, 2008

To mark the tenth anniversary of the XML Recommendation, Tim Bray has resurrected an account he wrote ten years ago of various people involved in the pre-history and creation of XML.

Well worth reading, whether you were there and are looking for an excuse to spend half an hour on nostalgia, or you weren’t there and wonder what it was like. Of course, there is no single “what it was like”: it was like different things from different vantage points. My memories of the initial development of XML are a lot longer on technical discussions and a lot shorter on memorable dinners with movers and shakers.

Another plug for XML Catalogs (and caching)

Saturday, February 9th, 2008

The W3C systems group posted a blog entry the other day about the caching of DTDs and schemas. The failure of some XML software to use caches wisely is causing unbelievable amounts of traffic on the W3C site: in some cases, the same IP address is requesting the same DTD file hundreds and thousands of times in the space of a few hours.

The blog has good pointers to resources about using HTTP caching well, and about XML Catalogs.

I’ve said it before, and I’ll say it again: every piece of software that works with XML ought to use XML Catalogs. By all means allow the user to turn it off, but support it, and turn it on by default. The main reason is: it makes the life of your users easier. And the kind of problem discussed by the systeam blog post is one more reason.

Devil ain’t got no place to play ’round here

Friday, February 8th, 2008

[8 February 2008]

Idle hands, they say, are the Devil’s playground.

Poor Devil. No playground around here these days.

After the XML Schema working group buckled down at our face to face meeting week before last, and worked through all of the Last-Call issues raised against the Structures spec, the WG took a week without a meeting, to allow the editors time to get some proposals done. My co-editor Sandy Gao, whose name will someday be in reference books as a cross reference from “Stakhanovite”, produced wording proposals to close 42 open issues against Structures. In the same time, I managed a proposal to resolve one Structures issue. (Or rather, to resolve it in part.) On the Datatypes side, we didn’t manage quite such a glorious record, but we did manage proposals for eleven issues.

OK, yes, sure, many of the issues we had proposals for were simple to fix: typos, small tweaks to wording, and so on. That’s one of the reasons for doing triage: to have a list of easy items and low-hanging fruit. But still, that’s a lotta issues to close. Drafting and reviewing other editors’ drafts occupied most of my time last week, and this week most of my Schema time went to the mechanical task of just getting the proposals shipped to the working group.

I could have made lots of klog entries, but they would all have read

  • 2008-01-30: finished draft for bug 2947.
  • 2008-01-30: finished draft for bug 3265.
  • 2008-01-30: finished draft for bug 3256.
  • 2008-01-30: finished draft for bug 4839.
  • 2008-01-30: finished draft for bug 4089.

Today the working group adopted almost all of the proposals; they bounced one back and accepted the wording for one issue as a “partial” resolution, with the request that the editors try to revise it one more time, to get it a little better before we close the issue entirely. But we closed a lot of them.

So since this morning’s WG call, I have been generating new copies of the status-quo documents and updating Bugzilla records.

This is not purely mechanical work, but it’s mechanical enough that I’ve had time to think, now and then, while waiting for the server, about the significance of the two tasks.

Keeping a document that shows the current status quo

One of the best decisions I made when I began to work as an editor on XSD 1.1 was the decision to attempt, always, to ensure that decisions made by the WG during a telcon were reflected before the next WG meeting — and preferably before close of business that same day — in a copy of the spec kept in a stable location. This has turned out, I think, to be a useful technique.

I didn’t always feel this way. In 1997 it irritated me profoundly that Dan Connolly used to complain whenever there were decisions the XML working group had made that he didn’t see reflected in the most recent draft of the XML spec he could find. But while I’m still not sure I think his reasons were sound, I have come to believe that his conclusion was correct. (Maybe I didn’t understand his reasons propertly.)

  • It makes it easier for WG members to see where we are.

It makes it easier for other WGs, too, although other WGs seem unaccountably leery of looking at the current status quo document instead of at public working drafts on the /TR page. I suppose they feel justifiably that there can be such a thing as Too Much Information.

  • It means that WG decisions have visible effects immediately (or, as immediately as is compatible with my having to regenerate the spec and check in the new copies, when tends to take a few hours), which is good for morale.

I at least find it alienating and depressing to work hard making decisions in a WG and find, months later, that the editors have still not gotten around to making the text of the spec reflect those decisions. Experience also shows that if much time elapses between decision and revision of the spec, the editors end up forgetting what the WG decided, or conveniently rewriting it in their memory.

  • Constant re-publication (even if it’s only in the member-only portion of the site) also helps keep the document production system in good trim. Problems get found and fixed sooner, and debugging is simpler: fewer problems around means it’s less likely that one problem will be interacting with or masking another.
  • Constant re-publication (even if it’s only in the member-only portion of the site) also helps keep the document in good trim. If I link-check and validate the status quo regularly, publishing a working draft becomes a simple task instead of requiring herculean efforts to fix HTML- and link-validity errors.

We haven’t always managed to achieve the same-day goal; sometimes the spec gets a week, or two weeks, out of date. And on a couple of occasions when outside circumstances were unfavorable, the Structures spec got as much as four months out of date. Bringing the status-quo documents up to date after that kind of interval was a nightmare — a suitable punishment, perhaps, for letting them get so far out of date in the first place.

Bugzilla as issue tracking system

Keeping the issues list up to date has many of the same advantages, although as a WG we have not been nearly so good about that as about keeping the ‘current editors’ copy’ of the spec up to date. And of course, like many working groups we have trouble providing quick response. Most of the bugs we fixed today were reported months ago, some, well, longer ago than that. We need to do better about that; it would help keep the number of issues from climbing so high in the first place.

And if being slow to get back to the reader has a bad effect on the WG’s relation to the wider community, being slow to update the status of issues, once the WG acts on them, has an even worse effect on the WG’s relation with itself. When the issues list is out of date, you can’t look at any issue without the risk that you’re wasting time re-considering something the WG actually already decided once. Or, just as likely, you risk the WG saying “Wait, we don’t need to look at this one, we did it a few months ago” when what they remember is that it was discussed a few months ago, without remembering that it was not resolved a few months ago.

So I have come to believe it’s very helpful to get the issues list updated right away after a meeting. Or preferably during the meeting, but the XML Schema WG has never quite assimilated that idea.

For a long time, we maintained our issues list in XML, using a series of ad hoc vocabularies. Because they were in XML, it was possible to do a lot of nifty things with them; we had stylesheets to indicate status via foreground and background color, we had a good fit between the data and the information we want to maintain about issues, and so on.

But maintaining the issues list in an XML document on the server meant that only WG members with CVS access could update it. And keeping the list in a single XML document meant, in practice, that only one person at a time could (or maybe I mean would) do so. And that person invariably became a bottleneck.

Bugzilla is not well suited to the task of of issue tracking for a working group developing a spec. It’s designed for tracking software bugs, not design issues or spec defect reports; its workflow doesn’t match what any of my WGs does; its terminology is sub-optimal; its notion of text has all the power and charm of ASCII email (which is particularly grating for WGs working with XML technology: we know how much better things can be with good markup!).

But it’s got a convenient Web interface and more than one WG member can be updating things at a time. And I have come to believe that those two facts trump all others. Someday I’ll get around to designing and implementing an XML-based issue tracking system for working groups; it will have a Web interface and suitable markup, and it will be fun to develop.

In the meantime, we use Bugzilla.

So it was with Bugzilla that I spent most of my afternoon, while make and XSLT and CVS checkins went on in the background.

Marking an issue resolved can take time. Since the status-quo version of the spec is not publicly accessible, whenever the comment came from someone outside the WG who doesn’t have member access to the W3C site it’s necessary to transcribe the wording we finally adopted into the bug record, so the originator of the comment actually has a way to tell whether we got it fixed or not. And if the bug applies both to XSD 1.0 and to XSD 1.1, our decision today resolves it only for the latter, so a new issue needs to be raised for 1.0, otherwise we’ll lose track of it.

Two pieces of advice, then, for those using Bugzilla to maintain issues lists:

  • Learn to use the “Change several bugs at once” feature.
  • Learn to use the “Clone this bug” feature.

Enough said.

There’s a final reason that marking issues resolved can be slow going. Because these are Last-Call comments, the working group is responsible for keeping an audit trail to show that all comments have been dealt with, and to document that each person who raised an issue has been asked explicitly whether the WG’s action on the issue has satisfied their concerns.

If they’re not satisfied, the director of the consortium reviews the question when the time comes to move the spec forward. Woe then to the working group who didn’t make an effort to satisfy those who commented on its drafts. If the chair, or the staff contact, can explain plausibly what point is at issue and what the WG has done to try to accommodate the reader who raised the issue and why it’s not possible to do anything more to resolve the comment without breaking something more important, then the director may well sustain the WG in its decision. I’ve seen that happen.

But when the working group did not in fact try very hard to resolve the comment, and all you have to say is “well, we didn’t want to do that”, or “no, we didn’t want to consider that”, then you’re in for a long afternoon of sharp, skeptical questions and there’s a real possibility that the spec will be sent back to the working group so that more time can be spent resolving the open points of dispute. I’ve seen that happen, too.

It’s hard for some WGs to remember, but the goal is not to achieve consensus within the working group. The goal is to achieve consensus within the Web community as a whole.

And that, in the end, is what the whole exercise is about.

Applescript, so close and yet so far

Saturday, February 2nd, 2008

[2 February 2008]

There are lots of big things on my mind lately: papers due and overdue and long overdue, submissions deadlines coming up, and a long long list of things to fix in the XSD 1.1 spec.

But there are some little things that refuse to stop taking up time and energy.

Years ago, tired of the hassles of trying to synchronize desktop and laptop, I followed the example of my friend Willard McCarty and started using my laptop as my only machine. This has worked pretty well on the whole, though it has saddled me with heavier laptops than some of my friends carry and given me less disk space than I could have had on desktop machines bought for the same price.

But a key part of making this work is having an external keyboard to use at my desk. I use a wave-shaped keyboard from Logitech at my desk, and to make things work as I expect, I use the Mac System Preferences interface to switch the Option and Command keys when I’m using the external keyboard.

Unfortunately, when I’m using the Powerbook’s own keyboard, this system preference must be undone. And then when I return to my desk, I have to switch the keys again.

Changing the relevant keyboard settings takes seven or eight mouse clicks. That gets old. I’d like to automate it; can Applescript help? Yes, it can: the samples include at least one example of scripting a change to the system preferences.

So I spent some time the other day trying to script my task: one script to launch System Preferences, choose Keyboard and Mouse, choose Modifier Keys, switch Command and Option, choose OK, and quit; another to go the other way.

The documentation makes fairly clear that I need to know the names for buttons and subpanes and so on provided by the application, so I can tell Applescript which things to activate. But I seem to be missing a step; I can’t find anything that tells me what names System Preferences gives to its panes. There’s an Open Dictionary option in the Script editor, but the dictionary for System Preferences only tells me that it defines things called panes. It doesn’t tell me — or am I just missing something here? — what IDs those panes have, or how to find out.

At the moment, this task is out of time and is going back to the bottom of the to-do list. But every time I take my machine away from my desk, or bring it back, I’m reminded that I haven’t solved this one yet.