What matters?

C. M. Sperberg-McQueen

Closing keynote

Extreme Markup Languages

Montreal, 9 August 2002

1. Technologies that matter
2. Properties that matter
- 2.1. Descriptive markup
- 2.2. Serial form, data structure, validation
- 2.3. Automating validity checking
- 2.4. Overlap
- 2.5. XML
3. What matters

[This is a lightly revised version of the closing address of the Extreme Markup Languages conference in August 2002. In preparing it for publication on the Web, I have tried to make some passages clearer, I have corrected some errors, and I have added notes here and there where it seemed useful. My thanks to Kate Hamilton of Mulberry Technologies for transcribing the tape (the notes about audience reaction were supplied by her), and to Syd Bauman of Brown University for making the tape. ]

1. Technologies that matter

In an indirect way, my title and topic today were given me by Tim Bray, my friend and co-editor on the XML specification.

A few months ago, a small cyclone of discussion erupted on xml-dev when someone posted a pointer to some slides of a talk Tim had given at an industry conference. In his talk, Tim tried to identify a useful predictor of wide industry uptake for a technology — or, equivalently, a predictor of the likelihood that a typical member of his audience was eventually going to have to learn about a technology or hire someone to deal with it — so that his audience could spend their time learning the things they were going to need to know, and avoid learning things that they weren't actually going to need to know. In the course of his talk, Tim included a chart listing various technologies of which that had proven true (i.e., they had wide or universal uptake in general industrial IT practice, and IT practitioners have had to learn about them) and various technologies of which it had not proven true.

Some people may cavil at his choice of words, but the two columns were headed “What mattered” and “What didn't”.

There was, as you might expect, a certain amount of dismay in some quarters that Tim had XML on the list of things that mattered, and SGML on the list of things that hadn't mattered.

I'm going to take the opposite position here, but I should mention that I don't actually think that Tim and I are in violent disagreement on this. Tim was talking about what ‘matters’ in the sense of what you have to go by when you are placing bets on the future. It is clear that the most important predictor for an IT manager eventually being required to learn about a technology is network effects. In almost any comparison you can think of, if there are two competing technologies, one of which has visible benefits from network effects, and the other of which doesn't, the one with the visible benefits from network effects is the one that's going to win. This is not inherently evil; it's also not inherently good. It does have unambiguous benefits. The network effect provides the payoff which helps induce us as a society to make choices when we need to.

In many cases, it is much more important that a choice be made so that we as a society can benefit from the network effects, than it matters which choice is made. I don't actually much care whether the wire that connects my machine to the power grid here in Montreal is running 110 volts or 109 volts, or 220. What I care about is that my machine and the generators here are working to a prearranged standard, so that they can be in agreement. It's very handy, of course, that there's only a small adaptor needed when I go into 220-volt land. In that sense, the choice between technologies is often of much smaller significance for us as a society than that there be a choice. Networks effects have their role in this and provide many of the benefits.

But if we want to benefit not just from the network effect but also from the advantages of technology, it is in everyone's interest that the network effects cut the right way: that we choose as a society the technologies that work best.

Now, if network effects are the best predictor, then we must infer that the people who actually are responsible for making a good decision are the early adopters. In IT, that means you. You have a responsibility to judge what matters not by network effects but by technical merit. This is a special case of the Categorical Imperative of Immanuel Kant, which you may dimly remember is phrased something like this: “act only on that maxim by which you can at the same time will that it should become a universal law,”[1] but which your mother may have expressed more colloquially as, “What would the world be like if everyone did that?”

It's very easy, in some cases, to evade the Categorical Imperative. I evaded it for years in a number of contexts by saying, “That's an interesting question; but I have empirical evidence that it doesn't matter what I do: the whole world is not going to do the same thing I do. So I'm not entirely certain how relevant the question is to the choice I'm currently facing.” But it's an imperative that we cannot avoid. If we want to be thought-leaders in information technology, we have to make choices based not just on how to place your bets but on how to choose correctly.

In that sense of “what matters”, I think I would give an answer different from Tim's. I've been reading a lot of books about formal logic in connection with the work that Allen Renear and David Dubin were presenting here the other day,[2] so I currently think of this in terms of modal logic, which adds to logic the notions of logically necessary statements — statements not just true but necessarily true — and logically contingent or logically possible statements. In modal logics, these notions are often interpreted in terms of possible worlds.[3]

In that context, and for the definition of the predicate matters which I have in mind, I find it difficult to imagine that there exists any possible world which assigns to the sentence “XML matters” the value true while assigning the value not true to the sentence “SGML matters”. Either they both matter, in the sense I mean, or neither matters. I think they both matter.

I don't mean that there is no difference between SGML and XML. I am very pleased that XML is widely adopted; I was very frustrated that SGML was not widely adopted. And it's clear to me that the difference in uptake between the two has everything to do with the differences between them, and not with the similarities.

But the reasons that it is a good idea for them to be widely adopted have nothing to do with the differences between SGML and XML, and everything to do with the essential characteristics of the languages. I'm happy to be associated with XML, I'm happy to have my name on that spec and to take the undeserved credit I get as a result. But the choice of any technology is a cost/benefit calculation. And the only changes XML made to that calculation were in lowering the costs of deployment, not in adding any benefits — unless you count the the benefits of the network effect, which are, as I have suggested, considerable.

Those of you with long memories will remember the late 1980s and early 1990s when some of our number were already convinced that SGML was suitable as a universal modeling language and said so, loudly and in public. The more rational, or at least more conservative, members of our community would say, “Well ... calm down, Eliot. [Audience laughs.] SGML is very good for what it does, but there are some things for which, even though you could do them in SGML, it would be pointless, it would be dumb. For example, to make a graphics format in SGML — no one would do that. [More laughter.] It would be dumb to make a programming language in SGML; nobody would do that.” [Laughter.] I come to you from teaching a workshop in XSLT in which the home-run demonstration was using XSLT to generate SVG images, and I submit to you that that is a demonstration of the network effect in action.

No one thought that not being in markup would be a disadvantage for a programming language or a graphics language. But when we tried doing them in markup, we discovered a lot of advantages that we had never suspected. That's the network effect at work.

But apart from the network effect, that is is apart from getting *ML widely adopted, the fundamental advantages of markup as represented by XML and SGML were given to the language not by the XML working group (or more precisely by the SGML-on-the-web working group that defined XML), but by ISO/IEC/JTC1/SC18/WG8, the group that created SGML. Those of you WG8 members who are here should be taking credit ... because the advantages of XML rest on the advantages that you gave us in SGML. Ever since that discussion on xml-dev, I have wanted to say this in public: we owe it to you guys.

2. Properties that matter

2.1. Descriptive markup

If SGML and XML matter, we are naturally led to ask: “Why do we think they matter?” There is a very simple answer. First and foremost, SGML and XML matter because they enable descriptive markup. I'm not entirely happy with the term “descriptive markup” because I think that people use it to describe two different things. But it's advantageous in both of those orthogonal dimensions.

First of all, SGML and XML are declarative rather than procedural; this is guaranteed to us by the way the specs are written.

The second axis that we denote by the term “descriptive markup” is not enforced by the specifications and cannot be, but it is true of SGML as a social movement, and that is that it emphasizes the use of logical markup rather than purely presentation- or process-oriented markup. This is not something that can be enforced by the specs, because the specs made one absolutely brilliant move: they steadfastly refused — to the consternation of some of our colleagues in computer science — to identify anything that looks remotely like a semantic primitive. That's not a weakness; that's a strength. There is a reason you could write a graphics language and a programming language in XML, whereas you wouldn't do that in troff. [Laughter.] Some people have done it, sort of, in TeX. [More laughter.] But there's a reason that I don't know very much about that. The advantage for everyone of the refusal of SGML to specify semantic primitives is that it places the responsibility for the quality of our modeling work where it belongs: with us, the users, the developers of systems, and not the designers of the metalanguage or the designers of our software.

I'm very happy that we have seen a lot of papers at this conference which can be grouped under the general rubric, “How can we improve our information modeling?”.[4] People interested in information markup generally will always find it relevant to ask “What is [fill in the name of some datatype] really?” This is perhaps an exception to the rule Paul Prescod proposed yesterday, that says flexibility is not necessarily the way to greater utility.[5] In the right place, flexibility can give us strength. The semantic flexibility of SGML and XML gives them strength.

This is also why the considerations about Saussurian linguistics that Wendell Piez gave us the other day are relevant here:[6] because what matters in SGML and in XML is the ability to say what you mean and mean what you say, and in order to make sure that we are able to do that it's important that we understand how sign systems work. Hence also our perpetual interest in finding ways to describe the ways that semantics work in markup languages. It's easy to define semantics in programming languages like C where there are semantic primitives. It's somewhat harder to define them in a system where the user of the language defines the semantic primitives.

2.2. Serial form, data structure, validation

That's the first, last, but not the only reason for the utility or the rightness of SGML and XML. There's a list of secondary reasons that I think make an enormous amount of difference.

First, SGML gave us a reasonably pleasant syntax — not a perfect syntax in the sense of making everybody happy; I don't think a perfect syntax in the sense of making everybody happy will ever exist; but it doesn't make too many people too unhappy, which is more than some language designers have managed to achieve. It's human-readable, at least for a relevant subset of humanity. It's machine-processable, for a relevant subset of our machines. And it's designed in such a way that it can appeal to the idealist in us as well as to the pragmatist in us. I always tell people, when I teach the basics of SGML and XML, that the processing instruction has various syntactic rules and various semantic dimensions and I always say it has a sort of occult significance as well. The processing instruction is a token: it is your guarantee that this language was not designed by some hare-brained computer scientist who had never had to make a 4pm Fedex deadline. [Laughter.] So: a nice serialization format is one part of this secondary list.

A second one is that, associated with this serialization format is an obvious data structure which makes marked up data convenient to work with. It's inevitable that anyone looking at SGML or XML will say, “That looks like a labeled bracketing, a serialization of a tree. I know what I'll do; I'll build a tree!” (Well, not everybody: the people at Netscape apparently said, “I know what I'll do: I'll build a linked list!” [Laughter.]) I don't know that it's essential that the data structure be a tree, but I know that the close, natural relationship between trees and our surface syntax is important, because it makes it easy to figure out how you're going to process this stuff usefully, internally.

Those two things are important; there's a third thing that is equally important. Both the serialization form and the data structure stand in a natural relation, an inescapable relation, with a well-understood mechanism for validation. This is not just any tree; it is a parse tree for a sentence in the language defined by the grammar expressed in our DTD. If that sentence is in that language, the document is valid. If it's not, the document is not valid. Everything linguists and computer scientists have learned about context-free languages since 1957 comes to bear here.[7] Until we have alternatives, not just for the surface syntax and the data structure, but for validation, all of our thought experiments about other things we could do, other markup languages we could design, will remain what they are now: important, useful, interesting, but just thought experiments. Until we provide validation, a natural data structure, and a readable serial syntax, we don't have anything that can seriously compete with SGML or XML.

Those three things working together help explain why SGML is an outstanding exception to what we might call Arms's Rule — for Bill Arms, the head of the Corporation for National Research Initiatives in Washington, best known to many of us as the former employer of Guido van Rossum, the inventor of Python. Arms once said he had a rule for figuring out what technology was going to matter (in the betting-on-the-future sense) — it was always hard to tell, so the best thing to do, if you could afford it, was to wait five years. You will never need to wait more than five years, according to Arms's Rule, from the time you first hear about a new technology, because within five years either the technology will have succeeded and it will be universal and it will be obvious to you that this is something you have to learn about; or it will have failed and disappeared and it will be obvious that it is not something you have to learn about.[8] So when he heard about SGML in 1986 or 1987, he said, “That's interesting,” and he waited five years. And he found an exception to his rule. SGML hadn't succeeded, in the sense that it hadn't completely dominated its obvious application niche — bearing in mind of course that because of the absence of a fixed predefined semantics the application niche for SGML is information processing, which covers a lot of ground. Even within the obvious or ‘traditional’ areas, SGML was not universally accepted, but it also hadn't disappeared. There were people who had started using SGML, and they were certainly frustrated that the rest of the world hadn't also adopted SGML; but, unlike early adopters of most other technologies which don't achieve universal uptake, they weren't giving up and going home. They were saying, “No, no: we're right, you're wrong, this is better, why should we give it up?” And ten years later, in 1996, it was still approximately the same situation.

There is a fierce loyalty of people who got interested in SGML in the days before XML, and of many people, too, who came the party only with XML, because of these three things working together: the serial form, the data structure, and validation. Personally, I think validation may have been the most important. I like the conceptual clarity of a tree structure; I invested a lot of effort to retrain my thinking, so that instead of thinking about texts as character sequences, I think about them as having more structure than a flat sequence; so in fact I'm a little nervous when Core Range Algebra,[9] or LMNL,[10] suggest we go back to thinking that the real, pure, quintessence of the text is the sequence of characters. I say, “Oh, no, you've lost too much there; you're back to water.” Maybe if character sets were designed in a process that wasn't already too familiar to me from other sources, I would have more faith in their representation of a Platonic ideal of writing systems; but I happen to know how standards committees work, and I'm not willing to buy ISO 646 or ISO 10646 as expressions of anything remotely resembling a Platonic ideal of writing systems. [Laughter.]

So the conceptual clarity of trees is good, but mechanical validation was probably more important. The light in here is not great but I'm sure that most of you can see that I have grey hairs. The reason I have grey hairs is GML. [Shouts of laughter.] And Script. [Sighs.] I wrote macros in Waterloo script; I wrote and maintained macros in GML, that's why I have grey hair. I never tried to write macros in TeX; that's why I have hair. [Shouts of laughter; applause.]

2.3. Automating validity checking

GML and LaTeX were remarkable when they were introduced for making more explicit a notion which had always been part of document processing. Any document-processing system has the notion of a set of acceptable documents, if only negatively defined. To clarify this idea, ask yourself what would happen if the creator of this program were sitting in his office, and a user came in and said, “I don't like this output.” He'd say, “Well let me see your input.” You can always imagine that there are some files for which the programmer is going to say, “Of course you're getting idiotic output: you can't expect the program to do anything useful when you give it input like that.” Garbage in, garbage out: it's a fundamental rule of data processing. What GML and LaTeX did which earlier systems had not done very explicitly was to try to express those rules, to get across the notion that there is an expected way: that there is a set of documents for which this program is going to do the right thing; and conversely there is a set for which you cannot expect it to do the right thing. Those other documents should raise errors, ideally. And if you look at the macros for headings in Waterloo GML, you will find that they make an explicit effort to enforce those rules. First-level sections contain second-level sections contain third-level sections. So in the macro for every heading level in Waterloo GML you set a variable to say “The most recent heading was an H1”, or an H2, or an H3, and so on. In every macro (excepting H0 and H1) you check to see what was the last one used. If the new one is an H4, then the last one has to have been an H3 or an H4; otherwise I raise an error. That's good; it improves all sorts of things. It made the relationship between input and output error easier to understand. But there are an awful lot of things that can go wrong, and if you have to think to check all of them, you will never finish. [In a hushed voice.] They didn't check to see whether you were beginning a chapter inside a footnote. [Laughter.] You get some interesting results when you try it. [Laughter.]

Partly, you may say, that's because the user is going to see that this is a disaster and they will fix it. Partly it's because “You can't expect us to hold the user's hand on everything; they have to have some sense.” Sure. But there are a lot of possibilities, and the user's not going to think of them all. Defining an explicit set of rules, so that you can mechanically check them all — that's what validation brings.

Algol is considered a better programming language than Fortran 1 — even Fortran devotees will accept that Algol 60 was a better language than Fortran 1. Not because you could only write correct programs in it, but because owing to the formal definition of its syntax an entire class of programming errors could now be checked mechanically instead of by eyeball. The use of Fortran led, according to a persistent (but, as it turns out, erroneous) story, to the loss of an Atlas rocket in the early 1960s. Someone failed to type a comma in a do-loop, and therefore the statement was interpreted as an assignment to an eccentrically-named variable that was not present anywhere else in the document — because variables didn't need to be declared, and because whitespace was allowed even in what we would think of as tokens.[11] In Algol 60, the faulty program would have been detected earlier because the compiler would have rejected it as syntactically incorrect.

That's a kind of progress that it would be really painful to give up. Those of you who were paying attention to IT matters in 1999 will remember the Mars Climate Orbiter which was lost owing to a similar kind of problem. It wasn't the kind of problem that could be caught by lexical analysis, but it was a type-checking problem.[12] There are languages now that know how to check that mechanically. If those languages had been in use, we would still be getting data from the Mars lander. The ability to find a large subset of the set of possible errors automatically is the chief reason that no one who ever wrote programs for text processing before SGML will ever go back to a world without formal grammars for documents and formal validation.

2.4. Overlap

Now it's true that in order to get validation, and a natural fit between serialization and data structure, we have given up some things which some will regard as (having been) advantages. Overlap, for instance, was not a problem before SGML. Pre-SGML systems had no trouble encoding what we would refer to as overlapping structures. Of course, those systems and their users didn't think of them as overlapping structures: overlap was not something that you would conveniently describe before SGML, because before SGML the notion that documents had structure was hardly something you could talk about coherently.

Understanding and controlling your data, on the other hand, was a problem. Convenient manipulation of your data using its structural units was a problem, as was defining anything in the nature of an explicit contract between a data source and a data sink. Those were the problems.

If we have to make the choice between a world in which overlap is not a problem but understanding and controlling your data is a problem, and convenient manipulation of your data is a problem, and definition of a contract between a data source and sinks is a problem, and another world where all of those are solved at least as well as we have solved them so far and overlap is a challenge, I know which one I choose. I choose the second one.

Don't underestimate the importance or the interest of overlap: it's an extremely interesting challenge.[13] But I think it's important that we keep in mind that it is an interesting problem because it is the biggest problem remaining in the residue. If we have a set of quantitative observations, and we try to fit a line to them, it is good practice to look systematically at the difference between the values predicted by our equation (our theory) and the values actually observed; the set of these differences is the residue. We look at the residue because if there is a prominent pattern in it, it can tell us something about the data which is not captured by the equation we have fitted to the data. In the context of SGML and XML, overlap is a residual problem. It is a problem which emerged — which allowed us to see it and formulate it — only when we adopted SGML and XML. SGML and XML can in some sense be said to have allowed us to discover overlap, in that they have provided the conceptual framework within which the problem of overlap can be formulated concisely for the first time.

2.5. XML

So far, I have identified advantages that SGML and XML share. In fairness to my colleagues on the XML working group I should then try to answer the question, “Does XML add anything?” Well, it didn't add anything; it didn't add anything at all, and that's part of the point. One thing that Murata Makoto often says that I like a lot is that XML didn't innovate at all. It added nothing new. The only new thing about XML were the things that weren't there. Because we took stuff out, we made it a lot easier to build an actual parser. We made it look trivial. (That's an illusion, by the way: it's not actually trivial. But by the time the programmer discovers that, they're too far in. [Roars of laughter & applause.]) With ten years of practice with ISO 8879 under our belts, we're ready to look at the feature set and pare away some that did not seem to be carrying their weight in terms of complexity. It felt in many ways like a redesign and it was exhilarating for that reason, but it's probably best to point out that in very salient respects it was not a redesign. When considering features and making the choice “Keep it, fix it, lose it”, you get to fix it only if you can fix it in a way that's backward-compatible. If fixing the feature would have meant breaking existing SGML data or SGML parsers, we didn't do that, we felt we couldn't do that. I think that was right, because that was what gave the SGML community the ability to adopt XML and give us our first cadre of early adopters.

The other thing that XML added is something that I'm not entirely certain I can explain. For reasons that I don't pretend to understand fully, once the document-processing community had finished simplifying SGML, to make it easier for us to do our work, we discovered that a lot of other people were interested in it as well. Database-makers suddenly were interested in marked-up documents. This comes I'm sure in part from internal pressures in the history of the database industry, which I can't tell you about. It may come in part from the observation that loose coupling was working very well in some contexts and the guess that it might help in the database context. But it gave us an enormous opportunity which we should not squander. For the first time, we have a wire format which is agreed on both by the people who have created documents for the last few decades and by the people who have been pouring data into SQL databases for the last few decades — accepted by both groups of people as a plausible form for their information. A single format, for all of the information in our organizations. That's the one thing that is newly interesting about XML. It's significant, perhaps, that it is a social fact as well as it is a technical fact.

Murato Mokoto said — and this is what I think of with initial caps as Murata's Observation — “XML is interesting primarily and possibly only because it is a language plausible for use both with data and in documents.” It's ironic, in this context, that RELAX NG should be being embraced by some people as “the way to do it right for documents, and who cares about the database people.” I doubt that that was what Murata-san had in mind; at least, I think not.

3. What matters

There's one other sense in which the question “What matters?” is running through people's minds. I don't think any of us can have failed to ask ourselves that question in the weeks after September 11. It's important to remind yourself from time to time that some things are more important than others. Many of the things we spend our cycles on day-to-day matter a lot to us in the short run, but in 500 years who will remember anything we do or say at this conference, except perhaps one lone historian of technology, or religion [laughter], who infers from obscure references in contemporary sources that it was about this time that there emerged a new computational model that later revolutionized computer science; it was not based on Dijkstra's idea of Cooperating Sequential Processes, nor on Marvin Minsky's Society of Mind. It was based instead on Perry's Processing State of Nature (in which it is not the case that you are what you consume, but that you are what you produce).[14] [Laughter.] Who knows?

Two hundred years ago, there was a great earthquake in Lisbon, Portugal. It shook Europe in much the same way that Europe and America were shaken by September 11. People wondered aloud whether a divine punishment was intended, and if so, exactly which sins it was for. Many unquestioned ideals of the Enlightenment were cast into question; many found themselves doubting the idea that this is the best of all possible worlds. Voltaire scoffed at their ever having believed so in the first place, but you will recall that in the end, he concluded that whether this is the best of all possible worlds or not, it remains the one in which we find ourselves, and the one we have to make the best of. We may not be able to fix it — Voltaire was deeply sceptical of our ability to fix it, overall, since the designer responsible for it does not always seem to be interested in our suggestions for improving it, or at least doesn't answer email [laughter]. But maybe we can find one corner of it to cultivate and improve. Growing a garden may not improve the world overall, but it does satisfy the Categorical Imperative: “What would the world be like if everyone cultivated gardens?” and it keeps our hands busy and keeps us out of trouble.

Similarly, worrying about the use of relative URIs as namespace names, or about the proper use of the HTTP GET and POST, or about the topic naming convention and when it may safely be relaxed, or about the best way to deal with overlapping elements or concurrent hierarchies — all of this may not save the world from terrorism,[15] or from secret police tactics in the name of national security, or from an accidental ignition of rocket fuel in a Tralfamadorian test range.[16] But it may help us run our organizations more efficiently; it may help us preserve a little bit more of our cultural heritage in a form accessible to those who will come after us; it may help us in our endeavor to acquire and manipulate information, so that we may achieve knowledge, in the hopes that we will perhaps in the end receive wisdom.

If the decades that I have spent learning to use computers and learning how to apply markup languages to philology help others to produce better editions of Joyce, or Dante, or Chaucer, or Wolfram von Eschenbach, then — well, if I had my choice I would rather have been able to work on those editions myself. But since I can't, if what I do can help those editions, then my work will have been amply repaid.

In a novel I read many times in my youth, an aging professor of music says:[17]

To be honest, I for one have never in my life said a word to my students about the ‘meaning’ of music; if there is one, it has no need of me. On the other hand, I have always laid great stress upon their counting their eighth and sixteenth notes very precisely. Whether you become a teacher, or a scholar, or a musician, regard meaning with reverence, but do not mistake it for something that can be taught.

This has always stuck in my mind, perhaps because as an adolescent I thought it was really profound, perhaps because it is profound. I realized last night, thinking about it, that it is nothing but a literary echo of what may be the most famous sentence in 20th-century philosophy, the end of Wittgenstein's Tractatus, where he says, “The things we cannot talk about, we must pass over in silence.”

It is a good thing that from time to time we gather in conferences like this one to argue about whether milestone elements are a sound engineering solution to a problem or just a metaphysically ugly hack which damages orthogonality and (what is much, much worse) involves us in lying to the processor. It's good that we should investigate whether in practice validation really lives up to its promise. It's good that we should meditate on whether in such-and-such a circumstance our content models should properly be written with an AND or an OR connector.[18]

It is not that these things are so important in themselves — although Lord knows I think they are important, or else I don't think I would be here. It is that through them that we can approach some of the things about which we cannot usefully speak in words, the things which really matter.