Does XML have a future on the Web?

Earlier this month, the opening session of the XML 2007 conference was devoted to a panel session on the topic “Does XML have a future on the Web?” Doug Crockford (of JSON fame) and Michael Day (of YesLogic) and I talked for a bit, and then the audience pitched in.

(By the way — surely there is something wrong when the top search result for “XML 2007 Boston” is the page on the 2006 conference site that mentions the plans for 2007, instead of a page from the 2007 conference site. Maybe people are beginning to take the winter XML conference for granted, and not linking to it anymore?)

Michael Day began by pointing out that in the earliest plans, the topic for the session had included the word “still”, which had made him wonder: “Did XML ever have a future on the Web?” He rather thought not: XML, he said, was yet another technology originally intended for the browser that ended up on the server, instead. No one serves XML on the Web, he said, and when they try to something as simple as XHTML, it’s not well-formed. (This, of course, is a simplistic caricature of his remarks, which were a lot more nuanced and thoughtful than this. But of course, while he was speaking I was trying to remember what I was supposed to say; these are the bits of his opening that penetrated my skull and stuck.)

Doug Crockford surprised me a bit; from what I had read about JSON and his sometimes fraught relations with XML, I had expected him to answer “No” to the question in the session title. But he began by saying quite firmly that yes, he thought XML had a very long future on the Web. He paused while we chewed on that a moment, before putting in the knife. We know this, he said, because once any technology is deployed, it can take forever to get rid of it again. (You can still buy Cobol compilers, he pointed out.) If I understood him correctly, his view is that XML (or XHTML, or the two together with all their associated technologies) has been a huge distraction for the Web community, and nothing to speak of has been done on HTML or critical Web technologies for several years as a result. We need, he thought, to rebuild the Web from its foundations to improve reliability and security.

It gives me some regret now that I did not interrupt at this moment to point out that XHTML and XForms are precisely an effort (all in all, a pretty good one) to improve the foundations of the Web, but I wasn’t quick enough to think of that then. (I also didn’t think to say that being compared to Grace Murray Hopper, however indirectly and with whatever intention, is surely one of the highest compliments anyone has ever paid me. Thank you, Doug!) And besides, it’s bad form to interrupt other panelists, especially when it’s your turn to speak next.

Since I have cut so short what Michael Day and Doug Crockford said, I ought in perfect fairness to truncate my own remarks just as savagely, so the reader can evaluate what we said on some sort of equal footing. But this is my blog, so to heck with that.

Revised slightly for clarity, my notes for the panel read something like the following (I have added some material in italics, either to reflect extempore additions during the session or to reflect later pentimenti). I’d like to have given some account of the ensuing discussion, as well, but this post is already a bit long; perhaps in a different post.

I agree with Doug Crockford in answering “Yes” to the question, but we have different reasons. I don’t think just that XML has a future because we can’t manage to get rid of it; I think it ought to have a future, because it has some properties it’s hard to find elsewhere.

1 What do we mean by “the Web”?

A lot depends on what we mean by “the Web”. If we mean Web 2.0 Ajax applications, we may get one answer. If we mean the universe of data publicly accessible through HTTP, the answer might be different. But neither of these, in reality, is “the Web”.

If there is a single central idea of the Web, it’s that of a single connected information space that contains all the information we might want to link to — that means, in practice, all the information we care about (or might come to care about in future): not just publicly available resources, but also resources behind my enterprise firewall, or on my personal hard disk. If there is a single technical idea at the center of the Web, it’s not HTTP (important though it is) but the idea of the Uniform Resource Identifier, a single identifier space with distributed responsibility and authority, in which anyone can name things they care about, and use their own names or names provided by others, without fear of name collisions.

Looked at in this way, “the Web” becomes a rough synonym for ‘data we care about’, or ‘the data we process, store, or manage using information technology’. And the question “Does XML have a future on the Web?” becomes another way of asking “Does XML have a future?”

Not all parts of the Web resemble each other closely. In some neighborhoods, rapid development is central, and fashion rules all things. In others, there are large enterprises for whom fashion moves more slowly, if at all. Data quality, fault tolerance, fault detection, reliability, and permanence are crucial in a lot of enterprises.

The Web is for everyone. So a data format for the Web has to have good support for internationalization and accessibility.

Any data format for “the Web” must satisfy a lot of demands beyond loading configuration data or objects in a client-side Javascript program. As Murata Makoto has often said, one reason to be interested in XML is that it offers us the possibility of managing in a single notation data that for a long time we held separately, in databases and in documents, managed by separate tool sets. General-purpose tools are sometimes cumbersome for particular specialized forms of data, but the provision of a common model and notation is a huge win; before I decide to use another specialized notation, I want to think hard about the costs of yet another notation.

I think XML has a future on the Web because it is the only format around that can plausibly be used for such a broad range of different kinds of data.

2 Loose coupling, tight coupling

One of the important technical properties of the Web is that it encourages a relatively loose coupling between parts of the larger system. Because the server and the client communicate through a relatively narrow channel, and because the HTTP server is stateless, client and server can develop independently of each other.

In a typical configuration there are lots of layers, so there are lots of points of flexibility, lots of places where we can intervene to process requests or data in a different way. By and large, the abstractions are not very leaky, so we can change things at one layer without disturbing (very much) things in the adjoining layers.

In information systems, as in physical systems [or so I think — but I am not a mechanical engineer], loose couplings incur a certain efficiency cost, and systems with tighter couplings are often more efficient. But loose coupling turns out to be extremely useful for allowing diverse communities to satisfy diverse needs on the Web. It turns out to be extremely useful in allowing the interchange of information between unlike devices: if the Web had tighter coupling, it would be impossible to provide Web access to new kinds of devices. And, of course, loose coupling turns out to be a good way of allowing a system to evolve and grow.

One of the secrets of loose coupling is not to expose more information between the two partners in information exchange than you want to.

And in this context, some of the notations sometimes offered as alternatives to XML (at least in some contexts) — or for that matter, as uses of XML — have always made me nervous. We’re building a distributed system; we want to exchange information between client and server, while limiting their mutual dependencies, so that we can refactor either side whenever we need to. And you want me to expose my object structures?! Are you out of your mind? In polite company there is such a thing as too much information. And exhibiting my object structures for the world to see is definitely a case of too much information. I don’t want to see yours, and I don’t want you to see mine. Sorry. Let’s stick to the business at hand, and leave my implementation details out of it.

So, second, I think XML has a future on the Web because (for reasons I think are social as much as technical) the discipline of developing XML vocabularies has a pretty good track record as a way of defining interfaces with loose coupling and controlled exposure of information.

3 Publication without lossy down-translation

There were something like two hundred people actively involved in the original design of XML, and among us I wouldn’t be surprised to learn that we had a few hundred, or a few thousand, different goals for XML.

One goal I had, among those many, was to be able to write documents and technical papers and essays in a descriptive vocabulary I found comfortable, and to publish them on the Web without requiring a lossy down-translation into HTML. I made an interesting discovery a while ago, about that goal: we succeeded.

XML documents can now be read, and styled using XSLT, by the large majority of major browsers (IE, Mozilla and friends, Opera, Safari). It’s been months since I had to generate an HTML form of a paper I had written, in order to put it on the Web.

I know XML has a future on the Web because XML makes it easier for publishers to publish rich information and for readers to get richer information. No one who cares about rich information will ever be willing to go back. XML will go away only after you rip it out of my cold, dead hands.

[After the session, Norm Walsh remarked “and once they’re done with your cold dead hands, they’ll also have to pry it out of mine!”]


One reason to think that XML has found broad uptake is the sheer variety of people complaining about XML and the contradictory nature of the problems they see and would like to fix. For some, XML is too complicated and they seek something simpler; for others, XML is too simple, and they want something that supports more complex structures than trees. Some would like less draconian error handling; others would like more restrictive schema languages.

Any language that can accumulate so many different enemies, with such widely different complaints, must be doing something right. Long life to descriptive markup! Long life to XML!