International versioning symposium (Balisage report 1)

[11 August 2008]

The Balisage 2008 conference begins tomorrow. Today, there is a one-day pre-conference symposium on Versioning XML Vocabularies and Systems. So far (this version of this post is being written just after lunch), I’ve learned a lot.

Peter Brown of Pensive spoke about the nature of versions (and thus of the versioning problem) as social constructs. Plato and Aristotle thought of reality as having natural joints, and of a good conceptual apparatus as cutting reality at those joints; that would suggest that there is typically some natural distinction between x (for any x) and not-x. Nowadays, Brown suggested, things look different. A look at French and American butchers’ charts shows that even the cuts of meat are culturally variable. But if even real joints don’t necessarily determine how the cow is cut, how much less will the joints of reality (as we understand reality) fully determine an ontology. Why, Brown asked, do we identify some things as versions of other things? What is the point of the identification; what is particular about the points at which the term is applied? Are versions used for identifying particular content? for tracking evolution? for tracking differences? for identifying canonical forms of variable material? The answer, essentially, is “yes”: versions are used for all of the above. That is to say, our use of the term version is culturally and functionally determined. He offered a number of useful bits of advice, most saliently perhaps the advice: “If anyone mentions versioning, assume nothing! In particular, don’t assume that you know what they mean.”

David Orchard talked about the fundamental problems of versioning from a down to earth point of view, with particular emphasis on the issues of forwards and backwards compatibility, their definition in terms of the sets of documents produced and accepted by various agents, and their effect on deployment of protocols and software. He talked about specific things vocabulary designers can do to make it easier to extend and evolve their languages.

A couple of points made in the discussion after David’s talk may be worth recording. He had pointed to the difficulty some schema authors experience in writing a schema which allows extension in all the right places, while still making clear what sort of messages are actually expected in the absence of extension; there is no law, however, that says a language or a protocol spec must be defined by a single schema. If you want producers of v.1 messages to produce only a particular kind of messages, and v.1 consumers to accept all of those messages, but also a larger set of messages (so that they will react calmly when confronted with v.2 messages), then you could do worse than to define two schemas, one for the producer language and one for the acceptor language.

Marc de Graauw picked up the notion of compatibility, and the idea of defining compatibility in terms of set theory, and gave a thorough exposition of how they interrelate. He gave short shrift to the idea of defining compatibility in terms of a formal logic like (for example) OWL; it’s way too complicated to contemplate a serious logical definition of something as complicated as a real-life vocabulary like HL7 v3. Also, definitions like that are hard to reconcile with the idea of consumers or producers of messages as black boxes. Instead, he suggested defining the semantics of a vocabulary in terms of behavior. His slides pushed Venn diagrams for message sets just about as far as they can go.

Laurel Shifrin of Lexis / Nexis gave the first of a series of case studies, describing the various mechanisms Lexis / Nexis has adopted to try to manage vocabulary change in an extremely heterogeneous and decentralized organization. Answer: it’s not easy, and you really do benefit from executive buy-in. (Again, there was a lot more; I should try to elaborate this description further, later on.)

Debbie Lapeyre spoke about a vocabulary for (mostly biomedical) journal content used by the U.S. National Library of Medicine. The perceived stability and backwards-compatibility guarantees of the vocabulary were among the key reasons for the success of the vocabulary, and its wide adoption among a larger community. But over time, patience with the shortcomings and outright errors in the vocabulary waned, and even the user community was asking for a revision which would include incompatible changes. I had the impression that Debbie was a little surprised that the members of the responsible group were not lynched, but that she was glad that they weren’t, and that the new version can repair some problems she has had on her conscience as a vocabulary designer for a while.

Ken Holman talked about the way versioning problems are handled in UBL (the universal business language). UBL has moved to a two-layer validation story: the only normative definition of UBL documents is given by the XSD schema documents, and business rules are enforced in a separate business-rule validation step (rather than customizing local versions of the schema documents to incorporate business rules). Perhaps the most surprising restriction, for some auditors, may be the UBL policy that no their schema documents will never use xsd:choice constructions. Life is made harder by the fact that many consumers of UBL documents appear (by Ken’s account) to use systems which simply will not or cannot show their users invalid documents. It’s hard to make informed decisions about how to recover from a problem, if you’re using a system that won’t show you the problematic data. Fortunately, UBL has some ingenious people.

Digital Humanities 2008

After two days of drizzle and gray skies, the sun came out on Saturday to make the last day of Digital Humanities 2008 memorable and ensure that the participants all remember Finland and Oulu as beautiful and not (only) as gray and wet and chilly. Watching the sun set over the water, a few minutes before midnight, by gliding very slowly sideways beneath the horizon, gave me great satisfaction.

The Digital Humanities conference is the successor to the annual joint conference of the Association for Computers and the Humanities (ACH) and the Association for Literary and Linguistic Computing (ALLC), now organized by the umbrella organization they have founded, which in a bit of nomenclature worthy of Garrison Keillor is called the Association of Digital Humanities Organizations.

There were a lot of good papers this year, and I don’t have time to go through them all here, since I’m supposed to be getting ready to catch the airport bus. So I hope to do a sort of fragmented trip report in the form of followup posts on a number of projects and topics that caught my eye. A full-text XML search engine I had never heard of before (TauRo, from the Scuola Normale Superiore in Pisa), bibliographic software from Brown, and a whole long series of digital editions and databases are what jump to my mind now, in my haste. The attendance was better than I had expected, and confirmed what some have long suspected: Kings College London has become the 800-pound gorilla of humanities computing. Ten percent of the attendees had Kings affiliations, there was an endless series of reports on intelligently conceived and deftly executed projects from Kings, and Kings delegates seemed to play a disproportionately large role in the posing of incisive questions and in the interesting parts of discussions. There were plenty of good projects done elsewhere, too, but what Harold Short and his colleagues have done at Kings is really remarkable — someone interested in how institutions are built up to eminence (whether as a study in organizational management or because they want to build up some organization) should really do a study of how they have gone about it.

As local organizer, Lisa-Lena Opas-Hänninen has done an amazing job, and Espen Ore’s program committee deserves credit for a memorable program. Next year’s organizers at the University of Maryland in College Park have a tough act to follow.

Balisage preliminary program

Balisage 2008 logo

The Balisage 2008 program committee met this week, and we now have a preliminary program. (As I write, the schedule-at-a-glance isn’t up yet, but it really should be up any moment now.)

We’ve got a good range of topics, with talks on resource-oriented architectures and a framework for managing constraints and how markup relates to the philosophical problem of universals and how to handle overlap in a format that makes good use of the non-overlapping properties of XML. And a parallel algorithm for XML parsing and validation that exploits multiple-core machines, and an XML Exchange for messaging that uses XQuery to route messages, and structural metadata as a socio-technological problem and using the Atom Publishing Protocol to build dynamic applications.

The day before Balisage begins, there will be a one-day symposium on versioning issues, which has also shaped up very nicely. (More on that in another post.)

Great topics, great people (I’ve heard a lot of these speakers, and I know they’re good), and Montreal in August. What more could you want?

So think hard about being in Montreal the week of 11-15 August, for the versioning symposium and for Balisage 2008. I look forward to seeing people there!

Balisage — the paper deadline nears

Two weeks to go before the Balisage paper submission deadline. That’s far enough away that with a little luck and some late nights, you can still get a draft paper put together that is complete enough for review.

But it’s close enough that you have to start now. The Balisage review process has a very tight schedule and long extensions are not an option.

(Enrique interrupts me at this point to remind me that not everyone in the world — not even every reader of this blog — will necessarily know what Balisage is. He’s right.)

Balisage is a new conference on markup and its applications, organized by a group which has, in past years, worked together as the organizing committee for Extreme Markup Languages and a variety of other conferences organized by IDEAlliance (and the Graphic Communications Association) before that. It will take place this August in Montréal. To be very specific, it will take place 12-15 August, with a pre-conference symposium on 11 August.

Any topic related to the theory or practice of markup (preferably both! there is nothing as practical as a good theory! there is nothing so theoretically fruitful as thoughtful practice!) is in scope. We are hoping for papers on vocabulary design, XQuery, XSLT, SGML, XML, alternatives to XML, overlapping hierarchies, general problems of knowledge engineering and information management, topic maps, OWL, RDF, UBL, validation, the futility of validation, the utility of validation despite its futility, using markup for graphics, SVG, XSL-FO, ontologies, content management, markup theory, information structure, and more. Theory, case studies, developer reports, all are in scope.

Last year, the day before Extreme Markup Languages we organized a one-day symposium on the handling of overlapping structures, which was hugely informative. This year, there will be a one-day symposium on versioning, covering any and all issues relating to revision, variation, and managing change in XML documents and vocabularies. I confidently expect to learn a lot, and I hope you will, too. But for that to happen, you have to attend.

I’m one of the organizers, so I have a vested interest in making you, dear Reader, think that Balisage is an interesting and informative conference to attend, and a cool place to be, and a wonderful place to get your interesting new idea heard by smart people who care and know about markup.

So don’t take my word for it: check out the Web site and ask around, and decide for yourself. More info at http://www.balisage.net.

I hope to hear from you soon.