XML for the Long Haul

[23 February 2010]

The organizing committee for Balisage 2010 have announced the topic of this year’s one-day pre-conference symposium: “XML for the Long Haul: Issues in the Long-term preservation of XML,” and have issued a call for participation. The basic question is roughly this: what do we need to do to make sure that XML-encoded data is usable for long periods? Descriptive markup was invented by people who needed and desired longevity, application independence, and device independence for their data, so longevity is often used as a selling point for the use of SGML and XML. And as sometimes happens with selling points, the precise nature of the relation between XML and long-lived data is sometimes obscured, to the point where some potential customers may believe they have been told that the use of XML in itself guarantees data longevity. And maybe they have, but not (I should think) by anyone who knows what they are talking about. The use of XML, or more generally descriptive markup, may be a necessary condition of data longevity, but it’s unlikely to be sufficient, just as a hammer may be necessary (or extremely helpful) in getting a nail driven, but buying a hammer does not by itself get the nail into the wall.

There’s a lot to be said about the facets and ramifications of the topic, but I think I’ll save those for later posts. For now, I’ll just say that I’ll be chairing the symposium this year, and I hope to see readers of this blog in Montréal in August.

[My evil twin Enrique had been tugging my elbow for some time, and now asked “So why is the logo a moving truck? Will non-native speakers of English understand the reference?” I don't know (but if you do, I'm interested to learn: native speakers of other languages, please speak up! Does the logo make sense outside of English?), but I can at least explain. The English phrase “long haul” refers most literally to long distances, especially for the transport of freight or people (as in “long-haul flights” and “long-haul trucking”). In an extended sense (originally metaphorical, I guess) it denotes a protracted or difficult task (“we're in it for the long haul”) or an extended period of time. Long-term preservation of data and meaning involves a long haul both in the sense of being a difficult task and of involving long period of time. “Oh,” said Enrique. “I get it! The logo is a truck used for long-haul freight transport, the way XML may be used for long-haul preservation of information. Don't you think you should explain that somewhere?” “Maybe,” I said. Maybe I should.

ACH and ALLC co-sponsoring Balisage

[12 February 2010]

The Association for Computers and the Humanities and the Association for Literary and Linguistic Computing have now signed on as co-sponsors of the Balisage conference held each year in August in Montréal. They join a number of other co-sponsors who also deserve praise and thanks, but I’m particularly happy about ACH and ALLC because they have provided such an important part of my intellectual home over the years.

Balisage will take place Tuesday through Friday, 3-6 August, this year; on Monday 2 August there will be a one-day pre-conference symposium on a topic to be announced real soon now. It’s a conference for anyone interested in descriptive markup, information preservation, access to and management of information, accessibility, device independence, data reuse — any of the things that descriptive markup helps enable. The deadline for peer review applications is 19 March; the deadline for papers is 16 April. Time to start thinking about what you’re going to write up; you don’t want to be caught up short at the last minute, without time to work out your idea properly.

Mark your calendars!

Metadata and search – a concrete example

[18 August 2009]

Here’s a concrete example of the difference between the metadata-aware search we would like to have, and the metadata-oblivious full-text search we mostly have today, encountered the other day at the Balisage 2009 conference in Montréal.

Try to find a video of the song “I don’t want to go to Toronto”, by a group called Radio Free Vestibule.

When I search video.google.com for “I don’t want to go to Toronto”, I get, in first place, a song called “I don’t want to go”, performed live in Toronto. When I put quotation marks around the title, it tells me nothing matches and shows me a video of Elvis Costello singing “I don’t want to go to Chelsea”.

It’s always good to have concrete examples, and I always like real ones better than made-up examples. (Real examples do often have a disconcerting habit of bringing in one complication after another and involving more than one problem, which is why good ones are so hard to find. But I don’t see many extraneous complications in this one.)