Even imperfect technologies …

[23 August 2010]

The TEI has published a list of workshops to be offered at the TEI Members’ Meeting this November in Zadar, Croatia.

Together with Syd Bauman of Brown University, I’m offering two tutorial workshops: one on XForms and one on XQuery. Each will last a day and a half, and involve some talking heads, some group discussion, and as much hands-on work as we can manage.

There are several other very good workshops on offer: Norm Walsh on XProc, the TEI@Oxford team on the ODD system, Elena Pierazzo and Malte Rehbein on the encoding of genetic editions, and Andreas Witt et al. on TEI for transcriptions of speech.

The organizers remind me that there is an early-bird discount for those who register before 31 August. There is some chance that tutorials which fail to attract enough participants will be canceled if they don’t get enough registration, so if you definitely want to come, you definitely want to register early, to help make sure your tutorial has enough registrants to make the cut.

[17 November 2010, Brussels]

“Even imperfect technologies can change the world.”
-Hans Uszkoreit, at the META-FORUM meeting in Brussels today

XForms and XQuery tutorials at TEI members’ meeting

[23 August 2010]

The TEI has published a list of workshops to be offered at the TEI Members’ Meeting this November in Zadar, Croatia.

Together with Syd Bauman of Brown University, I’m offering two tutorial workshops: one on XForms and one on XQuery. Each will last a day and a half, and involve some talking heads, some group discussion, and as much hands-on work as we can manage.

There are several other very good workshops on offer: Norm Walsh on XProc, the TEI@Oxford team on the ODD system, Elena Pierazzo and Malte Rehbein on the encoding of genetic editions, and Andreas Witt et al. on TEI for transcriptions of speech.

The organizers remind me that there is an early-bird discount for those who register before 31 August. There is some chance that tutorials which fail to attract enough participants will be canceled if they don’t get enough registration, so if you definitely want to come, you definitely want to register early, to help make sure your tutorial has enough registrants to make the cut.

Counting down to Balisage paper submission deadline

[6 April 2010; addenda 7 April 2010]

After discovering earlier this year that the definition of the XPath 1.0 data model falls short of the goal of guaranteeing the desired properties to all instances of the data model, I’ve been spending some time experimenting with alternative definitions, trying to see what must be specified a priori and what properties can be left to follow from others.

It’s no particular surprise that the data model can be defined in a variety of different ways. I’ve worked out three with a certain degree of precision. Here is one, which is not the usual way of defining things. For simplicity, it ignores attributes and namespace nodes; it’s easy enough to add them in once the foundations are a bit firmer.

Assume a non-empty finite set S and two binary relations R and Q on S, with the following properties [Some constraints are shown here as deleted: they were included in the first version of this list but later proved to be redundant; see below] :

  1. R is functional, acyclic, and injective (i.e. for any x and y, R(x) = R(y) implies x = y).
  2. There is exactly one member of S which is not in the domain of R (i.e. R(e) has no value), and exactly one which is not in the range of R (i.e. there is one element e such that for no element f do we have e = R(f)).
  3. Q is transitive and acyclic.
  4. The transitive reduction of Q is functional and injective.
  5. It will be observed that R essentially places the elements of S in a sequence without duplicates. For all elements e, f, g, h of S, if Q includes the pairs (e, f) and (g, h) and if g falls between e and f in the sequence defined by R (or, more formally, if the transitive closure of R contains the pairs (e, f), (e, g), and (g, f)), then h also falls between e and f in that sequence.
  6. The transitive closure of the inverse of R (i.e. R-1*) contains Q as a subset.
  7. The single element of S which is not in the domain of R is also neither in the domain nor the range of Q.

It turns out that if we have any relations R and Q defined on some set S, then we have an instance of the XPath 1.0 data model. The nodes in the model instance, the axes defining their interrelations, and so on can all be defined in terms of S, R, and Q.

[It also turns out that several of the constraints included in the list above are redundant. The fact that relation R is functional and injective, for example, follows from the others shown. Actually it follows from a subset of them. The deletions above show one way of reducing the number of a priori constraints: they all follow from the others and can be dropped. None of the remaining items follows from the others; if any of them are deleted, the constraints no longer suffice to ensure the properties required by XPath.]

For the moment, I’ll leave the details as an exercise for the reader. (I also realize, as I’m about to click “Publish”, that I have not actually checked to see whether the set of constraints given above is minimal. I started with a short list and added constraints until S, R, and Q sufficed to determine a unique data model instance, but I have not checked to see whether any of the later additions rendered any of the earlier additions unnecessary. So points for any reader who identifies redundant constraints in the list given above.)
[9 April 2010]

Just a week to go before paper submissions for Balisage 2010 are due. Time for the procrastinators, the delayers, the mañana-sayers to buckle down and get their papers written and submitted.

Speaking of which, I have some work to do now K thx bye.

XML Prague is over

[14 March 2010]

XML Prague took place yesterday and today, and I’m still coming down from the adrenaline high. Lots of good talks, in a single-tracked conference; I had the task of trying to knit them all together in the closing.

I’ll do a fuller trip report later, if working on my taxes doesn’t prevent it. But Norm Walsh has already asked that I post at least the last bit of my remarks.

[Readers who did not attend the conference or follow the streaming video need to know that in the opening sessions of the two conference days, Tony Graham and Sharon Adler had each referred at critical moments to the book of Genesis. One of Sharon’s slides bore the title “In the beginning was SGML,”, and Tony began his talk on Saturday with an extended reference to the book of Genesis: “In the beginning was the page. And the page was without form and void, …” Sharon wondered aloud whether people involved with descriptive markup all have God complexes; she may have something there. The text below also has some other references to things said at the conference, but I think for now I’ll spare readers the detailed annotation necessary to explain them all. Apologies in advance to any readers made uncomfortable by parodies of scripture.]

At the end of my talk, I described wandering through the streets of Prague, trying to find my way from the Strahov Monastery, where the conference dinner was held, back to my hotel, when suddenly I had a vision — one might almost say, a revelation.

And I saw in the right hand of him that sat on the throne an XML document canonicalized and serialized with EXI, containing seven pi-trees encrypted with seven public-key encryption key pairs.

And I saw a strong angel proclaiming with a loud voice, Who is worthy to parse the XML document, and to decrypt the encryption thereof?

And no one in heaven, nor in earth, neither under the earth, was able to parse the XML document, neither by buffering it in memory nor by processing it in streaming mode.

And I wept much, because no man was found worthy to parse and to process the XML document, neither to exploit its vocabulary-specific semantics nor to perform vocabulary-independent pretty-printing thereon.

And one of the elders saith unto me, Weep not: behold, the standards-compliant XML application, with support for C14n and EXI, for XML encryption and schema validation, and for interoperable stylesheet technologies like XSLT 17.2 and XSL FO 42.0, hath prevailed to open the XML document, and to decrypt the seven encryptions thereof, and to display it coherently, yea even for those who abhor the sight of angle brackets and prefer beautiful well-formatted text with tasteful images.

And every creature which is in heaven, and on the earth, and under the earth, and such as are in the sea, and all that are in them, heard I saying, Blessing, and honour, and glory, and power, be unto those that preserve data, and provide access to information, if not for ever and ever, then at least for the foreseeable future.

And the presentations, and the coffee breaks, were the second day.

And the nine conference organizers said, Amen. And the representatives of the two gold sponsors, and the four silver sponsors, and the four bronze sponsors, and the five media partners, and the three sister events, stood up and said Amen. And the one hundred and forty-four conference particpants clapped their hands and thanked the conference organizers for a great conference focusing on information that shall outlive the applications which create and process it.

XML for the Long Haul

[23 February 2010]

The organizing committee for Balisage 2010 have announced the topic of this year’s one-day pre-conference symposium: “XML for the Long Haul: Issues in the Long-term preservation of XML,” and have issued a call for participation. The basic question is roughly this: what do we need to do to make sure that XML-encoded data is usable for long periods? Descriptive markup was invented by people who needed and desired longevity, application independence, and device independence for their data, so longevity is often used as a selling point for the use of SGML and XML. And as sometimes happens with selling points, the precise nature of the relation between XML and long-lived data is sometimes obscured, to the point where some potential customers may believe they have been told that the use of XML in itself guarantees data longevity. And maybe they have, but not (I should think) by anyone who knows what they are talking about. The use of XML, or more generally descriptive markup, may be a necessary condition of data longevity, but it’s unlikely to be sufficient, just as a hammer may be necessary (or extremely helpful) in getting a nail driven, but buying a hammer does not by itself get the nail into the wall.

There’s a lot to be said about the facets and ramifications of the topic, but I think I’ll save those for later posts. For now, I’ll just say that I’ll be chairing the symposium this year, and I hope to see readers of this blog in Montréal in August.

[My evil twin Enrique had been tugging my elbow for some time, and now asked “So why is the logo a moving truck? Will non-native speakers of English understand the reference?” I don’t know (but if you do, I’m interested to learn: native speakers of other languages, please speak up! Does the logo make sense outside of English?), but I can at least explain. The English phrase “long haul” refers most literally to long distances, especially for the transport of freight or people (as in “long-haul flights” and “long-haul trucking”). In an extended sense (originally metaphorical, I guess) it denotes a protracted or difficult task (“we’re in it for the long haul”) or an extended period of time. Long-term preservation of data and meaning involves a long haul both in the sense of being a difficult task and of involving long period of time. “Oh,” said Enrique. “I get it! The logo is a truck used for long-haul freight transport, the way XML may be used for long-haul preservation of information. Don’t you think you should explain that somewhere?” “Maybe,” I said. Maybe I should.