Day of the dead 2011

7 November 2011

Last week’s celebration of the Day of the Dead (aka All Souls’ Day, 2 November) was a little more thoughtful for me than it is in some years. Partly this was because John McCarthy had just died, and partly because this year seems to have taken an unusually high toll in people whose work I have had occasion to value.

News of McCarthy’s death came through when I was on the phone with John Cowan and my brother Roger Sperberg. We paused for a few moments, and then we spent half an hour thinking about technical topics, which seemed like a good way to mark the occasion. (For example: if the original plan was for Lisp programs to be written not in S-expressions but in an Algol-like syntax called M-expressions, is that a sign that McCarthy was less far-sighted than he might have been? How could he not have seen the importance of the idea that Lisp data and Lisp programs should use the same primitive data structures? Perhaps he had feet of clay, so to speak? Or on the contrary should we infer, from the fact that the plan for M-expressions was abandoned and that Lisp became what it became, that McCarthy was astute enough to recognize great ideas when he saw them, and nimble enough to change his plans to capture them? On the whole, I guess I lean toward the latter view.)

This year, Father Roberto Busa also died. Many people (including me) regard him as the founder of the field of digital humanities, because of his work, beginning in 1948, on a machine-readable text of the work of Thomas Aquinas. The Index Thomisticus was completed in 1978, several IT revolutions later. Busa, too, was astute enough to adjust his plans in mid-project: his initial plans involved clever use of punched cards and sorters, and it was only after the project had been going for some years that it began to use computers instead of unit-record equipment. I met Busa only briefly, once as a young man at my first job in humanities computing, and once years later when I chaired the committee which voted to award him what became the Busa Award for contributions to the application of information technology to humanistic scholarship. But he made a strong impression on me with his sweetness of temper and his intelligence. He made an even stronger impression on me indirectly: Antonio Zampolli worked with Busa as a student. And without Antonio, I think my life would have had a rather different shape.

Oh, well. Nobody gets out of here alive, anyway.

Another reason to use the microphone

[Hamburg, 29 September 2011]

Every now and then conference speakers want to avoid using a microphone; they dislike the introduction of technology into the speaker/audience relation, perhaps, and sometimes they are so confident of their ability to be heard in the room that any suggestion that they might use a mike is almost an affront to their lung power. (Are these last class of speaker always male? Well, usually, I think.)

I have been told on good authority that users of hearing aids benefit a good deal from amplification of the speaker’s voice; that’s a good reason to use the microphone.

But sitting here listening to a very interesting speaker who is completely ignoring the microphone, I am reminded of a different reason: for purposes of speaker amplification, non-native speakers are effectively hard of hearing. When the speaker strays into range of the podium’s microphone and happens to be facing the audience, I can understand every word he says; when he faces away from the audience or wanders over to the side of the room, I am missing at least every fifth word, which makes the talk into a kind of aural cloze test. That’s OK for me (I pass the test, more or less, though I missed that nice joke everyone else laughed at). But for my neighbor (for whom German is not a second but a fourth or fifth language), the experience is clearly a real trial.

If you are attending an international conference and want to be understood by people who are not native speakers of your language, then there is a simple piece of advice:

Use the microphone.

Enough said.

XQuery in the cloud

[10 August 2011]

Recently I had occasion to build a small web application (feedback forms for the Balisage conference) using XForms. I used XForms since XForms delivers the information from the user in an XML document, which makes it easier for me to work with the data later. As an experiment, I developed the app using Sausalito, the XQuery engine in the cloud developed by 28msec. Quick summary: Cool! Thumbs UP!

[Obligatory hand-waving and disclaimer: Sausalito is not the only way to deploy XQuery in the cloud: MarkLogic has defined Amazon machine instances with MarkLogic Server pre-installed, and I’m sure there are, or will be, other options as well. I will continue to make a point of working with as many different XQuery implementations as I can, just to know what’s out there. But I had a lot of fun with Sausalito, and if you have a use for a Web-based XML application, Sausalito is definitely worth a look.]

The basic structure of a Sausalito project is fairly straightforward, and well documented on their site: the URIs you want to serve are matched either against static resources in a public subdirectory of the project, or against a directory of XQuery modules containing handlers for requests. For example, in the Balisage feedback application, the URI /reviews/single is handled by the single() function in the module reviews.xq; it can call library functions defined elsewhere. Sausalito has all the functions usual in XQuery, and also some fairly extensive libraries of things you may want for web applications (to query aspects of the incoming HTTP request, for example, and to set properties in the response). They have an Eclipse-based IDE that’s reasonably nice (though I still missed Emacs from time to time), and also a command-line interface (so I can shift to that and use Emacs, if I want to).

Unsurprisingly, I found it very pleasant to be able to write the core of the application in XQuery, with no Javascript, returning XML to the browser and using XSLT and CSS to render it there. What did surprise me a little, because I had not expected it, was the exhilarating speed with which I was able to move from idea to deployed application. I’ve deployed XForms applications on the Web before, and I have an eight-point checklist for setting up a WebDAV server using Subversion and Apache. It’s not particularly difficult or strenuous, but it’s tedious and takes few hours each time I have to do it. And developing the checklist was very painful; it took a long time to find configurations that worked for me, in the environment provided by my service providers.

The developer configures a collection of documents in Sausalito by declaring the collection:

declare ordered collection my:docs as node()*;

Then they deploy the application. And it’s … just … there. Instant gratification, or as close to instant as your network latency and bandwidth will allow.

As I wrote to the developers at 28msec:

I’m … very taken with the convenience of deploying to the cloud; having an XML database on demand is a lot like having running water on demand — those who have never had it may think it’s a luxury anyone should be able to live without, but once you’ve had it, it can be hard to go back.

Vocabulary specialization and generalization, use and interchange: things our schema languages should make easier than they do

[23 January 2011]

Perhaps it’s because the call for participation in the 2011 Balisage Pre-conference Symposium on XML Document Interchange has just come out, or perhaps it’s for other reasons, but I found myself thinking today about the problem of specialized uses of generalized vocabularies.

There are lots of vocabularies defined in fairly general terms: HTML, TEI, DocBook, the NLM article tag set, you can surely think of plenty yourself.

Often, for a specific purpose in a specific organization or project, it would be handy to have a much tighter, much more specific vocabulary (and thus one that’s semantically richer, easier to process, and easier to validate tightly). For example, consider writing and managing an issues list (or a list of use cases, or any other list containing items of a specialized genre), in a generic vocabulary. It’s easy enough: you just have a section for each issue and with that section you have standard sections on where the issue came from, what part of the project it relates to, its current status, and the history of your work on it. Easy enough. And if you’re the kind of person who write macros in whatever editor you use, you can write a macro to set up a new issue by adding a section of type ‘issue’ with subsections with appropriate types and headings. But isn’t that precisely what a markup-aware editor typically does? Well, yes, typically: any schema-aware editor can look at the schema, and as soon as you say “add a new issue” they can populate it with all of the required subelements. Or, they could, if you had an element type called ‘issue’, with appropriately named sub-elements. If instead you are using a generic ‘div’ element, your editor is going to have a hard time helping you, because you haven’t said what you really mean. You want an issue, but what you’ve said is ‘add a div’.

Some schemas, and some schema languages, try to make this easier by allowing you to say, essentially, that an issue element is kind of div, and that the content model for issue is a specialization of that for div (and so on). This is better than nothing, but I’m probably not the only person who fails to use these facilities in all the cases where they would be helpful. And when I do, I have to extend the standard stylesheets for my generic vocabulary to handle my new elements, because even when the stylesheet language supports the specialization mechanisms of the schema language (as XSLT 2.0 supports element substitution groups in XSD), most stylesheets are not written to take advantage of it. And if I’m exchanging documents with someone else, they may or may not want to have to deal with my extensions to the schema.

I wonder if we might get a better answer if (a) in our schema languages it were as easy to write a rule for div type='issue' as for issue, and (b) in our validation tools it were as easy to apply multiple grammars to a document as a single grammar, and to specify that the class of documents we are interested in is given by the intersection of the grammars, or by their union, or (for grammars A, B, C) by A ∪ (B ∩ ¬ C). Also (c) if for any schema extension mechanism it were easy to generate a transformation to take documents in the extended schema into the base schema, and vice versa.

Perhaps NVDL may be in a position to help with (b), though I’ve never learned it well enough to know and it seems to be more heavily freighted with unacknowledged assumptions about schema languages and validation than I’d like.

And perhaps Relax NG already can handle both (a) and (b).

Homework to do.

Giants and the KISS principle

[12 January 2011]

From time to time, my evil twin Enrique runs across passages in literature that seem to him to provide useful illustrations of important principles in information management. I don’t know, maybe he’s saving them up for the next time he teaches a class or something.

The other day, he came by and pointed me to a passage in J.K. Rowling’s Harry Potter and the Order of the Phoenix ([New York]: Scholastic; London: Bloomsbury, 2003):

“In any case, giants … — overload ’em with information an’ they’ll kill yeh jus’ to simplify things.”

“That’s nice,” I said. “The KISS principle in a nutshell.”

“What I want to know.” Enrique said, “is: Where were the giants when they were needed in the [WG name suppressed] working group?”