Archive for the ‘Balisage’ Category

Counting down to Balisage paper submission deadline

Friday, April 9th, 2010

[9 April 2010]

Just a week to go before paper submissions for Balisage 2010 are due. Time for the procrastinators, the delayers, the mañana-sayers to buckle down and get their papers written and submitted.

Speaking of which, I have some work to do now K thx bye.

XML for the Long Haul

Tuesday, February 23rd, 2010

[23 February 2010]

The organizing committee for Balisage 2010 have announced the topic of this year’s one-day pre-conference symposium: “XML for the Long Haul: Issues in the Long-term preservation of XML,” and have issued a call for participation. The basic question is roughly this: what do we need to do to make sure that XML-encoded data is usable for long periods? Descriptive markup was invented by people who needed and desired longevity, application independence, and device independence for their data, so longevity is often used as a selling point for the use of SGML and XML. And as sometimes happens with selling points, the precise nature of the relation between XML and long-lived data is sometimes obscured, to the point where some potential customers may believe they have been told that the use of XML in itself guarantees data longevity. And maybe they have, but not (I should think) by anyone who knows what they are talking about. The use of XML, or more generally descriptive markup, may be a necessary condition of data longevity, but it’s unlikely to be sufficient, just as a hammer may be necessary (or extremely helpful) in getting a nail driven, but buying a hammer does not by itself get the nail into the wall.

There’s a lot to be said about the facets and ramifications of the topic, but I think I’ll save those for later posts. For now, I’ll just say that I’ll be chairing the symposium this year, and I hope to see readers of this blog in Montréal in August.

[My evil twin Enrique had been tugging my elbow for some time, and now asked “So why is the logo a moving truck? Will non-native speakers of English understand the reference?” I don't know (but if you do, I'm interested to learn: native speakers of other languages, please speak up! Does the logo make sense outside of English?), but I can at least explain. The English phrase “long haul” refers most literally to long distances, especially for the transport of freight or people (as in “long-haul flights” and “long-haul trucking”). In an extended sense (originally metaphorical, I guess) it denotes a protracted or difficult task (“we're in it for the long haul”) or an extended period of time. Long-term preservation of data and meaning involves a long haul both in the sense of being a difficult task and of involving long period of time. “Oh,” said Enrique. “I get it! The logo is a truck used for long-haul freight transport, the way XML may be used for long-haul preservation of information. Don't you think you should explain that somewhere?” “Maybe,” I said. Maybe I should.

ACH and ALLC co-sponsoring Balisage

Friday, February 12th, 2010

[12 February 2010]

The Association for Computers and the Humanities and the Association for Literary and Linguistic Computing have now signed on as co-sponsors of the Balisage conference held each year in August in Montréal. They join a number of other co-sponsors who also deserve praise and thanks, but I’m particularly happy about ACH and ALLC because they have provided such an important part of my intellectual home over the years.

Balisage will take place Tuesday through Friday, 3-6 August, this year; on Monday 2 August there will be a one-day pre-conference symposium on a topic to be announced real soon now. It’s a conference for anyone interested in descriptive markup, information preservation, access to and management of information, accessibility, device independence, data reuse — any of the things that descriptive markup helps enable. The deadline for peer review applications is 19 March; the deadline for papers is 16 April. Time to start thinking about what you’re going to write up; you don’t want to be caught up short at the last minute, without time to work out your idea properly.

Mark your calendars!

Metadata and search - a concrete example

Tuesday, August 18th, 2009

[18 August 2009]

Here’s a concrete example of the difference between the metadata-aware search we would like to have, and the metadata-oblivious full-text search we mostly have today, encountered the other day at the Balisage 2009 conference in Montréal.

Try to find a video of the song “I don’t want to go to Toronto”, by a group called Radio Free Vestibule.

When I search video.google.com for “I don’t want to go to Toronto”, I get, in first place, a song called “I don’t want to go”, performed live in Toronto. When I put quotation marks around the title, it tells me nothing matches and shows me a video of Elvis Costello singing “I don’t want to go to Chelsea”.

It’s always good to have concrete examples, and I always like real ones better than made-up examples. (Real examples do often have a disconcerting habit of bringing in one complication after another and involving more than one problem, which is why good ones are so hard to find. But I don’t see many extraneous complications in this one.)

International Symposium on Processing XML Efficiently

Wednesday, August 12th, 2009

[10 August 2009, delayed by network problems ...]

The International Symposium on Processing XML Efficiently, chaired by Michael Kay, has now reached its midpoint, at lunch time.

The morning began with a clear and useful report by Michael Leventhal and Eric Lemoine at LSI, talking about six years of experience building XML chips. Tarari, which was spun off by Intel in 2002, started with a tokenizer on a chip, based on a field-programmable gate array (FPGA) and has gradually developed more capabilities, including parse-time evaluation of XPath queries, later on full XSLT, and even parse-time XSD schema validation. The goal is not to perform full XML processing on the chip, but to perform tasks which software higher up in the stack will find useful.

One property of the chip I found interesting was that it attempts to treat parsing as a stateless activity, which aids security and allows several XML documents to be processed at once. Of course, parsing is not by nature stateless, but the various specialized processes implemented on the chip produce relevant state information as part of their output, and that state information is cycled around to be fed into the input side of the chip together with the next bits of the document in question. It reminds me of the way Web (and CICS) applications make themselves stateless by externalizing the state of a long conversational interaction by sending it to the client as hidden data.

David Maze of IBM then spoke about the Data Power XML appliance; I had heard people talk about it before, but had never gotten a very clear idea of what the box actually does. This talk dispelled a lot of uncertainty. In contrast to the LSI chip, the Data Power appliance is designed to perform full XML processing, and with throughput rather than reduced latency as the design goal. But the list of services offered is still rather similar: low-level parsing, XPath evaluation, XSLT processing, schema validation. Some are done during parsing, and some by means of a set of specialized post-processing primitives.

Rob Cameron and some of his students at Simon Fraser University came next. They have been exploring ways to exploit the single-instruction/multiple-data (SIMD) instructions which have been appearing in the instruction sets of recent chips. They turn a stream of octets into eight streams of bits, and can thus process eight times as much of the document in a single operation as they would otherwise be able to. The part that blew my mind was the explanation of parsing by means of bitstream addition. He used decimal numeric character references to illustrate. I can’t explain in detail here, but the basic idea is: you make a bit stream for the ‘&’ character (where a 1 in position n means that the nth character in the input stream is a ‘&’. Make a similar bit stream for the ‘#’. And them together; you have the occurrences of the ‘&#’ delimiter in the document. Make a similar bit stream for decimal digits; you may frequently have multiple decimal digits in a row. Now comes an extremely expected trick. Reverse the bit array, so it rights right to left. Now shift the ‘&#’ delimiter bit stream by one position, and ADD it to the decimal-digit bit stream. If the delimiter was followed by a decimal digit (as in a decimal character reference it must be), there will be two ones in the same column. They will sum to ‘10′, and the one will carry. If the following character is also a decimal digit, it will sum, with the carry, to ‘10′. And so on, until finally you reach the end of the sequence of adjacent decimal digits, and are left with a 1 in the result bitstream. AND that together with the bit stream for the ‘;’ character, and wherever there is a 1 in the result you have now diagnosed a well-formedness error in the input.

Upshot: A single machine operation has scanned from the beginning to the end of (multiple instances of) a variable-length construct and identified the end-point of (each instance of) the construct. With a little cleverness, this can be applied to pretty much all the regular-language phenomena in XML parsing. The speedups they report as a result are substantial.

The use of bitstream operations provides a lot of parallelism; the one place the Parabix system has to drop into a loop seems to be for the recognition of attribute-value pairs. I keep wondering whether optimistic parallelism might not allow that problem to be solved, too.

Mohamed Zergaoui gave a characteristically useful discussion of streamability and its conceptual complications, and David Lee and Norm Walsh presented some work on timing various different ways of writing and running pipelines of XML processes. When running components written in Java, from scripting languages like bash, the time required for the XML processing (in the real-world application they used for testing) was dwarfed by the cost of repeatedly launching the Java Virtual Machine. Shifting to a system like xmlsh or calabash, which launches the JVM once not repeatedly, gained fifty- and hundred-fold speedups.

Reminder - Balisage late-breaking news

Tuesday, June 16th, 2009

[16 June 2009]

Most readers of this blog will (I hope) have seen the Balisage 2009 call for late-breaking news. If you haven’t, go look at it.

The deadline for late-breaking submissions is this Friday; that’s enough time to put together a persuasive proposal if there are recent developments in some field of interest that you are in a position to report on.

See you in Montréal!

Efficient processing of XML

Tuesday, June 9th, 2009

[9 June 2009]

The organizers of Balisage (among them me) have announced that the program is now available for the International Symposium on Processing XML Efficiently, chaired by Michael Kay of Saxonica, which will take place the day before the Balisage conference starts, in the same venue. I reproduce the announcement below.

PROGRAM NOW AVAILABLE

International Symposium on Processing XML Efficiently:
Overcoming Limits on Space, Time, or Bandwidth

Monday August 10, 2009
Hotel Europa, Montréal, Canada

Chair: Michael Kay, Saxonica
Symposium description: http://www.balisage.net/Processing/
Detailed Program: http://www.balisage.net/Processing/Program.html
Registration: http://www.balisage.net/registration.html

Developers have said it: “XML is too slow!”, where “slow” can mean many things including elapsed time, throughput, latency, memory use, and bandwidth consumption.

The aim of this one-day symposium is to understand these problems better and to explore and share approaches to solving them. We’ll hear about attempts to tackle the problem at many levels of the processing stack. Some developers are addressing the XML parsing bottleneck at the hardware level with custom chips or with hardware-assisted techniques. Some researchers are looking for ways to compress XML efficiently without sacrificing the ability to perform queries, while others are focusing on the ability to perform queries and transformations in streaming mode. We’ll hear from a group who believe the problem (and its solution) lies not with the individual component technologies that make up an application, but with the integration technology that binds the components together.

We’ll also hear from someone who has solved the problems in real life, demonstrating that it is possible to build XML-based applications handling very large numbers of documents and a high throughput of queries while offering good response time to users. And that with today’s technologies, not tomorrow’s.

If you are interested in this symposium we invite you to read about
“Balisage: The Markup Conference”, which follows it in the same location:
http://www.balisage.net/

Questions: info@balisage.net

XML in the browser at Balisage

Saturday, June 6th, 2009

[6 June 2009]

It’s been some time since XML was first specified, with the hope that once browsers supported XML, it would become easy to deliver content on the Web in XML.

A lot of people spent a lot of time, in those early years, warning that you couldn’t really deliver XML on the Web in practice, because too many people were still using non-XML-capable (or non-XSLT-enabled) browsers. (At least, it seemed that way to me.) I got used to the idea that you couldn’t really do it. So it was a bit of a surprise to me, a few years ago, to discover that I could, after all. There are some dark corners in some browsers’ implementations of XSLT (no information about unparsed entities in Opera, no namespace nodes in Firefox — though that last one is being fixed even as I write, which is good news) but there are workarounds; the situation is probably at least as good with respect to XSLT as it is with respect to Javascript and CSS. I have not had to draft an important paper in HTML, or worry about translating it into HTML in order to deliver it on the Web, in years.

Why is this fact not more widely known and exploited by users of XML?

It will surprise no one, after what I just said, that one of the papers I’m looking forward to hearing at Balisage 2009 (coming up very soon — the deadline for late-breaking news submissions is 19 June, and the conference itself is the week of 10 August) is a talk by Alex Milowski under the title “XML in the browser: the next decade”. Alex Milowski is one of the smartest and most thoughtful technologists I know, and I look forward to hearing what he has to say.

He’ll talk (as I know from having read the paper proposal) about what some people hoped for, from XML in the browser, ten years ago, and about what has happened instead. He’ll talk about what can be done with XML in the browser today, and what most needs fixing in the current situation. There will probably be demos. And, most interesting, I expect he will provide some thoughts on the long-term direction we should be trying to take, in order to make the Web and XML get along better together.

If you think XML can help make the Web and the world a better place, you might want to come to Montréal this August to hang around with Alex, and with me, and with others of your kind. It’s always a rewarding experience.

Balisage preliminary program is up

Friday, May 29th, 2009

[29 May 2009; typo corrected 13 August 2009]

The preliminary program for Balisage 2009 is now up on the Web, in both full and brief forms.

(more…)

Balisage: The markup conference 2009

Friday, April 3rd, 2009

[3 April 2009]

Three weeks to go until the 24 April deadline for papers for the 2009 edition of Balisage: The markup conference.

We want your paper. So give it to us, already!

This is a peer-reviewed conference that seeks to be of interest both to the theorist and to the practitioner of markup. That makes it a lot of fun (at least for people like me who are interested both in theory and in practice, and who like to see them informing each other). And the peer reviews are unusual (at least in my experience of conference paper submissions) in the detail and passion of their comments.

If you have markup-related work to report on, you will not get better feedback from any conference on the planet. (Disclaimer: I am one of the organizers, and have been known to have a small soft spot in my heart for this conference. But don’t take my word for it: ask anyone who has spoken at Balisage how the peer review and the questions at the conference compared with other conferences.)

Details of submission procedure, of course, in the call for participation.

I look forward to reading your papers.