Archive for the ‘Conferences’ Category

XForms and XQuery tutorials at TEI members’ meeting

Monday, August 23rd, 2010

[23 August 2010]

The TEI has published a list of workshops to be offered at the TEI Members’ Meeting this November in Zadar, Croatia.

Together with Syd Bauman of Brown University, I’m offering two tutorial workshops: one on XForms and one on XQuery. Each will last a day and a half, and involve some talking heads, some group discussion, and as much hands-on work as we can manage.

There are several other very good workshops on offer: Norm Walsh on XProc, the TEI@Oxford team on the ODD system, Elena Pierazzo and Malte Rehbein on the encoding of genetic editions, and Andreas Witt et al. on TEI for transcriptions of speech.

The organizers remind me that there is an early-bird discount for those who register before 31 August. There is some chance that tutorials which fail to attract enough participants will be canceled if they don’t get enough registration, so if you definitely want to come, you definitely want to register early, to help make sure your tutorial has enough registrants to make the cut.

Counting down to Balisage paper submission deadline

Friday, April 9th, 2010

[9 April 2010]

Just a week to go before paper submissions for Balisage 2010 are due. Time for the procrastinators, the delayers, the mañana-sayers to buckle down and get their papers written and submitted.

Speaking of which, I have some work to do now K thx bye.

XML Prague is over

Sunday, March 14th, 2010

[14 March 2010]

XML Prague took place yesterday and today, and I’m still coming down from the adrenaline high. Lots of good talks, in a single-tracked conference; I had the task of trying to knit them all together in the closing.

I’ll do a fuller trip report later, if working on my taxes doesn’t prevent it. But Norm Walsh has already asked that I post at least the last bit of my remarks.

[Readers who did not attend the conference or follow the streaming video need to know that in the opening sessions of the two conference days, Tony Graham and Sharon Adler had each referred at critical moments to the book of Genesis. One of Sharon's slides bore the title “In the beginning was SGML,”, and Tony began his talk on Saturday with an extended reference to the book of Genesis: “In the beginning was the page. And the page was without form and void, ...” Sharon wondered aloud whether people involved with descriptive markup all have God complexes; she may have something there. The text below also has some other references to things said at the conference, but I think for now I'll spare readers the detailed annotation necessary to explain them all. Apologies in advance to any readers made uncomfortable by parodies of scripture.]

At the end of my talk, I described wandering through the streets of Prague, trying to find my way from the Strahov Monastery, where the conference dinner was held, back to my hotel, when suddenly I had a vision — one might almost say, a revelation.

And I saw in the right hand of him that sat on the throne an XML document canonicalized and serialized with EXI, containing seven pi-trees encrypted with seven public-key encryption key pairs.

And I saw a strong angel proclaiming with a loud voice, Who is worthy to parse the XML document, and to decrypt the encryption thereof?

And no one in heaven, nor in earth, neither under the earth, was able to parse the XML document, neither by buffering it in memory nor by processing it in streaming mode.

And I wept much, because no man was found worthy to parse and to process the XML document, neither to exploit its vocabulary-specific semantics nor to perform vocabulary-independent pretty-printing thereon.

And one of the elders saith unto me, Weep not: behold, the standards-compliant XML application, with support for C14n and EXI, for XML encryption and schema validation, and for interoperable stylesheet technologies like XSLT 17.2 and XSL FO 42.0, hath prevailed to open the XML document, and to decrypt the seven encryptions thereof, and to display it coherently, yea even for those who abhor the sight of angle brackets and prefer beautiful well-formatted text with tasteful images.

And every creature which is in heaven, and on the earth, and under the earth, and such as are in the sea, and all that are in them, heard I saying, Blessing, and honour, and glory, and power, be unto those that preserve data, and provide access to information, if not for ever and ever, then at least for the foreseeable future.

And the presentations, and the coffee breaks, were the second day.

And the nine conference organizers said, Amen. And the representatives of the two gold sponsors, and the four silver sponsors, and the four bronze sponsors, and the five media partners, and the three sister events, stood up and said Amen. And the one hundred and forty-four conference particpants clapped their hands and thanked the conference organizers for a great conference focusing on information that shall outlive the applications which create and process it.

XML for the Long Haul

Tuesday, February 23rd, 2010

[23 February 2010]

The organizing committee for Balisage 2010 have announced the topic of this year’s one-day pre-conference symposium: “XML for the Long Haul: Issues in the Long-term preservation of XML,” and have issued a call for participation. The basic question is roughly this: what do we need to do to make sure that XML-encoded data is usable for long periods? Descriptive markup was invented by people who needed and desired longevity, application independence, and device independence for their data, so longevity is often used as a selling point for the use of SGML and XML. And as sometimes happens with selling points, the precise nature of the relation between XML and long-lived data is sometimes obscured, to the point where some potential customers may believe they have been told that the use of XML in itself guarantees data longevity. And maybe they have, but not (I should think) by anyone who knows what they are talking about. The use of XML, or more generally descriptive markup, may be a necessary condition of data longevity, but it’s unlikely to be sufficient, just as a hammer may be necessary (or extremely helpful) in getting a nail driven, but buying a hammer does not by itself get the nail into the wall.

There’s a lot to be said about the facets and ramifications of the topic, but I think I’ll save those for later posts. For now, I’ll just say that I’ll be chairing the symposium this year, and I hope to see readers of this blog in Montréal in August.

[My evil twin Enrique had been tugging my elbow for some time, and now asked “So why is the logo a moving truck? Will non-native speakers of English understand the reference?” I don't know (but if you do, I'm interested to learn: native speakers of other languages, please speak up! Does the logo make sense outside of English?), but I can at least explain. The English phrase “long haul” refers most literally to long distances, especially for the transport of freight or people (as in “long-haul flights” and “long-haul trucking”). In an extended sense (originally metaphorical, I guess) it denotes a protracted or difficult task (“we're in it for the long haul”) or an extended period of time. Long-term preservation of data and meaning involves a long haul both in the sense of being a difficult task and of involving long period of time. “Oh,” said Enrique. “I get it! The logo is a truck used for long-haul freight transport, the way XML may be used for long-haul preservation of information. Don't you think you should explain that somewhere?” “Maybe,” I said. Maybe I should.

ACH and ALLC co-sponsoring Balisage

Friday, February 12th, 2010

[12 February 2010]

The Association for Computers and the Humanities and the Association for Literary and Linguistic Computing have now signed on as co-sponsors of the Balisage conference held each year in August in Montréal. They join a number of other co-sponsors who also deserve praise and thanks, but I’m particularly happy about ACH and ALLC because they have provided such an important part of my intellectual home over the years.

Balisage will take place Tuesday through Friday, 3-6 August, this year; on Monday 2 August there will be a one-day pre-conference symposium on a topic to be announced real soon now. It’s a conference for anyone interested in descriptive markup, information preservation, access to and management of information, accessibility, device independence, data reuse — any of the things that descriptive markup helps enable. The deadline for peer review applications is 19 March; the deadline for papers is 16 April. Time to start thinking about what you’re going to write up; you don’t want to be caught up short at the last minute, without time to work out your idea properly.

Mark your calendars!

XML Prague 2010

Wednesday, February 10th, 2010

[10 February 2010]

Friends and colleagues who have attended XML Prague in previous years have come back with such enthusiastic tales that I have always wished I could attend. Smart people, good discussions, and of course Prague is Prague, one of the most beautiful cities in Europe. This year, it seems, my luck is in. My tickets are booked, my passport has been renewed, and my Czech phrasebook has been dusted off. (Pivo, prosim. Velký.)

There’s still time to decide to go. See you there, perhaps?

Topic maps on my mind

Friday, November 20th, 2009

[20 November 2009, additional pointer 23 November]

Last week I spent in Leipzig, attending the Topic Maps Research and Applications 2009 conference organized by the Topic Maps Lab at the university there. And since I returned, I have found myself spending a lot of time thinking about topic maps. (Enough that I really need to stop for a bit and get back to other work.)

I’ve written a short trip report on the conference, which can be read in the archives of the Topic Maps in LIS list run by Kevin Trainor. [23 November. The Topic Maps Lab has posted a version of the trip report that may be easier to read. Many thanks to them.] It doesn’t really do the conference justice, but perhaps it’s better than nothing. Other trip reports are [also] pointed to from the TM Lab page.

The short version of my report is: “Gosh, that was fun! And wow! is Leipzig ever worth a visit!”

Metadata and search - a concrete example

Tuesday, August 18th, 2009

[18 August 2009]

Here’s a concrete example of the difference between the metadata-aware search we would like to have, and the metadata-oblivious full-text search we mostly have today, encountered the other day at the Balisage 2009 conference in Montréal.

Try to find a video of the song “I don’t want to go to Toronto”, by a group called Radio Free Vestibule.

When I search video.google.com for “I don’t want to go to Toronto”, I get, in first place, a song called “I don’t want to go”, performed live in Toronto. When I put quotation marks around the title, it tells me nothing matches and shows me a video of Elvis Costello singing “I don’t want to go to Chelsea”.

It’s always good to have concrete examples, and I always like real ones better than made-up examples. (Real examples do often have a disconcerting habit of bringing in one complication after another and involving more than one problem, which is why good ones are so hard to find. But I don’t see many extraneous complications in this one.)

Trip report: Digital Humanities 2009

Monday, June 29th, 2009

[29 June 2009; subheads added 30 June 2009]

Last week I was at the University of Maryland attending Digital Humanities 2009, the annual joint conference of the Association for Computers and the Humanities (ACH), the Association for Literary and Linguistic Computing (ALLC), and the Society for Digital Humanities / Société pour l’étude des médias interactifs (SDH/SEMI) — the constituent societies of the Association of Digital Humanities Organizations. It had a fine concentration of digital humanists, ranging from stylometricians and adepts of authorship attribution to theorists of video games (this is a branch of cultural and medial studies, not to be confused with the game theory of von Neumann and Morgenstern — there may be a promotional opportunity in the slogan “Not your grandfather’s game theory!”, but I’ll let others take it up).

I saw a number of talks; some of the ones that stick in my mind are these.

Rockwell / Sinclair on Knowledge radio

Geoffrey Rockwell (Univ. of Alberta) and Stefán Sinclair (McMaster U.) talked about “Animating the knowledge radio” and showed how one could lower the threshold of text analysis by processing raw streams of data without requiring that it be indexed first. The user’s wait while the stream is being processed can be made bearable, and perhaps even informative, by animating the processing being performed. In one of their examples, word clouds of the high-frequency words in a text are created, and the cloud changes as new words are read in and the list of most-frequent words changes. The analogy with radio (as a stream you tap into without saving it to disk first) is arresting and may have potential for doing more work than they currently make it do. I wonder, too, whether going forward their analysis would benefit by considering current work on continuous queries (a big topic in database research and practice today) or streaming query processors (more XQuery processors act on streams than act on static indexed data). Bottom line: the visuals were pretty and the discipline of making tools work on streams appears to be beneficial.

Roued, Cayless, Stokes on paleography and the reading of manuscripts

Henriette Roued (Oxford) spoke on an Interpretation Support System she is building in company with others, to help readers and editors of ancient documents keep track of cruces, conjectures, arguments in favor of this or that reading of a crux, and so on. It was very impressive stuff. In the same session, Hugh Cayless (NYU) sketched out a kind of theoretical framework for the annotation of manuscript images, starting from a conventional scan, processing it into a page full of SVG paths, and attempting from that to build up a web of links connecting transcriptions to the image at the word token and line levels. This led to various hallway and lunchroom conversations about automatic detection of page structure, or mise en page, about which I know mostly that there are people who have studied it in some detail and whose results really ought to be relevant here. The session ended with Peter Stokes (Cambridge) talking about the past and future of computer-aided paleography. Among them, the three speakers seemed to have anticipated a good deal of what Claus Huitfeldt, Yves Marcoux, and I were going to say later in the week, and their pictures were nicer. This could have been depressing. But we decided to take this fact as a confirmation that our topic really is relevant.

One thing troubles me a bit. Both Roued and Cayless seem to take as a given that the regions of a document containing basic tokens provide a tessellation of the page; surely this is an oversimplification. It is perhaps true for most typewritten pages using the Roman alphabet, if they have no manuscript additions, but high ascenders, low descenders, complex scribal abbreviations, even printers’ ligatures all seem to require or suggest that it might be wise to assume that the regions occupied by basic tokens might overlap each other. (Not to mention the practice in times of paper shortage of overwriting the page with lines at right angles to the first set. And of course there are palimpsests.) And in pages with a lot of white space, it doesn’t seem obvious to me that all of the whitespace need be accounted for in the tabulation of basic tokens.

Bradley on the contributions of technical specialists to interdisciplinary projects

John Bradley (King’s College London) closed (my experience of) the first day of the conference by presenting a thought-provoking set of reflections on the contribution of specialists in digital humanities to projects undertaken jointly with humanists who are not particularly focused on the digital (analog humanists?). Of course, in my case he was preaching to the choir, but his arguments that those who contribute to the technical side of such projects should be regarded as partners, not as factotums, ought to be heeded by everyone engaged in interdisciplinary projects. Those who have ears to hear, let them hear.

Pierazzo on diplomatic editions

One of the high points of the conference for me was a talk on Wednesday by Elena Pierazzo (King’s College London), who spoke under the title “The limit of representation” about digital diplomatic editions, with particular reference to experience with a three-year project devoted to Jane Austen’s manuscripts of fiction. She spoke eloquently and insightfully about the difference between transcriptions (even diplomatic transcriptions) and the original, and about the need to choose intelligently when to capture some property of the original in a diplomatic edition and when to gesture instead toward the facsimile or leave the property uncaptured. This is a quiet step past Thomas Tanselle’s view (Studies in Bibliography 31 [1978]) that “the editor’s goal is to reproduce in print as many of the characteristics of the document as he can” — the history of digital editions, short as it is, provides plenty of examples to illustrate the proposition that editorial decisions should be driven by the requirements of the material and of the intended users of the edition, not (as in Tanselle’s view) by technology.

Ruecker and Galey on hermeneutics of design

Stan Ruecker (U. Albert) and Alan Galey (Toronto) gave a paper on “Design as a hermeneutic process: Thinking through making from book history to critical design” which I enjoyed a great deal, and think I learned a lot from, but which appears to defy paraphrase. After discussing the competing views that design should be the handmaiden of content and that design can and should itself embody an argument, they considered several examples, reading each as the embodiment of an argument, elucidating the work and the argument, and critiquing the embodiment. It gave me a pleasure much like that of sitting in on a master class in design.

Huitfeldt, Marcoux, and Sperberg-McQueen on transcription

In the same session (sic), Claus Huitfeldt (Univ. of Bergen), Yves Marcoux (Univ. de Montréal), and I gave our paper on “What is transcription? part 2”; the slides are on the Web.

Rockwell and Day, Memento mori for projects

The session concluded with a presentation by Geoff Rockwell (again! and still at Alberta) and Shawn Day (Digital Humanities Observatory, RIA Dublin) called “Burying dead projects: Depositing the Globalization Compendium”. They talked about some of the issues involved in depositing digital work with archives and repositories, as illustrated by their experience with a several-year project to develop a collection of resources on globalization (the Globalization Compendium of the title). Deposit is, they said, a requirement for all projects funded by the Canadian Social Sciences and Humanities Research Council (SSHRC), and has been for some time, but the repositories they worked with were still working out the kinks in their processes, and their own initial plans for deposit were also subject to change (deposit of the material was, interestingly, from the beginning planned into the project schedule and budget, but in the course of the project they changed their minds about what “the material” to be deposited should include).

I was glad to hear the other talks in the session, but I never did figure out what the program committee thought these three talks had in common.

Caton on transcription and its collateral losses

On the final day of the conference, Paul Caton (National Univ. of Ireland, Galway) gave a talk on transcription, in which he extended the analysis of transcription which Claus Huitfeldt and I had presented at DH 2007 (later published in Literary & Linguistic Computing) to consider information beyond the sequence of graphemes presented by a transcription and its exemplar.

There are a number of methodological and terminological pitfalls here, which mean caution is advised. For example, we seem to have different ideas about the meaning of the term token, which some people use to denote a concrete physical object (or distinguishable part of an object), but which Paul seems to use to denote a particular glyph or graphetic form. And is the uppercase / lowercase distinction of English to be taken as graphemic? I think the answer is yes (changing the case of a letter does not always produce a minimal pair, but it sometimes does, which I think suffices); Paul explicitly says the answer is no.

Paul identifies, under the cover term modality, some important classes of information which are lost by (most) transcriptions: presentation modality (e.g. font shifts), accidental modality (turned letters, malformed letters, broken type, even incorrect letters and words out of sequence), and temporal modality (the effects of time upon a document).

I think that some of the phenomena he discusses can in fact be treated as extensions of the set of types used to read and transcribe a document, but that raises thorny questions to which I do not have the answer. I think Paul has placed his finger upon a sore spot in the analysis of types and tokens: the usual view of the types instantiated by tokens is that we have a flat unstructured set of them, but as the examples of upper- and lower-case H, roman, italic, and bold instances of the word here, and other examples (e.g. long and short s, i/j, v/u) illustrate, the types we use in practice often do not form a simple flat set in which the identity of the type is the only salient information: often types are related in special ways. We can say, for purposes of analysis and discussion, that a set of types which merges upper and lower case, on the one had, and one which distinguishes them, on the other, are simply two different sets of types. But then, in practice, we operate not with one type system but with several, and the relations among type systems become a topic of interest. In particular, it’s obvious that some sets of types subsume others, and conversely that some are refinements of others. It’s not obvious that subsumption / refinement is the only relation among typesets that is worth worrying about. I assume that phonology has similar issues, both with identifying phonemes and with choosing the level of detail for phonetic transcriptions, but I know too little of phonology to be able to identify useful morals for application here.

What, no markup theory?

Looking back over this trip report, I notice that I haven’t mentioned any talks on markup theory or practice. Partly this reflects the program: a lot of discussions of markup theory seem to have migrated from the Digital Humanities conference to the Balisage conference. But partly it’s illusory: there were plenty of mentions of markup, markup languages, and so on. Syd Bauman and Dot Porter talked about the challenge of improving the cross referencing of the TEI Guidelines, and many talks mentioned their encoding scheme explicitly (usually the TEI). The TEI appears to be in wide use, and some parts of the TEI which have long been neglected appear to be coming into their own: Jan Christoph Meister of Hamburg and his students have built an annotation system (CATMA) based on TEI feature structures, and at least one other poster or paper also applied feature structures to its problem. Several people also mentioned standoff markup (though when one otherwise interesting presenter proposed using character offsets as the way to point into a base text, I left quietly to avoid screaming at him during the question session).

The hallway conversations were also very rewarding this year. Old friends and new ones were present in abundance, and I made some new acquaintances I look forward to renewing at future DH conferences. The twitter stream from the conference was also abundant (archive); not quite as active as an IRC channel during a typical W3C meeting, but quite respectable nonetheless.

All in all, the local organizers at the Maryland Institute for Technology in the Humanities, and the program committee, are to be congratulated. Good job!

Reminder - Balisage late-breaking news

Tuesday, June 16th, 2009

[16 June 2009]

Most readers of this blog will (I hope) have seen the Balisage 2009 call for late-breaking news. If you haven’t, go look at it.

The deadline for late-breaking submissions is this Friday; that’s enough time to put together a persuasive proposal if there are recent developments in some field of interest that you are in a position to report on.

See you in Montréal!