Vocabulary specialization and generalization, use and interchange: things our schema languages should make easier than they do

[23 January 2011]

Perhaps it’s because the call for participation in the 2011 Balisage Pre-conference Symposium on XML Document Interchange has just come out, or perhaps it’s for other reasons, but I found myself thinking today about the problem of specialized uses of generalized vocabularies.

There are lots of vocabularies defined in fairly general terms: HTML, TEI, DocBook, the NLM article tag set, you can surely think of plenty yourself.

Often, for a specific purpose in a specific organization or project, it would be handy to have a much tighter, much more specific vocabulary (and thus one that’s semantically richer, easier to process, and easier to validate tightly). For example, consider writing and managing an issues list (or a list of use cases, or any other list containing items of a specialized genre), in a generic vocabulary. It’s easy enough: you just have a section for each issue and with that section you have standard sections on where the issue came from, what part of the project it relates to, its current status, and the history of your work on it. Easy enough. And if you’re the kind of person who write macros in whatever editor you use, you can write a macro to set up a new issue by adding a section of type ‘issue’ with subsections with appropriate types and headings. But isn’t that precisely what a markup-aware editor typically does? Well, yes, typically: any schema-aware editor can look at the schema, and as soon as you say “add a new issue” they can populate it with all of the required subelements. Or, they could, if you had an element type called ‘issue’, with appropriately named sub-elements. If instead you are using a generic ‘div’ element, your editor is going to have a hard time helping you, because you haven’t said what you really mean. You want an issue, but what you’ve said is ‘add a div’.

Some schemas, and some schema languages, try to make this easier by allowing you to say, essentially, that an issue element is kind of div, and that the content model for issue is a specialization of that for div (and so on). This is better than nothing, but I’m probably not the only person who fails to use these facilities in all the cases where they would be helpful. And when I do, I have to extend the standard stylesheets for my generic vocabulary to handle my new elements, because even when the stylesheet language supports the specialization mechanisms of the schema language (as XSLT 2.0 supports element substitution groups in XSD), most stylesheets are not written to take advantage of it. And if I’m exchanging documents with someone else, they may or may not want to have to deal with my extensions to the schema.

I wonder if we might get a better answer if (a) in our schema languages it were as easy to write a rule for div type='issue' as for issue, and (b) in our validation tools it were as easy to apply multiple grammars to a document as a single grammar, and to specify that the class of documents we are interested in is given by the intersection of the grammars, or by their union, or (for grammars A, B, C) by A ∪ (B ∩ ¬ C). Also (c) if for any schema extension mechanism it were easy to generate a transformation to take documents in the extended schema into the base schema, and vice versa.

Perhaps NVDL may be in a position to help with (b), though I’ve never learned it well enough to know and it seems to be more heavily freighted with unacknowledged assumptions about schema languages and validation than I’d like.

And perhaps Relax NG already can handle both (a) and (b).

Homework to do.

Giants and the KISS principle

[12 January 2011]

From time to time, my evil twin Enrique runs across passages in literature that seem to him to provide useful illustrations of important principles in information management. I don’t know, maybe he’s saving them up for the next time he teaches a class or something.

The other day, he came by and pointed me to a passage in J.K. Rowling’s Harry Potter and the Order of the Phoenix ([New York]: Scholastic; London: Bloomsbury, 2003):

“In any case, giants … — overload ’em with information an’ they’ll kill yeh jus’ to simplify things.”

“That’s nice,” I said. “The KISS principle in a nutshell.”

“What I want to know.” Enrique said, “is: Where were the giants when they were needed in the [WG name suppressed] working group?”

Introduction to XForms, 14-15 February 2011

[5 January 2011]

It’s official; on Monday and Tuesday, 14 and 15 February, I’ll be teaching a two-day hands-on course on XForms in Rockville, Maryland. Thanks to Mulberry Technologies for allowing me the use of their facilities.

If you use XML seriously, particularly in a multi-person project or organization (but even if you are on your own), and you don’t use XForms, then I think you owe it to yourself to look into the possibilities XForms offers for developing special-purpose editing interfaces for your XML documents. Sometimes, you want a specialized tool to perform one particular task on your documents. Consistency in some matters is a lot easier to achieve if you go over an entire body of material checking just the one thing in all documents. Special-purpose interfaces can help here.

For example: after long drawn-out battles, your project finally agrees on how to capitalize words in section headings: sentence case or title case? It would be nice to have a specialized editor that just showed you the section headings and let you edit them. Or suppose you decide it’s time to make a pass over your entire Web site to improve accessibility. As one task, you want to ensure that all of your images have alt text. That will be easier if you have an interface in which you can pull up each Web page and have a text widget next to each image allowing you to type in the description.

If you’re interested in the course, see the course Web page. If you’re interested in email announcements of courses (and other events at Black Mesa), subscribe to our (new!) announcements list.