Xerophily, a parser for XSD regular expressions

[30 December 2009]

A while back, in connection with the work of the W3C XML Schema working group, I wrote a parser, in Prolog, for the regular-expression language defined by the XSD spec. This has been on the Web for some time, at http://www.w3.org/XML/2008/03/xsdl-regex/.

Not long ago, though, I received an inquiry from Jan Wielemaker, the maintainer of SWI Prolog (and ipso facto someone to whom I owe a great debt, along with every other user of that well maintained system), asking if it wouldn’t be possible to re-issue the code under the Lesser Gnu Public License, so that he could use parts of it in some work he’s doing on support for the simple types of XSD. (Strictly speaking, he could do that under the W3C license, but that would mean he has to manage not one license but two for the same library: too complicated.)

Fine with me, but since I wrote it while working at W3C, I don’t own the copyright. To make a long story short, after some cavilling and objections and assorted mutual misunderstandings, agreement was reached, and W3C gave their blessing.

The LGPL boilerplate assumes that the package being issued has a name, so I have given the XSD regex parser the name Xerophily.

[“Xerophily? What on earth is that?” asked Enrique. The word, I explained to him, denotes the property possessed by plants well adapted to growing in dry, especially hot and dry, conditions. “Well, that makes it apt for code produced in New Mexico, I guess,” he allowed. “And your code, like those plants, is a little scraggly and scrawny. But what name are you going to use for your next program?” “It’s also one of the few words I could find containing X (for XSD), R (for regular expression) and P (for parser, and for Prolog).” “Oh.”]

A word of caution: Xerophily was written to support some very dry work on the grammar of the regex language; it doesn’t really have a user interface; and it doesn’t actually perform regex matching against strings. It just parses the regex into an abstract syntax tree (AST). I do have code, or I think I did some years back, to do the matching, but the shape of the AST has changed since then. In other words, Jan Wielemaker may be the only person on earth who could have any plausible reason for wanting to use or read this code.

But for what it’s worth, as of a few minutes ago, it’s on the Web, under the LGPL, at the address http://www.w3.org/XML/2008/03/xsdl-regex/LGPL/. The W3C-licensed version remains available, at the address given above.