Grail for regular languages

[11 December 2009]

Every now and then — not constantly, but recurrently — I experience a strong need to have a running copy of Grail, a software package first written by Derick Wood and Darrell Raymond and described by its documentation as “a symbolic computation environment for finite-state machines, regular expressions, and other formal language theory objects.”

Among other things, Grail is handy for answering questions about the equivalence or non-equivalence of regular expressions, or about subset/superset relations holding between the languages recognized by them. A few years ago, for example, the W3C XML Schema Working Group found itself in possession of two different descriptions of the lexical space of the XSD duration type. The working group wished, not unreasonably, to check that the two really were equivalent.

The first description provided three regular expressions, and said the lexical space of duration included all the strings which matched all three expressions:

  • -?P([0-9]+Y)?([0-9]+M)?([0-9]+D)?(T([0-9]+H)?([0-9]+M)?([0-9]+(.[0-9]+)?S)?)? (strings in which the fields of an ISO 8601 duration appear in the correct order, and in which each field appears only if it has at least one digit present)
  • .*[YMDHS].* (strings in which at least one field is present)
  • [^T]+(T[^HMS]+[HMS].*)? (if the character T appears, it must be followed by one of the time-related fields)

The second description translated the context-free grammar into regular-expression form (I’ve introduced white space for legibility):

-?P(([0-9]+Y)([0-9]+M)?([0-9]+D)?
(T(([0-9]+H)([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+(.[0-9]+)?S)))?
|([0-9]+M)([0-9]+D)?
(T(([0-9]+H)([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+(.[0-9]+)?S)))?
|([0-9]+D)?
(T(([0-9]+H)([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+(.[0-9]+)?S)))?
|T(([0-9]+H)([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+M)?([0-9]+(.[0-9]+)?S)?
|([0-9]+(.[0-9]+)?S)))

Easy enough to eyeball, for some people, I guess, but the working group wanted a more reliable method.

After a few hours trying vainly to compile Grail for my Linux box, I found an RPM that worked for me, and in ten minutes or so I had used Grail to establish that the two descriptions are equivalent.

Today I realized that another problem I face could best be solved by using Grail, but I no longer have a Linux box (and have not, in any case, found that old RPM). Grail 2.5 is dated March 1996, and the C++ in which it is written does not seem to please GCC 4.0.1. Grail+ 3.0, a successor project in other hands, may have been touched as recently as 2002 or 2004, but most of the dates appear to be in summer or fall 1998. GCC doesn’t like it, either.

So I have thus far been unable to recompile this very helpful tool.

If anyone out there knows of anyone who has either massaged the source of Grail into a form more like what modern C++ compilers will compile, or found out what combination of compile-time flags will persuade GCC to put itself in a more forgiving frame of mind and compile the thing, please get in touch. (And no, -Wno-deprecated does not suffice to do the trick.)

And any C++ proficients looking for interesting and useful projects to undertake could do a lot worse for themselves and for the world than to bring Grail into the twenty-first century.