[18 March 2008]
Good news.
The second edition of Parsing techniques: A practical guide, by Dick Grune and Ceriel J. H. Jacobs, has now appeared. I found this out not long ago and ordered a copy, and it was here waiting for me when I came home from my most recent trip, and I’ve now had a day or so to look at it.
The first edition has been out of print for a while, which was frustrating at first since I liked recommending the book. But then the authors got the rights back from the publisher, and made Postscript and PDF of the first edition available on Dick Grune’s web site. A few years ago Dick Grune told me a second edition was planned, and I have been waiting rather impatiently ever since.
It was worth the wait. Everything that made the first edition so useful and fun to read has been retained.
Well, almost; there is one minor exception. The second edition has 662 pages, compared to the first edition’s 322, and while Springer has done a very good job of keeping the book compact and handy, it does weigh more, so that when I sit on the couch to read it, eventually the weight makes my little finger hurt. But the new material is, well, worth the weight. (Sorry!)
The best way to give a feel for the book is to quote from the preface to the first edition:
This book does not address a strictly uniform audience. On the contrary, while writing this book, we have consistently tried to imagine giving a course on the subject to a diffuse mixture of students and faculty members of assorted faculties, sophisticated laymen, the avid readers of the science supplement of the large newspapers, etc. Such a course was never given; a diverse audience like that would be too uncoordinated to convene at regular intervals, which is why we wrote this book, to be read, studied, perused or consulted wherever or whenever desired.
For myself, I’ve read, studied, and perused the first edition repeatedly since I bought the first edition at Kroch’s and Brentano’s in downtown Chicago in 1992 (back when their computer science department had computer science books, not just dummies books about popular computer programs). And I’ve consulted it for reference material many times.
They give a less formal account of automata and the key theorems (pumping lemma and so on) than, say, Hopcroft and Ullman, but nevertheless a clear one, and incomparably more accessible. And they give full, interesting accounts of a lot of topics not well covered in other books on parsing I’ve looked at. Not everyone knows that parsing does not necessarily proceed left to right, but can run right to left, or even non-directionally, but anyone who has read this book will have a very clear understanding of the matter. (This came up once in connection with an obscure passage of ISO 8879, which defines SGML: the spec says that certain attributes default to the “most recently specified” value. But most recent on what time axis? It turned out that the spec was written in the belief that constructs in a language must necessarily be recognized left to right; the editor expressed disbelief when I told him that other recognition orders were possible.)
The first edition had a very good treatment of van Wijngaarden grammars, affix grammars, and attribute grammars; the second edition expands the treatment into a full chapter on non-Chomsky grammars which also describes tree-adjoining grammars (which I now understand in principle, though I still don’t know how to write a parser for them), coupled grammars, ordered grammars, ‘recognition systems’, and boolean grammars. I had just noticed that there’s no discussion of adaptive grammars and was feeling just a little smug (“wow, there’s an unusual approach to parsing that I know about that Grune and Jacobs don’t mention; I must be hot stuff”) when I found a paragraph that says
One interesting possibility not explored here is to modify the CF grammar under the influence of parser actions. This leads to dynamic grammars, also called modifiable grammars, or adaptable grammars.
And the web-based bibliography for the second edition lists publications on the topic going back to 1963.
The book is much more than “a reservation for endangered parsers” (in the words of Kees van Reeuwijk, quoted in the preface): the descriptions of the standard algorithms are clear and easy to follow, and the authors provide a clear sense of the varying points of view and value schemes of mathematicians studying formal languages, computer scientists, and users from other fields seeking to apply grammars and parsing techniques in those fields.
If you have any interest at all in formal languages, grammars, or parsing, this book should be on your shelf.
Hello!
I accidentally find your klog (Google reader recomend it). You
write about xml and also in one of posts mention that you use
emacs, what’s why I ask.
I’m trying to write mode for emacs for testing web services (send
them soap request and get response). It will work so: I give
emacs wsdl file or url to wsdl, emacs parses it, asks me what
operation of what port I want to invoke, creates request and
sends it.
I parse wsdl (and xsds it uses) using nxml-parse-file, it returns
lisp structure that represents xml (I think it is similiar to DOM
tree). My problem is creating xml request. I have create instance
document that is valid according to used xsd scheme (one of it’s
element of type).
My current approach is recursively traversing lisp-structure that
represents xsd: find element instance of which I want to build,
find it’s type, and repeat that procedure for every children
elements of this type, until I will reach predefined xsd
types. What do you think, is it right approach? Can “Parsing
Techniques” helps me somehow? Will it be better to parse xsd by
hand instead of using tree from nxml?