Grune and Jacobs, Parsing Techniques, Second Edition

[18 March 2008]

Good news.

The second edition of Parsing techniques: A practical guide, by Dick Grune and Ceriel J. H. Jacobs, has now appeared. I found this out not long ago and ordered a copy, and it was here waiting for me when I came home from my most recent trip, and I’ve now had a day or so to look at it.

The first edition has been out of print for a while, which was frustrating at first since I liked recommending the book. But then the authors got the rights back from the publisher, and made Postscript and PDF of the first edition available on Dick Grune’s web site. A few years ago Dick Grune told me a second edition was planned, and I have been waiting rather impatiently ever since.

It was worth the wait. Everything that made the first edition so useful and fun to read has been retained.

Well, almost; there is one minor exception. The second edition has 662 pages, compared to the first edition’s 322, and while Springer has done a very good job of keeping the book compact and handy, it does weigh more, so that when I sit on the couch to read it, eventually the weight makes my little finger hurt. But the new material is, well, worth the weight. (Sorry!)

The best way to give a feel for the book is to quote from the preface to the first edition:

This book does not address a strictly uniform audience. On the contrary, while writing this book, we have consistently tried to imagine giving a course on the subject to a diffuse mixture of students and faculty members of assorted faculties, sophisticated laymen, the avid readers of the science supplement of the large newspapers, etc. Such a course was never given; a diverse audience like that would be too uncoordinated to convene at regular intervals, which is why we wrote this book, to be read, studied, perused or consulted wherever or whenever desired.

For myself, I’ve read, studied, and perused the first edition repeatedly since I bought the first edition at Kroch’s and Brentano’s in downtown Chicago in 1992 (back when their computer science department had computer science books, not just dummies books about popular computer programs). And I’ve consulted it for reference material many times.

They give a less formal account of automata and the key theorems (pumping lemma and so on) than, say, Hopcroft and Ullman, but nevertheless a clear one, and incomparably more accessible. And they give full, interesting accounts of a lot of topics not well covered in other books on parsing I’ve looked at. Not everyone knows that parsing does not necessarily proceed left to right, but can run right to left, or even non-directionally, but anyone who has read this book will have a very clear understanding of the matter. (This came up once in connection with an obscure passage of ISO 8879, which defines SGML: the spec says that certain attributes default to the “most recently specified” value. But most recent on what time axis? It turned out that the spec was written in the belief that constructs in a language must necessarily be recognized left to right; the editor expressed disbelief when I told him that other recognition orders were possible.)

The first edition had a very good treatment of van Wijngaarden grammars, affix grammars, and attribute grammars; the second edition expands the treatment into a full chapter on non-Chomsky grammars which also describes tree-adjoining grammars (which I now understand in principle, though I still don’t know how to write a parser for them), coupled grammars, ordered grammars, ‘recognition systems’, and boolean grammars. I had just noticed that there’s no discussion of adaptive grammars and was feeling just a little smug (“wow, there’s an unusual approach to parsing that I know about that Grune and Jacobs don’t mention; I must be hot stuff”) when I found a paragraph that says

One interesting possibility not explored here is to modify the CF grammar under the influence of parser actions. This leads to dynamic grammars, also called modifiable grammars, or adaptable grammars.

And the web-based bibliography for the second edition lists publications on the topic going back to 1963.

The book is much more than “a reservation for endangered parsers” (in the words of Kees van Reeuwijk, quoted in the preface): the descriptions of the standard algorithms are clear and easy to follow, and the authors provide a clear sense of the varying points of view and value schemes of mathematicians studying formal languages, computer scientists, and users from other fields seeking to apply grammars and parsing techniques in those fields.

If you have any interest at all in formal languages, grammars, or parsing, this book should be on your shelf.