Grune and Jacobs, Parsing Techniques, Second Edition

[18 March 2008]

Good news.

The second edition of Parsing techniques: A practical guide, by Dick Grune and Ceriel J. H. Jacobs, has now appeared. I found this out not long ago and ordered a copy, and it was here waiting for me when I came home from my most recent trip, and I’ve now had a day or so to look at it.

The first edition has been out of print for a while, which was frustrating at first since I liked recommending the book. But then the authors got the rights back from the publisher, and made Postscript and PDF of the first edition available on Dick Grune’s web site. A few years ago Dick Grune told me a second edition was planned, and I have been waiting rather impatiently ever since.

It was worth the wait. Everything that made the first edition so useful and fun to read has been retained.

Well, almost; there is one minor exception. The second edition has 662 pages, compared to the first edition’s 322, and while Springer has done a very good job of keeping the book compact and handy, it does weigh more, so that when I sit on the couch to read it, eventually the weight makes my little finger hurt. But the new material is, well, worth the weight. (Sorry!)

The best way to give a feel for the book is to quote from the preface to the first edition:

This book does not address a strictly uniform audience. On the contrary, while writing this book, we have consistently tried to imagine giving a course on the subject to a diffuse mixture of students and faculty members of assorted faculties, sophisticated laymen, the avid readers of the science supplement of the large newspapers, etc. Such a course was never given; a diverse audience like that would be too uncoordinated to convene at regular intervals, which is why we wrote this book, to be read, studied, perused or consulted wherever or whenever desired.

For myself, I’ve read, studied, and perused the first edition repeatedly since I bought the first edition at Kroch’s and Brentano’s in downtown Chicago in 1992 (back when their computer science department had computer science books, not just dummies books about popular computer programs). And I’ve consulted it for reference material many times.

They give a less formal account of automata and the key theorems (pumping lemma and so on) than, say, Hopcroft and Ullman, but nevertheless a clear one, and incomparably more accessible. And they give full, interesting accounts of a lot of topics not well covered in other books on parsing I’ve looked at. Not everyone knows that parsing does not necessarily proceed left to right, but can run right to left, or even non-directionally, but anyone who has read this book will have a very clear understanding of the matter. (This came up once in connection with an obscure passage of ISO 8879, which defines SGML: the spec says that certain attributes default to the “most recently specified” value. But most recent on what time axis? It turned out that the spec was written in the belief that constructs in a language must necessarily be recognized left to right; the editor expressed disbelief when I told him that other recognition orders were possible.)

The first edition had a very good treatment of van Wijngaarden grammars, affix grammars, and attribute grammars; the second edition expands the treatment into a full chapter on non-Chomsky grammars which also describes tree-adjoining grammars (which I now understand in principle, though I still don’t know how to write a parser for them), coupled grammars, ordered grammars, ‘recognition systems’, and boolean grammars. I had just noticed that there’s no discussion of adaptive grammars and was feeling just a little smug (“wow, there’s an unusual approach to parsing that I know about that Grune and Jacobs don’t mention; I must be hot stuff”) when I found a paragraph that says

One interesting possibility not explored here is to modify the CF grammar under the influence of parser actions. This leads to dynamic grammars, also called modifiable grammars, or adaptable grammars.

And the web-based bibliography for the second edition lists publications on the topic going back to 1963.

The book is much more than “a reservation for endangered parsers” (in the words of Kees van Reeuwijk, quoted in the preface): the descriptions of the standard algorithms are clear and easy to follow, and the authors provide a clear sense of the varying points of view and value schemes of mathematicians studying formal languages, computer scientists, and users from other fields seeking to apply grammars and parsing techniques in those fields.

If you have any interest at all in formal languages, grammars, or parsing, this book should be on your shelf.

An odd fact about fifth powers

[27 December 2007; crucial typo corrected, graphics added 29 December]

Over Christmas I got a copy of Oystein Ore’s book Number theory and its history (New York: McGraw-Hill, 1948; reprint New York: Dover, 1988).
In section 2-3, exercise 5 reads

5. Prove that in the decadic number system [i.e. using decimal numerals] the fifth power of any number has the same last digit as the number itself.

This is straightforward enough, if you’ve just read the preceding section. (My exposition isn’t as good as Ore’s, so I won’t try to explain it.)

But now consider a related problem.

For numbers n whose decimal representation ends in a particular digit d, we can make a little directed graph that makes it easy to calculate the final digit of any number multiplied by n. Each graph will have ten nodes, labeled 0 through 9, and there will be an arc from node i to node j if and only if multiplying n by a number ending in i produces a number ending in j.

For the digit 3, for example, we’ll have the arcs 0 → 0, 1 → 3, 2 → 6, 3 → 9, 4 → 2, 5 → 5, 6 → 8, 7 → 1, 8 → 4, and 9 → 7. Like this:

Exponentiation graph for the digit 3

And for the digit 5, there will be arcs from each odd-numbered state to state 5, and from each even-numbered state to state 0. And so on.
Like this:

Exponentiation graph for the digit 5

There ought to be a name for these graphs, but I don’t know of one. I’ll call them multiplication graphs.

Now, consider what problem 5 means in terms of these multiplication graphs. For the given number, we pick the appropriate graph. If we raise the number to the power 0, the result is a number whose decimal representation ends in ‘1’. (Because it’s always the number 1.) Call the state labeled ‘1’ the start state. A path of length 1 will always take us to the state named for the digit whose multiplication properties the graph represents: in graph 3, a path of length 1 takes us to state 3, in graph 7, a path of length 1 takes us to state 7. Call this state the ‘characteristic state’.

If we raise the number to the power n, the result will have a decimal representation ending in the digit we reach by following a path through the graph, beginning at the start state and having exactly n steps. Final digit of the square? Follow a path two steps long. Final digit of the fifth power? Follow a path five steps long.

So: in terms of multiplication graphs, exercise 5 amounts to proving that in each graph, a path of length 5 takes you to the same state as a graph of length 1. Or, equivalently, that there is a cycle of length 4, or 2, or 1, beginning at the characteristic state of the graph.

It’s clear that there must be cycles in the graphs, and that the cycles can’t have length greater than 10, since there are only ten states. It’s clear, after a moment’s thought, that any cycle must either hit only even numbers or only odd numbers, and so any cycle must have length less than or equal to 5.

In fact, however, the cycles are all for length 1 (in graphs 0, 1, 5, and 6), 2 (graphs 4 and 9) or 4 (graphs 2, 3, 7, 8).

Why are there no cycles of length 5?