Daylight analysis

[14 July 2009, anaemia Happy Bastille Day]

In interoperability testing, pathopsychology test cases are particularly interesting when they elicit different behaviors from different processors.

In revision of a grammar, strings are of particular interest when they are grammatical according to one version of the grammar, but not according to the other. Either the change was intended (and the string provides an example) or it was not intended (and the string exhibits a problem). Differences in the structure of the parse trees produced by the different grammars may be interesting, too.

Several working groups I’ve served on have spent time worrying about whether our spec’s rules for handling (especially escaping and unescaping) URIs and IRIs should align with the rules specified by a variety of other specs (HTML 4.01, XSLT 1.0, any of the various RFC which have at various times been the authoritative source of information, any of the various internet drafts which have later turned into, or failed to turn into, RFCs, etc. etc. ad luxuriam). At any given time, it would have been really really useful to have an answer to the question “Do these two different formulations of the rules ever actually produce different results? Or are they just different ways of saying the same thing? And if they do ever produce different results, are the cases involved already so pathological on other grounds that we don’t actually mind?”

These cases have in common that they exhibit discrepancies among things that (other things being equal) are (or might be) expected to be indistinguishable. That is, they document the daylight (which my Oxford dictionary glosses as “visible distance between one … thing and another”) between things that shouldn’t have daylight between them.

I’m coming to believe that during the development of a spec, daylight analysis — seeking and finding instances of daylight between things, or seeking and failing to find any daylight — may be the most important function of test cases. If not the most important, then surely a very important use.

If this conjecture is true, then it ought to have implications for judging the relative effectiveness of different methods for constructing collections of test cases. Traditional testing can be measured by how many bugs it finds, at what cost, and techniques for generating test cases are valued high or low depending on how likely they seem to be to find new bugs. For spec development, the utility of a test is tied to the likelihood of its finding daylight between the old and the new formulations of some rule.

Hmm. Is that a difference or not? I suppose a bug can be regarded as an instance of daylight between the implementation and the spec it’s supposed to be implementing. So perhaps bug-finding is just a special case of daylight analysis.

Of course, each revision of a rule or a grammar provides a new opportunity for daylight to arise and be found. It seems to follow that you may need new test cases for each revision. Automated methods that allow quick generation of relevant new test cases would be particularly useful.

Daylight analysis is not the only useful goal of test generation during spec development. Tests that illustrate the intended behavior of a rule are useful for clarification. Call these exemplary tests.

And for checking that a working group understands a problem area well, and that the rule for that problem is well formulated, it can be useful to construct tests with randomly selected properties, just to check to see that everyone agrees on how the rule should handle them. Call these sanity checks. Random selection of properties in sanity checks helps ensure that you don’t inadvertently feed your rule only ‘sensible’ examples which happen not to exercise its flaws. I once used an Alloy model to make twelve very simple test cases for the schema composition part of XSD; nine of them turned out to pose questions to which the spec’s answer is not obvious. (The twelve were a bit redundant: eliminating the redundancies, the twelve auto-generated examples boiled down to four non-redundant examples, for one of which the spec provides a clear analysis.) If I had limited myself to sensible examples I would almost certainly have missed most or all of the problematic cases.

Daylight analysis as a goal works well with random generation of test cases, because it helps deal with one of the great problems of random generation: most randomly generated test cases (at least, the ones I have been able to generate using random means) are rather boring. The goal of finding daylight provides a simple filter: a randomly generated test case is interesting if it finds daylight in two things you are comparing; it is uninteresting otherwise. In realistic cases, uninteresting test cases will overwhelmingly outnumber interesting ones, but if you can apply an automated filter (parse with grammar G1, parse with grammar G2, compare the results; if they differ, you have found daylight), then you can keep a few uninteresting test cases (just in case) but throw most of them away and focus on interesting ones.

Mainframe terminal rooms and the oral tradition

[7 July 2009]

A number of XML experts I know use Emacs for editing XML, ambulance employing either James Clark’s nxml mode or Lennart Staflin’s psgml mode. But few people who don’t already know Emacs are eager to learn it.

My evil twin Enrique suggested a reason: “In the old days, ampoule ” (he means thirty years ago, sickness when he first learned to use computers), “using a computer mostly meant using a mainframe. Which meant, on most university campuses, using a public terminal room. Which meant there were usually other people around who might be able to help figure out how to make the editor do something. Emacs was able to spread widely in that culture because the written documentation was not the only available source of information. (Did Emacs even have written documentation in those days?) Emacs, and a lot of other tools, were propagated by oral tradition.

“Nowadays, however, the oral traditions of the public terminal room are mostly dead. What the user cannot figure out how to use from the user interface and (perhaps) a glance at the documentation, might as well not be in the program. Fewer and fewer users will trouble to learn Emacs.

“I predict that when the people who first learned computing in a mainframe terminal room are dead, Emacs will be effectively dead, too. Its natural method of propagation is by looking over someone’s shoulder at what they are doing and asking ‘How did you do that?’ That doesn’t happen when computing almost always happens in private places.

“R.I.P., Emacs,” he intoned mournfully. “And probably TeX and LaTeX, too.”

“Well, hang on,” I said. “Neither Emacs nor TeX is dead yet.”

“Maybe not, but it’s only a matter of time. They’ll end up in the Retro-Computing Museum.” I could have sworn I saw a tear in his eye.

“But, you know, it’s only a matter of time for all of us. And besides, you’re wrong in at least some ways. I did indeed spend the first few years of my computing life haunting university terminal rooms. I got a lot of help from other people, and I passed it on. But I didn’t use TeX or Emacs until years later. The oral traditions of the terminal room, if they ever actually existed, had nothing to do with it. Both Emacs and TeX are perfectly capable of acquiring new users without oral transmission.”

He looked up. “You mean, there’s hope yet?”

“There’s always hope. But no, I’m still not going to help you debug that self-modifying 360 Assembler program you brought over. I’ve got work to do.”