Two styles of Monte Carlo testing for grammars

[8 August 2008]

In an earlier post I talked about generating test cases for grammars, and asked a number of questions. I’ve been meaning to get back to the topic; this afternoon I revisited some work I did with the XSD 1.1 grammar for regular expressions, and so the topic is again on my mind.

Pseudo-random strings

In the past, I’ve used Monte Carlo test generation to test grammars, in particular to find strings that distinguish between grammars. For example, given two grammars for the lexical space of a particular datatype, can we find strings that are accepted by one grammar and rejected by the other?

But random strings can also be used just to understand a given grammar better.

At one point in our work on XSD 1.1, for example, I proposed that in order to have a looser coupling between XSD and the RFCs that define URIs, we should just accept any string at all as a type-valid URI, much the same way that HTML browsers ‘accept’ any string as the value of an href attribute. They may not understand it, but they don’t raise an error message. The rationale for my proposal was that in any case, the generic grammars defined by RFC 3986 and its predecessors are so vague, so general, and so forgiving, that it wasn’t really very clear that they outlawed very much at all. I’ve heard people who know URIs well say that really the only thing strings that grammar won’t accept are strings with two hash signs. Others, equally well informed, say that those are legal, too.

The scheme-specific grammars (e.g. the grammar for http: URLs) are much more useful in catching errors, I argued. But we don’t require that they be checked, and don’t really want to make them part of the definition of type validity for the anyURI type. (I think a good validator should check anyURI value against the scheme-specific grammars and issue warnings if there are problems, but I don’t know any that do that.) So, I argued, the XSD spec was giving the illusion of useful value checking, but not the reality. Others in the working group maintained that the generic grammar does perform some useful checks, and wanted to retain it.

Generating ten thousand or so strings of twenty or forty characters each, with each character a randomly chosen character in the printable range of the ASCII character set, persuaded me that I was wrong. Even if the only useful thing the generic URI grammar does is eliminate strings containing forbidden characters, it rejects well over half the random strings I generated.

So I abandoned my proposal, and in XSD 1.1, type validity for anyURI does involve checking the literal against the grammar given in the RFC.

More interesting pseudo-random strings

One problem, for a human looking at the data, is that if you generate random strings as I did, by generating twenty random numbers between 32 and 127, and then concatenating the corresponding characters, then you generate an awful lot of uninteresting test cases, and you can go a long long time before you actually exercise a specific part of your grammar.

So when I faced the task of generating test cases for XSD’s regular-expression grammar, I wanted some method that would generate more interesting test cases, test cases more likely to exercise all the parts of the grammar. In particular, I wanted to make sure we got good coverage of the more obscure corners of the character-class constructs. These have relatively elaborate substructures and the likelihood of generating a complex character-class expression using randomly chosen characters was way too low.

To make a long story short, I think I found a way. This is what I did.

  1. I built a set of strings in the following way.
    1. For each non-terminal in the grammar, I generated one or more strings that matched that non-terminal, and added them to the set.
    2. For each literal string mentioned in the grammar, I included that literal string in the set. For example, the first production rule in the regex grammar is
      regExp ::= branch ( '|' branch )* 

      So I added “|” to the set of strings.

    3. For each class of terminal symbol, I included one or two examples in the set of strings.

    For the regex grammar, I ended up with a set of a hundred or so strings, from “a” to “(” to “[p{IsBasicLatin}-[aeiou]]”.

  2. I put the set of strings into an array, in no particular order.
  3. Then, I generated sequences of five or ten random numbers between 0 and the size of my set of string. For example, “[92, 21, 48, 6, 41, 69, 47”.
  4. Then I used each number to look up the corresponding string in the array, and concatenated the results. For the random numbers just given, this produced the strings ”}”, ”a”, ”2”, ”)”, ”|xyz”, ”|”, ”[\p{IsBasicLatin}-[aeiou]]”, and ”(”.
    Together, these yield the test string “ }a2)|xyz|[\p{IsBasicLatin}-[aeiou]](

The test cases generated by this modified method (random selection of substrings from a set of strings known to be grammatically interesting) exercised the grammar far better than test cases generated by the random-character method.

If I were a better mathematician, I might be able to characterize the difference. But in a purely informal way, I think one can think of the production rules of any grammar as a set of regular expressions over an alphabet consisting of both the terminal and the non-terminal symbols of the grammar, not just of characters. In any normal grammar, for most non-terminals some or most random character sequences will fail to match the non-terminal. Test cases generated by selecting characters at random will have the same effect as selecting items randomly from the terminal + non-terminal vocabulary, but with a bias (typically a very strong bias) against selecting any non-terminal symbol. By making a set of substrings as described, the second Monte Carlo method evens the odds a bit, and you get sequences that are much more likely to match the regular expressions over the terminal + non-terminal vocabulary, or to come very close to matching them. And that combination (matching all the productions, and coming close to matching each of the productions, but missing) is as close as I can come to characterizing the set of ‘interesting’ test cases.

Certainly, it was not necessary to generate tens of thousands of test cases in order to find examples that were valid against one form of the grammar, and invalid against a different form of the grammar.

If anyone wants to look at this quick and dirty test generator, the code for the test generator is at http://www.w3.org/XML/2008/03/xsdl-regex/testgenerator1.pl; the parser for the grammars I was testing is in the other files of the same directory.

4 thoughts on “Two styles of Monte Carlo testing for grammars

  1. ‘Sneakily random’ Michael. Quite widely applicable too, since so many things are constrained in the same way. Did you settle on any regular approach to selecting your test cases for each of the non-terminals? Or was that purely random (number and selection).

    btw, thanks for the recommendation (art of software testing). A great read.

    DaveP

  2. Thank you, Dave. I’m glad you liked it.

    Ideally, I would like to have some systematic approach to generating the strings used for each non-terminal, in the array from which the random strings will later be built. Perhaps one way would be to use a random string generator (i.e. randomly selected characters) and throw random strings at the non-terminal until you get a match, or two matches, or three, and put those randomly generated strings into the array.

    But in this case, I had no particular system: I just went through the grammar and made up strings by hand for each non-terminal. Otherwise, I think it might have taken a long time before the random-string generator emitted a string that matched the character-class subtraction rule. (Hmm. There might be ways of bootstrapping the process, to make this step quicker; I’ll have to think about that.)

    I did check to make sure they matched the non-terminal I was trying to make them match, but because I made them up there is always a possibility that these manually constructed strings fail to exercise the grammar the same way more systematically random strings would do. The wonderful advantage of randomly generated test strings is that random number generators have no mercy, and (if the random number generator is a good one) don’t get into mental ruts the way humans do.

  3. Pingback: Messages in a bottle » Blog Archive » More questions about Monte Carlo test-case generation

  4. Pingback: User links about "regex" on iLinkShare

Comments are closed.