[30 June 2009]
Jacek Kopecky has commented on my earlier posting about Brittleness, regression testing, and formal methods. I started to reply in a further comment, but my response has gotten a bit long, so I’m making a separate post out of it.
Examples needed
Jacek writes:
Dear Michael, you are raising a good point, but you’re doing it the same way as the “formal methods proponentsâ€: you don’t show concrete examples.
Dear Jacek, you are quite correct: I don’t have persuasive examples. I have a gut feeling that formal methods would be helpful in spec development, but I can’t point to convincing examples of places it has helped. Having really convincing examples seems to require first persuading working groups to use formal methods, which is (as you suggest) not easy in the absence of good examples. It’s also a hard sell if the spec in question is at all far along in its development, because it appears that the first thing the group has to do is stop everything else to build a formal model of the current draft. No working group is going to want to do that.
I did embark once on an effort to model the XProc spec in Alloy (also in HTML), in an attempt to capture a spec which was still being worked on, and perhaps persuade the WG to apply formal methods in its work going forward. The example seems to illustrate both the up and the down of using formal methods on a spec. On the up side: the work I did raised a number of questions about the properties of pipeline ‘components’ as they were defined in the then-current draft and may have helped persuade some in the WG to support the proposal to eliminate the separate ‘component’ level. But on the down side: the draft was moving faster than I could move: before I was able to finish the model, submit the paper formally to the WG, and propose that the WG use the model as part of our on-going design process, the draft had been revised in such a way as to make much of the model inaccurate. No one in a WG really wants to consider proposals based on an outdated draft, so no one really wanted to look at or comment on the fragmentary model I had produced. I could not keep up, and eventually abandoned the attempt. Perhaps someone with better modeling skills than mine would have been able to find a way to move faster, or would have found a way to make a lighter-weight model which would be easier to update and keep in synch with the spec. But one possible lesson is: if the group doesn’t use formalisms from the very beginning, it may be very hard to adopt them in mid-process.
Starting small
Other lessons are possible. Kaufmann, Manolios, and Moore write, in Computer-aided reasoning: ACL2 case studies ([Austin: n.p.], 2002), p. 13:
The formalizers will often be struggling with several issues at once: understanding the informal descriptions of the product, discovering and formalizing the relevant knowledge that the implementors take for granted, and formalizing the design and its specification. To a large extent this phenomenon is caused by the fact that formal methods [are] only now being injected into industry. Once a significant portion of a group’s past projects has been formalized, along with the then-common knowledge, it will be much easier to keep up. But at this moment in history, keeping up during the earliest phases of a project can be quite stressful.
When working on a new project, we recommend that the formalizers start by formalizing the simplest imaginable model, e.g., the instruction set with one data operation and a branch instruction, or the protocol with a simple handshake. Choosing this initial model requires some experience…. Since iteration is often required to get things to fit together properly, it is crucial that this initial foray into the unknown be done with a small enough model to permit complete understanding and rapid, radical revision.
In my effort to model XProc, I failed to find a suitably small initial model. (As they say, it takes some experience, which is what I was trying to gain.) But then, how persuasive would a really small toy model be to a working group skeptical of formal methods in the first place? I don’t yet know how to square this circle.
I was more successful in finding useful small partial models when I used Alloy in 2007 to model schema composition. In that case I was able to generate a number of extremely useful test cases, but the exercise, sadly, showed merely that the XML Schema WG does not have any consensus on the meaning of the XSD spec’s description of schema composition. Being able to generate those examples during the original design phase might conceivably have led to a better design. At least, I like to think so. A skeptic might say it would only have led the working group to introduce more special cases and ad hoc rules, in order to avoid having to explain how certain cases are supposed to work. (This is, after all, the XML Schema WG we’re talking about here.) Nothing is perfect.
The tedium of formalization
JK continues:
I, for one, avoid formal methods other than programmatic reference implementation for a few reasons: the tediousness of translating the spec into the formalism that would support all those tests; creating all those tests; and finally the difficulty of being relatively sure that what is written in the formalism actually corresponds to the intent of what is written in the text.
I could change my mind if it was shown that enough formalization for XML Schema that would support your regression tests is not all that tedious.
Here JK makes several excellent points, to which I do not currently have good answers.
On tedium: It’s possible to model a spec selectively (as in my model of schema composition) with relatively little tedium. Indeed, the point of light-weight formal methods (as promoted by Daniel Jackson and embodied in Alloy) is to allow the user to construct relatively small and simple models which cover just the aspects of the design one is currently worried about. There is no fixed foundational set of primitives in which models must be described; the model can specify its own primitives.
For example, I was able to model some simple aspects of XSD 1.0 schema composition without having to model all of schema validation, and without modeling all the details of XML well-formedness. If the paper on that model does not seem especially light-weight, I think it’s due to the need to specify how to inspect the behavior of implementations unambiguously.
I have to confess, thought, that it’s not always obvious to me how best to construct such partial models. For a long time I avoided trying to model any part of XSD because I thought I was likely to have to model the Infoset and XML in full, first. I’d like to do that, someday, just for the satisfaction, but I don’t expect it to be quick. It was only after a longish while that I saw a way to make a simpler partial model. The more accurately I can identify some part of the design that I’m worrying about, the easier it is to find a partial model for that part of the design.
But in the earlier post, I was thinking about a kind of regression testing which would be intended to identify problems and interactions I was not consciously worried about. For that to work, I expect I would need a model that covered pretty much all of the salient properties of the entire spec. And at the moment, I cannot say I expect the construction of such a model to be entirely free of tedium. If I think that it might nevertheless be a good idea, it’s because checking natural-language prose for consistency is also tedious (which is why working groups sometimes do not do it well, or at all).
Consistency of prose and formalism
In other words, if you have one model (the spec in English), it will have dark corners that need discovering and clearing up. That will have a cost. If you have two models (the spec in English and the formalism), in my experience both may have dark corners (but one model’s dark corner may be clarified by the other model where it’s not a dark corner, so that may be a net plus), but there’s also the consistency of the two models that comes in question. So, gimme examples, show me that the consistency is not a problem, please, and I’ll be very grateful indeed.
I read somewhere once (in a source I have not managed to find again) that some international standards bodies require, or recommend, that their working groups produce both the English and the French version of their specifications, rather than working monolingually and then having a normative translation made. Why? Because working monolingually left too many dark corners in the text; working on both versions simultaneously led to clearer texts in both languages. The initial process was slower, but the additional effort was paid back by better results with lower maintenance costs.
I speculate that working both in English and in a suitable formalism like Alloy or ACL2 would have a similar effect: initial progress would feel slower, but the results would be better and maintenance would be easier. (This is a lot like having test cases in software development: it seems to slow you down at first, but it speeds things up later.)
Consistency between the two formulations should not, in principle, be any greater problem than consistency between the natural-language formulation of different parts of a complex spec. In fact, it should (note that modal verb!) be less of a problem: natural language provides only so much help with consistency checking, whereas formalisms tend to be a bit better on that score. The only really complex spec I have ever seen with no inconsistencies I could detect was ISO 8879, whose editor was a lawyer skilled in the drafting of contracts.
There are, of course, things a working group can do to make it easier, or harder, to maintain consistency between the prose and the formalism. Putting the formal treatment in a separate document (like the XPath formal semantics) or a separate part of the same document (like the schema for schema documents in XSD) is a good way to make it easier for inconsistencies to remain undetected. Integrating fragments of the formalism with the prose describing those constructs (as is done with the BNF in the XML spec, or as is done in any good literate program) makes it easier to detect and remove inconsistencies. It also produced pressure on the working group to use a legible formalism; the XML form of XSD might be more readable if the working group had forced itself to read it more often instead of hiding it in the appendix to the spec.
Integrating the formalism into the text flow of the spec itself can work only if the members of the working group are willing to learn and use the formalism. That’s why I attach such importance to finding and using simple notations and light-weight methods.
But this long post is just more speculation on my part. You are right that examples are needed. Someday, I hope to oblige.