Zero-knowledge software verification?

[28 August 2018]

Volkswagen is in the news again today, in connection with their engine software that defeated pollution tests by detecting the test situation and changing the engine’s running characteristics.

One might wonder (at least I have wondered) why regulatory agencies don’t require that software for such applications come with a proof of correctness. One obvious answer is that many people think (wrongly) that such proofs are impossible for complicated software, or think (rightly) that if not impossible they are at least rather difficult and probably expensive. But are they more expensive than the recalls and the fines and the lasting hit to the company’s reputation? No, I suspect proofs of correctness would have been cheap by comparison.

Another obvious answer is that the software is almost certainly chock-full of proprietary information, trade secrets, etc., which a proof of correctness would almost certainly end up revealing to anyone who read and understood it. To protect the proprietary information, the proof would have to remain confidential. But then skeptical outsiders would have no way of checking that the proof is valid. There would be a certain inevitable temptation to cheat on the proof, or even to claim that a proof exists when none does. How might one go about persuading outsiders that one has not succumbed to that temptation?

Could zero-knowledge proofs be used to allow outsiders to verify that a formal proof exists showing that a particular piece of software has specified properties, without exposing unwanted information about the software?

Perhaps this amounts to the question: Does the problem of showing that a particular property holds of a piece of software (e.g. that it doesn’t turn on pollution controls only under test conditions) fall into the class NP-Complete? Or is there an NP-Complete problem we could use as a substitute for the informal demand to see that there is a proof for this particular version of this particular manufacturer’s engine control software?

Sequence as source of identity

[19 February 2013]

For reasons not worth going into (it’s a long story) I was once involved in a discussion of how to tell, given a sequence of objects of a given kind and two descriptions (or variable references) denoting items X and Y in the sequence, whether X and Y are identical or distinct. (This topic came up during a discussion of whether we needed clearer notions of identity conditions for this class of objects.)

One of my interlocutors suggested that we did not need to specify any identity conditions at all. Since X and Y were presented to us in a sequence, we could always tell whether X and Y are the same or different by looking at their position in the sequence. If X and Y are both at the same position in the sequence, then X must be identical to Y, and if X and Y are at different positions in the sequence, then X and Y must be distinct from each other.

Now, the first claim (if X and Y are at the same position in the sequence, then X = Y) is obviously true. And the second (if X and Y are at different positions in the sequence, then X ≠ Y) is manifestly false, though sadly my interlocutor never stood still long enough for me to point this out. Let X be “the Fibonacci number F1” and Y be “the Fibonacci number F2“. These occur at different positions (1 and 2) in the Fibonacci series, but both expressions denote the integer 1. So we cannot safely infer, from the fact that X and Y identify things at different positions in a sequence, that X and Y identify different things.

As I discovered the other day, Frege turns out to have a word to say on this topic, too. I wish I had had this quotation ready to hand during that discussion.

… die Stelle in der Reihe kann nicht der Grund des Unterscheidens der Gegenstände sein, weil diese schon irgendworan unterschieden sein müssen, um in einer Reihe geordnet werden zu können.

… their positions in the series cannot be the basis on which we distinguish the objects, since they must already have been distinguished somehow or other, for us to have been able to arrange them in a series.

(Gottlob Frege, Die Grundlagen der Arithmetic, 1884; rpt Stuttgart: Reclam, 1987; tr. by J. L. Austin as The Foundations of arithmetic 1950, rpt. Evanston, Illinois: Northwestern University Press, 1980, § 42.)

Regular approximations (again)

[9 April 2012]

For a while it has seemed to me interesting and potentially useful that any context-free language L can be approximated by regular languages which accept either a subset or a superset of L. (See earlier posts from 2008 and 2009.)

Apart from the intrinsic interest of the fact, this means that it’s possible in principle to define XSD simple types using those regular approximation languages, which in turn means it’s possible to use XSD validation to catch at least some syntactic errors in strings which should be members of L.

Some time ago, I had occasion to translate the ABNF of RFC 3986 and RFC 3987 into XSD regular expressions (possible in that case without an approximation, since the languages defined by those to RFCs are all regular). The mechanism I used was simple: each ABNF production turned into an entity declaration, and each non-terminal on the right-hand side of a rule turned into an entity reference.

Today I was reviewing that work and it occurred to me that if we could find a regular approximation by algebraic manipulation of the grammar (without any detour through finite-state automata), it would make it much simpler to write the necessary XSD patterns.

But how?

Given a context-free grammar for L, can we identify algebraic rules which would allow us to derive a regular grammar for a regular approximation to L?

Automatic theorem provers and proof trees

[13 December 2010]

When I took an introductory course in symbolic logic, all those many years ago, we used a textbook (Richard C. Jeffrey, Formal logic: its scope and limits [New York: McGraw-Hill, 1967], in case you’re curious) which presented a proof method based on proof trees, which had the nice property that for valid inferences it’s guaranteed to terminate, and that for invalid inferences it will never terminate with a false positive. Allen Renear informs me that the locus classicus for proof trees is Raymond M. Smullyan, First-order logic (2d ed. New York: Dover, 1995), which I have been reading lately with pleasure.

All the theorem provers I’ve read about, however, seem to require more or less active participation and guidance from the user; like the proof-tree method, they produce a proof only when a proof exists, but unlike the proof-tree method they aren’t guaranteed to find a proof if one exists.

So I’ve been wondering: why aren’t there automatic theorem provers based on the proof-tree method?

Or are there?

There must be fifty ways …

[7 April 2010]

The old Paul Simon song “There must be fifty ways to leave your lover” keeps running through my head. I can see close to fifty ways to define the XPath 1.0 data model in terms of (a) a set of nodes and (b) two relations defined on that set, which are taken as primitive; all other relations (i.e. all the other axes of XPath) are defined in terms of those two primitive relations.

Strictly speaking, I make it forty-eight ways. First, pick any single relation from any of the following four groups:

  1. parent, child, ancestor, descendant
  2. prevsib, nextsib, preceding-sibling, following-sibling
  3. prevnode, nextnode, document-order preceding (>>), document-order-following (<<)

That’s twelve possibilities.

Second, pick any single relation from either of the other two groups; that makes eight possible choices, times twelve first choices, for ninety-six ordered pairs of relations. But the order doesn’t matter, so we have forty-eight distinct pairs.

In recent days, taking some pairs not quite at random, defining the constraints they must satisfy in order to be a suitable basis for defining an XPath 1.0 tree, and defining all the other relations in terms of the chosen primitives, I have learned a couple of mildly interesting things.

  • It’s more convenient to take parent as a primitive, than child.
  • It’s more convenient to take one of the single-step relations (parent, child, nextsib, prevsib, nextnode, prevnode) than one of their transitive closures (ancestor, descendant, etc.).

The nextsib relation, for example, needs to be acyclic, functional, injective, and not transitive. If it is, then its transitive closure following-sibling will automatically be suitable. But if you start with following-sibling and specify (as you will need to) that it is transitive and acyclic, that does not suffice to guarantee that its transitive reduction nextsib is functional and injective. You can of course simply say that a following-sibling relation is suitable if and only if (a) it’s transitive, (b) it’s acyclic, and (c) its transitive reduction is functional and injective, but now you’re forcing the reader to work with two relations, not just one: both following-sibling and its transitive reduction. It would be interesting either to find a way to constrain the closure directly to ensure the necessary properties in the reduction, or else to find a proof that there is no way to constrain a closure to ensure that its reduction is functional and injective without explicitly referring to the reduction.

  • From any relation in any group, the other relations in that group are (relatively) easy to derive in terms of inversion, transitive closure, or transitive reduction. Defining a relation in the third group typically proves more interesting. And while it’s more convenient to choose the primitive relations from among the reductions, it turns out that at least in some cases it’s easiest to define the third group in terrms of one of the closures. For example, given the parent and next-sibling relations, it proves easier to define document-order-following in terms of the primitives than to define next-node.

It occurs to me to wonder whether there are ways to define XPath 1.0 trees that don’t reduce to or include one of these forty-eight.