An XForms thought experiment: expression languages

[19 July 2012]

Consider the following application scenario. We are building an XML application for an XML vocabulary that includes XML representations of arithmetic expressions. Expressed as an extended BNF grammar, the structure of the expressions would look something like this:

expression = term { addition-op term }

term = factor { multiplication-op factor}

factor = num | variable-reference | function-call

variable-reference = ‘$’ name

function-call = name ‘(‘ expression {‘,’ expression} ‘)’

addition-op = ‘+’ | ‘-‘

multiplication-op = ‘*’ | ‘/’ | ‘div’ | ‘mod’

The basic alphabet of the grammar includes name, number, and the various quoted strings in the grammar.

Our XML representation will have elements named num, var, funcall, sum, diff, product, quotient, iquotient, and modulo, which will nest in the natural way. The arithmetic expression (4 + max($line)) mod 3 will be encoded

<modulo>
<sum>
<num>4</num>
<funcall f="max"><var>line</var></funcall>
</sum>
<num>3</num>
</modulo>

We would like to provide a convenient interface, using XForms, to allow users to create or modify arithmetic expressions.

Now we are ready for the thought experiment, which consists of asking these questions:

1 Are there any really good ways to do this today (in XForms 1.1, or with extensions in existing XForms implementations)?

2 What are the possible ways of doing this today? This question is not restricted to techniques we might regard as really good — here we’ll settle for technically feasible. Extra credit for sound analysis of the pros and cons of the various techniques, and for pointers to examples to illustrate them.

3 What changes to XForms might allow implementations of a future version of the spec to handle this class of problem (more) easily?

I think sometimes discussions of XForms conflate the issue of variable-depth recursive structures with the issue of mixed-content support; when I think about use cases like this one, they seem to me quite different. (Concretely: for mixed-content editing, people often propose one rich-text-editor widget or another, but I don’t think a rich-text editor would be likely to help much here.)

To count as really good, a solution ought to be applicable to other similar expression languages with similar structural properties:

  • well-formed formulae of symbolic logic (sentential logic provides a simple case, first-order predicate calculus a more complex case)
  • grammars (in various notations: BNF, EBNF, content models in XML schema languages)
  • XPath expressions
  • CSS selector expressions

If a solution can handle XML representations of programming-language constructs, that would be good, too.