An XForms thought experiment: expression languages

[19 July 2012]

Consider the following application scenario. We are building an XML application for an XML vocabulary that includes XML representations of arithmetic expressions. Expressed as an extended BNF grammar, the structure of the expressions would look something like this:

expression = term { addition-op term }

term = factor { multiplication-op factor}

factor = num | variable-reference | function-call

variable-reference = ‘$’ name

function-call = name ‘(‘ expression {‘,’ expression} ‘)’

addition-op = ‘+’ | ‘-‘

multiplication-op = ‘*’ | ‘/’ | ‘div’ | ‘mod’

The basic alphabet of the grammar includes name, number, and the various quoted strings in the grammar.

Our XML representation will have elements named num, var, funcall, sum, diff, product, quotient, iquotient, and modulo, which will nest in the natural way. The arithmetic expression (4 + max($line)) mod 3 will be encoded

<modulo>
<sum>
<num>4</num>
<funcall f="max"><var>line</var></funcall>
</sum>
<num>3</num>
</modulo>

We would like to provide a convenient interface, using XForms, to allow users to create or modify arithmetic expressions.

Now we are ready for the thought experiment, which consists of asking these questions:

1 Are there any really good ways to do this today (in XForms 1.1, or with extensions in existing XForms implementations)?

2 What are the possible ways of doing this today? This question is not restricted to techniques we might regard as really good — here we’ll settle for technically feasible. Extra credit for sound analysis of the pros and cons of the various techniques, and for pointers to examples to illustrate them.

3 What changes to XForms might allow implementations of a future version of the spec to handle this class of problem (more) easily?

I think sometimes discussions of XForms conflate the issue of variable-depth recursive structures with the issue of mixed-content support; when I think about use cases like this one, they seem to me quite different. (Concretely: for mixed-content editing, people often propose one rich-text-editor widget or another, but I don’t think a rich-text editor would be likely to help much here.)

To count as really good, a solution ought to be applicable to other similar expression languages with similar structural properties:

  • well-formed formulae of symbolic logic (sentential logic provides a simple case, first-order predicate calculus a more complex case)
  • grammars (in various notations: BNF, EBNF, content models in XML schema languages)
  • XPath expressions
  • CSS selector expressions

If a solution can handle XML representations of programming-language constructs, that would be good, too.

Regular approximations (again)

[9 April 2012]

For a while it has seemed to me interesting and potentially useful that any context-free language L can be approximated by regular languages which accept either a subset or a superset of L. (See earlier posts from 2008 and 2009.)

Apart from the intrinsic interest of the fact, this means that it’s possible in principle to define XSD simple types using those regular approximation languages, which in turn means it’s possible to use XSD validation to catch at least some syntactic errors in strings which should be members of L.

Some time ago, I had occasion to translate the ABNF of RFC 3986 and RFC 3987 into XSD regular expressions (possible in that case without an approximation, since the languages defined by those to RFCs are all regular). The mechanism I used was simple: each ABNF production turned into an entity declaration, and each non-terminal on the right-hand side of a rule turned into an entity reference.

Today I was reviewing that work and it occurred to me that if we could find a regular approximation by algebraic manipulation of the grammar (without any detour through finite-state automata), it would make it much simpler to write the necessary XSD patterns.

But how?

Given a context-free grammar for L, can we identify algebraic rules which would allow us to derive a regular grammar for a regular approximation to L?

Team work, specialization, fact-checking

[26 January 2012]

This week’s New Yorker has an interesting essay on brainstorming (doesn’t work, it says). It brought my evil twin Enrique running, waving his copy in the air. “Look at this. Look at this!” he shouted.

I looked at the passage he pointed out. Pursuing the observation that “like it or not, human creativity has increasingly become a group process”, the author quotes one Ben Jones, a professor at the Kellogg School of Management at Northwestern University, who has quantified the trend away from solo work and towards work in teams.

“‘A hundred years ago, the Wright brothers could build an airplane all by themselves,’ Jones says. ‘Now Boeing needs hundreds of engineers just to design and produce the engines.’”

“Well,” I said to Enrique, “no question that teams are bigger today.” “But …” he spluttered. “But what?” I said. “But Boeing doesn’t make engines.” “They don’t?” (I love to play dumb; it drives Enrique speechless with frustration. But he seems to be right. If I’m reading their Web site correctly, Boeing hasn’t manufactured an engine since 1968, and those weren’t aircraft engines in any case.) “But what makes the airplane go, then?” “GE makes engines,” Enrique snarled. “Rolls-Royce makes engines. Pratt and Whitney makes engines. Boeing makes airframes” (along with many other things, I hasten to add, none of them engines). “How can someone be interested in specialization and not know that?”

Didn’t the New Yorker use to have a fact-checking department?

Copyright and other unwelcome issues

[10 November 2011]

One of the unwelcome side effects of recent trends in copyright (I mean the gradual shift, over the last fifty years, towards more and more protection for commercial interests and less and less protection of the public benefit) is that while it used to be easy to make one’s own work readily available for reuse by others, it now requires more careful planning. It used to be, for example, that if you didn’t care to claim or protect copyright in something you wrote, all you had to do was nothing: if you didn’t claim copyright, and the work was public, then it was in the public domain.

[“Hmm. How sure of you are that?” asked my evil twin Enrique, with a suspicious look. “Well, kind of sort of sure, I think.” “Better add a disclaimer, then, don’t you think?” “OK, right you are.”]

At least, that’s how I understand it, at a first approximation. (I am not a lawyer and have never much wanted to be, though a friend of mine who did go to law school once told me I’d enjoy the mysteries and mystique of tax law. So no reader should take anything I write as providing guidance about the law of the U.S. or any other country.) If you want good information about copyright, go find something by Pamela Samuelson.

[“OK, that’ll do, I guess. Is Pamela Samuelson really a good source?” “The best. Thank heavens she’s writing that column for Communications of the ACM again.”]

Nowadays, of course, in the U.S. this is no longer true: from the moment you write anything down, you own copyright in it, whether you want it or not, unless you do something to avoid it.

Of course, many people ignore this and behave as if the old legal regime were still in place. I’ve had representatives of U.S. universities say “Oh, feel free to reuse that stylesheet we wrote” — as if, because it carried no copyright statement, it were available for reuse by anyone interested. On the contrary! Since the stylesheet didn’t carry any licensing information or dedication to the public domain, it was certainly copyright either by the individual who wrote it or by the institution for which it was written. And since it didn’t carry any copyright information, it was impossible to know with any confidence who actually did (or does) own the copyright, and whom to contact for permission.

[“And these people who were trying to give you permission to reuse that stylesheet, were they legally empowered to enter into binding agreements on behalf of their institutions?” Enrique asked. “Dunno. I doubt it.” “So, tell me, do you have a barge pole handy?” “A barge pole? No, why?” “Because I want to warn you not to touch that code with a barge pole, that’s why. And if you don’t have a barge pole, it just feels kind of pointless. Do you have an eleven-foot pole, maybe?” “Oh, hush.”]

End result: I politely ignored (at least, I hope my silence was polite) their invitation to reuse that code, and I wrote new code from scratch.

[“Oh, come on,” Enrique hissed. “You know perfectly well you wouldn’t have reused that code anyway! It was full of xsl:for-each elements.” (New readers may need to be informed that I seem to have an issue with xsl:for-each elements; I’m sure it’s a perfectly fine construct and there’s nothing wrong with it. I only know that when I have a stylesheet with a bug and discover that it has for-each constructs, rewriting the for-each as an apply-templates always seems to make the bug disappear. Go figure.) “Well, yeah. But even if I had loved the code, I would not have felt able to reuse it.”]

Of course, there are plenty of open-source and Creative Commons licenses to choose from, if you want to ensure that work you do can be re-used.

But who, in a collaborative project, is “you”?

If you write code or prose as an individual, outside the course and scope of your normal employment duties, then it’s straightforward to assert copyright in your own name. But if you are collaborating with others in a project, and you want to apply an appropriate license, in whose name should copyright be claimed? If only one person works on a given item (a program or a document) it’s easy to say that person should assert copyright and grant the license. But if more than one person works on it?

Some people incline to claim copyright in the name of the project, which feels plausible at some level: project is a name we sometimes give to the intentional collaboration of individuals to achieve some goal, and work done in furtherance of that goal can plausibly said to be done for “the project”.

But can a project which is not a legal entity actually be the owner of a copyright? If there’s a legal entity involved, it’s possible in principle to figure out, in case of disputes, who speaks for the entity and who makes decisions. But if there’s no legal entity?

Can copyright usefully be claimed by a research project, in the name of the research project?

[“Well, wouldn’t a research project be legally a form of partnership?” asked Enrique. “A partnership doesn’t have to be incorporated to be a legal person, right?” “Maybe,” I equivocated. “But remember, I am not a lawyer. And a fortiori you, as a figment of my imagination, are also not a lawyer.” “Oh, go soak your head. Whom are you calling a figment … ?”]

I notice that W3C, for example, which is not a legal entity, claims copyright in the name of W3C, but immediately after adds, in parentheses, the names of the three host institutions of W3C, which are legal entities.

It would be nice, wouldn’t it, if intellectual property rights served to promote the useful arts and sciences, instead of being an unproductive drain on the time and effort of creative people and a barrier to normal intellectual work? Oh, well, maybe someday.