A new look / Death to spies

[6 December 2012]

This blog has a new look, courtesy of the crackers who broke in and rendered it necessary to delete the blog, reinstall WordPress from trusted media, re-configure things, change passwords, and re-import the posts and comments. The old theme wasn’t part of the new installation, and while I kind of liked it, I didn’t like it enough to spend any time trying to retrieve it. The old theme was the then-current default theme for WordPress blogs; the new theme is (again) the default for WordPress blogs.

[“Are you kidding me? You can’t even be bothered to change the _____ theme?” hissed my evil twin Enrique at this point. “I did look at alternatives. Really. I just happened to like the default pretty well. It’s not like I’m lazy.” “Not just that you’re lazy, I think you mean?” sighed Enrique. “Oh, hush,” I said.]

The new theme seems to want a header image; I am grateful to Flickr for providing search qualifiers that allow one to search only for photos licensed under Creative Commons and allowing commercial use. The image above is drawn from a photo published on Flickr under the name Glass Bead Game by the photographer Darren Kirby of Edmonton, Alberta, to whom thanks (and an acknowledgement in the footer).

[“Wow,” said Enrique. “His blog photo makes him look way too young to be a reader of Hermann Hesse. Has there been a Hesse resurgence while I wasn’t looking?” “Not sure, but I think I heard that in fact there has been. It’s not the 80s anymore, Enrique.” “Are you sure? The House majority does not agree with you. And haven’t you heard anything about higher education in the UK lately? Sure sounds like Mrs. Thatcher’s Britain to me!” “Oh, be quiet, and leave politics out of this.”]

On the minus side, the links to other blogs have been lost, at least for the moment (I do have a list; it’s just a matter of typing them in again). And of course, I’ve lost a few days’ work (and counting), cleaning up after the Viagra hawkers.

On the plus side, I now have nicer blog backup utilities and more convenient tools for intrusion detection (on the theory that making them more convenient is a good way to increase the likelihood of their being used regularly). It turns out that when a web site is just a checked-out working copy of a Subversion repository which gets updated automatically when things are checked into the repository using a commit hook (as this Web site is), then just logging in to the server and running svn status gives you a nice list of things that have been placed on the server by intruders coming in the back door. So for those who do manage their sites this way, and who have never managed to get around to installing tripwire or similar tools, a local shell script reading, in its entirety, ssh hostname 'for d in *.com ; do (echo; cd $d; pwd; svn status); done' may be able to serve as at least a poor, partial substitute.

Still, while it’s almost always nice to learn new things, there are other things I’d rather have spent the last few days on. I find I have new sympathy for the motto Death to spies.

Another XForms thought experiment: mixed-content editing

[23 July 2012]

The other day, I posted a kind of design challenge for XForms: what is the best way to provide an editing interface for tightly constrained, arbitrarily recursive structures like those found in arithmetic or logical expressions or in languages like XPath or programming languages?

Another topic comes up from time to time as a challenge for XForms interfaces: mixed content.

One view, which seems to be reasonably widely held, is that the best way to handle mixed content in XForms is to provide some sort of ‘rich-text’ editor like the tools provided by some libraries for editing HTML documents in the browser. (I put scare quotes around ‘rich text’ in that phrase because HTML-encoded text doesn’t seem particularly rich by some standards.) Many (all?) of the major XForms implementations have rich-text editors of one kind or another tha work this way.

I think such widgets work well for some applications. But they are almost never what I think I want when I think about handling mixed content in applications I’d like to build with XForms. I think I can distinguish two cases, or classes of cases.

The first case is similar to that handled by the existing rich-text widgets: allow the user to edit a paragraph or a series of paragraphs, and to mark phrases within the paragraphs with phrase-level markup. In this use case, I think I want several things (some of which may be achievable with current extension widgets, with a little work or a lot):

  • I’d like to be able to specify, for each individual instantiation of the control, which phrase-level elements are allowed. (E.g.: bold and italic here, italic-only over there.)
  • I’d like to be able to specify elements in vocabularies other than HTML. (E.g. TEI, TEI Lite, DocBook, JATS, …)
  • I’d like the editing widget to allow crystals as sub-structures. (By crystals I mean sub-elements with fixed internal structure; they may float in a sort of text soup, but they retain their own internal structure while doing so. In XHTML, lists [ordered, unordered, and definition-] are a simple example.)
  • If the editor automatically interprets hard returns as marking paragraph boundaries (as some do), I’d like to be able to specify what the paragraph element is called. And ideally I’d like to allow more than one such top-level element, so the user can use the widget to create a sequence of (for example) p, div, or ab elements.

In the second case (or class of cases), I want to be able to display mixed-content elements in a normal read-only way, with flowed text and font shifts and so on, and I want to be able to allow the user to edit selected aspects of the material, but not all. For example:

  • In some cases I’d want the user to be able to edit the key attribute on the person and place elements in the text, but I would like it to be impossible for the user to change the text of the paragraph.
  • In some cases I’d like to allow the user to change element types (changing a person element to place, or vice versa), and in other cases not.
  • Often I’d like to allow the user to select a contiguous sequence of characters in the paragraph and say that they should be tagged as person or place.
  • And then there is the scenario Henry Thompson once used to introduce me to the idea of padded-cell editors: we are preparing a corpus for linguistic research and we have a provisional segmentation of the text into sentences (or sentence-like objects). We want an interface that will allow a human being to review the segmentation and correct it. They should be able to open a document, split existing s elements, join adjacent s elements, save, or quit without saving.

The same questions apply here as for the earlier thought experiment:

1 Are there any really good ways to implement interfaces of this class today (in XForms 1.1, or with extensions in existing XForms implementations)?

2 What are the possible ways of doing this kind of thing (or: any of these kinds of things) today? (In the absence of a really good to do it, any technically feasible way is worth knowing about; it’s better than nothing.) Extra credit, as always, for sound analysis of the pros and cons and for pointers to examples to illustrate the techniques.

3 What changes to XForms might allow implementations of a future version of the spec to handle this class of problem (more) easily?

An XForms thought experiment: expression languages

[19 July 2012]

Consider the following application scenario. We are building an XML application for an XML vocabulary that includes XML representations of arithmetic expressions. Expressed as an extended BNF grammar, the structure of the expressions would look something like this:

expression = term { addition-op term }

term = factor { multiplication-op factor}

factor = num | variable-reference | function-call

variable-reference = ‘$’ name

function-call = name ‘(‘ expression {‘,’ expression} ‘)’

addition-op = ‘+’ | ‘-‘

multiplication-op = ‘*’ | ‘/’ | ‘div’ | ‘mod’

The basic alphabet of the grammar includes name, number, and the various quoted strings in the grammar.

Our XML representation will have elements named num, var, funcall, sum, diff, product, quotient, iquotient, and modulo, which will nest in the natural way. The arithmetic expression (4 + max($line)) mod 3 will be encoded

<funcall f="max"><var>line</var></funcall>

We would like to provide a convenient interface, using XForms, to allow users to create or modify arithmetic expressions.

Now we are ready for the thought experiment, which consists of asking these questions:

1 Are there any really good ways to do this today (in XForms 1.1, or with extensions in existing XForms implementations)?

2 What are the possible ways of doing this today? This question is not restricted to techniques we might regard as really good — here we’ll settle for technically feasible. Extra credit for sound analysis of the pros and cons of the various techniques, and for pointers to examples to illustrate them.

3 What changes to XForms might allow implementations of a future version of the spec to handle this class of problem (more) easily?

I think sometimes discussions of XForms conflate the issue of variable-depth recursive structures with the issue of mixed-content support; when I think about use cases like this one, they seem to me quite different. (Concretely: for mixed-content editing, people often propose one rich-text-editor widget or another, but I don’t think a rich-text editor would be likely to help much here.)

To count as really good, a solution ought to be applicable to other similar expression languages with similar structural properties:

  • well-formed formulae of symbolic logic (sentential logic provides a simple case, first-order predicate calculus a more complex case)
  • grammars (in various notations: BNF, EBNF, content models in XML schema languages)
  • XPath expressions
  • CSS selector expressions

If a solution can handle XML representations of programming-language constructs, that would be good, too.

Regular approximations (again)

[9 April 2012]

For a while it has seemed to me interesting and potentially useful that any context-free language L can be approximated by regular languages which accept either a subset or a superset of L. (See earlier posts from 2008 and 2009.)

Apart from the intrinsic interest of the fact, this means that it’s possible in principle to define XSD simple types using those regular approximation languages, which in turn means it’s possible to use XSD validation to catch at least some syntactic errors in strings which should be members of L.

Some time ago, I had occasion to translate the ABNF of RFC 3986 and RFC 3987 into XSD regular expressions (possible in that case without an approximation, since the languages defined by those to RFCs are all regular). The mechanism I used was simple: each ABNF production turned into an entity declaration, and each non-terminal on the right-hand side of a rule turned into an entity reference.

Today I was reviewing that work and it occurred to me that if we could find a regular approximation by algebraic manipulation of the grammar (without any detour through finite-state automata), it would make it much simpler to write the necessary XSD patterns.

But how?

Given a context-free grammar for L, can we identify algebraic rules which would allow us to derive a regular grammar for a regular approximation to L?