[begun 11 August 2008, finished 19 August 2008]
At the versioning symposium the day before the Balisage 2008 conference, my colleague Sandro Hawke talked about a method he has developed for allowing a core language to be extended in a variety of ways while still retaining reasonably good interoperability. The details of his implementation are interesting, but what strikes me most forcibly is the underlying model of interoperability, which seems to me to bring some very useful new ideas to bear on the problem. There may be — I mean there must be — situations where these new ideas don’t help. It’s not actually clear to me how best to characterize the situations where they do help, and the situations where they don’t, but it does seem to me clear that in the situations where they do help, they can help a lot.
It took me a bit of effort to grasp the basic idea, and in fact I didn’t really get it until my evil twin Enrique started to explain it to me. The summary of the basic ideas that follows is Enrique’s, not Sandro’s, and Sandro should not be held responsible for it.
1 Start at the beginning.
“It’s a very good place to start,” I murmured. “What was that?” “Nothing.” “One more Sound of music reference out of you and this explanation is over, got that?”
It’s obvious, and thus well understood by at least some people, that interchange is easy when everyone involves is speaking exactly the same language.
“The Lord said, ‘If as one people speaking one language they have begun to do this, then nothing they plan to do will be impossible for them.’“ “And right he was. Sometimes Yahweh really knows what he’s talking about.”
2 But we have a lot of situations where we don’t want to, or can’t, speak exactly the same language.
3 In some cases, with unrelated languages, the only option seems to be translation, or non-communication.
“Non-communication?” I asked. “Always a popular option,” said Enrique. “Popular?” “Or at least, a common result. I assume it must be popular, given how many normally intelligent working group members engage in it. Do you really think that in the average working group argument, people are speaking the same language? Or even trying to?” “Well, of course they are. People working together in good faith …” “Good faith? working together? When was the last time you actually listened to what people were saying in a working group meeting?” “Don’t say that where my Working Group chairs can hear you, OK?” “You think they don’t know? Stick around, kid. Your naïvete is refreshing.”
4 An important case: We have a lot of situations where we want to support multiple versions / variants of the same vocabulary: a sequence of version 1, version 2, and version 3 is a simple example.
Another example is the definition of a core architecture and various competing (or complementary) extensions. For example, in the context of the W3C Rule Interchange Format, Sandro’s slide imagines a Core RIF dialect, extended separately by adding actions (to create Production Rule Dialect) and by adding equality and functions (to create a Basic Logic Dialect), the Basic Logic Dialect in turn being extended by a negation-as-failure feature and a classical-negation feature (producing, respectively, a logic-programming dialect and a version of first order logic).
“Is it an important pattern here,” I asked, “that we are talking about different dialects with overlapping expressive power?” “Possibly. What do you mean?” “The examples you mention seem to involve different dialects with semantics or functionality in common, or partially in common: you can approximate negation-as-failure with classical negation, or vice versa, if you translate direct from the one dialect to the other. If you had to translate them down into a common core language without any negation at all, you wouldn’t be able to approximate them nearly so well.” “You’re getting ahead of yourself here, but yes, that’s true of the examples so far.”
Another example is the definition of a central reference architecture and various extensions or specializations.
“You mean like HL7 and different specializations of it for general practitioners and for hospitals and other health-care institutions?” “Possibly.”
5 A common approach to such situations is a kind of hub and spoke model: specify an interchange language, an interlingua. Everybody translates into it, everybody translates out of it. So you have a 2×n problem, not an n-squared problem.
6 But in some situations, e.g. core language + multiple independent extensions, the only feasible interlingua is the core language without extensions. (Example: consider any attempt to write portable C, or portable SQL. Basic rule: stay away from vendor extensions at all costs, even if they would be a lot of help. Secondary rule: use them, but put ifdef
s and so on around them.)
I.e. the interlingua is often not as expressive as the specific variants and extensions people are using. (That’s why they’re using the variants and not the interlingua.)
Put yet another way: going from the variants to the interlingua can be lossy, and often is.
7 Two things are wrong / can be wrong with an interlingua like that (or ANY interlingua solution), especially when a round trip through the interlingua produces something less useful or idiomatic than the original. (This is not logically necessary, it just almost always seems to be the case.)
Well, three.
(a) If two people (say, Ana Alicia and Ignacio) are doing blind interchange, and using the same extensions, it seems dumb for Ana Alicia to filter out all the extensions she uses, and for Ignacio not to get them. They would not have hurt him, they would have been helpful to him. The same applies if Ana and Ignacio are using similar but distinct extensions: if it’s possible to translate Ana’s extensions into a form that Ignacio can understand, that may be better than losing them entirely.
But that doesn’t work well if the interlingua can’t handle Ana Alicia’s extensions.
Sandro gave, among others, the example of the blink tag. The rules of HTML say that a browser that doesn’t understand the blink element should ignore its start- and end-tags. That makes the defined tags a sort of interlingua, and prescribes a simple (brutally simple, and also brutal) translation into the interlingua. But for text that should ideally be presented as blinking, it would be better to fall back to strong or some other form of strong highlighting, rather than rendering the contents using the same style as the surrounding text.
(b) Having extensions Ignacio doesn’t understand may or may not be a problem, depending on what Ignacio is planning on doing with the data.
(c) Even more important than (b): the best lossy translation to use may vary with Ignacio’s use of the data.
If one translation preserves the logical semantics of a set of logical rules (e.g. in RIF), but makes the rules humanly illegible, and another preserves human legibility but damages the logical content, then it matters whether Ignacio is running an inference engine or just displaying rules to users without doing any inference.
8 One advantage of direct translation pairs (the n-squqred approach instead of 2×n with an interlingua) is that the output is often higher quality. This is true both for artificial and for natural languages.
“Really?” I asked. “Sure,” said Enrique. “Look at the secret history of any of the European machine-translation projects. Even the ones based on having an interlingua ended up with annotations that say in effect ‘if you’re going from Spanish to Italian, do this; if from Spanish to German, do that.’’ “Where did you read that?” “Somewhere. No, I can’t cite a source. But I bet it’s true.”
But when we don’t know in advance what pairs are needed? and how many people are going to be playing? And thus when we cannot prepare all n-squared translation pairs in advance? What do we do then?
9 Could we manage a system where the pairwise translation is selected dynamically, based on simpler parts (so the total cost is not quadratic but less)?
10 Yes. Here’s how, in principle.
(a) Identify one or more base interlingua notations.
“That’s got to be a centralized function, surely?” I asked. “Probably; if it’s not centralized you’re unlikely to reach critical mass. But when you’re thinking of this technique as a way to provide for the extensibility of a particular language, then it’s not likely to be an issue in practice, is it? That language itself will be the candidate interlingua.” “Then why say ‘one more more’?” I asked. “Because in principle you could have more than one. Some widely deployed language in the application domain might be a second interlingua.”
(b) Provide names for various extensions / variants / versions of the base notation(s).
This means you can describe any end point in the process as accepting a particular base language with zero or more named extensions.
“Does this need to be centralized?” I asked. “Not at all; just use normal procedures to avoid name conflicts between extensions named separately.” “You mean, use URIs?” “Yes, use URIs. Sometimes you’re not as dumb as you look, you know?”
(c) Identify well known purposes for the use of the data (display to user, logical reasoning, etc.). This can if necessary be done in a distributed way, although it’s probably most effective if at least the basic purposes are agreed upon and named up front.
(d) For each extension, say how (and whether) it can be translated into some other target language — whether the target is one of the interlinguas identified in step (a) or is an interlingua plus a particular set of extensions.
Define the translation. And quantify (somehow, probably using some subjective quality measure) how lossy it is, how good the output is, for each of the specific well known purposes.
“This,” said Enrique, “ is why it’s better to centralize the naming of well known purposes. If you don’t have an agreed upon set of purposes, you end up with spotty, incomplete characterizations of various translations.” “And how is the translation to be specified?” I asked. “Any way you like. XTAN and GRDDL both use XSLT, but at this level of abstraction that’s an implementation detail. I take back what I said about your looks.”
(e) Now we have a graph of language variants, each defined as an interlingua plus extensions. Each node is a language variant. Each arc is a translation from one variant to another. Each arc has a cost associated with each well known purpose.
(f) To get from Ana Alicia’s language variant to Ignacio’s language variant, all Ignacio has to do (or Ana Alicia — whoever is taking charge of the translation) is find a path from Ana’s node in the graph
“The node representing Ana Alicia’s dialect, you mean.” “Pedant! Yes, of course that’s what I mean.”
to Ignacio’s. It need not be direct; it may involve several arcs.
(g) If there’s more than one path, Ignacio (let’s assume he is doing the translation) can choose the path that produces the highest quality output for his purposes.
Finding the optimal path through a weighted graph is not without its exciting points —
“You mean it’s hard” “Well, of course I do. What did you think exciting meant in this context?! But only when the graph is large.”
— not (I was saying) without its exciting points, but it’s also a reasonably well understood problem, and perfectly tractable in a small graph of the kind you’re going to have in applications of this technique. Sandro’s implementation uses logical rules here to good effect.
(h) If there’s only one path, Ignacio can decide whether the output is going to do him any good, or if he should just bag the idea of reusing Ana Alicia’s data.
11 The description above is how it works in principle. But can it actually be implemented? Yes.
But for details, you’re going to have to look at Sandro’s talk on XTAN, or on other writeups he has done (cited in the talk).