[22 July 2008; minor revisions 23 July]
Some colleagues and I spent time not long ago discussing the proposition that RDF has intrinsic semantics in a way that XML does not. My view, influenced by some long-ago thoughts about RDF, was that there is no serious difference between RDF and XML here: from interesting semantics we learn things about the real world, and neither the RDF spec nor the XML spec provides any particular set of semantic primitives for talking about the world. The maker of the vocabulary can (I oversimplify slightly, complexification below) make terms mean pretty much anything they want: this is critical both to XML and to RDF. The only way, looking at an RDF graph or the markup in an XML document, to know whether it is talking about the gross national product or the correct way to make adobe, is to look at the documentation. This analysis, of course, is based on interpreting the propositition we were discussing in a particular way, as claiming that in some way you know more about what an RDF graph is saying than you know about what an SGML or XML document is saying, without the need for human intervention. Such a claim does not seem plausible to me, but it is certainly what I have understood some of my RDF-enthusiast friends to have been saying over the years.
(I should point out that if one understands the vocabulary used to define classes and subclasses in the RDF graph, of course, the chances of hitting upon useful documentation are somewhat increased. If you don’t know what vug means, but know that it is a subclass of cavity, which in turn is (let’s say) a subclass of the class of geological formations, then even if vug is otherwise inadequately documented you may have a chance of understanding, sort of, kind of, what’s going on in the part of the RDF graph that mentions vugs. I was about to say that this means one’s chances of finding useful documentation may be better with RDF than with naked XML, but my evil twin Enrique points out that the same point applies if you understand the notation used to define superclass/subclass relations [or, as they are more usually called, supertype/subtype relations] in XSD [the XML Schema Definition Language]. He’s right, so the ability to find documentation for sub- and superclasses doesn’t seem to distinguish RDF from XML.)
This particular group of colleagues, however, had (for the most part) a different reason for saying that RDF has more semantics than XML.
Thomas Roessler has recently posted a concise but still rather complex statement of the contract that producers of RDF enter into with the consumers of RDF, and the way in which it can be said to justify the proposition that RDF has more semantics built-in than XML.
My bumper-sticker summary, though, is simpler. When looking at an XML document, you know that the meaning of the document is given by an interaction of (1) the rules for interpreting the document shaped by the designer of the vocabulary and by the usage of the document creator with (2) the actual content of the document. The rules given by the vocabulary designer and document author, in turn, are limited only by human ingenuity. If someone wants to specify a vocabulary in which the correct interpretation of an element requires that you perform gematriya on the element’s generic identifier (element type name, as the XML spec calls it) and then feed the resulting number into a specific random number generator as a seed, then we can say that that’s probably not good design, but we can’t stop them. (Actually, I’m not sure that RDF can stop that particular case, either. Hmm. I keep trying to identify differences and finding similarities instead.)
(Enrique interrupted me here. “Gematriya?” “A hermeneutic tool beloved of some Jewish mystics. Each letter of the alphabet has a numeric value, and the numerical value for a concept may be derived from the numbers of the letters which spell the word for the concept. Arithmetic relations among the gematriya for different words signal conceptual relations among the ideas they denote.” “Where do you get this stuff? Reading Chaim Potok or something?” “Well, yeah, and Knuth for the random-number generator, but there are analogous numerological practices in other traditions, too. Should I add a note saying that the output of the random number generator is used to perform the sortes Vergilianae?” “No,” he said, “just shut up, would you?”)
In RDF, on the other hand, you do know some things.
- You know the “meaning” of the RDF graph can be paraphrased as the conjunction of a set of declarative sentences.
- You know that each of those declarative sentences is atomic and semantically independent of all others. (That is, RDF allows no compound structures other than conjunction; it differs in this way from programming languages and from predicate logic — indeed, from virtually all formally defined notations which require context-free grammars — which allow recursive structures whose meaning must be determined top-down, and whose meaning is not the same as the conjunction of their parts. The sentences P and Q are both part of the sentence “if P then Q”, but the meaning of that sentence is not the same as the conjunction of the parts P and Q.)
When my colleagues succeeded in making me understand that on the basis of these two facts one could plausibly claim that RDF has, intrinsically, more semantics than XML, I was at first incredulous. It seems a very thin claim. Knowing that the graph in front of me can be paraphrased as a set of short declarative sentences doesn’t seem to tell me what it means, any more than suspecting that the radio traffic between spies and spymasters consists of reports going one direction and instructions going the other tells us how to crack the code being used. But as Thomas points out, these two facts are fairly important as principles that allow RDF graphs to be merged without violence to their meaning, which is an important task in data integration. Similar principles (or perhaps at this level of abstraction they are the same principles) are important in allowing topic maps to be merged safely.
Of course, there is a flip side. If a notation restricts itself to a monotonic semantics of this kind (in which no well formed formula ever appears in an expression without licensing us to throw away the rest of the expression and assume that the formula we found in it has been asserted), then some important conveniences seem to be lost. I am told that for a given statement P, it’s not impossible to express the proposition “not P” in RDF, but I gather than it does not involve any construct that resembles the expression for P itself. And similarly, constructions familiar from sentential logic like “P or Q”, “P only if Q”, and “P if and only if Q” must all be translated into constructions which do not contain, as subexpressions, the expressions for P or Q themselves.
At the very least, this seems likely to be inconvenient and opaque.
Several questions come thronging to the fore whenever I get this far in my ruminations on this topic.
- Do Topic Maps have a similarly restrictive monotonic semantics?
- Could we get a less baroque representation of complex conditionals with something like Lars-Marius Garshol’s quads, in which the minimal atomic form of utterance has subject, verb, object, and who-said-so components, so that having a quad in your store does not commit you to belief in the proposition captured in its triple the way that having a triple in your triple-store does? Or do quads just lead to other problems?
- If we accept as true my claim that XML can in theory express imperative, interrogative, exclamatory, or other non-declarative semantics (fans of Roman Jakobson’s 1960 essay on Linguistics and Poetics may now chant, in unison, “expressive, conative, meta-lingual, phatic, poetic”, thank you very much, no, don’t add “referential”, that’s the point, the ability to do referential semantics is not a distinguishing feature here), does that fact do anyone any good? The fundamental idea of descriptive markup has sometimes been analysed as consisting of (a) declarative (not imperative!) semantics and (b) logical rather than appearance-oriented markup of the document; if that analysis is sound (and I had always thought so), then presumably the use of XML for non-declarative semantics should be regarded as eccentric and probably not good practice, but unavoidable. In order to achieve declarative semantics, it was necessary to invent SGML (or something like it), but neither SGML nor XML enforce, or attempt to enforce, a declarative semantics. So is the ability to define XML vocabularies with non-declarative semantics anything other than an artifact of the system design? (I’m tempted to say “a spandrel”, but let’s not go into evolutionary biology.)
- Is there a short, clear story about the relation between the kinds of things you can and cannot express in RDF, or Topic Maps, and the kinds of things expressible and inexpressible in other notations like first-order predicate calculus, sentential calculus, the relational model, and natural language? (Or even a long opaque story?) What i have in mind here is chapter 10 in Clocksin and Mellish’s Programming in Prolog, “The Relation of Prolog to Logic”, in which they clarify the relative expressive powers of first-order predicate calculus and Prolog by showing how to translate sentences from the first to the second, observing along the way exactly when and how expressive power or nuance gets lost. Can I translate arbitrary first-order predicate calculus expressions into RDF? How? Into Topic Maps? How? What gets lost on the way?
It will not surprise me to learn that these are old well understood questions, and that all I really need to do is RTFM. (Actually, that would be good news: it would indicate that it’s a well understood and well solved problem. In another sense, of course, it would be less good news to be told to RTFM. I’ve tried several times to read Resource Description Framework (RDF) Model and Syntax Specification but never managed to get my head around it. But knowing that there is an FM to read would be comforting in its way, even if I never managed to read it. RDF isn’t really my day job, after all.)
How comfortable can we be in our formalization of the world, when for the sake of tractability our formalizations are weaker than predicate calculus, given that even predicate calculus is so poor at capturing even simple natural-language discourse? Don’t tell me we are expending all this effort to build a Semantic Web in which we won’t even be able to utter counterfactual conditionals?! What good is a formal notation for information which does not allow us to capture a sentence like the one with which Lou Burnard once dismissed a claim I had made:
“If that is the case, then I am the Queen of Romania.”