Thoughts to come back to …

[27 August 2008]

Observation: both RDF and Topic Maps seem to aspire to make it easy to find all the ways in which a given thing (topic or resource) may appear in a given body of data.

In both, the basic goal appears to be that if you look for Essex or Washington, you should be able to specify whether you mean the human being or the geographic entity (and probably be able to distinguish the state of Washington from the various cities named Washington), and find it no matter where it appears in the system. In RDF terms, this would mean no matter which triples it appears in, and no matter whether it appears as subject or object of the triples; in topic map terms, it would mean no matter which roles it plays in which associations.

Observation: Codd insists firmly that to be acceptable in his eyes, relational database management systems must not only provide built-in system-level support for domains (which may be regarded as a form of extended data types with slightly more semantics than the basic types in a typical programming language, so you can distinguish people from places even if you use VARCHAR(60) columns to represent both), but also include ways of finding all the values actively in use for a given domain, regardless of relation or column, and of finding all the occurrences of a particular value of a particular domain, without getting it mixed up with any values from different domains which happen to use the same underlying basic datatype in their representation. (For those with The relational model for database management, version 2 (Reading: Addison-Wesley, 1990) on the shelf, I’m looking at sections 3.4.1 The FAO_AV Command and 3.4.2 The FAO_LIST Command).

Question: Are the RDF and TM communities and Codd after essentially the same thing here? Is there some sense in which they fulfil (or are trying to fulfil) this part of Codd’s ideal for data management better than SQL systems do?

What exactly is the relation between this aspect of both RDF and TM on the one hand, and Codd’s notion of domains or extended data types on the other?

I’ve wanted to think about this for years, but have not managed to find anyone to discuss it with who had (a) sufficient knowledge of both semweb or Topic-Map technology and relational theory (or rather Codd’s particular doctrines) and (b) the time and inclination to go into it, at (c) a time when I myself had the time and inclination. But someday, perhaps …

One thought on “Thoughts to come back to …

  1. Codd domains are just the simple datatypes of XML Schema; there’s an underlying implementation corresponding to the value space, but two simple types derived from String are themselves incommensurable. SQL systems have always ignored this notion in favor of primitive datatypes only, to their detriment.

Comments are closed.