Gavagai and topic maps

[23 November 2009]

In Leipzig earlier this month, at TMRA 2009, there was (understandably, in light of the conference motto “Linked Topic Maps”) a certain amount of discussion of using public subject identifiers for topics, to increase the likelihood that information collected by different people could be merged usefully, without preconcertation by the parties involved. In principle, this raises all sorts of thorny questions about ontology (of both the philosophical and the engineering kinds). In practice, however, people would like to be able build systems and share data without waiting for the philosophers and engineers of the world to agree on an answer to the question ”What exists?” and they seem to be willing to risk making a mistake or two along the way.

At some point (I’m now a bit hazy when this happened), someone mentioned a proposal made by Vegard Sandvold, an interesting sort of rough-and-ready approach to the problem: use Wikipedia (or dbpedia) as a sort of barometer of a reasonably democratic consensus on the question.

Steve Pepper has responded that one problem with that (among others of no concern here) is that if he wants to say something about (for example) the International Socialists organization in the UK (founded in 1962), a restriction to Wikipedia as a source of public subject identifiers would make it impossible: Wikipedia redirects from International Socialists (UK) to a page about the Socialist Workers Party (Britain). This organization, founded in 1977, is really not the same thing, said Steve (even if Wikipedia says the SWP was formed by renaming the IS, which suggests a continuity of essential identity).

Steve appears to argue that there is a simple fact of the matter in this case, and that treating the International Socialists and the Socialist Workers Party as the same is simply wrong. He may be right. But (without wanting to argue the ins and outs of this particular case) the example seems to me to be the kind of question which does not always have a determinate answer. It illustrates nicely a kind of ontological indeterminacy which necessarily haunts both RDF and Topic Maps and shows why accounts of how those technologies can combine information from multiple sources can strike careful readers as naive.

Within a given discourse, for a given purpose, we may decide to analyse a particular portion of reality as consisting of several distinct entities which succeed each other in time; in another discourse, for another purpose, we may choose to analyse that same portion of reality as consisting of a single entity which undergoes some changes of (accidental) properties. Concretely: sometimes we want to treat the International Socialists as distinct from the Socialist Workers Party, and sometimes we want to treat them as two names for the same thing. Sometimes I want to treat the journal Computers and the Humanities and the journal Language resources and evaluation as two quite distinct journals; at other times, I want to treat them as the same journal, published under one title through 2004 and then under a new title. There is not a simple fact of the matter; it depends on what, exactly, I want to refer to when I refer to things.

It reminds me of W.V.O. Quine’s question: if a field linguist sees a rabbit run by and hears a native informant say “gavagai”, how is the linguist to determine whether this utterance means ‘Look, a rabbit!‘ or ‘Food!’ or ‘Undetached rabbit-parts’? Quine talked about the indeterminacy of translation, suggesting that he’s concerned essentially with the problem of understanding what someone else really means by something (oddly, he seems to think the difficulty arises primarily with foreign languages, which seems to me optimistic), but I think the difficulty reflects the fact that the field linguist is likely to confront an analysis of reality that does not match his own. This happens to the rest of us, too, not just to field linguists.

When I say the phrase “International Socialists (UK)”, can you reliably determine whether I intend to denote an organization which ceased to exist in 1977, or an organization which continues to exist today?

You can ask me, of course, and maybe I’ll answer. And if I answer, maybe I’ll tell you the truth. But if you don’t have a chance to ask me?

6 thoughts on “Gavagai and topic maps

  1. I think Steve’s basically correct. Cicero is Tully (and Hesperus is Phosphorus) in the world of objects, but as we know, “Steve agrees that Cicero is a great Roman orator” does not entail “Steve agrees that Tully is a great Roman orator” unless we have done something to commit ourselves to a de re interpretation of all terms. As subject matters Cicero and Tully are distinct, for they have distinct intensions.

  2. There are a number of issues here:

    (1) Are the IS and the SWP “the same thing”?
    (2) Should one use the Wikipedia URLs as subject identifiers for the one or the other (or both) or anything else for that matter?
    (3) Does this have anything to do with Quine’s gavagai?

    The short answer to (1) is “yes and no”. In some contexts one might want to regard the IS and the SWP as the same subject; in other contexts they are clearly different (e.g. someone who left IS for principled reasons when it transmuted into the SWP – as some people did – will understandably want to be able to assert that she was a member of IS *without* it being inferred that she was (also) a member of the SWP.) This is perfectly natural, and it is not in general a problem for the concept of PSIs.

    The answer to (2) is: Not if you care about the longevity and stability of identifiers. This is the point I wanted to illustrate using the IS/SWP example. At some point in time “T” there were separate Wikipedia pages for IS and SWP. Later the IS URL was redirected to the SWP page. It is the fact of this *change* that concerns me and makes me reluctant to use Wikipedia URLs as PSIs. If I had been looking for a PSI for IS (as distinct from SWP) at time “T” I might have chosen to use the Wikipedia PSI. When Wikipedia then decided to redirect to the SWP page they effectively merged those two topics and *changed the meaning* of my topic map. This is what I find unacceptable.

    Of course, one cannot blame Wikipedia for effecting such changes from time to time, because its purpose is to be an encyclopedia, not a repository of identifiers. The problem is that there are conflicting interests if one tries to combine the two purposes: for an encyclopedia, flexibility is very important; for a repository of identifiers, stability is paramount.

    As for (3), I’m not sure this has anything to do with the gavagai problem. That problem is essentially solved, as far as I’m concerned, by the published subject *descriptor* (a.k.a. indicator) side of the PSI/PSD dichotomy. If the native says “” you simply paste his URI into your browser and try to interpret what you get back. If it gives you a sufficiently unambiguous indication of the intended referent (“rabbit”, “food”, or whatever), and if that referent is what you yourself want to refer to, and if you trust the stability and longevity of the PSI, you can choose to use it. If not, you can choose not to. (And if you can’t find a suitable PSI, you can mint one yourself.)

  3. What I should have written was:

    *To the extent that it is relevant here*, that problem is essentially solved by the published subject *descriptor* […] side of the PSI/PSD dichotomy.

  4. Steve, thank you for the clarification. You did mention stability of the URI as an issue, but I fixed (perhaps because of my own worries) upon the difficulty of deciding (both for others, and even for oneself in some cases where you don’t much care one way or the other) just which individual or entity one wishes to refer to, in cases where the decision may be complicated. The history of the URIs is relevant to a full understand of this situation, but even without reference to that history it seems that at the moment one difficulty is that some agents active on the Web treat as one entity what others wish to treat as two distinct entities.

    I’m a little less optimistic than SP about the ability to tell, from a record in a public directory of identifiers, just what entity it identifies, particularly in cases where the difference is tricky. Or perhaps I’m just more depressed about the likely consequences of that difficulty.

    Inge Henriksen’s post identifies several ways of trying to convey what public identifiers mean, and correctly diagnoses many of the associated problems. The post is worth reading, though as I’ve said in a comment on it, I think the only real grounding for any identifier is, ultimately, based on human-readable prose. (Logical formalisms are a useful addition, but not, I think, a substitute.)

Comments are closed.