[19 February 2013]

For reasons not worth going into (it’s a long story) I was once involved in a discussion of how to tell, given a sequence of objects of a given kind and two descriptions (or variable references) denoting items X and Y in the sequence, whether X and Y are identical or distinct. (This topic came up during a discussion of whether we needed clearer notions of identity conditions for this class of objects.)

One of my interlocutors suggested that we did not need to specify any identity conditions at all. Since X and Y were presented to us in a sequence, we could always tell whether X and Y are the same or different by looking at their position in the sequence. If X and Y are both at the same position in the sequence, then X must be identical to Y, and if X and Y are at different positions in the sequence, then X and Y must be distinct from each other.

Now, the first claim (if X and Y are at the same position in the sequence, then X = Y) is obviously true. And the second (if X and Y are at different positions in the sequence, then X ≠ Y) is manifestly false, though sadly my interlocutor never stood still long enough for me to point this out. Let X be “the Fibonacci number F_{1}” and Y be “the Fibonacci number F_{2}“. These occur at different positions (1 and 2) in the Fibonacci series, but both expressions denote the integer 1. So we cannot safely infer, from the fact that X and Y identify things at different positions in a sequence, that X and Y identify different things.

As I discovered the other day, Frege turns out to have a word to say on this topic, too. I wish I had had this quotation ready to hand during that discussion.

… die Stelle in der Reihe kann nicht der Grund des Unterscheidens der Gegenstände sein, weil diese schon irgendworan unterschieden sein müssen, um in einer Reihe geordnet werden zu können.

… their positions in the series cannot be the basis on which we distinguish the objects, since they must already have been distinguished somehow or other, for us to have been able to arrange them in a series.

(Gottlob Frege, *Die Grundlagen der Arithmetic*, 1884; rpt Stuttgart: Reclam, 1987; tr. by J. L. Austin as *The Foundations of arithmetic* 1950, rpt. Evanston, Illinois: Northwestern University Press, 1980, § 42.)

There is fundamental difference between sequence of (instances of) value types and sequence of instances of reference types. In XDM representatives of the latter are sequences of nodes, and of the former — sequences of non-nodes (such as xs:string, xs:integer, … etc.), aka anyAtomicType.

The question of identity has been resolved well in traditional programming languages — remember VB’s “By Ref” and “By Val”.

It isn’t just a whim to follow the rule that value types are passed by value, and reference types are passed by reference. This just reflects the fact that equality between two identically-typed non-atomic objects isn’t generally possible based only on the values of their members (properties) — no conclusion can be made unless we have the exact reference to (handle of) the object.

The same is in full force when comparing sequences of non-atomic objects. I find that the only correct definition of equality of such sequences is the expression that compares for equality the ids (such as generate-id()) of their members:

deep-equal( $seq1/generate-id(), $seq2/generate-id() )

We don’t actually perform deep equality above, because all items of the two sequences are atomic — I used deep-equal just for convenience, as I am not aware of any other standard function that could be used for sequence comparison.

Exactly this definition of “sequence-equal” needs to be used if/when we define the rules for function memoization.

It is easy to give an example where comparing two sequence of non-atomic items with deep-equal() leads to error, despite the fact that the two sequences compare as truly deep-equal — it is enough to try to make use of the parents of these nodes and see that they are generally not the same.

My modest 2c.

Dimitre