Sequence as source of identity

[19 February 2013]

For reasons not worth going into (it’s a long story) I was once involved in a discussion of how to tell, given a sequence of objects of a given kind and two descriptions (or variable references) denoting items X and Y in the sequence, whether X and Y are identical or distinct. (This topic came up during a discussion of whether we needed clearer notions of identity conditions for this class of objects.)

One of my interlocutors suggested that we did not need to specify any identity conditions at all. Since X and Y were presented to us in a sequence, we could always tell whether X and Y are the same or different by looking at their position in the sequence. If X and Y are both at the same position in the sequence, then X must be identical to Y, and if X and Y are at different positions in the sequence, then X and Y must be distinct from each other.

Now, the first claim (if X and Y are at the same position in the sequence, then X = Y) is obviously true. And the second (if X and Y are at different positions in the sequence, then X ≠ Y) is manifestly false, though sadly my interlocutor never stood still long enough for me to point this out. Let X be “the Fibonacci number F1” and Y be “the Fibonacci number F2“. These occur at different positions (1 and 2) in the Fibonacci series, but both expressions denote the integer 1. So we cannot safely infer, from the fact that X and Y identify things at different positions in a sequence, that X and Y identify different things.

As I discovered the other day, Frege turns out to have a word to say on this topic, too. I wish I had had this quotation ready to hand during that discussion.

… die Stelle in der Reihe kann nicht der Grund des Unterscheidens der Gegenstände sein, weil diese schon irgendworan unterschieden sein müssen, um in einer Reihe geordnet werden zu können.

… their positions in the series cannot be the basis on which we distinguish the objects, since they must already have been distinguished somehow or other, for us to have been able to arrange them in a series.

(Gottlob Frege, Die Grundlagen der Arithmetic, 1884; rpt Stuttgart: Reclam, 1987; tr. by J. L. Austin as The Foundations of arithmetic 1950, rpt. Evanston, Illinois: Northwestern University Press, 1980, § 42.)

Postel’s Law vs the Whiteboard Marker Rule

[28 June 2010]

Many people have been influenced by what is often called Postel’s Law, usually quoted as saying “Be conservative in what you send and liberal in what you accept”.

In a room with a whiteboard, however, the only workable rule is: when you find a marker that no longer writes, do not be liberal in what you accept, and do not put it back where you got it. Throw it away. Otherwise, every whiteboard you run into will soon have twenty-odd markers, some of which actually work, if you can find them, but most of them duds.

Cleaning up as you go is a good principle in cooking, and in managing the set of white-board markers. It’s also a good idea in data management; it’s a shame so many people misread Postel’s Law to mean the opposite.

What is a character?

[25 November 2009]

The other day I posted about a proposal to use Wikipedia as a rough-and-ready guide to the ontology of most public entities. I’ve been thinking about it, and wondering why it felt somehow sort of familiar.

Eventually, I decided that the proposal reminds me of the way in which some people (including me) eventually disposed of the thorny question of what to count as a ‘character’ when analysing writing systems. (For example: when are e and é to be regarded as the same character, when as two? Or is the latter a sequence of two characters e and an acute accent? The answer some people eventually converged upon is simple:

For virtually all engineering purposes, treat something as a character if and only if there is a code point for it in the Universal Character Set (UCS) defined by Unicode and ISO 10646.

Some exceptions may need to be made in principle for the private use area, or for particular special cases. But unless you and your project are the kind of people who actually run into, or identify, special cases related to subtle issues in the history of the world’s writing systems (that means, for 99.999% of the world’s population, and at least 50% of the readers of this blog), you don’t need to worry about exceptions.

The reasoning is not that the Unicode Consortium and SC 2 got the answers right. On the contrary, any reasonable observer will agree that they got some of them wrong. Many members of the relevant committees will agree. (They answer, for example, that é is BOTH a single character and a sequence of two characters. Thank you very much; may I have another drink now?) It’s not likely, of course, that any two reasonable observers will agree on which questions the UCS gets right, and which it gets wrong.

But some questions are just hard to answer in a universally satisfactory way; if you decide for yourself what counts as a character and what does not count as a character, your answers may differ in details from those enshrined in the UCS, but they will not be more persuasive on the whole: there are no answers to that question that are persuasive in every regard.

The definition of ‘character’ embodied in the UCS is as good an answer to the question “What is a character” as we as a community are going to get, and for those for whom that question is incidental to other more important concerns, it’s far better to accept that answer and move on than to try to provide a better one.

If the question is not incidental, but central to your concerns (if, for example, you are a historian of writing systems, or of a particular writing system), then a standardized answer is not much use to you, except perhaps as an object of study.

Hmm. Perhaps one of the main purposes of standardization is to allow us to ignore things we are not particularly interested in, at the moment? Is the purpose of standards to make things boring and ignorable?

That could affect whether we think it’s a good idea to adopt such a de facto standard for ontology, or whether we think such standardization is just one step along a slippery slope with thought police at the bottom.

Changing stylesheets in midstream

[19 October 2009]

My evil twin Enrique came by the other day in a great state of excitement. There’s been a bit of a kerfuffle in some W3C working groups lately, he told me. As some readers will know, the W3C recently unveiled a new design for their web site. (Many people seem to want to call this a site redesign, but as far as I know most of the site was originally developed by individuals and working groups working autonomously, and outside the front page, the Tech Reports page, and the other pages maintained by the Communications Team, the site never had a consistent design to begin with. Surely it’s only a redesign if there was a design there in the first place?)

Almost all the comments on the new design appear to be positive — at least, they were until some spec editors and working group chairs noticed that the site redesign had included reformatted versions of their working groups’ current Recommendations, which the working groups had not looked at before and which proved, when examined, to be sub-optimal in some ways.

“Sub-optimal is putting it mildly,” laughed Enrique. “Some of the specs looked like night soil on toast. And some of the editors were fit to be tied.” Enough pain was expressed over the new look of the old specs, apparently, that after a couple of days the standard URLs for existing Recommendations were all reset, and no longer point to the reformatted versions. (The reformatted versions are still around — no one at W3C ever deletes anything, it’s a point of some pride — though you have to know what URIs to point to.)

One of the most visible problems is that in some specs, extra space was appearing before and after large numbers of hyperlinked special terms. “You know what it was?” chortled Enrique. “Some bright young thing at some bright young design agency seems to have thought a 20px padding would be a good idea for the CODE element. Do these people not know any HTML? Here, look at the stylesheet!” He pulled out a hand-held and showed me a rule from one of the new stylesheets (reformatted here for legibility):

```h1, h2, h3, h4, h5, h6, ul, ol, dl,
p, pre, blockquote, code {
}```

He was cackling with malice now. “The stylesheet author seems to have thought that `code` was not for inline material but for indented blocks. Where do they get these people? And giving measurements in pixels is so dead-tree-oriented!”

“Now, now,” I said. “I’m sure you were a bright young thing once yourself.”

“Not me,” he returned brightly. “I was fifty-two the day I was born, and I’ve always been dumb as a post.”

“Two, actually. Odd, though,” I said. “When I retrieve the reformatted versions of the XML and XSLT 2.0 specs, I don’t see extra white space around `code` elements.” I retrieved the stylesheet with the bogus padding values for `code`; the rule now read

```h1, h2, h3, h4, h5, h6, ul, ol, dl,
p, pre, blockquote {
}
```

“Those bastards!” Enrique cried. “You mean they’ve fixed it? I was going to charge them big bucks to tell them what was wrong!” And he stomped off again in spluttering disappointment. I haven’t seen him since, but I’m not worried; he’ll get over it.

[The new W3C site is the result of a long design history, and really does appear to be an improvement, for the most part. It makes it much easier than the old site to find your way around (or so I believe — I knew the old site structure well enough that the new one just confuses me; I assume that will pass). The new look intended for W3C technical reports (i.e. Recommendations, Notes, Working Drafts, etc.) can be inspected on the beta site’s Tech Reports page, or the beta site’s version of the new Standards page. I haven’t yet decided whether I think the new tech report styling is an improvement or not, and if it is, whether it’s enough of an improvement to justify the disruption of restyling the entire body of existing Recommendations. I’ll be interested in readers’ reactions.

One thing is unsurprising: if you launch a new stylesheet on technical material whose authors and editors pride themselves on precision, you would do well not to make it public until they have confirmed that it is OK. And it would be smart, before you let them see it at all, and certainly before you make it public, to make sure the new stylesheet doesn’t introduce highly visible problems like 20 pixels of extra white space around every `code` element.

Live and learn.]

NACS and W3C

[8 August 2009]

Just read a long interviw between Ian Jacobs of W3C and David Ezell, the chair of the W3C XML Schema working group and the representative of the National Association of Convenience and Petroleum Retailers (NACS) on the W3C’s Advisory Committee.

I may be biased, since I’ve worked closely with David for several years, but what he says in the interview seems to illustrate well the advantages for user organizations to be involved in standards development, instead of just leaving the standards work to their vendors. User organizations are woefully underrepresented on pretty much every standards group I know about; I wish more organizations took the enlightened approach NACS has taken in participating in W3C work and in supporting David’s work as chair of the XML Schema working group. Boeing deserves kudos, too, for their participation in XML Schema work. But if we had had even more users, and a less pronounced dominance of the group by vendors, I think the spec would have been better for it.

If your organization can join W3C, or other standards bodies relevant to your work, think about doing so.

Even without being a W3C member, of course, any member of the public is invited to comment on published drafts of specs, and the comments are typically taken very seriously. If you can’t afford the commitment of membership and working group participation, commenting on drafts is a good way to influence the specs. But if you can join, I encourage you to do so!