Erik Naggum, R.I.P.

[20 June 2009]

It appears from reports on the Net that Erik Naggum, long-time genius loci of comp.text.sgml, has died.

In person, he was (as far as I could tell, on the very few occasions I encountered him in the flesh) a very sweet individual. On the net — well, he taught me what a flame war was. His work on internationalization gave hints of great generosity; his resentment against the Unicode Consortium was almost comic in its ferocity (even to me, never one of that organization’s greatest fans).

Erik Naggum, dead? Is it possible? One person fewer who remembers the old days.

So it goes.

Grant and contract-supported software development

[7 April 2009]

Bob Sutor asks, in a blog post this morning, some questions about government funding and open source software. Since some of them, at least, are questions I have thought about for a while as a reviewer for the National Endowment for the Humanities and other funding agencies, I think I’ll take a shot at answering them. To increase the opportunity for independent thought to occur, I’ll answer them before I read Bob Sutor’s take on them; if we turn out to agree or disagree in ways that require comment, that can be separate.

He asks:

  • When a government provides funding to a research project, should any software created in the project be released under an open source license?

It depends.

In practice, I think it almost always should, but in theory I recognize the possibility of cases in which it needn’t.

When I review a funding proposal, I ask (among other things): what is the quid pro quo? The people of the country fund this proposal; what are they buying with that money? A reliable survey of the work of Ramon Llull and its relevance to today? Sounds good (assuming I think the applicant is actually in a position to produce a reliable survey, and the cost is not exorbitant). A better tool for finding and studying emblem books? Insight into methods of performing some important task? (Digitizing cultural artefacts, archiving digital research results for posterity, creating reliable language corpora, handling standoff annotation, … there are a whole lot of tasks it would be good to know better how to do.) How interesting is what we would be learning about? How much are we likely to learn?

My emphasis on what we get for the money sometimes leads other reviewers or panelists to regard me as cold and mean-hearted, insufficiently concerned with encouraging a movement here, nurturing a career there. But I have noticed that the smartest and most attractive members of panels I’ve been on are almost always even tougher graders than I am. When funds are as tight as they typically are, you really do need to put them where they will do the most good.

If the value proposition of the funding proposal is “we’ll develop this cool software”, then as a reviewer I want the public to own that software. Otherwise, what did we buy with that money?

If the value proposition is “we’ll develop these cool algorithms and techniques, and write them up, so the community now has better knowledge of how to do XYZ — oh, and along the way we will write this software, it’s necessary to the plan but it’s not what this grant is buying”, then I don’t think I want to insist on open-sourcing the software. But it does make it harder for the applicant to make the case that the results will be worth the money.

Stipulating that software produced in a project will be open-source does usually help persuade me that its benefit will be greater and more permanent. If the primary deliverable I care about is insight, or an algorithm, open-sourcing the software may not be essential. But it helps guarantee that there will be a mercilessly complete account of the algorithm with all details. (It does have the potential danger, though, that it may allow other reviewers or the applicants to believe that the source code provides an adequate account of the algorithm and there is no need for a well written paper or series of papers on the technical problem. I am told that some programmers write source code so clear and beautiful that it might suffice as a description of the algorithm. I say, if writing documentation as well as source code is good enough for Donald Knuth, it’s good enough for the rest of us.)

On the other hand, I don’t think deciding not to open-source the software is necessarily an insuperable barrier. The question is: what value is the nation or the world going to get from this funding? Sometimes the value clearly lies with the software people are proposing to develop, sometimes it clearly lies elsewhere and the software plays a purely subordinate, if essential, role. (But although I admit this in principle, I am not sure that in practice I have ever liked a proposal that proposed to spend a lot of effort on software but not to make it generally available. So maybe my generosity toward non-open-source projects is a purely theoretical quantity, not observable in practice.)

If software is involved, you also have to ask yourself as a reviewer how well it is likely to be engineered and whether the release of the software will serve the greater good, or whether it will act like a laboratory escape, not providing good value but inhibiting the devotion of resources to creating better software.

The chances and consequences of suboptimal engineering vary, of course, with whether the research in question is focused specifically on computer science and software engineering, or on an application domain, in which case there is a long and often proud history of good science being performed with software that would make any self-respecting software engineer gag. (A long time ago, I worked at a well known university where the music department burned more CPU cycles on the university mainframe than any other department. Partly this was because Physics had its own machines, of course, and partly it was because the music people were doing some really really cool and interesting stuff. But was it also partly because they were lousy programmers who ran the worst optimized code east of the Mississippi? I never found out.)

  • Does this change if commercial companies are involved? How?

If the work is being done by a commercial company, they are historically perhaps less likely to want to make the software they develop open-source. That’s one way the process is affected.

But also, if a government agency is contracting with a commercial organization to develop some software, there may be a higher chance that the agency wants some software for particular parties to use, and the main benefit to be gained is the availability to those parties of the software involved. In some cases, the benefit may be the existence of commercially viable organizations willing and able to support software of a particular class and develop it further.

There are plenty of examples of commercial codebases developed in close consultation with an initial client or with a small group of initial clients. The developer gets money with which to do the development; the initial clients get to help shape the product and ensure that at least one commercial product on the market meets their needs. In the cases I have heard of, the clients don’t typically turn around and demand that the code base be open-source.

It’s not clear to me that government funding agencies should be barred from acting as clients in scenarios like this. This kind of arrangement isn’t precisely what I tend to think of as “research”, but whether it’s appropriate or not in a given research program really depends on the terms of reference of that program, and not on what counts as research in the institutions that trained me.

I have been told on what I think is good authority that if it had not been for contracts let by various defence agencies, the original crop of SGML software might never have been commercially viable. (And since it was that crop of software that demonstrated the utility of descriptive markup and made XML possible, I wouldn’t like to try to outlaw whatever practices led to that software being developed.)

  • Does this change if academic institutions are involved? How?

I don’t think so.

  • How should the open source license be chosen? Who gets to decide?

Yes.

Two umbrellas and a prime number.

I think I mean “Huh?” Is this a trick question?”

To the extent that we think of funded research as the purchase (on spec) of certain research products we hope the funding will produce, then the funding agency can certainly say “We want the … license”. And then the Golden Rule of Arts and Sciences applies. Or the people writing the proposal can say “We want to use the … license; take it or leave it.” And the funding agency, guided by its reviewers and panelists and staff and the native wit of those responsible for the final decision, will either leave it or take it.

The only thing that would make me more suspicious and worried than this chaotic back and forth would be an attempt to make an iron-clad rule to cover all cases, for all projects, for all governmental funding agencies.

Heinrich Hertz and the empty set of tomatoes

[2 April 2009]

Why does Nelson Goodman want to work so hard just to avoid talking about classes or sets?

Earlier this year I spent some time reading the section on the calculus of individuals in Nelson Goodman’s The structure of appearance (3d ed. Boston: Reidel, 1977) and the paper Goodman wrote on the subject with Henry S. Leonard (Henry S. Leonard and Nelson Goodman, “The calculus of individuals and its uses” The journal of symbolic logic 5.2 (1940): 45-55).

I was struck by the lengths Goodman goes to in order to avoid talking about sets, although his compound individuals which contain other individuals seem to be doing very much the same work as sets. Indeed, the 1940 paper makes a selling point of this fact. On page 46, Leonard and Goodman write “To any analytic proposition of the Boolean algebra will correspond a postulate or theorem of this calculus provided that …” (In other words, with some few provisos, if you can make a true statement about sets, you can make a corresponding true statement about individuals in the calculus of individuals. The provisos aren’t even statements you can’t make, just restrictions on the form you make them in. Instead of saying “the intersection of x and y is the empty set” you have to say they are discrete. And so on.) And the concluding sentence of the paper (p. 55) is: “The dispute between nominalist and realist as to what actual entities are individuals and what are classes is recognized as devolving upon matters of interpretative convenience rather than upon metaphysical necessity.“

In other words, Goodman seems at first glance to be simplifying the world by eliminating the notion of sets and classes, and then to be complicating it again in precisely similar ways by taking all of the fundamental ideas we have about sets or classes, and reconstructing them as funny ways of talking about individuals. Cui bono?

This afternoon I saw a review by Anthony Gottlieb, in the New Yorker, of a recent book about the Wittgenstein family (Alexander Waugh, The House of Wittgenstein: A family at war), which seems to suggest a solution. Gottlieb quotes a suggestion from the physicist Heinrich Hertz:

Hertz had suggested a novel way to deal with the puzzling concept of force in Newtonian physics: the best approach was not to try to define it but to restate Newton’s theory in a way that eliminates any reference to force. Once this was done, according to Hertz, “the question as to the nature of force will not have been answered; but our minds, no longer vexed, will cease to ask illegitimate questions.”

(Throws a new light on Wittgenstein’s remark about not wanting to solve problems but to dissolve them, doesn’t it?)

It’s true that once you rebuild the ideas of set union, intersection, difference, etc. as ideas about individuals which can overlap or contain other individuals, and eliminate the word ‘set’, it becomes a lot harder to describe a set which contains as members all sets which are members of themselves, or a set which contains as members all sets which are not members of themselves. The closest you can conveniently get are statements about individuals which overlap themselves (they all do) or which do not overlap themselves (no such individual). Good-bye, Russell’s Paradox!

And consider the surrealist joke I ran into the other day:

Q. What is red and invisible?
A. No tomatoes.

A user of the calculus of individuals can enjoy this on its own terms, without having to worry about whether it’s a veiled reference to the fact that some typed logics end up with multiple forms of empty set, one for each type in the system. One for integers, if you’re going to reason about integers. One for customer records, if you’re going to reason about customers. And … one for tomatoes?

Q. What is red and invisible?
A. The empty set of tomatoes.

</w3c:msm>

[30 January 2009]

Today is my last workday as a staff member at the World Wide Web Consortium.

I have learned a lot during my time here, I’ve enjoyed the work, and I have tried to help make the Web a better place for people who care about information and for the information we care about. But it’s ten years since I joined the staff, and it’s time to move on.

What’s next? I will be doing consulting and contract work through Black Mesa Technologies LLC. If you have interesting problems touching on documents, electronic representation of information (documents or other), validation, XSLT, XQuery, or the like; if you have concerns about the proper application of information technology to the preservation of commercial or cultural-heritage information, then give me a call; I’ll be around.

Schmidt on networking

[26 October 2008]

I’ve recently set up a LinkedIn profile for myself, and I’ve been assiduously searching in LinkedIn for old and current friends and colleagues and ‘building my network’ — so I was rather struck by the following slightly sobering remarks in Helmut Schmidt’s recent book Außer Dienst, which I picked up in Germany a couple weeks ago and have been reading at odd moments on airplanes:

Den Ausdruck Netzwerk hat es zu meiner Zeit noch nicht gegeben. Aber natürlich hat man vielfältige persönliche Kontakte geknüpft und sie langfristig aufrechterhalten. Wer sich gegen seine Zeitgenossen abschließt, hat es schwerer, zu abgewogenen Urteilen zu gelangen, als einer, der sich öffnet und Kontakt und Austausch sucht. … Mir wollen weniger jene Netzwerke wichtig erscheinen, welche einem bestimmten Interesse oder der eigenen Karriere dienen, als vielmehr solche, die der geistigen Anregung und dem gedanklichen Austausch förderlich sind.

Or (my translation):

In my day the expression ‘network’ did not yet exist. But of course people made a lot of personal contacts and kept them up over long periods. Anyone who is closed off from contact with one’s contemporaries will find it harder to reach well founded judgements than someone who is open to new contacts and to exchange of views. … Networks of contacts which serve only a particular interest or which are made only in the interest of one’s career seem to me less important than contacts which serve to provide intellectual stimulation and promote the exchange of thoughts.

Of course, Schmidt has things like the Mittwochsgesellschaft (and a Freitagsgesellschaft in Hamburg) in mind, which set an awfully high bar. But still — it makes me wonder not just how well LinkedIn and other social networking sites measure up to these high standards, but how one might use them to pursue the same kinds of intellectual cross-fertilization and mutual education Schmidt describes.