Copyright and other unwelcome issues

[10 November 2011]

One of the unwelcome side effects of recent trends in copyright (I mean the gradual shift, over the last fifty years, towards more and more protection for commercial interests and less and less protection of the public benefit) is that while it used to be easy to make one’s own work readily available for reuse by others, it now requires more careful planning. It used to be, for example, that if you didn’t care to claim or protect copyright in something you wrote, all you had to do was nothing: if you didn’t claim copyright, and the work was public, then it was in the public domain.

[“Hmm. How sure of you are that?” asked my evil twin Enrique, with a suspicious look. “Well, kind of sort of sure, I think.” “Better add a disclaimer, then, don’t you think?” “OK, right you are.”]

At least, that’s how I understand it, at a first approximation. (I am not a lawyer and have never much wanted to be, though a friend of mine who did go to law school once told me I’d enjoy the mysteries and mystique of tax law. So no reader should take anything I write as providing guidance about the law of the U.S. or any other country.) If you want good information about copyright, go find something by Pamela Samuelson.

[“OK, that’ll do, I guess. Is Pamela Samuelson really a good source?” “The best. Thank heavens she’s writing that column for Communications of the ACM again.”]

Nowadays, of course, in the U.S. this is no longer true: from the moment you write anything down, you own copyright in it, whether you want it or not, unless you do something to avoid it.

Of course, many people ignore this and behave as if the old legal regime were still in place. I’ve had representatives of U.S. universities say “Oh, feel free to reuse that stylesheet we wrote” — as if, because it carried no copyright statement, it were available for reuse by anyone interested. On the contrary! Since the stylesheet didn’t carry any licensing information or dedication to the public domain, it was certainly copyright either by the individual who wrote it or by the institution for which it was written. And since it didn’t carry any copyright information, it was impossible to know with any confidence who actually did (or does) own the copyright, and whom to contact for permission.

[“And these people who were trying to give you permission to reuse that stylesheet, were they legally empowered to enter into binding agreements on behalf of their institutions?” Enrique asked. “Dunno. I doubt it.” “So, tell me, do you have a barge pole handy?” “A barge pole? No, why?” “Because I want to warn you not to touch that code with a barge pole, that’s why. And if you don’t have a barge pole, it just feels kind of pointless. Do you have an eleven-foot pole, maybe?” “Oh, hush.”]

End result: I politely ignored (at least, I hope my silence was polite) their invitation to reuse that code, and I wrote new code from scratch.

[“Oh, come on,” Enrique hissed. “You know perfectly well you wouldn’t have reused that code anyway! It was full of xsl:for-each elements.” (New readers may need to be informed that I seem to have an issue with xsl:for-each elements; I’m sure it’s a perfectly fine construct and there’s nothing wrong with it. I only know that when I have a stylesheet with a bug and discover that it has for-each constructs, rewriting the for-each as an apply-templates always seems to make the bug disappear. Go figure.) “Well, yeah. But even if I had loved the code, I would not have felt able to reuse it.”]

Of course, there are plenty of open-source and Creative Commons licenses to choose from, if you want to ensure that work you do can be re-used.

But who, in a collaborative project, is “you”?

If you write code or prose as an individual, outside the course and scope of your normal employment duties, then it’s straightforward to assert copyright in your own name. But if you are collaborating with others in a project, and you want to apply an appropriate license, in whose name should copyright be claimed? If only one person works on a given item (a program or a document) it’s easy to say that person should assert copyright and grant the license. But if more than one person works on it?

Some people incline to claim copyright in the name of the project, which feels plausible at some level: project is a name we sometimes give to the intentional collaboration of individuals to achieve some goal, and work done in furtherance of that goal can plausibly said to be done for “the project”.

But can a project which is not a legal entity actually be the owner of a copyright? If there’s a legal entity involved, it’s possible in principle to figure out, in case of disputes, who speaks for the entity and who makes decisions. But if there’s no legal entity?

Can copyright usefully be claimed by a research project, in the name of the research project?

[“Well, wouldn’t a research project be legally a form of partnership?” asked Enrique. “A partnership doesn’t have to be incorporated to be a legal person, right?” “Maybe,” I equivocated. “But remember, I am not a lawyer. And a fortiori you, as a figment of my imagination, are also not a lawyer.” “Oh, go soak your head. Whom are you calling a figment … ?”]

I notice that W3C, for example, which is not a legal entity, claims copyright in the name of W3C, but immediately after adds, in parentheses, the names of the three host institutions of W3C, which are legal entities.

It would be nice, wouldn’t it, if intellectual property rights served to promote the useful arts and sciences, instead of being an unproductive drain on the time and effort of creative people and a barrier to normal intellectual work? Oh, well, maybe someday.

Day of the dead 2011

7 November 2011

Last week’s celebration of the Day of the Dead (aka All Souls’ Day, 2 November) was a little more thoughtful for me than it is in some years. Partly this was because John McCarthy had just died, and partly because this year seems to have taken an unusually high toll in people whose work I have had occasion to value.

News of McCarthy’s death came through when I was on the phone with John Cowan and my brother Roger Sperberg. We paused for a few moments, and then we spent half an hour thinking about technical topics, which seemed like a good way to mark the occasion. (For example: if the original plan was for Lisp programs to be written not in S-expressions but in an Algol-like syntax called M-expressions, is that a sign that McCarthy was less far-sighted than he might have been? How could he not have seen the importance of the idea that Lisp data and Lisp programs should use the same primitive data structures? Perhaps he had feet of clay, so to speak? Or on the contrary should we infer, from the fact that the plan for M-expressions was abandoned and that Lisp became what it became, that McCarthy was astute enough to recognize great ideas when he saw them, and nimble enough to change his plans to capture them? On the whole, I guess I lean toward the latter view.)

This year, Father Roberto Busa also died. Many people (including me) regard him as the founder of the field of digital humanities, because of his work, beginning in 1948, on a machine-readable text of the work of Thomas Aquinas. The Index Thomisticus was completed in 1978, several IT revolutions later. Busa, too, was astute enough to adjust his plans in mid-project: his initial plans involved clever use of punched cards and sorters, and it was only after the project had been going for some years that it began to use computers instead of unit-record equipment. I met Busa only briefly, once as a young man at my first job in humanities computing, and once years later when I chaired the committee which voted to award him what became the Busa Award for contributions to the application of information technology to humanistic scholarship. But he made a strong impression on me with his sweetness of temper and his intelligence. He made an even stronger impression on me indirectly: Antonio Zampolli worked with Busa as a student. And without Antonio, I think my life would have had a rather different shape.

Oh, well. Nobody gets out of here alive, anyway.

Another reason to use the microphone

[Hamburg, 29 September 2011]

Every now and then conference speakers want to avoid using a microphone; they dislike the introduction of technology into the speaker/audience relation, perhaps, and sometimes they are so confident of their ability to be heard in the room that any suggestion that they might use a mike is almost an affront to their lung power. (Are these last class of speaker always male? Well, usually, I think.)

I have been told on good authority that users of hearing aids benefit a good deal from amplification of the speaker’s voice; that’s a good reason to use the microphone.

But sitting here listening to a very interesting speaker who is completely ignoring the microphone, I am reminded of a different reason: for purposes of speaker amplification, non-native speakers are effectively hard of hearing. When the speaker strays into range of the podium’s microphone and happens to be facing the audience, I can understand every word he says; when he faces away from the audience or wanders over to the side of the room, I am missing at least every fifth word, which makes the talk into a kind of aural cloze test. That’s OK for me (I pass the test, more or less, though I missed that nice joke everyone else laughed at). But for my neighbor (for whom German is not a second but a fourth or fifth language), the experience is clearly a real trial.

If you are attending an international conference and want to be understood by people who are not native speakers of your language, then there is a simple piece of advice:

Use the microphone.

Enough said.

XQuery in the cloud

[10 August 2011]

Recently I had occasion to build a small web application (feedback forms for the Balisage conference) using XForms. I used XForms since XForms delivers the information from the user in an XML document, which makes it easier for me to work with the data later. As an experiment, I developed the app using Sausalito, the XQuery engine in the cloud developed by 28msec. Quick summary: Cool! Thumbs UP!

[Obligatory hand-waving and disclaimer: Sausalito is not the only way to deploy XQuery in the cloud: MarkLogic has defined Amazon machine instances with MarkLogic Server pre-installed, and I’m sure there are, or will be, other options as well. I will continue to make a point of working with as many different XQuery implementations as I can, just to know what’s out there. But I had a lot of fun with Sausalito, and if you have a use for a Web-based XML application, Sausalito is definitely worth a look.]

The basic structure of a Sausalito project is fairly straightforward, and well documented on their site: the URIs you want to serve are matched either against static resources in a public subdirectory of the project, or against a directory of XQuery modules containing handlers for requests. For example, in the Balisage feedback application, the URI /reviews/single is handled by the single() function in the module reviews.xq; it can call library functions defined elsewhere. Sausalito has all the functions usual in XQuery, and also some fairly extensive libraries of things you may want for web applications (to query aspects of the incoming HTTP request, for example, and to set properties in the response). They have an Eclipse-based IDE that’s reasonably nice (though I still missed Emacs from time to time), and also a command-line interface (so I can shift to that and use Emacs, if I want to).

Unsurprisingly, I found it very pleasant to be able to write the core of the application in XQuery, with no Javascript, returning XML to the browser and using XSLT and CSS to render it there. What did surprise me a little, because I had not expected it, was the exhilarating speed with which I was able to move from idea to deployed application. I’ve deployed XForms applications on the Web before, and I have an eight-point checklist for setting up a WebDAV server using Subversion and Apache. It’s not particularly difficult or strenuous, but it’s tedious and takes few hours each time I have to do it. And developing the checklist was very painful; it took a long time to find configurations that worked for me, in the environment provided by my service providers.

The developer configures a collection of documents in Sausalito by declaring the collection:

declare ordered collection my:docs as node()*;

Then they deploy the application. And it’s … just … there. Instant gratification, or as close to instant as your network latency and bandwidth will allow.

As I wrote to the developers at 28msec:

I’m … very taken with the convenience of deploying to the cloud; having an XML database on demand is a lot like having running water on demand — those who have never had it may think it’s a luxury anyone should be able to live without, but once you’ve had it, it can be hard to go back.

Vocabulary specialization and generalization, use and interchange: things our schema languages should make easier than they do

[23 January 2011]

Perhaps it’s because the call for participation in the 2011 Balisage Pre-conference Symposium on XML Document Interchange has just come out, or perhaps it’s for other reasons, but I found myself thinking today about the problem of specialized uses of generalized vocabularies.

There are lots of vocabularies defined in fairly general terms: HTML, TEI, DocBook, the NLM article tag set, you can surely think of plenty yourself.

Often, for a specific purpose in a specific organization or project, it would be handy to have a much tighter, much more specific vocabulary (and thus one that’s semantically richer, easier to process, and easier to validate tightly). For example, consider writing and managing an issues list (or a list of use cases, or any other list containing items of a specialized genre), in a generic vocabulary. It’s easy enough: you just have a section for each issue and with that section you have standard sections on where the issue came from, what part of the project it relates to, its current status, and the history of your work on it. Easy enough. And if you’re the kind of person who write macros in whatever editor you use, you can write a macro to set up a new issue by adding a section of type ‘issue’ with subsections with appropriate types and headings. But isn’t that precisely what a markup-aware editor typically does? Well, yes, typically: any schema-aware editor can look at the schema, and as soon as you say “add a new issue” they can populate it with all of the required subelements. Or, they could, if you had an element type called ‘issue’, with appropriately named sub-elements. If instead you are using a generic ‘div’ element, your editor is going to have a hard time helping you, because you haven’t said what you really mean. You want an issue, but what you’ve said is ‘add a div’.

Some schemas, and some schema languages, try to make this easier by allowing you to say, essentially, that an issue element is kind of div, and that the content model for issue is a specialization of that for div (and so on). This is better than nothing, but I’m probably not the only person who fails to use these facilities in all the cases where they would be helpful. And when I do, I have to extend the standard stylesheets for my generic vocabulary to handle my new elements, because even when the stylesheet language supports the specialization mechanisms of the schema language (as XSLT 2.0 supports element substitution groups in XSD), most stylesheets are not written to take advantage of it. And if I’m exchanging documents with someone else, they may or may not want to have to deal with my extensions to the schema.

I wonder if we might get a better answer if (a) in our schema languages it were as easy to write a rule for div type='issue' as for issue, and (b) in our validation tools it were as easy to apply multiple grammars to a document as a single grammar, and to specify that the class of documents we are interested in is given by the intersection of the grammars, or by their union, or (for grammars A, B, C) by A ∪ (B ∩ ¬ C). Also (c) if for any schema extension mechanism it were easy to generate a transformation to take documents in the extended schema into the base schema, and vice versa.

Perhaps NVDL may be in a position to help with (b), though I’ve never learned it well enough to know and it seems to be more heavily freighted with unacknowledged assumptions about schema languages and validation than I’d like.

And perhaps Relax NG already can handle both (a) and (b).

Homework to do.