7500 to 1

[14 December 2017]

A little less than a month ago, noticing that spam had gotten out of hand here again, I started monitoring comments on this blog manually, to observe them more closely.

Since that time, the About page and the posts still open for comments have received something more than 7500 comments (so, between 200 and 300 a day), not counting those blocked by the Bad Behavior plugin), of which one was a relevant comment made by a human, and the others were spam.

Sigh.

Team work, specialization, fact-checking

[26 January 2012]

This week’s New Yorker has an interesting essay on brainstorming (doesn’t work, it says). It brought my evil twin Enrique running, waving his copy in the air. “Look at this. Look at this!” he shouted.

I looked at the passage he pointed out. Pursuing the observation that “like it or not, human creativity has increasingly become a group process”, the author quotes one Ben Jones, a professor at the Kellogg School of Management at Northwestern University, who has quantified the trend away from solo work and towards work in teams.

“‘A hundred years ago, the Wright brothers could build an airplane all by themselves,’ Jones says. ‘Now Boeing needs hundreds of engineers just to design and produce the engines.’”

“Well,” I said to Enrique, “no question that teams are bigger today.” “But …” he spluttered. “But what?” I said. “But Boeing doesn’t make engines.” “They don’t?” (I love to play dumb; it drives Enrique speechless with frustration. But he seems to be right. If I’m reading their Web site correctly, Boeing hasn’t manufactured an engine since 1968, and those weren’t aircraft engines in any case.) “But what makes the airplane go, then?” “GE makes engines,” Enrique snarled. “Rolls-Royce makes engines. Pratt and Whitney makes engines. Boeing makes airframes” (along with many other things, I hasten to add, none of them engines). “How can someone be interested in specialization and not know that?”

Didn’t the New Yorker use to have a fact-checking department?

Copyright and other unwelcome issues

[10 November 2011]

One of the unwelcome side effects of recent trends in copyright (I mean the gradual shift, over the last fifty years, towards more and more protection for commercial interests and less and less protection of the public benefit) is that while it used to be easy to make one’s own work readily available for reuse by others, it now requires more careful planning. It used to be, for example, that if you didn’t care to claim or protect copyright in something you wrote, all you had to do was nothing: if you didn’t claim copyright, and the work was public, then it was in the public domain.

[“Hmm. How sure of you are that?” asked my evil twin Enrique, with a suspicious look. “Well, kind of sort of sure, I think.” “Better add a disclaimer, then, don’t you think?” “OK, right you are.”]

At least, that’s how I understand it, at a first approximation. (I am not a lawyer and have never much wanted to be, though a friend of mine who did go to law school once told me I’d enjoy the mysteries and mystique of tax law. So no reader should take anything I write as providing guidance about the law of the U.S. or any other country.) If you want good information about copyright, go find something by Pamela Samuelson.

[“OK, that’ll do, I guess. Is Pamela Samuelson really a good source?” “The best. Thank heavens she’s writing that column for Communications of the ACM again.”]

Nowadays, of course, in the U.S. this is no longer true: from the moment you write anything down, you own copyright in it, whether you want it or not, unless you do something to avoid it.

Of course, many people ignore this and behave as if the old legal regime were still in place. I’ve had representatives of U.S. universities say “Oh, feel free to reuse that stylesheet we wrote” — as if, because it carried no copyright statement, it were available for reuse by anyone interested. On the contrary! Since the stylesheet didn’t carry any licensing information or dedication to the public domain, it was certainly copyright either by the individual who wrote it or by the institution for which it was written. And since it didn’t carry any copyright information, it was impossible to know with any confidence who actually did (or does) own the copyright, and whom to contact for permission.

[“And these people who were trying to give you permission to reuse that stylesheet, were they legally empowered to enter into binding agreements on behalf of their institutions?” Enrique asked. “Dunno. I doubt it.” “So, tell me, do you have a barge pole handy?” “A barge pole? No, why?” “Because I want to warn you not to touch that code with a barge pole, that’s why. And if you don’t have a barge pole, it just feels kind of pointless. Do you have an eleven-foot pole, maybe?” “Oh, hush.”]

End result: I politely ignored (at least, I hope my silence was polite) their invitation to reuse that code, and I wrote new code from scratch.

[“Oh, come on,” Enrique hissed. “You know perfectly well you wouldn’t have reused that code anyway! It was full of xsl:for-each elements.” (New readers may need to be informed that I seem to have an issue with xsl:for-each elements; I’m sure it’s a perfectly fine construct and there’s nothing wrong with it. I only know that when I have a stylesheet with a bug and discover that it has for-each constructs, rewriting the for-each as an apply-templates always seems to make the bug disappear. Go figure.) “Well, yeah. But even if I had loved the code, I would not have felt able to reuse it.”]

Of course, there are plenty of open-source and Creative Commons licenses to choose from, if you want to ensure that work you do can be re-used.

But who, in a collaborative project, is “you”?

If you write code or prose as an individual, outside the course and scope of your normal employment duties, then it’s straightforward to assert copyright in your own name. But if you are collaborating with others in a project, and you want to apply an appropriate license, in whose name should copyright be claimed? If only one person works on a given item (a program or a document) it’s easy to say that person should assert copyright and grant the license. But if more than one person works on it?

Some people incline to claim copyright in the name of the project, which feels plausible at some level: project is a name we sometimes give to the intentional collaboration of individuals to achieve some goal, and work done in furtherance of that goal can plausibly said to be done for “the project”.

But can a project which is not a legal entity actually be the owner of a copyright? If there’s a legal entity involved, it’s possible in principle to figure out, in case of disputes, who speaks for the entity and who makes decisions. But if there’s no legal entity?

Can copyright usefully be claimed by a research project, in the name of the research project?

[“Well, wouldn’t a research project be legally a form of partnership?” asked Enrique. “A partnership doesn’t have to be incorporated to be a legal person, right?” “Maybe,” I equivocated. “But remember, I am not a lawyer. And a fortiori you, as a figment of my imagination, are also not a lawyer.” “Oh, go soak your head. Whom are you calling a figment … ?”]

I notice that W3C, for example, which is not a legal entity, claims copyright in the name of W3C, but immediately after adds, in parentheses, the names of the three host institutions of W3C, which are legal entities.

It would be nice, wouldn’t it, if intellectual property rights served to promote the useful arts and sciences, instead of being an unproductive drain on the time and effort of creative people and a barrier to normal intellectual work? Oh, well, maybe someday.

Day of the dead 2011

7 November 2011

Last week’s celebration of the Day of the Dead (aka All Souls’ Day, 2 November) was a little more thoughtful for me than it is in some years. Partly this was because John McCarthy had just died, and partly because this year seems to have taken an unusually high toll in people whose work I have had occasion to value.

News of McCarthy’s death came through when I was on the phone with John Cowan and my brother Roger Sperberg. We paused for a few moments, and then we spent half an hour thinking about technical topics, which seemed like a good way to mark the occasion. (For example: if the original plan was for Lisp programs to be written not in S-expressions but in an Algol-like syntax called M-expressions, is that a sign that McCarthy was less far-sighted than he might have been? How could he not have seen the importance of the idea that Lisp data and Lisp programs should use the same primitive data structures? Perhaps he had feet of clay, so to speak? Or on the contrary should we infer, from the fact that the plan for M-expressions was abandoned and that Lisp became what it became, that McCarthy was astute enough to recognize great ideas when he saw them, and nimble enough to change his plans to capture them? On the whole, I guess I lean toward the latter view.)

This year, Father Roberto Busa also died. Many people (including me) regard him as the founder of the field of digital humanities, because of his work, beginning in 1948, on a machine-readable text of the work of Thomas Aquinas. The Index Thomisticus was completed in 1978, several IT revolutions later. Busa, too, was astute enough to adjust his plans in mid-project: his initial plans involved clever use of punched cards and sorters, and it was only after the project had been going for some years that it began to use computers instead of unit-record equipment. I met Busa only briefly, once as a young man at my first job in humanities computing, and once years later when I chaired the committee which voted to award him what became the Busa Award for contributions to the application of information technology to humanistic scholarship. But he made a strong impression on me with his sweetness of temper and his intelligence. He made an even stronger impression on me indirectly: Antonio Zampolli worked with Busa as a student. And without Antonio, I think my life would have had a rather different shape.

Oh, well. Nobody gets out of here alive, anyway.

Finding accessible color schemes for data graphics: one problem, four solutions

[11 June 2010]

Yesterday I spent a little time writing a stylesheet to make it easier to browse through some potentially complicated and voluminous data. I ran into a problem, and I found four solutions and learned several lessons, which I record here for those who face similar problems.

[“What stylesheet was that?” hissed my evil twin Enrique. I don’t know why he always whispers when asking these questions, but he does. “It doesn’t matter for the point at issue, but since you’re wondering, it was a stylesheet for the catalog files used to organize the XSD test suite. If you’re curious about it, you can see the stylesheet in use for a small sample catalog file.” “This is supposed to be easy to browse?” “Well, click View Source and compare it to the underlying XML.” “OK, I guess I see your point. But where’s the rest of the test suite? Don’t tell me the XML Schema working group has a test suite with … lessee, eighteen test cases?” “Hush. No, of course not. But the test suite as a whole is not set up for interactive browsing on the web. Yet.”]

[“And what was the problem?” “Stop interrupting and maybe you’ll find out.”]

To make it a little easier to see which test cases involve valid documents and which involve invalid documents Extract from catalog display showing lines with colored backgrounds(not to mention documents with [validity] = notKnown and test cases with implementation-defined results), I supplied a little pale green, pink, blue, or gray background for the different possible expected results.

So far, so good, but this morning Liam Quin mentioned casually that I might want to consider changing the colors, since the red and green backgrounds would be indistinguishable for some readers. He was right, of course, and the original color scheme was really just a sort of quick and dirty proof of concept, not intended for final use.

“Boy, do you sound defensive. Thought you could get away with it, huh?” asked Enrique. “Oh, hush. Like you never take shortcuts with things.”

So I tested the color scheme using the incredibly useful image analysis tool at VisCheck.com: A simulation of deuteranopy (red/green color blindness) on the image given earlier you upload an image, and can perform simulations of deuteranopy (red/green color deficit), protanopy (a different form of red/green deficit), and tritanopy (a blue/yellow color deficit). And Liam was absolutely right: the backgrounds for valid and invalid test cases were virtually indistinguishable in the output of the first simulation.

So I set about to fix it, and I learned some things.

First solution. First, I taught myself something about how not to go about this task. Using the helpful Color Sphere widget I downloaded some time ago from colorjack.com, I found a set of three colors that were pretty well distinct in all the different filters it provides: a green, a red, and a blue. They were too intense to use as background colors, so I spent some time with a color picker finding RGB values that matched those hues Screen shot fragment showing one version of the color schemebut had less saturation (select the RGB sliders pane, convert from hex to decimal because the widget doesn’t understand hex; change to the HSB pane, slide the saturation level down a bit, go back to the RGB pane, convert the decimal numbers back to hex, write them down) and Another version of the color scheme checking the resulting color scheme with VisCheck (update the stylesheet, refresh the page in the browser, take a screen snapshot of part of the page showing all the colors, go to VisCheck, upload, perform deuteranopy simulation, whoops, that won’t work, go back and change one of the colors, come back, upload, perform deuteranopy simulation, upload, perform protanopy simulation, upload, perform tritanopy simulation).

I’m very grateful for these tools, but sometimes I do wish it were more convenient to run all three filters on the input.

After a couple of passes and errors and false starts (which means; after an hour and a half or so), I had a color scheme Yet another version of the color scheme that worked for all the vision types I was checking.

The only problem was that I then decided the green was too virulent and intense.

No problem, I thought: Another version of the color scheme these three hues are good for all three types of color deficit, I’ll just lower the saturation to make them less intense, and I’m good to go. Wrong: on one of the simulations, there was no longer any distinction between the pink and the gray. And the green still bothered me.

So I learned an important lesson: In trying to make colors for data graphics accessible to all types of vision, it’s not just hue that counts: hue and saturation interact with the different deficits.

Second solution. I went back to the colorjack.com Color Sphere widget on my system, and noticed that I could control not just the hue of the primary color but its saturation as well: as you move toward the center of the color circle, saturation goes down. A new color scheme And as you move things around with the deuteranopy filter set, you can see for yourself that the same three hues can be distinct at one saturation and not at another. So I got another color scheme that worked.

Second lesson: if you know in advance that you want the colors to be unsaturated and unobtrusive, use the ability of the color picker / color scheme generator you’re working with, to get close to the saturation and brightness you need. The color picker is better at these kinds of conversions than you are.

The only problem was that as I continued working with the data I grew to like this new color scheme less and less. The green wasn’t virulent any more, but it also wasn’t green; more a greenish yellow.

So I gave up on the assocations green = go = valid outcome, red = stop = invalid outcome. All that matters is that the colors be distinct; they don’t need to be particularly mnemonic (the distinction is, after all, also carried by the text). A new color scheme So back to the color picker, once or twice. It gave me schemes that worked in the sense of being visually distinct for normal vision, deuteranopy, etc., but … well, I didn’t much like any of them visually.

Third solution. I remembered that some time ago, I had found a nice monochromatic color scheme for the XSD type hierarchy diagram, Much-reduced image showing the XSD type hierarchy, with a color scheme in various shades of blue which I had concluded was in fact more attractive than the polychrome color scheme we had started with. So I copied that, and that’s the scheme in use in the stylesheet now.

When I did that work on the type hierarchy diagram, a friend (not Enrique, I promise) asked me “So, can you write down an accessible color scheme I can use whenever I need one, so I don’t have to go through all this hassle of testing things and experimenting and changing things?” I told him no, I didn’t think so: too much depends on the information you’re trying to convey.

But I think one useful general rule can be formulated. If you are looking for a coherent color scheme for data graphics, and the only required function of the colors is that they be visually distinct from each other both for normal vision and for the various color deficits, then one very simple approach is to go to a color scheme generator like the one at colorschemedesigner.com, pick a hue, and generate a monochromatic color scheme (hmm, not all color scheme generators have this as an option). You should definitely use the color-deficit simulations on the result, just to make sure, but virtually all the variation among the colors of a monochromatic scheme is variation in saturation and brightness, not in hue, so the chances are much greater that the variation will be perceptible to all types of vision.

So: third lesson. Try a monochromatic color scheme.

Fourth solution. Another fairly quick and easy solution is to use the Daltonize tool at VisCheck.com, which makes the visual information in an image (in the case of coloring for data graphics, this means the information conveyed by distinctions of color) and makes it more readily perceptible to color-blind viewers by increasing the red/green contrast and by converting information conveyed by red/green distinctions into variations in brightness and on the blue/yellow axis. If you are working with something like false-color photography, where there are many variations in color and tone, things like color-scheme generators are not going to help you; Daltonize is a very cool tool for those applications.

Disclaimer: I know enough accessibility specialists, and enough people with good graphical skills, to know that I am neither one nor the other. I lay no claim to particular expertise in accessibility issues, only to a firm belief that they are important. I think in fact that they are too important to be the concern only of accessibility specialists, just as design is too important to be left only to designers: every maker of Web pages needs to think about the visual communication of information, and about making that information accessible to all the members of your audience. When you do, you’ll appreciate the contributions and the expertise of specialists all the more. (They will remind you, for example, that color-blindness is just one barrier to accessibility, among many. It just happens to be one that lends itself to pretty, colorful illustrations.)

Only when they are accessible to everyone who may need them will your web pages, and The Web, achieve their full potential.