Archive for the ‘Uncategorized’ Category

Finding accessible color schemes for data graphics: one problem, four solutions

Friday, June 11th, 2010

[11 June 2010]

Yesterday I spent a little time writing a stylesheet to make it easier to browse through some potentially complicated and voluminous data. I ran into a problem, and I found four solutions and learned several lessons, which I record here for those who face similar problems.

[“What stylesheet was that?” hissed my evil twin Enrique. I don't know why he always whispers when asking these questions, but he does. “It doesn't matter for the point at issue, but since you're wondering, it was a stylesheet for the catalog files used to organize the XSD test suite. If you're curious about it, you can see the stylesheet in use for a small sample catalog file.” “This is supposed to be easy to browse?” “Well, click View Source and compare it to the underlying XML.” “OK, I guess I see your point. But where's the rest of the test suite? Don't tell me the XML Schema working group has a test suite with ... lessee, eighteen test cases?” “Hush. No, of course not. But the test suite as a whole is not set up for interactive browsing on the web. Yet.”]

[“And what was the problem?” “Stop interrupting and maybe you'll find out.”]

To make it a little easier to see which test cases involve valid documents and which involve invalid documents Extract from catalog display showing lines with colored backgrounds(not to mention documents with [validity] = notKnown and test cases with implementation-defined results), I supplied a little pale green, pink, blue, or gray background for the different possible expected results.

So far, so good, but this morning Liam Quin mentioned casually that I might want to consider changing the colors, since the red and green backgrounds would be indistinguishable for some readers. He was right, of course, and the original color scheme was really just a sort of quick and dirty proof of concept, not intended for final use.

“Boy, do you sound defensive. Thought you could get away with it, huh?” asked Enrique. “Oh, hush. Like you never take shortcuts with things.”

So I tested the color scheme using the incredibly useful image analysis tool at VisCheck.com: A simulation of deuteranopy (red/green color blindness) on the image given earlier you upload an image, and can perform simulations of deuteranopy (red/green color deficit), protanopy (a different form of red/green deficit), and tritanopy (a blue/yellow color deficit). And Liam was absolutely right: the backgrounds for valid and invalid test cases were virtually indistinguishable in the output of the first simulation.

So I set about to fix it, and I learned some things.

First solution. First, I taught myself something about how not to go about this task. Using the helpful Color Sphere widget I downloaded some time ago from colorjack.com, I found a set of three colors that were pretty well distinct in all the different filters it provides: a green, a red, and a blue. They were too intense to use as background colors, so I spent some time with a color picker finding RGB values that matched those hues Screen shot fragment showing one version of the color schemebut had less saturation (select the RGB sliders pane, convert from hex to decimal because the widget doesn’t understand hex; change to the HSB pane, slide the saturation level down a bit, go back to the RGB pane, convert the decimal numbers back to hex, write them down) and Another version of the color scheme checking the resulting color scheme with VisCheck (update the stylesheet, refresh the page in the browser, take a screen snapshot of part of the page showing all the colors, go to VisCheck, upload, perform deuteranopy simulation, whoops, that won’t work, go back and change one of the colors, come back, upload, perform deuteranopy simulation, upload, perform protanopy simulation, upload, perform tritanopy simulation).

I’m very grateful for these tools, but sometimes I do wish it were more convenient to run all three filters on the input.

After a couple of passes and errors and false starts (which means; after an hour and a half or so), I had a color scheme Yet another version of the color scheme that worked for all the vision types I was checking.

The only problem was that I then decided the green was too virulent and intense.

No problem, I thought: Another version of the color scheme these three hues are good for all three types of color deficit, I’ll just lower the saturation to make them less intense, and I’m good to go. Wrong: on one of the simulations, there was no longer any distinction between the pink and the gray. And the green still bothered me.

So I learned an important lesson: In trying to make colors for data graphics accessible to all types of vision, it’s not just hue that counts: hue and saturation interact with the different deficits.

Second solution. I went back to the colorjack.com Color Sphere widget on my system, and noticed that I could control not just the hue of the primary color but its saturation as well: as you move toward the center of the color circle, saturation goes down. A new color scheme And as you move things around with the deuteranopy filter set, you can see for yourself that the same three hues can be distinct at one saturation and not at another. So I got another color scheme that worked.

Second lesson: if you know in advance that you want the colors to be unsaturated and unobtrusive, use the ability of the color picker / color scheme generator you’re working with, to get close to the saturation and brightness you need. The color picker is better at these kinds of conversions than you are.

The only problem was that as I continued working with the data I grew to like this new color scheme less and less. The green wasn’t virulent any more, but it also wasn’t green; more a greenish yellow.

So I gave up on the assocations green = go = valid outcome, red = stop = invalid outcome. All that matters is that the colors be distinct; they don’t need to be particularly mnemonic (the distinction is, after all, also carried by the text). A new color scheme So back to the color picker, once or twice. It gave me schemes that worked in the sense of being visually distinct for normal vision, deuteranopy, etc., but … well, I didn’t much like any of them visually.

Third solution. I remembered that some time ago, I had found a nice monochromatic color scheme for the XSD type hierarchy diagram, Much-reduced image showing the XSD type hierarchy, with a color scheme in various shades of blue which I had concluded was in fact more attractive than the polychrome color scheme we had started with. So I copied that, and that’s the scheme in use in the stylesheet now.

When I did that work on the type hierarchy diagram, a friend (not Enrique, I promise) asked me “So, can you write down an accessible color scheme I can use whenever I need one, so I don’t have to go through all this hassle of testing things and experimenting and changing things?” I told him no, I didn’t think so: too much depends on the information you’re trying to convey.

But I think one useful general rule can be formulated. If you are looking for a coherent color scheme for data graphics, and the only required function of the colors is that they be visually distinct from each other both for normal vision and for the various color deficits, then one very simple approach is to go to a color scheme generator like the one at colorschemedesigner.com, pick a hue, and generate a monochromatic color scheme (hmm, not all color scheme generators have this as an option). You should definitely use the color-deficit simulations on the result, just to make sure, but virtually all the variation among the colors of a monochromatic scheme is variation in saturation and brightness, not in hue, so the chances are much greater that the variation will be perceptible to all types of vision.

So: third lesson. Try a monochromatic color scheme.

Fourth solution. Another fairly quick and easy solution is to use the Daltonize tool at VisCheck.com, which makes the visual information in an image (in the case of coloring for data graphics, this means the information conveyed by distinctions of color) and makes it more readily perceptible to color-blind viewers by increasing the red/green contrast and by converting information conveyed by red/green distinctions into variations in brightness and on the blue/yellow axis. If you are working with something like false-color photography, where there are many variations in color and tone, things like color-scheme generators are not going to help you; Daltonize is a very cool tool for those applications.

Disclaimer: I know enough accessibility specialists, and enough people with good graphical skills, to know that I am neither one nor the other. I lay no claim to particular expertise in accessibility issues, only to a firm belief that they are important. I think in fact that they are too important to be the concern only of accessibility specialists, just as design is too important to be left only to designers: every maker of Web pages needs to think about the visual communication of information, and about making that information accessible to all the members of your audience. When you do, you’ll appreciate the contributions and the expertise of specialists all the more. (They will remind you, for example, that color-blindness is just one barrier to accessibility, among many. It just happens to be one that lends itself to pretty, colorful illustrations.)

Only when they are accessible to everyone who may need them will your web pages, and The Web, achieve their full potential.

Boomerangs, bad pennies, encores

Thursday, June 10th, 2010

[10 June 2010]

As of 1 June, W3C is paying for a quarter of my time, to work with the W3C XML Schema working group. Given the current state of XSD 1.1 (the working group is mostly waiting for implementations to be completed before progressing it to Proposed Recommendation), my time will mostly be devoted to work on the XSD 1.1 test suite and whatever else can be done to smooth the path of the implementors.

In the work group, the air has (predictably) been full of the expected boomerang references, Thomas Wolfe quotations, and Terminator jokes, with the occasional mention of encore performances. I leave the imagination of the scene as an exercise to the reader.

“Encore performances?” sneered my evil twin Enrique, when he read this over my shoulder. “I’ll tell you about encore performances!” Long ago, he said, he attended a recital where the pianist’s performance of a very difficult piece was greeted by thunderous applause. The astonished pianist had not prepared an encore, so as an encore he simply played the final piece over again. And the second time, said Enrique, the audience was still wildly enthusiastic. There were further cries of “Encore, encore!”, mixed with others of “Again! Again!” And so he played the piece yet again.

After the third rendition the audience was still calling for more, but Enrique swears he heard the man behind him shouting, above the din, “Again! Again! And you’re gonna keep playing it, until you get it right!”

[Thank you, Enrique, for that vote of confidence. I agree, at least, that it's worth while to try to get a spec right.]

Three quarters of my time will continue to be spent on consulting and contract work, with a focus on using standards and descriptive markup to help make digital information more widely useful and give it longer life. Just as it’s best to combine theory and practice, if possible, so also it’s helpful to combine standards work with work on practical applications. For example: I just finished a project together with an archival collection at a major U.S. university; they are encoding their finding aids in XML using the Encoded Archival Description, and they are publishing the finding aids on the Web, in XML, with XSLT stylesheets to display them nicely for humans. Because they don’t have a separate EAD-to-HTML translation step, their workflow is simpler: they update a collection record in their archival management system, save the collection description as XML, copy it to their Web server, and it’s published. Their finding-aids site is very cool.

In other words, they are using the Web and XML in just the ways their originators hoped they could be used: for sharing semantically rich information. It’s a great pleasure to work on exploiting the open standards W3C has produced, and I look forward to helping produce more of them.

Thought for the day

Tuesday, January 19th, 2010

[19 January 2010]

Not quite heard in conversation: “You will never gain full mastery of any tool until you have misused it.”

Day of the dead

Monday, November 2nd, 2009

[2 November 2009]

Today is the Feast of All Souls, better known where I come from as the Day of the Dead. It’s a useful day to remember the dead.

Today, I am thinking particularly of Donald Walker, Antonio Zampolli, Yuri Rubinsky, each important in different ways to me. Life remains (as I expected when they died) a little harder without them around.

It’s also a good day to think about the death that will come for each of us before long.

Deyr fé,    deyia frœndr,
deyr siálfr it sama;
enn orðztírr    deyr aldregi,
hveim er sér góðan getr.
Deyr fé,    deyia frœndr,
deyr siálfr it sama;
ec veit einn,    at aldri deyr:
dómr um dauðan hvern.

What will we leave to those who stay here after us? What would we like to be remembered by?

Balisage is calling …

Wednesday, August 5th, 2009

[5 August 2009]

This week I’m busy trying to wrap things up before heading to Montréal next week for Balisage. Songs from South Pacific keep running through my head, starting of course with “Bali Ha’i” (to which Enrique is working on a contrafacture).

I had meant to post periodically over the summer about papers I’m particularly looking forward to hearing, in the interest of reminding people about the conference and trying to encourage attendance. I only managed one or two, but it seems I needn’t feel guilty, after all. The conference chair, Tommie Usdin of Mulberry Technologies, tells me that we have now pre-registered more people for Balisage 2009 than we have had at any previous Balisage.

So even without my reminding people about what is on the program, people are coming to the conference anyway. Good! But I can’t resist mentioning here: Fabio Vitali and his colleagues have a really super idea for encoding overlapping structures by using RDF (which automatically means that we can try using SPARQL to query such documents). The continuing work on XML representations of overlap in Bielefeld and Lyon continues to bear fruit: Maik Stührenberg and Daniel Jettka of Bielefeld are talking about XStandoff, the successor to the Sekimo General Format (SGF) developed earlier in Bielefeld, while Pierre Edouard Portier and Sylvie Calabretto of Lyon are talking about the problem of constructing documents using formats like Lyon’s MultiX. And Desmond Schmidt of Queensland University of Technology is coming, to talk about his work on overlapping structures in multi-versioned documents.

Norm Walsh and Michael Kay are both talking about pipelines in XML processing. Michael is also chairing a full-day symposium on Monday about efficient processing of XML. (Why did no one from the EXI effort offer a paper?!) Kurt Cagle is talking about XML and linked data (that would be the rebranding of the Semantic Web).

And there’s a lot more. See the program for details.

So this year Balisage will be bigger than ever before.

I hope to see you in Montréal next week!

Another example of the curb-cut effect

Monday, June 29th, 2009

[29 June 2009]

The XSD Datatypes spec has a diagram showing the hierarchical derivation relations among the built-in datatypes. The old version (created by Asir Vedamuthu, to whom thanks, and used in XSD 1.0 and in earlier drafts of XSD 1.1) has simple color-coding to distinguish various classes of datatypes (what are now called the special datatypes, the primitives, and the other built-ins).

For the Candidate Recommendation draft of XSD 1.1, though, we needed to make a new drawing to show the built-in datatypes added in 1.1 (anyAtomicType, dateTimeStamp, dayTimeDuration, yearMonthDuration, precisionDecimal).

The new version created for the Candidate Recommendation draft has a new color scheme, which I made with the help of a very nice tool for color, now to be found at colorschemedesigner.com (I used the previous version, but the functionality I counted on is still there). This tool (and some others) allows you to see an approximation of the effect of your color scheme for a reader with various forms of color perception deficit (protanopy, deuteranopy, tritanopy, etc.), which means you can try to ensure that the distinctions in your diagrams are visible also to readers with those forms of vision.

I found it remarkable that I ended up with a color scheme I find more attractive than the old one; it’s remarkable how many people have told me they think the same (without realizing the proximate cause of the change).

SVG, of course, makes it easy to make diagrams for which the color scheme can easily be modified. And XSLT makes it easier to generate this diagram and to modify it systematically in various ways (including color scheme). But it’s the idea of universal design that gets the credit for making the diagram visually more attractive.

Universal design: try it sometime. You’ll be glad you did.

Erik Naggum, R.I.P.

Saturday, June 20th, 2009

[20 June 2009]

It appears from reports on the Net that Erik Naggum, long-time genius loci of comp.text.sgml, has died.

In person, he was (as far as I could tell, on the very few occasions I encountered him in the flesh) a very sweet individual. On the net — well, he taught me what a flame war was. His work on internationalization gave hints of great generosity; his resentment against the Unicode Consortium was almost comic in its ferocity (even to me, never one of that organization’s greates fans).

Erik Naggum, dead? Is it possible? One person fewer who remembers the old days.

So it goes.

Grant and contract-supported software development

Tuesday, April 7th, 2009

[7 April 2009]

Bob Sutor asks, in a blog post this morning, some questions about government funding and open source software. Since some of them, at least, are questions I have thought about for a while as a reviewer for the National Endowment for the Humanities and other funding agencies, I think I’ll take a shot at answering them. To increase the opportunity for independent thought to occur, I’ll answer them before I read Bob Sutor’s take on them; if we turn out to agree or disagree in ways that require comment, that can be separate.

He asks:

  • When a government provides funding to a research project, should any software created in the project be released under an open source license?

It depends.

In practice, I think it almost always should, but in theory I recognize the possibility of cases in which it needn’t.

When I review a funding proposal, I ask (among other things): what is the quid pro quo? The people of the country fund this proposal; what are they buying with that money? A reliable survey of the work of Ramon Llull and its relevance to today? Sounds good (assuming I think the applicant is actually in a position to produce a reliable survey, and the cost is not exorbitant). A better tool for finding and studying emblem books? Insight into methods of performing some important task? (Digitizing cultural artefacts, archiving digital research results for posterity, creating reliable language corpora, handling standoff annotation, … there are a whole lot of tasks it would be good to know better how to do.) How interesting is what we would be learning about? How much are we likely to learn?

My emphasis on what we get for the money sometimes leads other reviewers or panelists to regard me as cold and mean-hearted, insufficiently concerned with encouraging a movement here, nurturing a career there. But I have noticed that the smartest and most attractive members of panels I’ve been on are almost always even tougher graders than I am. When funds are as tight as they typically are, you really do need to put them where they will do the most good.

If the value proposition of the funding proposal is “we’ll develop this cool software”, then as a reviewer I want the public to own that software. Otherwise, what did we buy with that money?

If the value proposition is “we’ll develop these cool algorithms and techniques, and write them up, so the community now has better knowledge of how to do XYZ — oh, and along the way we will write this software, it’s necessary to the plan but it’s not what this grant is buying”, then I don’t think I want to insist on open-sourcing the software. But it does make it harder for the applicant to make the case that the results will be worth the money.

Stipulating that software produced in a project will be open-source does usually help persuade me that its benefit will be greater and more permanent. If the primary deliverable I care about is insight, or an algorithm, open-sourcing the software may not be essential. But it helps guarantee that there will be a mercilessly complete account of the algorithm with all details. (It does have the potential danger, though, that it may allow other reviewers or the applicants to believe that the source code provides an adequate account of the algorithm and there is no need for a well written paper or series of papers on the technical problem. I am told that some programmers write source code so clear and beautiful that it might suffice as a description of the algorithm. I say, if writing documentation as well as source code is good enough for Donald Knuth, it’s good enough for the rest of us.)

On the other hand, I don’t think deciding not to open-source the software is necessarily an insuperable barrier. The question is: what value is the nation or the world going to get from this funding? Sometimes the value clearly lies with the software people are proposing to develop, sometimes it clearly lies elsewhere and the software plays a purely subordinate, if essential, role. (But although I admit this in principle, I am not sure that in practice I have ever liked a proposal that proposed to spend a lot of effort on software but not to make it generally available. So maybe my generosity toward non-open-source projects is a purely theoretical quantity, not observable in practice.)

If software is involved, you also have to ask yourself as a reviewer how well it is likely to be engineered and whether the release of the software will serve the greater good, or whether it will act like a laboratory escape, not providing good value but inhibiting the devotion of resources to creating better software.

The chances and consequences of suboptimal engineering vary, of course, with whether the research in question is focused specifically on computer science and software engineering, or on an application domain, in which case there is a long and often proud history of good science being performed with software that would make any self-respecting software engineer gag. (A long time ago, I worked at a well known university where the music department burned more CPU cycles on the university mainframe than any other department. Partly this was because Physics had its own machines, of course, and partly it was because the music people were doing some really really cool and interesting stuff. But was it also partly because they were lousy programmers who ran the worst optimized code east of the Mississippi? I never found out.)

  • Does this change if commercial companies are involved? How?

If the work is being done by a commercial company, they are historically perhaps less likely to want to make the software they develop open-source. That’s one way the process is affected.

But also, if a government agency is contracting with a commercial organization to develop some software, there may be a higher chance that the agency wants some software for particular parties to use, and the main benefit to be gained is the availability to those parties of the software involved. In some cases, the benefit may be the existence of commercially viable organizations willing and able to support software of a particular class and develop it further.

There are plenty of examples of commercial codebases developed in close consultation with an initial client or with a small group of initial clients. The developer gets money with which to do the development; the initial clients get to help shape the product and ensure that at least one commercial product on the market meets their needs. In the cases I have heard of, the clients don’t typically turn around and demand that the code base be open-source.

It’s not clear to me that government funding agencies should be barred from acting as clients in scenarios like this. This kind of arrangement isn’t precisely what I tend to think of as “research”, but whether it’s appropriate or not in a given research program really depends on the terms of reference of that program, and not on what counts as research in the institutions that trained me.

I have been told on what I think is good authority that if it had not been for contracts let by various defence agencies, the original crop of SGML software might never have been commercially viable. (And since it was that crop of software that demonstrated the utility of descriptive markup and made XML possible, I wouldn’t like to try to outlaw whatever practices led to that software being developed.)

  • Does this change if academic institutions are involved? How?

I don’t think so.

  • How should the open source license be chosen? Who gets to decide?

Yes.

Two umbrellas and a prime number.

I think I mean “Huh?” Is this a trick question?”

To the extent that we think of funded research as the purchase (on spec) of certain research products we hope the funding will produce, then the funding agency can certainly say “We want the … license”. And then the Golden Rule of Arts and Sciences applies. Or the people writing the proposal can say “We want to use the … license; take it or leave it.” And the funding agency, guided by its reviewers and panelists and staff and the native wit of those responsible for the final decision, will either leave it or take it.

The only thing that would make me more suspicious and worried than this chaotic back and forth would be an attempt to make an iron-clad rule to cover all cases, for all projects, for all governmental funding agencies.

Heinrich Hertz and the empty set of tomatoes

Thursday, April 2nd, 2009

[2 April 2009]

Why does Nelson Goodman want to work so hard just to avoid talking about classes or sets?

Earlier this year I spent some time reading the section on the calculus of individuals in Nelson Goodman’s The structure of appearance (3d ed. Boston: Reidel, 1977) and the paper Goodman wrote on the subject with Henry S. Leonard (Henry S. Leonard and Nelson Goodman, “The calculus of individuals and its uses” The journal of symbolic logic 5.2 (1940): 45-55).

I was struck by the lengths Goodman goes to in order to avoid talking about sets, although his compound individuals which contain other individuals seem to be doing very much the same work as sets. Indeed, the 1940 paper makes a selling point of this fact. On page 46, Leonard and Goodman write “To any analytic proposition of the Boolean algebra will correspond a postulate or theorem of this calculus provided that …” (In other words, with some few provisos, if you can make a true statement about sets, you can make a corresponding true statement about individuals in the calculus of individuals. The provisos aren’t even statements you can’t make, just restrictions on the form you make them in. Instead of saying “the intersection of x and y is the empty set” you have to say they are discrete. And so on.) And the concluding sentence of the paper (p. 55) is: “The dispute between nominalist and realist as to what actual entities are individuals and what are classes is recognized as devolving upon matters of interpretative convenience rather than upon metaphysical necessity.“

In other words, Goodman seems at first glance to be simplifying the world by eliminating the notion of sets and classes, and then to be complicating it again in precisely similar ways by taking all of the fundamental ideas we have about sets or classes, and reconstructing them as funny ways of talking about individuals. Cui bono?

This afternoon I saw a review by Anthony Gottlieb, in the New Yorker, of a recent book about the Wittgenstein family (Alexander Waugh, The House of Wittgenstein: A family at war), which seems to suggest a solution. Gottlieb quotes a suggestion from the physicist Heinrich Hertz:

Hertz had suggested a novel way to deal with the puzzling concept of force in Newtonian physics: the best approach was not to try to define it but to restate Newton’s theory in a way that eliminates any reference to force. Once this was done, according to Hertz, “the question as to the nature of force will not have been answered; but our minds, no longer vexed, will cease to ask illegitimate questions.”

(Throws a new light on Wittgenstein’s remark about not wanting to solve problems but to dissolve them, doesn’t it?)

It’s true that once you rebuild the ideas of set union, intersection, difference, etc. as ideas about individuals which can overlap or contain other individuals, and eliminate the word ‘set’, it becomes a lot harder to describe a set which contains as members all sets which are members of themselves, or a set which contains as members all sets which are not members of themselves. The closest you can conveniently get are statements about individuals which overlap themselves (they all do) or which do not overlap themselves (no such individual). Good-bye, Russell’s Paradox!

And consider the surrealist joke I ran into the other day:

Q. What is red and invisible?
A. No tomatoes.

A user of the calculus of individuals can enjoy this on its own terms, without having to worry about whether it’s a veiled reference to the fact that some typed logics end up with multiple forms of empty set, one for each type in the system. One for integers, if you’re going to reason about integers. One for customer records, if you’re going to reason about customers. And … one for tomatoes?

Q. What is red and invisible?
A. The empty set of tomatoes.

</w3c:msm>

Friday, January 30th, 2009

[30 January 2009]

Today is my last workday as a staff member at the World Wide Web Consortium.

I have learned a lot during my time here, I’ve enjoyed the work, and I have tried to help make the Web a better place for people who care about information and for the information we care about. But it’s ten years since I joined the staff, and it’s time to move on.

What’s next? I will be doing consulting and contract work through Black Mesa Technologies LLC. If you have interesting problems touching on documents, electronic representation of information (documents or other), validation, XSLT, XQuery, or the like; if you have concerns about the proper application of information technology to the preservation of commercial or cultural-heritage information, then give me a call; I’ll be around.