For collectors of headline ambiguities …

[22 April 2009; unintended injection of draft notes for other postings removed 18 November 2017]

Linguists (and others) like to collect cases in which the compressed, telegraphic style of newspaper headlines lead to unexpected syntactic ambiguities.

The Albuquerque Journal and the Santa Fe New Mexican both carried stories yesterday about the results of a new University of New Mexico study on the incidence of spinal injuries in the U.S.

I showed them to Enrique, who glanced at the headline in the Journal:

Paralysis More Widespread Than Thought

and asked “When did the Albuquerque Journal start covering W3C and ISO Working Groups?”

Consistency checking in prose

[26 March 2009]

Twice in the last couple of days I’ve run into documents with consistency problems. So I’ve been thinking about consistency checking in prose as a challenge.

The web site for a large organization has, in the About Us section of the site, a side bar saying so-and-so many employees in so-and-so many countries. And one of the documents within the About Us section talked about the organization’s efforts to be a good corporate citizen and employer at all of its so-and-so many locations. If you are in n locations in m countries, though, shouldn’t n be greater than or equal to m?

The other example was documentation for a specialized XML vocabulary which included a summary of gaps in the vocabulary’s coverage and shortcomings in the design. The main gap, said the documentation, was that “the vocabulary offers no solution to the problem of XYZ” But the vocabulary does offer a solution to that problem: the revision of the vocabulary to deal with problem XYZ is described with some pride two or three sections further up in the document.

One may speculate that in both cases, a perfectly true statement in the document was rendered false by later events, and statements added later to the document, reflecting the later state of affairs, contradict the earlier statements. (There was a gap in the vocabulary, and the documentation mentioned it as a potentially painful one. Eventually it was painful enough to be filled. And the tag-by-tag account of the markup was even revised to document the new constructs. But the description of gaps and shortcomings was not revised. And it’s not hard to believe that an organization may be in m locations at one point, and in a larger number of locations, in n countries, later on.)

In neither of these cases is the contradiction particularly embarrassing or problematic.

[“But I notice you name no names,” said Enrique. “Chicken.” “Hush,” I said. “The names are not relevant.”]

But of course the same problem happens in specs, where inconsistencies may have graver consequences.

[“Ha! I can think of some names under this rubric!” crowed Enrique. “Shut up!” I explained. “I’m trying to describe and understand the problem, not confess my sins.” “Oh, go on! Admitting you have a problem is the first step towards recovery. If you don’t admit that you have the problem, you’ll never — what the ?!” At this point, I managed to get the duct tape over his mouth.]

I think there must be two basic approaches to trying to avoid inconsistencies.

(1) You can try to prevent them arising at all.

(2) You can try to make them easier to detect automatically, so that an automated process can review a set of documents and flag passages that need attention.

Neither of these seems to be easy to do. But for both of them, it’s not hard to think of techniques that can help. And thinking about any kind of information system, whether it’s an XML vocabulary or a database management system or a programming language or a Web site content management system, or a complicated combination of the above, we can usefully ask ourselves:

How could we make it easier to prevent inconsistency in a set of documents?

How could we make it easier to keep a set of documents in synch with each other as they change?

How could we document the information dependencies between documents at a useful level of granularity? (Helpful, perhaps, to say “if document X changes, documents Y and Z, which depend on it, must be checked to see if they need corresponding revisions”, but a lot more helpful if you can say which sections in Y and Z depend on which bits of X.) Could we do it automatically?

It seems plausible that detecting inconsistencies between distant parts of a document would be easier if we could get a list of (some of) the entailments of each bit of a document.

How can we make it easier to document the entailments of a particular passage in a spec?

For making the entailments of a passage explicit (and thus amenable to mechanical consistency checking with other parts of the document set) I think there are several obvious candidates: RDF, Topic Maps, RDFa, the work Sam Hunting has been doing with embedded topic maps (see for example his report at Balisage 2008), colloquial XML structures designed for the specific system in question. Years ago, José Carlos Ramalho and colleagues were talking about semantic validation of document content; they must have had something to say about this too. (In the DTD, I recall, they used processing instructions.) Even standard indexing tools may be relevant.

How do these candidates compare? Are they all essentially equally expressive? Does one or the other make it easier to annotate the document? Is one or the other easier to process?

[“If you don’t admit that you have the problem, you’ll never be able to fix it. And you keep that roll of duct tape away from me, you hear?”]

One reason I love Alloy (Goodman, calculus of individuals)

[13 February 2009]

In the last couple of years I have become rather fond of a tool called Alloy. I’ve been talking to a couple of people about it lately, and this week I had an extremely enlightening experience with Alloy, so it occurred to me it might be useful to give a simple example of its use here. (This has gotten longer than expected, because some context is needed to understand the problem I had this week and how Alloy helped me solve it. Reader, persevere!)

First, some background.

Alloy is intended for use in what its developers call ‘light-weight formal methods’. It can be used as a sort of desk-checker for software designs. Alloy allows you to describe a design in a simple notation for symbolic logic, define predicates which hold either of particular objects in the universe, or of the universe as a whole, make assertions about the design, and so on. So far, it’s a lot like the specification language Z, only it doesn’t use Greek symbols and it’s much less intimidating to programmers who turn off when things look too formal or mathy. But one nice feature of Alloy not shared by Z is that Alloy can be used to find instances of predicates you define, or counter-examples to assertions you make. It doesn’t actually prove your assertions true, but it does an exhaustive check on all possible instances of the model you’ve defined, up to a user-specified size, and if it finds a counter-example, it will show it to you.

So if there is a problem in your design which can be illustrated by an example small enough to fit into the scope you specify, Alloy will find it.

Not all bugs have small test cases that exercise them. But a whole lot of them do. And Alloy’s clever exploitation of current SAT-solver technology means you can find those bugs really easily.

Now, my former W3C colleague Dan Connolly pointed out to me a long, long time ago that one can use specification languages like Alloy not just to check software designs, but to test our understanding of things we read. Dan told me once that when he really wanted to understand something, he tried to model it using a specification language and prove at least a few trivial theorems about it. As an example, he showed me a formalization he had done of part of paper by Murata Makoto, using Larch (and in particular the Larch Prover LP) for this purpose; nowadays I think he uses ACL2 for this kind of thing. (I like LP and ACL2, too, but I find them hard to use, which means I don’t use them very often. And I’m not sure how many theorems Dan proves in a year using these techniques.)

The other day, Claus Huitfeldt and I were discussing Nelson Goodman’s account of properties in his book The structure of appearance. Some properties, Goodman points out, have the property that if they are true of a thing, they are true of every part of that thing. (The property “is located in Indiana” is an example.) Other properties have other (second-order) properties; Goodman identifies several properties of properties that he finds useful. Claus and I worked through the definitions of dissective, expansive, collective, and nucleative properties, translating them into symbols and trying to think of examples, and we were just starting on the properties of two-place predicates when I thought it might be useful to check some of our conjectures by formalizing things a little more fully.

This involved backing up a bit, because in defining the terms dissective, expansive, etc., Goodman appeals to notions like “is part of”, “is the sum of”, and “is the product of”. These, in turn, are secondary concepts defined in terms of a single primitive, “overlaps”. (This alone would probably have sufficed to hook Claus and me.) So we set out to describe, in Alloy, the notion of overlap in Goodman’s calculus of individuals.

My first cut was simple. An individual is anything a particular system wants to treat as an individual; OO programmers would call it “object” not “individual”, but Goodman antedates OO programming. Overlap is a relation among individuals, and relations among things are specified in Alloy by naming them in the description of the first member of the ordered pairs. So I wrote

sig Individual {
overlaps : set Individual
}

This defines a signature or set of objects, named Individual. Each individual has a field (or property) named overlaps, the value of which is a set of individuals. Another equivalent way to describe it is to say that there is a relation overlaps the members of which are ordered pairs of individuals.

Looking at instances of this model (using the Alloy Analyzer, you add the instruction run {} and ask that it be executed) showed that this formulation had an intuitive drawback: For two individuals a and b, a could overlap b without b overlapping a. That didn’t seem right, so I added a constraint: if b is in the overlaps set of a, then a is in the overlap set of b. (The “@” sign is a syntactic artifact; don’t worry about it.)

sig Individual {
overlaps: set Individual
}{
all i : Individual | i in overlaps iff this in i.@overlaps
}

The second pair of braces contains a set of constraints (in this case, just one) that hold of every member of the signature. This constraint has the desired effect of guaranteeing that if a.overlaps contains b, then b.overlaps contains a.

To avoid having to say awkward things like b in a.overlaps, we can define a predicate that allows us to say simply overlap[a,b]. (For technical reasons, predicate arguments in Alloy 4 are enclosed in square brackets, not parentheses. You’ll get used to it.)

pred overlap[x, y : Individual] {
x in y.overlaps or y in x.overlaps
}

[Enrique piped up here, to say “The constraint in the declaration of Individual guarantees that if y in x.overlaps then x in y.overlaps; you can delete either half of the or without losing anything except unnecessary complexity.” I was doubtful, but to shut him up, I wrote

pred overlap2[x, y : Individual] {
x in y.overlaps
}
assert defs_equivalent {
all x, y : Individual | overlap[x,y] iff overlap2[x,y]
}
check defs_equivalent for 5

The Analyzer found no counter-examples to the assertion. This isn’t exactly a proof, but it persuaded me that Enrique was right.]

Now, Goodman takes overlap as a primitive notion, but he provides a sort of formal definition by saying that individuals x and y overlap if and only if there is some individual z, such that anything that overlaps z also overlaps x and y. Intuitively, z can be any individual which is part of both x and y. (You can think of x and y as sets, and of z as any individual in their intersection, but only if you don’t mind hearing howls of outrage from the general direction of Goodman’s grave: in defining the calculus of individuals, he was trying to provide an alternative to set theory for those who like himself preferred to avoid having to assume that sets exist.)

We translated this into Alloy thus:

fact overlap_axiom {
all x, y : Individual |
overlap[x,y] iff
(some z : Individual |
(all w : Individual |
overlap[w,z] => (overlap[w,x] and overlap[w,y])))
}

Now, it seems clear from our intuitive understanding of the nature of overlap that in a plausible formalization the overlap predicate has to be both symmetric (overlap[a,b] iff overlap[b,a]) and reflexive (everything overlaps itself).

We added appropriate assertions to the model and checked them. It seemed clear that the symmetry must follow from what we had said so far, and I was not surprised to see that it did: in every universe with ten Individuals or fewer, the symmetry of the overlap relation holds. It seemed equally clear that based on our model so far, overlap would not (yet) be reflexive, and we would need to make it so by adding another fact asserting the reflexivity of overlap as an axiom.

So I admit I was surprised when Alloy found no examples of the model which violated the assertion that overlap is reflexive.

I have spent several odd hours, over the last couple of days, trying to understand this result. I assumed it was a typo on my part, some error in my definition of overlap_axiom. Perhaps the scoping of the operators was wrong? But no, I can never remember operator priorities reliably, so the version I was testing was already fully parenthesized. Perhaps some weirdness of the syntax rules was tripping me up?

Finally, I sat down and wrote out a description of a universe in which all the axioms of the Alloy model held, in which overlap was not reflexive. After two failed attempts, I realized that reflexivity follows necessarily from Goodman’s characterization of overlap (i.e. from overlap_axiom). (Hint: in a universe with only one individual a, bind x, y, and z to a and try to make overlap[a,a] false. You won’t. Q.E.D.)

Goodman’s axiom guarantees the symmetry of overlap, too. So we can lose the constraint in the definition of the Individual signature.

What I like about this experience is not so much having spent two hours or so looking fruitlessly for a syntax error in my formulation of overlap_axiom; that I could have done without. What I really like is that formalizing Goodman’s concept of overlap in Alloy surprised me by exposing a bug in my understanding. I was convinced that the reflexivity of overlap, as Goodman described it, was a hidden premise smuggled in from our extra-logical understanding of the English word “overlap”. At the cost of about ten minutes’s work with Alloy, I discovered that my conviction was misplaced. It did take me several more hours of thought, over a couple of days, to understand why reflexivity follows from Goodman’s characterization, but that really illustrates a shortcoming in my central nervous system, not one in Alloy.

Any tool that can so effortlessly expose misconceptions about fundamental concepts and thus help clarify our thinking about a design or a theory is a tremendous aid. Why isn’t every Working Group in the W3C and every other standards-development organization working in information technology writing Alloy models to check their understanding of their designs?

Finally, some links: If I have succeeded in making Alloy sound worth looking at, you’ll want to know that Alloy was developed by the Software Design Group at MIT’s Computer Science and Artificial Intelligence Laborary (CSAIL) under the leadership of Daniel Jackson. The Alloy site has tutorials and other ancillary information, as well as the current downloadable version of the Alloy Analyzer.

Baby, you should fix your site

[5 January 2009]

Maybe it was an hallucination, or just an after-effect of seeing the article on T. V. Raman in the Sunday New York Times yesterday. But my evil twin Enrique showed up this evening.

“I’ve written a song,” he said. “You have, have you?” I said, guardedly. “Want to hear it?” he asked. “I doubt it.” But there was no stopping him.

[Before you read this, you may need to be reminded that “WAI” (pronounced “way”) is the W3C’s Web Accessibility Initiative. And “section 508” refers to the section of the U.S. legal code that requires at least some Web sites to be accessible to the disabled. There are analogous requirements in many civilized countries, but the girl in this song appears to be an American.]

I asked a girl what she wanted to do,
And she said ‘Baby, surf the Web, just like you.
I got some software, it’s readin’ my screen,
but your damn page breaks the whole machine!

Chorus:
“Baby, you should fix your site!
WAI can help you make it right.
Baby, if you fix your site,
Then maybe I’ll love you.”

I told that girl that my pages were good.
And she said “Listen, babe, they don’t do what they should.
You know, there’s rules, section 508
Fix that page, if you want a date!

Chorus:
“Baby, you should fix your site!
WAI can help you make it right.
Baby, if you fix your site,
Then maybe I’ll love you.

[Instrumental.]

Chorus:
“Baby, you should fix your site!
WAI can help you make it right.
Baby, if you fix your site,
Then maybe I’ll love you.”

I asked that girl how to start right away/
And she said, “Baby, do Level A.
You won’t stop there, not if you are smart.
But Level A, baby, now that’s a start!

Chorus:
“Baby, you can make it right.
WAI can help you fix your site!
Baby, if you see the light,
Then, baby, I’ll love you!”

I hate to encourage Enrique in his song-writing career, and especially I hate to encourage further songs using Beatles tunes (if you didn’t recognize “Baby, you can drive my car” here, then you are a lot younger than I am), but well, you know, that girl has got a point.

If your site is not accessible, you really should fix it.

My management asks me to point out that we do not warrant that making your Web site more accessible will actually get you more dates. But it’s still the right thing to do.

… And it don’t look like I’ll ever stop my wandering (travels, part 3)

[4 November 2008]

This is the third in a series of posts about recent travels.

From Mannheim, I traveled to Dublin to visit the Digital Humanities Observatory headed by Susan Schreibman; they stand at the beginning of a three-year project to provide access to Irish work in digital humanities and to prepare the way for long-term preservation. I wish them the best of luck in persuading the individual projects with whom they are to collaborate that the use of appropriate standards is the right way forward.

From Dublin, Susan and I traveled to Trier for <philtag n=”7″/>, a small gathering whose name is a macaronic pun involving the German words Philologie (philology), Tag (congress, conference, meeting), and the English word tag. The meeting gathered together a number of interesting people, including several of those most actively interested in computer applications in philology, among them Werner Wegstein, who has organized most of the series, and whom I know from way back as a supporter of the TEI; Andrea Rapp, one of the senior staff at the Trier center of expertise in electronic access and publishing in the humanities; and Fotis Jannidis, currently teaching in Darmstadt and the founder and editor of the annual Computerphilologie, as well as a co-editor of the important electronic edition of the young Goethe. Wegstein is retiring from his chair in Würzburg, thus leading to the creation of a new chair computational philology, for which both Rapp and Jannidis were on the short list; on the preceding Friday, they had given their trial lectures in Würzburg. Either way, Würzburg will get a worthy successor to Wegstein.

The general topic this year was “Communicating eHumanities: Archives, Textcentres, Portals”, and several of the reports were indeed focused on archives, or text centers, or portals. I spoke about the concept of schema mapping as a way of making it possible to provide a single, simple, unified user interface to heterogeneous collections, while still retaining rich tagging in resources that have it, and providing access to that rich markup through other interfaces. Susan Schreibman spoke about the DHO. Haraldur Bernharðsson of Reykjavík spoke about an electronic edition of the Codex Regius of the Poetic Edda, which cheered me a great deal, since the Edda is dear to my heart and I’m glad to see a well done electronic edition. Heike Neuroth, who is affiliated both with the Max Planck Digital Library and Berlin and with the Lower Saxon State and University Library in Göttingen, spoke on the crucial but underappreciated topic of data curation. (I did notice that many of the papers she cited as talking about the need for long-term preservation of data were published in proprietary formats, which struck me as unfortunate for both practical and symbolic reasons. But data curation is important, even if some who say so are doing a good job of making it harder to curate they data they produce.)

There were a number of other talks, all interesting and useful. But I think the high point of the two days was probably the public lecture by Fotis Jannidis under the title Die digitale Evolution der Kultur oder der bescheidene Beitrag der Geisteswissenschaften zur Virtualisierung der Welt (‘Digital evolution, or the modest contribution of the humanities to the virtualization of the world’). Jannidis took as his point of departure a suggestion by Brewster Kahle that we really should try to digitize all of the artifacts produced til now by human culture and refined and augmented Kahle’s back of the envelope calculations about how much information that would involve, and how one might go about it. At one point he showed a graphic with representations of books and paintings and buildings and so on in the upper left, and digitizations of them in the upper right, and a little row of circles labeled Standards at the bottom, like the logs on which the stones of the pyramids make their way to the construction site, in illustrations of books about ancient Egypt.

It was at about this point that, as already pointed out, he said “Standards are the essential axle grease that makes all of this work.”