An ineffable moment

[2008-12-31+07:00 / 2009-01-01Z]

If we add it all up, the XML Schema Working Group has spent a lot of time, since the beginning of the group, worrying about leap seconds.

XSD 1.0 attempts to accommodate them in its descriptions of the date/time types, but leaves some aspects of behavior unspecified. Accordingly, implementations of XSD 1.0 vary wildly in how carefully they handle leap seconds; not every implementation comes with built-in knowledge of when leap seconds have occurred in the past, and not every implementation enforces the rules which specify that, when they occur in the future, leap seconds will occur only at midnight, UTC, at the end of a month. You can read some things on the Web that seem to imply leap seconds can only occur at the end of June or December, or possibly also March and September, but that’s not what the relevant spec says. The quarter days are to be preferred, but in principle a leap second could occur at the end of any month. You can also read things on the Web that suggest many people are uncertain whether the responsible authorities could in principle insert (or delete) two leap seconds at a time, or even more. I was unsure myself, until some time after I read the relevant specification it finally dawned on me that the answer is no.

[“The relevant spec?” asked Enrique. “Which is that?” “Well, if you want to be precise, it’s Recommendation ITU-R TF.460-6: Standard-frequency and time-signal emissions, published by the International Telecommunications Union (Geneva: ITU, February 2002). I don’t remember how I managed to acquire a copy.” “And how can you be sure there can never be adjacent leap seconds, if it doesn’t say that flat out?” “What it says is that leap seconds are to be added at the end of some month, UTC, in order to keep UTC within 0.8 seconds of the appropriate solar time measure (maybe UT1 or UT2, but it’s been a while and I forget the details). If after adding two leap seconds, UTC is within 0.8 seconds of solar time, then before they were added it must have been more than 0.8 seconds off. But it’s not allowed to be more than 0.8 seconds off — that’s the point of adding or deleting leap seconds.” “What if there was a large change between January and June?” insisted Enrique. “Then the spec implies that a leap second should be added between January and June. The spec does not limit the insertion of leap seconds to December 31 and June 30, it just says to prefer those dates. I think the implication is pretty clear that if you need to add a leap second at the end of May, you are supposed to do so.”

[“Yeah, but what if the world slowed down by two seconds in the course of a single month? Isn’t that logically possible?” “Logically possible, yeah, I guess so. But astronomically implausible. If the rotation of the earth starts to vary that much, it’s likely to be because a large asteroid just hit us, or something. Under those circumstances, schema-validity is likely to be the least of our worries” “Well, my point exactly,” said Enrique. “If the world is falling apart, that’s the last time you want your systems to start failing because the schema validator doesn’t like your time stamps. There will be more important things to be worrying about!”]

In developing XSD 1.1, we spent a lot of time trying to nail things down better, but ultimately reached the conclusion that there just was no good way to allow all real leap seconds and only real leap seconds, to handle validation of dateTime values for the future, and to maintain the principle that a document’s schema-validity against a given schema is the same today and in the future; it should not change from day to day depending on decisions made by the managers of Universal Coordinated Time. In the end, we said that XSD 1.1 processors just don’t handle leap seconds at all: the moments in the global time-line which are occupied by leap seconds do not correspond to values in the xsd:dateTime value space.

It’s an important principle of schema design (and of the use of other formalisms as well, I think) that in the general case, what the formal notation can express may be only an approximation to the reality you are modeling. Some things may exist without being able to be spoken. Mostly we mean by that that specific rules that apply in a given context may not be expressible in a given formal notation, since the expressive power of the formalism may be hobbled in order to preserve its tractability. It’s nice, I think, that the principle is also instantiated by the dateTime type: there are some moments of UTC time that cannot be captured as values in the dateTime value space.

All this is on my mind, of course, because one of those moments is scheduled to occur today. At midnight UTC. Any moment, now, in fact.

[Pause.]

[“Shouldn’t it be midnight local time?” hissed Enrique. “No, you’re thinking of shifting to and from Daylight Savings Time. Leap seconds are inserted at the same moment all around the globe. Hush, now, don’t spoil it. Just wait and watch.”]

And there it went. Midnight UTC has passed, and the sequence of seconds shown by the applet at http://www.time.gov for Mountain Time went:

  • 16:59:56
  • 16:59:57
  • 16:59:58
  • 16:59:59
  • 16:59:60 [That’s it! That’s it!! “Hey, come look at this!” I wanted to call to my wife.]
  • 17:00:00 [“No, wait, never mind. It’s over already.”
  • 17:00:01

All of these past weeks, as events in W3C and in the economy and in the world have gone from bad to worse, I’ve been waiting impatiently to shake the dust of this year from my feet, and yearning for 2009 and a new leaf. The new year will surely be hard in many ways, I tell myself, but it cannot be as bad as the year just ending. As far as I can tell, I am not alone; 2009 has a heavy freight of hope and expectations to carry. A heavier freight than it’s fair to ask any year to bear.

So I like the idea that between the old year, so widely and deservingly anathematized, and the new one which carries so much fragile hope, time paused, just for a second, to gather its forces before picking up its burden and marching forward again.

Happy New Year, o my readers. Happy New Year.

Moving a WordPress blog to a new domain

[12 December 2008]

Having just moved this blog from people.w3.org to cmsmcq.com, I think it might be useful (to others, or to me down the road) to record what I did in order to make the move relatively smooth.

I started out thinking that I would have to export all the data from MySQL on people.w3.org (using my handy backup routine), move the resulting mib.sql file to cmsmcq.com, and load it from the command line.

Reading the various documents about moving blogs from one site to another, or within a site, however, I discovered that that wasn’t what anyone recommended. Export from the old site, I read, and then import into the new site. WordPress has developed a handy export format that can be used conveniently for this purpose.

I tried it. Export worked fine, and I edited the resulting XML document to change all occurrences of “http://people.w3.org/~cmsmcq/blog” to “http://cmsmcq.com/mib”. Then I imported the file to the new WordPress installation. Several articles loaded successfully, but by no means all, and those that did load did not have similar query strings in their URIs. Article http://people.w3.org/~cmsmcq/blog/?p=12 might appear as http://cmsmcq.com/mib/?p=3, not as ...?p=12. That’s a pain, because I’d like to redirect from the old locations to the new, so existing references to the blog don’t break. I know I can build a table containing all the URIs of everything in the blog, and map each to the appropriate URI on the new host, but I’d really rather not have to spend time on that.

I never did figure out why only part of the data was loading successfully; deleting spam from the site, and then re-exporting helped some (more of the posts loaded), but I never got everything to load.

So I reconsidered. I made a new SQL dump of the database on the old site, and edited it to change URIs from the old name to the new. (I also deleted the commands to load data into the user and options tables, since I didn’t want to overwrite them. I deleted the Spam Karma 2 tables, too, since my new host has a newer version of Word Press and the existing SK2, which is no longer maintained, may or may not work with it. I’ll install Bad Behavior instead.)

I tried to load this edited SQL dump to the new host by using the Web interface to MySQL provided by phpMyAdmin; it complained about a problem, and after I fixed that the process kept hanging.

So I split the file into smaller pieces, to evade any timeout and data-volume restrictions, and tried importing each in turn. Either the host choked, or my name service went away about this time; I think some of the smaller SQL files were successfully imported, but not all. Tried again the next day, and it hung again.

So I went back to Plan A: I copied the entire edited SQL file to the new host and loaded it in MySQL from the command line — took about five minutes (including the file transfer), if you don’t count the six or eight hours of time I burned trying to follow other people’s directions.

For the sake of keeping the old URIs stable, I then added a .htaccess file to the ~/cmsmcq/blog directory on people.w3.org to redirect from the old addresses to the new.

Concisely, what worked best for me was:

  1. Export the data from the old server. I did this with the command mysqldump --verbose --add-drop-table --all --extended-insert --quick --skip-lock-tables --user mysql-userid --password dbname > mib.sql, but it might have been better to export individual tables more selectively.
  2. Edit the mib.sql file, changing the old address (in this case “http://people.w3.org/~cmsmcq/blog”) to the new address (in this case “http://cmsmcq.com/mib”) wherever it occurs (it will occur primarily in cross references from one post to another). Some authorities also recommend doing a global search and replace on your old email address. I also took this opportunity to delete tables I didn’t want in the dump: wp_options, wp_usermeta, wp_users, and the tables used by Spam Karma 2 (RIP). And I modified the wp_ prefix in the table names to match the one provided by my hosting service’s auto-install of WordPress.
  3. Copy the edited SQL dump file (in my case named mib.edited.sql) to the new host.
  4. Invoke MySQL from the command line in the obvious way: mysql -h hostname -u username -p dbname < mib.edited.sql
  5. On the original host, add a .htaccess file to the blog directory (here “~cmsmcq/public_html/blog”) including
    RedirectMatch permanent ^/~cmsmcq/blog/(.*)$ http://cmsmcq.com/mib/$1
    Redirect permanent ^/~cmsmcq/blog$ http://cmsmcq.com/mib
    

No WordPress export/import, no phpMyAdmin, just command line tools. I'm all in favor of Web interfaces and so I think that WordPress export and import, and phpMyAdmin, are great ideas; they just didn't work at all well for me in this situation. But one possible take-home message is: it pays to be comfortable with the command line.

Participant observation / moving house

[12 December 2008]

Some ill defined thoughts are occupying my musings.

Some time ago, my colleague Liam Quin decided to include advertising on his site http://www.fromoldbooks.org/, which makes available high-quality scans of public-domain images he finds in … well, in old books. When we have discussed it, he has occasionally observed that one of his goals in doing so is to understand Web technology and Web usage from a slightly different vantage point. I understand him to mean that it is one thing to have a deep factual knowledge about the specifications which undergird and constitute the web, but a different thing to experience them in the process of running a web site. By accepting ads, and experimenting with different advertising programs, and watching his search engine rankings, Liam says, he has learned a good deal.

In a way, it sounds a bit like what one reads about participant observers in introductory anthropology courses. Some kinds of knowledge are more accessible from the inside than from the outside.

A second observation has concerned me for some time. The Semantic Web proposes to use URIs to denote things we want to talk about, and this has the nice side effect that proposals to mint a new term for something are safe from name collisions while still not needing to go through any central registration authority. All of my colleagues at W3C, from Tim Berners-Lee on down, recommend the use of HTTP URIs for such purposes. But new HTTP URIs can be minted, in practice, only by people who own domain names, or who have arrangements with people who own domain names. (It’s a bit like freedom of the press, which guarantees the right of uncensored publication to those who own a press. Fortunately, the Web makes owning a virtual press fairly simple, but it does tend to involve, again, owning a domain name.)

These lines of thought, together with some other considerations that need not concern us here at the moment, have led me to think it’s really high time I moved into the domain-owning classes.

So: We’re moving, or rather, we’ve moved. Messages in a Bottle is now hosted at http://cmsmcq.com/mib instead of the old address on people.w3.org.

I believe all existing references to posts and comments in the old location should be successfully redirected to the same posts and the same comments in the new location; this was a bit harder than it really ought to have been (details in a later post). If any reader finds exceptions or failures, please let me know at the email address whose username is “mib” and whose host name is “cmsmcq.com”.

Six-month retrospective and evaluation

[16 July 2008]

This klog started about six months ago, as an experiment. In an early post, I wrote:

So I’m going to start a six-month experiment in keeping a work log. Think of it, dear reader, as my lab notebook. (I was going to do it starting a year ago, but, well, I didn?~~t. So I?~~m going to start now.)

My original plan was to make it accessible only to the W3C Team, so that I could talk about things that probably shouldn?~~t be discussed in public or in member space. Norm Walsh has blown a hole in that idea by pointing to this log [Hi, Norm!]. So public it is. (Ideally, I?~~d have a blog in which each item could be marked with an ACL, like resources in W3C date space: Team-only, Member-only, World-readable. Maybe later.)

Next year about June, if I remember, I will evaluate the experiment and decide whether it?~~s been useful for me or not.

So, as one of my teachers used to say at the beginning of a group evaluation of some student work: what works, what doesn’t work?

Things that don’t work as well as I would like:

  • As might have been predicted, the fact that Messages in a Bottle is public, not private, has encouraged me to be circumspect in ways that fight with the lab-notebook goal. I don’t want to be carelessly rude about colleagues or others in public, the way one can be in private conversations and to their faces. Across a dinner table, one can greet a claim made by a colleague with a straightforward cry of “But that’s bullcrap!” without impeding a useful discussion. (This depends in part on the conversational style cultivated by individuals and groups, of course. But as some readers of this post will know, this is not a speculation but a report.) It doesn’t feel quite right, however, to say in public of something proposed by someone acting in good faith that it’s just bullcrap. You have to spend some time thinking of another way to put it. Enrique comes in handy here, since he will say anything. It has not been proven, however, that Enrique will never piss anyone off.
  • For the same reason, I have not yet found a good way of recording issues and concerns I don’t have good answers for. In a lab notebook, or a private conversation, one can talk more forthrightly about things that are going wrong, or things that have gone wrong, and how to right them. But in public, members of a Working Group, and editors of a specification, do better to accept a sort of cabinet responsibility for the work product. You do the best you can to lead the group to what you believe is the right decision, and then you accept the decision and defend it in public. I have not yet found a way to combine the acceptance of that joint responsibility, and the concomitant need to avoid bad-mouthing decisions one is responsible for defending, on the one hand, with forthright analysis of errors on the other. Sometimes careful phrasing can do the job, but any need for care in phrasing constitutes a tax on the writing of posts about tricky subject matter.
  • So try as I might to keep pushing these posts toward being a work log, the genre keeps pushing back and trying to make them into something like a first-person newspaper column. That’s a fine and worthy thing, and I can’t say I don’t enjoy that genre, but it’s not quite what I was aiming for when I started. As a result, one cannot read back through the archives and get the kind of record one wants in a lab notebook, and I’m not sure Messages in a Bottle is working optimally as a means for me to communicate with myself, or with those I work with most closely.

And on the other side, some things do seem to work.

  • At one level of abstraction, the primary goal of this worklog is to improve communication between me and those I work with. There is some evidence, both in the comments here and in other channels, that some of those I work with do read these postings and find them useful, or at least diverting. I have never bothered to try to check the server logs for hit or visitor counts — my guess, based on my Spam Karma 2 reports, is that humans are strongly outnumbered by spambots among my readers, and I’d just as soon not have that demonstrated in quantitative detail — but it’s clear that more people read these posts at least sporadically than I would ever dream of pestering by sending them email meditations on these topics. If they read these posts and derive any insight from the reading, then this klog would appear to have improved communication at least somewhat.
  • It’s probably not actually a bad thing that I think of this as a public space. It makes me a bit more likely to try to write coherently, to supply relevant context, and to do the other things that help ensure that a communication can be read with understanding by readers distant in time, space, sentiment, or context from the author. If I occasionally indulge in a private joke or two, I hope you will bear with me.
  • It’s easier for me to find records of points of view and analyses that have gone into posts here than to find records kept only in files on my hard disk or on paper shoved into the shelves behind me.
  • So far, no one has complained even about the really boring technical discussions about regular grammars, even though it’s clear some of my readers would rather be reading about Enrique.

In sum, I think I believe the experiment can be adjudged modestly successful, and I will continue it for another six months.

A nicer editing environment

Since starting this worklog, I’ve been experimenting using the screen that WordPress provides for writing blog posts.

It’s good to see every now and then how the other half lives.

And I’m impressed: the blog editing screen WordPress gives you is significantly better than a poke in the eye with a sharp stick, which distinguishes it from some other systems I have experienced.

But, well, not so nice that anyone could plausibly expect me to use it instead of Emacs. So today I decided it was time to get a bit more realistic about how to draft posts, particularly those that are tricky and that I want to get right. That means, I need a way to edit in Xemacs, using psgml.el’s XML mode, and conveniently upload to WordPress.

The editing-in-Xemacs part is easy. All I have to do is stop not doing it. What was needed was a stylesheet to translate from the version of TEI Lite I habitually use for writing, into the odd mix of normal XHTML and blank lines for paragraph breaks that WordPress uses. (I should probably experiment with using normal p elements — will WordPress let me use them?)

Answer: yes. So I don’t have to render paragraphs using just blank lines. And I didn’t really need to do a new stylesheet; I could just have used my existing TEI-to-HTML translation. I’ll comfort myself with the thought that the new stylesheet is smaller and has less cruft.

It’s still a little awkward that I have to cut and paste from my Emacs buffer into the WordPress write-post screen; I notice that David Carlisle seems to be able to edit existing posts from within emacs, using T. V. Raman’s g-client package. This makes me jealous, but not jealous enough to drop everything to work on an emacs mode for posts to WordPress. (Everything I haven’t already dropped in order to play with my tei-to-wordpress stylesheet and to write this post, that is.)

When the cut-and-paste gets annoying enough, I’ll take the time to make the stylesheet generate the post in Web-forms submission format, and write a script to upload to the server. But for now, being back in emacs for editing is enough.