Treating our information with the care it deserves

[21-22 July 2008]

I don’t make a habit of recording here all of the interesting, useful, or amusing things I read. But I am quite taken with Steve Pepper’s account of the situation in which many large organizations find themselves in. In a blog post devoted to a different topic (the history of Norway’s vote on OOXML), he describes (his understanding of) one organization’s point of view and motivations:

They are a big MS Office user, they participated in TC45 (the Ecma committee responsible for OOXML) and they clearly feel that OOXML is important to them.

I can understand why. An enormous amount of their intellectual capital is tied up in proprietary formats ?~~ in particular Excel ?~~ that have been owned and controlled by a vendor for the last 20 or so years. StatoilHydro has literally had no way of getting at its own information, short of paying license fees to Microsoft. Recently the company has started to realize the enormity of the mistake it has made in not treating its information with the care and respect it deserves.

As he points out, they are of course not alone in having made this mistake, particularly if one includes other proprietary formats beyond Office, and other vendors than Microsoft.

Several points occur to me:

  • It’s easy for me to feel superior and to lack interest in the problems of converting legacy data: I stopped using proprietary formats about twenty years ago, not very long after I had acquired a personal computer and gained the opportunity to start using them in the first place, and so with very few exceptions pretty much every piece of information I have created over my career is still readable. (A prospective collaboration did collapse once when at the end of a full-day meeting, as we were deciding who would draft what piece of the grant proposal, I asked what DTD we would be using, and my soon-to-be-former prospective collaborators said they had planned to be working in a proprietary word processor.) But feeling superior is not really a useful analysis of the situation.

Enrique: Are you saying that there are no proprietary data formats on mainframes? Whom are you trying to kid?

Me: No, but all my mainframe usage was on university mainframes; we don’t seem to have been able to afford any seriously proprietary software, at least any that was interesting to me. I was mostly doing document preparation, and later on database work. And for a while I maintained the terminal translation tables.

Enrique: The what?

Me: Never mind. There used to be things called terminals, and … Sorry I brought it up.

Enrique: And your databases didn’t use proprietary formats?

Me: Internally, sure. But they could all dump the data in a reusable text file format. I think I translated the Spires dump of my bibliographic data to XML once. Or maybe that was just something that went on the Someday pile.

Enrique: You’re right. Feelings of superiority are not really an adequate analysis of a complex situation. Even if the feelings were justified, which in this case, Bucko, does not seem to be the case.

  • The right solution for these organizations is, perhaps, to move away from such closed systems once for all, and use semantically richer markup. Certainly that’s where my immediate sympathies lie. It’s not impossible: lots of organizations use surprisingly rich markup for data they care about.
  • But how are they to get there, starting from where they are now? Even if the long-term benefits are substantial (which is close to self-evident for me, but is likely to sound very unproven to any serious organizational IT person), you have to get through the short term in order to reach the long term. So the ideal migration path starts paying off very quickly, even before you’ve gone very far. (Paoli’s Law: if people put five cents of effort in, they want to see a nickel in return, and quickly.) Can there be such a migration path? Or is going cold turkey the only way to go?
  • The desire to get as much benefit for as little work as possible seems to make everyone with a legacy-data problem easy prey for snake-oil salesmen. I don’t see any prospect of this changing, though, ever.

Enrique: Nah. Snake oil, now there’s a growth stock.

One thought on “Treating our information with the care it deserves

  1. Pingback: Recent Links Tagged With "topicmaps" - JabberTags

Comments are closed.