Schmidt on networking

[26 October 2008]

I’ve recently set up a LinkedIn profile for myself, and I’ve been assiduously searching in LinkedIn for old and current friends and colleagues and ‘building my network’ — so I was rather struck by the following slightly sobering remarks in Helmut Schmidt’s recent book Außer Dienst, which I picked up in Germany a couple weeks ago and have been reading at odd moments on airplanes:

Den Ausdruck Netzwerk hat es zu meiner Zeit noch nicht gegeben. Aber natürlich hat man vielfältige persönliche Kontakte geknüpft und sie langfristig aufrechterhalten. Wer sich gegen seine Zeitgenossen abschließt, hat es schwerer, zu abgewogenen Urteilen zu gelangen, als einer, der sich öffnet und Kontakt und Austausch sucht. … Mir wollen weniger jene Netzwerke wichtig erscheinen, welche einem bestimmten Interesse oder der eigenen Karriere dienen, als vielmehr solche, die der geistigen Anregung und dem gedanklichen Austausch förderlich sind.

Or (my translation):

In my day the expression ‘network’ did not yet exist. But of course people made a lot of personal contacts and kept them up over long periods. Anyone who is closed off from contact with one’s contemporaries will find it harder to reach well founded judgements than someone who is open to new contacts and to exchange of views. … Networks of contacts which serve only a particular interest or which are made only in the interest of one’s career seem to me less important than contacts which serve to provide intellectual stimulation and promote the exchange of thoughts.

Of course, Schmidt has things like the Mittwochsgesellschaft (and a Freitagsgesellschaft in Hamburg) in mind, which set an awfully high bar. But still — it makes me wonder not just how well LinkedIn and other social networking sites measure up to these high standards, but how one might use them to pursue the same kinds of intellectual cross-fertilization and mutual education Schmidt describes.

… I’ve been wandering late … (travels, part 2)

[26 October 2008]

This is the second in series of posts recording some of my impressions from recent travels.

After the XSLT meetings described in the previous post, and then a week at home, during which I was distracted by events that don’t need to be described here, I left again in early October for Europe. During the first half of last week [week before last, now], I was in Mannheim attending a workshop on organized by the electronic publications working group of the Union of German Academies of Science. Most of the projects represented were dictionaries of one stripe or another, many of them historical dictionaries (the Thesaurus Linguae Latinae, the Dictionnaire Etymologique de l’Ancien Français, the Deutsches Rechtswörterbuch, the Qumran-Wörterbuch, the Wörterbuch der deutschen Vinzersprache, both an Althochdeutsches Wörterbuch and an Althochdeutsches Etymologisches Wörterbuch, a whole set of dialect dictionaries, and others too numerous to name).

Some of the projects are making very well planned, good use of information technology (the Qumran dictionary in Göttingen sticks particularly in my mind), but many suffer from the huge weight of a paper legacy, or from short-sighted decisions made years ago. I’m sure it seemed like a good idea at the time to standardize on Word 6, and to build the project work flow around a set of Word 6 macros which are thought not to run properly in Word 7 or later versions of Word, and which were built by a clever participant in the project who is now long gone, leaving no one who can maintain or recreate them. But however good an idea it seemed at the time, it was in reality a foolish decision for which project is now paying a price (being stuck in outdated software, without the ability to upgrade, and with increasing difficulty finding support), and for which the academy sponsoring the project, and the future users of the work product, will continue paying for many years to come.

I gave a lecture Monday evening under the title “Standards in der geisteswissenschaftlichen Textdatenverarbeitung: Über die Zukunftssicherung von Sprachdaten”, in which I argued that the IT practice of projects involved with the preservation of our common cultural heritage must attend to a variety of problems that can make their work inaccessible to posterity.

The consistent use of suitably developed and carefully chosen open standards is by far the best way to ensure that the data and tools we create today can still be used by the intended beneficiaries in the future. I ended with a plea for people with suitable backgrounds in the history of language and culture to participate in standardization work, to ensure that the standards developed at W3C or elsewhere provide suitable support for the cultural heritage. The main problem, of course, is that the academy projects are already stretched thin and have no resources to spare for extra work. But I hope that the academies will realize that they have a role to play here, which is directly related to their mission.

It’s best, of course, if appropriate bodies join W3C as members and provide people to serve in Working Groups. More universities, more academies, more user organizations need to join and help guide the Web. (Boeing and Chevron and other user organizations within W3C do a lot for all users of the Web, but there is only so much they can accomplish as single members; they need reinforcements!) But even if an organization does not or cannot join W3C, an individual can have an impact by commenting on draft W3C specifications. All W3C working groups are required by the W3C development process to respond formally to comments received on Last-Call drafts, and to make a good-faith effort to satisfy the originator of the comment, either by doing as suggested or by providing a persuasive rationale for not doing so. (Maybe it’s not a good idea after all, maybe it’s a good idea but conflicts with other good ideas, etc.) It is not unknown for individuals outside the working group (and outside W3C entirely) to have a significant impact on a spec just by commenting on it systematically.

Whether anyone in the academies will take the invitation to heart remains to be seen, though at least a couple of people asked hesitantly after the lecture how much membership dues actually run. So maybe someday …

I’ve been wandering early …

[21 October 2008]

I’ve been traveling a good deal lately.

This is the first in series of posts recording some of my impressions.

In late September the XSL Working Group held a one-week meeting in Prague to work on revisions of XSLT to make it easier to support streaming transformations in XSLT. By streaming, the WG means:

  • The transformation can run in memory independent of document size (sometimes constant memory, sometimes memory proportional to document depth, sometimes memory proportional to the size of discrete windows of data in the document).
  • The transformation can begin delivering results before all of the input is available (e.g. can work on so-called ‘infinite’ XML documents like streams of stock quotations).
  • The transformation can be preformed in a single pass over the document.

It turns out that for different use cases it can be necessary or useful to:

  • declare a particular input document as streamable / to be streamed if possible
  • declare a particular set of templates as streamable
  • declare that particular parts of the document need to be available in full (buffered for random access) for part of the transform, but can then be discarded (e.g. for windowing use cases)

Some members of the WG may have been apprehensive at the thought of five straight days of WG discussions. Would we have enough to do, or would we run out of ideas on Tuesday and spend Wednesday through Friday staring at the floor in embarrassment while the chair urged us to get some work done? (If no one else harbored these fears, I certainly did.) But in fact we had a lively discussion straight through to the end of the week, and made what I think was good progress toward concrete proposals for the spec.

Among the technical ideas with most legs is (I think) the idea that sometimes what you want to do with a particular node in the input tree can actually be done with a partial copy of the input tree, and that different kinds of partial copy may be appropriate in different situations.

If you perform a deep copy of an element node in an XDM data model instance, for example, you have access to the entire subtree rooted at that node, but not to any of its siblings or ancestors, nor to anything else in the tree from which it came. For cases where you wish to (or must) allow access to the subtree rooted in a node, but to nothing else, such a deep copy is ideal: it preserves the information you want to have available, and it makes all other information inaccessible. (This is essentially the way that XSD 1.1 restricts assertions to the subtree rooted in a given node: logically speaking the assertions are evaluated against a copy of the node, not against the node itself.)

Several kinds of copy can be distinguished. In the terminology of the XSL Working Group (using terms introduced mostly by Michael Kay and Mohamed Zergaoui):

  • Y-copy: contains the subtree rooted in the node being copied, and also all of its ancestor nodes and their attributes, but none of their siblings. It is thus shaped like an upside down uppercase Y.
  • Nabla-copy: contains just the subtree rooted in the node being copied. It is thus shaped like an upside-down nabla. (Yes, also like a right-side-up delta, but since the Y-copy requires the Y to be inverted, we say nabla copy not delta copy. Besides, a delta copy would sound more like something used in change management.)
  • Dot-copy: contains just the node being copied, itself, and its attributes if any.
  • Yen-copy: like a Y-copy, but includes the siblings of each ancestor together with their attributres (although not their children).
  • Spanish exclamation-point copy: contains just the node being copied, and its ancestors, together with their attributes. Shaped like an exclamation point (dot, with something above it), or like an upside-down Spanish exclamation point.

I’ve been quite taken recently by one possible application of these ideas outside of the streaming XSLT work. In the current draft, assertions in XSD 1.1 are restricted to / are evaluated against a nabla-copy of the element or attribute being validated, and the conditions used for conditional type assignment are evaluated against a dot copy of the element. These restrictions are painful, especially the latter since it makes it impossible to select the type of an element depending on its xml:lang value (which is inherited from an ancestor if not specified locally). But XSD 1.1 could readily support conditions on the nearest value of xml:lang if conditions were evaluated on a Spanish-exclamation-point copy, instead of on a dot copy, of the element in question. I don’t know whether the XML Schema WG will buy this approach, but the possibility does seem to suggest that there is value in the idea of thinking about things in terms of invariants preserved by different kinds of node copying operations.