<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Messages in a Bottle</title>
	<atom:link href="http://cmsmcq.com/mib/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://cmsmcq.com/mib</link>
	<description>CMSMcQ's klog</description>
	<pubDate>Tue, 24 Aug 2010 01:06:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>XForms and XQuery tutorials at TEI members&#8217; meeting</title>
		<link>http://cmsmcq.com/mib/?p=1141</link>
		<comments>http://cmsmcq.com/mib/?p=1141#comments</comments>
		<pubDate>Tue, 24 Aug 2010 01:06:41 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[Conferences]]></category>

		<category><![CDATA[TEI]]></category>

		<category><![CDATA[computing in the humanities]]></category>

		<category><![CDATA[tei2010]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1141</guid>
		<description><![CDATA[[23 August 2010]
The TEI has published a list of workshops to be offered at the TEI Members&#8217; Meeting this November in Zadar, Croatia. 
Together with Syd Bauman of Brown University, I&#8217;m offering two tutorial workshops:  one on XForms and one on XQuery.  Each will last a day and a half, and involve some [...]]]></description>
			<content:encoded><![CDATA[<p>[23 August 2010]</p>
<p>The TEI has published a <a href = "http://ling.unizd.hr/~tei2010/workshops/index.en.html" >list of workshops</a> to be offered at the <a href = "http://ling.unizd.hr/~tei2010/index.en.html" >TEI Members&#8217; Meeting</a> this November in Zadar, Croatia. </p>
<p>Together with Syd Bauman of Brown University, I&#8217;m offering two tutorial workshops:  one on XForms and one on XQuery.  Each will last a day and a half, and involve some talking heads, some group discussion, and as much hands-on work as we can manage.  </p>
<p>There are several other very good workshops on offer:  Norm Walsh on XProc, the TEI@Oxford team on the ODD system, Elena Pierazzo and Malte Rehbein on the encoding of genetic editions, and Andreas Witt et al. on TEI for transcriptions of speech.</p>
<p>The organizers remind me that there is an early-bird discount for those who register before 31 August.  There is some chance that tutorials which fail to attract enough participants will be canceled if they don&#8217;t get enough registration, so if you definitely want to come, you definitely want to register early, to help make sure your tutorial has enough registrants to make the cut.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1141</wfw:commentRss>
		</item>
		<item>
		<title>Postel&#8217;s Law vs the Whiteboard Marker Rule</title>
		<link>http://cmsmcq.com/mib/?p=1135</link>
		<comments>http://cmsmcq.com/mib/?p=1135#comments</comments>
		<pubDate>Tue, 29 Jun 2010 00:13:35 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[Information modeling]]></category>

		<category><![CDATA[Standards development]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1135</guid>
		<description><![CDATA[[28 June 2010]
Many people have been influenced by what is often called Postel&#8217;s Law, usually quoted as saying &#8220;Be conservative in what you send and liberal in what you accept&#8221;.  
In a room with a whiteboard, however, the only workable rule is:  when you find a marker that no longer writes, do not [...]]]></description>
			<content:encoded><![CDATA[<p>[28 June 2010]</p>
<p>Many people have been influenced by what is often called Postel&#8217;s Law, usually quoted as saying &#8220;Be conservative in what you send and liberal in what you accept&#8221;.  </p>
<p>In a room with a whiteboard, however, the only workable rule is:  when you find a marker that no longer writes, do <em>not</em> be liberal in what you accept, and do <em>not</em> put it back where you got it.  <em>Throw it away.</em>  Otherwise, every whiteboard you run into will soon have twenty-odd markers, some of which actually work, if you can find them, but most of them duds. </p>
<p>Cleaning up as you go is a good principle in cooking, and in managing the set of white-board markers.  It&#8217;s also a good idea in data management; it&#8217;s a shame so many people misread Postel&#8217;s Law to mean the opposite.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1135</wfw:commentRss>
		</item>
		<item>
		<title>Finding accessible color schemes for data graphics:  one problem, four solutions</title>
		<link>http://cmsmcq.com/mib/?p=1113</link>
		<comments>http://cmsmcq.com/mib/?p=1113#comments</comments>
		<pubDate>Sat, 12 Jun 2010 02:09:31 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1113</guid>
		<description><![CDATA[[11 June 2010]
Yesterday I spent a little time writing a stylesheet to make it easier to browse through some potentially complicated and voluminous data.   I ran into a problem, and I found four solutions and learned several lessons, which I record here for those who face similar problems.
[&#8220;What stylesheet was that?&#8221; hissed my [...]]]></description>
			<content:encoded><![CDATA[<p>[11 June 2010]</p>
<p>Yesterday I spent a little time writing a stylesheet to make it easier to browse through some potentially complicated and voluminous data.   I ran into a problem, and I found four solutions and learned several lessons, which I record here for those who face similar problems.</p>
<p style="font-size: 80%;">[&ldquo;What stylesheet was that?&rdquo; hissed my evil twin Enrique.  I don't know why he always whispers when asking these questions, but he does.  &ldquo;It doesn't matter for the point at issue, but since you're wondering, it was a stylesheet for the catalog files used to organize the XSD test suite. If you're curious about it, you can see the stylesheet in use for a <a href="http://www.w3.org/XML/2008/xsdl-exx/substitution-groups-1.1/sg.testSet">small sample catalog file</a>.&rdquo; &ldquo;This is supposed to be easy to browse?&rdquo; &ldquo;Well, click View Source and compare it to the underlying XML.&rdquo; &ldquo;OK, I guess I see your point.  But where's the rest of the test suite?  Don't tell me the XML Schema working group has a test suite with ... lessee, eighteen test cases?&rdquo; &ldquo;Hush.  No, of course not.  But the test suite as a whole is not set up for interactive browsing on the web.  Yet.&rdquo;]</p>
<p style="font-size: 80%;">[&ldquo;And what was the problem?&rdquo; &ldquo;Stop interrupting and maybe you'll find out.&rdquo;]</p>
<p>To make it a little easier to see which test cases involve valid documents and which involve invalid documents <img src="images/colors.v1.png" alt="Extract from catalog display showing lines with colored backgrounds" style="float: left; width: 60%; padding-right: 0.5em;"/>(not to mention documents with [validity] = <code>notKnown</code> and test cases with implementation-defined results), I supplied a little pale <span style="background-color: #DFD;">green</span>, <span style="background-color: #FDD;">pink</span>, <span style="background-color: #DDF;">blue</span>, or <span style="background-color: #DDD;">gray</span> background for the different possible expected results.</p>
<p>So far, so good, but this morning Liam Quin mentioned casually that I might want to consider changing the colors, since the red and green backgrounds would be indistinguishable for some readers.  He was right, of course, and the original color scheme was really just a sort of quick and dirty proof of concept, not intended for final use.</p>
<p style="font-size: 80%;">&ldquo;Boy, do you sound defensive.  Thought you could get away with it, huh?&rdquo; asked Enrique.  &ldquo;Oh, hush.  Like you never take shortcuts with things.&rdquo;</p>
<p>So I tested the color scheme using the incredibly useful image analysis tool at <a href="http://vischeck.com/">VisCheck.com</a>:  <img style="float: right; width: 60%; padding-left: 0.5em;" src="images/colors.v1.deuteranope.jpg" alt="A simulation of deuteranopy (red/green color blindness) on the image given earlier" /> you upload an image, and can perform simulations of deuteranopy (red/green color deficit), protanopy (a different form of red/green deficit), and tritanopy (a blue/yellow color deficit).  And Liam was absolutely right:  the backgrounds for valid and invalid test cases were virtually indistinguishable in the output of the first simulation.</p>
<p>So I set about to fix it, and I learned some things.</p>
<p><b>First solution.</b> First, I taught myself something about how <em>not</em> to go about this task.  Using the helpful Color Sphere widget I downloaded some time ago from <a href="http://www.colorjack.com">colorjack.com</a>, I found a set of three colors that were pretty well distinct in all the different filters it provides: a <span style="background-color: #BBFF6D;">green</span>, a <span style="background-color: #FF6DBB;">red</span>, and a <span style="background-color: #6DBBFF;">blue</span>.  They were too intense to use as background colors, so I spent some time with a color picker finding RGB values that matched those hues <img src="images/colors.v2a.png" alt="Screen shot fragment showing one version of the color scheme" style="width: 20%; float: left; padding-right: 0.5em;" />but had less saturation (select the RGB sliders pane, convert from hex to decimal because the widget doesn&#8217;t understand hex; change to the HSB pane, slide the saturation level down a bit, go back to the RGB pane, convert the decimal numbers back to hex, write them down) and <img src="images/colors.v2b.png" alt="Another version of the color scheme"  style="width: 15%; float: right; padding-left: 0.5em;"/> checking the resulting color scheme with VisCheck (update the stylesheet, refresh the page in the browser, take a screen snapshot of part of the page showing all the colors, go to VisCheck, upload, perform deuteranopy simulation, whoops, that won&#8217;t work, go back and change one of the colors, come back, upload, perform deuteranopy simulation, upload, perform protanopy simulation, upload, perform tritanopy simulation).  </p>
<p>I&#8217;m very grateful for these tools, but sometimes I do wish it were more convenient to run all three filters on the input.   </p>
<p>After a couple of passes and errors and false starts (which means; after an hour and a half or so), I had a color scheme <img src="images/colors.v2c.png" alt="Yet another version of the color scheme" style="float: left; width: 20%; padding-right: 0.5em;"/> that worked for all the vision types I was checking.</p>
<p>The only problem was that I then decided the green was too virulent and intense.</p>
<p>No problem, I thought:  <img src="images/colors.v3.png" alt="Another version of the color scheme"  style="width: 20%; float: right; padding-left: 0.5em;"/> these three hues are good for all three types of color deficit, I&#8217;ll just lower the saturation to make them less intense, and I&#8217;m good to go.   Wrong:  on one of the simulations, there was no longer any distinction between the pink and the gray.  And the green still bothered me.</p>
<p>So I learned an important lesson:  <strong>In trying to make colors for data graphics accessible to all types of vision, it&#8217;s not just hue that counts:  hue and saturation interact with the different deficits.</strong></p>
<p><b>Second solution.</b>  I went back to the colorjack.com Color Sphere widget on my system, and noticed that I could control not just the hue of the primary color but its saturation as well:  as you move toward the center of the color circle, saturation goes down. <img src="images/colors.v4.png" alt="A new color scheme" style="float: left; width: 20%; padding-right: 0.5em;"/> And as you move things around with the deuteranopy filter set, you can see for yourself that the same three hues can be distinct at one saturation and not at another.  So I got another color scheme that worked.</p>
<p>Second lesson:  if you know in advance that you want the colors to be unsaturated and unobtrusive, use the ability of the color picker / color scheme generator you&#8217;re working with, to get close to the saturation and brightness you need.  The color picker is better at these kinds of conversions than you are.</p>
<p>The only problem was that as I continued working with the data I grew to like this new color scheme less and less.  The green wasn&#8217;t virulent any more, but it also wasn&#8217;t green; more a greenish yellow.  </p>
<p>So I gave up on the assocations green = go = valid outcome, red = stop = invalid outcome.  All that matters is that the colors be distinct; they don&#8217;t need to be particularly mnemonic (the distinction is, after all, also carried by the text).  <img src="images/colors.v5.png" alt="A new color scheme" style="float: right; width: 20%; padding-left: 0.5em;"/> So back to the color picker, once or twice.  It gave me schemes that worked in the sense of being visually distinct for normal vision, deuteranopy, etc., but &#8230; well, I didn&#8217;t much like any of them visually.</p>
<p><b>Third solution.</b>  I remembered that some time ago, I had found <a href="?p=652">a nice monochromatic color scheme</a> for the XSD type hierarchy diagram, <img src="http://www.w3.org/TR/xmlschema11-2/type-hierarchy-200901.svg" alt="Much-reduced image showing the XSD type hierarchy, with a color scheme in various shades of blue" style="float: left; width: 25%; padding-right: 0.5em;"/> which I had concluded was in fact more attractive than the polychrome color scheme we had started with.  So I copied that, and that&#8217;s the scheme in use in the stylesheet now.</p>
<p>When I did that work on the type hierarchy diagram, a friend (not Enrique, I promise) asked me &ldquo;So, can you write down an accessible color scheme I can use whenever I need one, so I don&#8217;t have to go through all this hassle of testing things and experimenting and changing things?&rdquo;  I told him no, I didn&#8217;t think so:  too much depends on the information you&#8217;re trying to convey.</p>
<p>But I think one useful general rule can be formulated.  If you are looking for a coherent color scheme for data graphics, and the <em>only</em> required function of the colors is that they be visually distinct from each other both for normal vision and for the various color deficits, then one very simple approach is to go to a color scheme generator like the one at <a href="http://colorschemedesigner.com/#">colorschemedesigner.com</a>, pick a hue, and generate a <em>monochromatic</em> color scheme (hmm, not all color scheme generators have this as an option).  You should definitely use the color-deficit simulations on the result, just to make sure, but virtually all the variation among the colors of a monochromatic scheme is variation in saturation and brightness, not in hue, so the chances are much greater that the variation will be perceptible to all types of vision.</p>
<p>So:  third lesson.  <strong>Try a monochromatic color scheme.</strong></p>
<p><b>Fourth solution.</b> Another fairly quick and easy solution is to use the <a href="http://www.vischeck.com/daltonize/">Daltonize</a> tool at VisCheck.com, which makes the visual information in an image (in the case of coloring for data graphics, this means the information conveyed by distinctions of color) and makes it more readily perceptible to color-blind viewers by increasing the red/green contrast and by converting information conveyed by red/green distinctions into variations in brightness and on the blue/yellow axis.  If you are working with something like false-color photography, where there are many variations in color and tone, things like color-scheme generators are not going to help you; Daltonize is a very cool tool for those applications.</p>
<p><b>Disclaimer:</b> I know enough accessibility specialists, and enough people with good graphical skills, to know that I am neither one nor the other.  I lay no claim to particular expertise in accessibility issues, only to a firm belief that they are important.  I think in fact that they are too important to be the concern only of accessibility specialists, just as design is too important to be left only to designers:  <em>every</em> maker of Web pages needs to think about the visual communication of information, and about making that information accessible to all the members of your audience.  When you do, you&#8217;ll appreciate the contributions and the expertise of specialists all the more.  (They will remind you, for example, that color-blindness is just one barrier to accessibility, among many.  It just happens to be one that lends itself to pretty, colorful illustrations.)  </p>
<p>Only when they are accessible to everyone who may need them will your web pages, and The Web, achieve their full potential. </p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1113</wfw:commentRss>
		</item>
		<item>
		<title>Boomerangs, bad pennies, encores</title>
		<link>http://cmsmcq.com/mib/?p=1110</link>
		<comments>http://cmsmcq.com/mib/?p=1110#comments</comments>
		<pubDate>Thu, 10 Jun 2010 20:29:32 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1110</guid>
		<description><![CDATA[[10 June 2010]
As of 1 June, W3C is paying for a quarter of my time, to work with the W3C XML Schema working group.  Given the current state of XSD 1.1 (the working group is mostly waiting for implementations to be completed before progressing it to Proposed Recommendation), my time will mostly be devoted [...]]]></description>
			<content:encoded><![CDATA[<p>[10 June 2010]</p>
<p>As of 1 June, W3C is paying for a quarter of my time, to work with the W3C XML Schema working group.  Given the current state of XSD 1.1 (the working group is mostly waiting for implementations to be completed before progressing it to Proposed Recommendation), my time will mostly be devoted to work on the XSD 1.1 test suite and whatever else can be done to smooth the path of the implementors.</p>
<p>In the work group, the air has (predictably) been full of the expected boomerang references, Thomas Wolfe quotations, and Terminator jokes, with the occasional mention of encore performances.  I leave the imagination of the scene as an exercise to the reader.</p>
<p>&#x201C;Encore performances?&#x201D; sneered my evil twin Enrique, when he read this over my shoulder.  &#x201C;I&#8217;ll tell you about encore performances!&#x201D; Long ago, he said, he attended a recital where the pianist&#8217;s performance of a very difficult piece was greeted by thunderous applause. The astonished pianist had not prepared an encore, so as an encore he simply played the final piece over again. And the second time, said Enrique, the audience was still wildly enthusiastic. There were further cries of &#x201C;Encore, encore!&#x201D;, mixed with others of &#x201C;Again! Again!&#x201D; And so he played the piece yet again.</p>
<p>After the third rendition the audience was still calling for more, but Enrique swears he heard the man behind him shouting, above the din, &#x201C;Again!  Again! And you&#8217;re gonna <em>keep</em> playing it, until you get it right!&#x201D;</p>
<p> [Thank you, Enrique, for that vote of confidence. I agree, at  least, that it's worth while to try to get a spec right.]</p>
<p> Three quarters of my time will continue to be spent on consulting and contract work, with a focus on using standards and descriptive markup to help make digital information more widely useful and give it longer life.  Just as it&#8217;s best to combine theory and practice, if possible, so also it&#8217;s helpful to combine standards work with work on practical applications.  For example:  I just finished a project together with an archival collection at a major U.S. university; they are encoding their finding aids in XML using the Encoded Archival Description, and they are publishing the finding aids on the Web, in XML, with XSLT stylesheets to display them nicely for humans.  Because they don&#8217;t have a separate EAD-to-HTML translation step, their workflow is simpler:  they update a collection record in their archival management system, save the collection description as XML, copy it to their Web server, and it&#8217;s published.  Their finding-aids site is very cool. </p>
<p>In other words, they are using the Web and XML in just the ways their originators hoped they could be used: for sharing semantically rich information.  It&#8217;s a great pleasure to work on exploiting the open standards W3C has produced, and I look forward to helping produce more of them.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1110</wfw:commentRss>
		</item>
		<item>
		<title>Aquamacs, XEmacs, and psgml</title>
		<link>http://cmsmcq.com/mib/?p=1104</link>
		<comments>http://cmsmcq.com/mib/?p=1104#comments</comments>
		<pubDate>Mon, 26 Apr 2010 17:47:22 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1104</guid>
		<description><![CDATA[[26 April 2010]
The other day I thought perhaps it was time to try Aquamacs again, a nice, actively maintained port of FSF Emacs to Mac OS X.  I&#8217;ve been using a copy of XEmacs I compiled myself years ago, with Andrew Choi&#8217;s Carbon XEmacs patches, but recently it has accumulated some problems I don&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>[26 April 2010]</p>
<p>The other day I thought perhaps it was time to try <a href="http://aquamacs.org/">Aquamacs</a> again, a nice, actively maintained port of FSF Emacs to Mac OS X.  I&#8217;ve been using a copy of XEmacs I compiled myself years ago, with Andrew Choi&#8217;s <a href="http://members.shaw.ca/akochoi-xemacs/">Carbon XEmacs</a> patches, but recently it has accumulated some problems I don&#8217;t have the patience to diagnose.</p>
<p>Got Lennart Staflin&#8217;s <a href="http://www.lysator.liu.se/projects/about_psgml.html">psgml</a> package (a major mode for SGML and XML documents) installed and compiled; this is a pre-requisite for any Emacs being habitable for me.  (Why is package management such a dirty word in FSF Emacs, by the way?)  And discovered that on one of my larger documents, psgml takes 50% longer in some tests (9 seconds vs 14 seconds) to parse a large document in Aquamacs than in XEmacs.  In other tests, it was 9 seconds vs. 90 seconds (or so &mdash; I kept getting bored and losing count between sixty and eighty seconds).</p>
<p>I may not be leaving XEmacs after all (undiagnosed problems or no undiagnosed problems).</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1104</wfw:commentRss>
		</item>
		<item>
		<title>Counting down to Balisage paper submission deadline</title>
		<link>http://cmsmcq.com/mib/?p=1100</link>
		<comments>http://cmsmcq.com/mib/?p=1100#comments</comments>
		<pubDate>Fri, 09 Apr 2010 18:40:56 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[Balisage]]></category>

		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1100</guid>
		<description><![CDATA[[9 April 2010]
Just a week to go before paper submissions for Balisage 2010 are due.  Time for the procrastinators, the delayers, the ma&#241;ana-sayers to buckle down and get their papers written and submitted.
Speaking of which, I have some work to do now K thx bye.
]]></description>
			<content:encoded><![CDATA[<p>[9 April 2010]</p>
<p>Just a week to go before paper submissions for <a href="http://balisage.net/">Balisage 2010</a> are due.  Time for the procrastinators, the delayers, the ma&ntilde;ana-sayers to buckle down and get their papers written and submitted.</p>
<p>Speaking of which, I have some work to do now K thx bye.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1100</wfw:commentRss>
		</item>
		<item>
		<title>There must be fifty ways &#8230;</title>
		<link>http://cmsmcq.com/mib/?p=1090</link>
		<comments>http://cmsmcq.com/mib/?p=1090#comments</comments>
		<pubDate>Wed, 07 Apr 2010 16:26:57 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[XPath]]></category>

		<category><![CDATA[formal methods]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1090</guid>
		<description><![CDATA[[7 April 2010]
The old Paul Simon song &#8220;There must be fifty ways to leave your lover&#8221; keeps running through my head.  I can see close to fifty ways to define the XPath 1.0 data model in terms of (a) a set of nodes and (b) two relations defined on that set, which are taken [...]]]></description>
			<content:encoded><![CDATA[<p>[7 April 2010]</p>
<p>The old Paul Simon song &ldquo;There must be fifty ways to leave your lover&rdquo; keeps running through my head.  I can see close to fifty ways to define the XPath 1.0 data model in terms of (a) a set of nodes and (b) two relations defined on that set, which are taken as primitive; all other relations (i.e. all the other axes of XPath) are defined in terms of those two primitive relations.</p>
<p>Strictly speaking, I make it forty-eight ways.  First, pick any single relation from any of the following four groups:</p>
<ol>
<li>parent, child, ancestor, descendant</li>
<li>prevsib, nextsib, preceding-sibling, following-sibling</li>
<li>prevnode, nextnode, document-order preceding (&gt;&gt;), document-order-following (&lt;&lt;)</li>
</ol>
<p>That&#8217;s twelve possibilities.</p>
<p>Second, pick any single relation from either of the other two groups; that makes eight possible choices, times twelve first choices, for ninety-six ordered pairs of relations. But the order doesn&#8217;t matter, so we have forty-eight distinct pairs.</p>
<p>In recent days, taking some pairs not quite at random, defining the constraints they must satisfy in order to be a suitable basis for defining an XPath 1.0 tree, and defining all the other relations in terms of the chosen primitives, I have learned a couple of mildly interesting things.</p>
<ul>
<li>It&#8217;s more convenient to take parent as a primitive, than child.</li>
<li>It&#8217;s more convenient to take one of the single-step relations (parent, child, nextsib, prevsib, nextnode, prevnode) than one of their transitive closures (ancestor, descendant, etc.). </li>
</ul>
<p>The nextsib relation, for example, needs to be acyclic, functional, injective, and not transitive.  If it is, then its transitive closure following-sibling will automatically be suitable.  But if you start with following-sibling and specify (as you will need to) that it is transitive and acyclic, that does not suffice to guarantee that its transitive reduction nextsib is functional and injective.  You can of course simply say that a following-sibling relation is suitable if and only if (a) it&#8217;s transitive, (b) it&#8217;s acyclic, and (c) its transitive reduction is functional and injective, but now you&#8217;re forcing the reader to work with two relations, not just one:  both following-sibling and its transitive reduction.  It would be interesting either to find a way to constrain the closure directly to ensure the necessary properties in the reduction, or else to find a proof that there is no way to constrain a closure to ensure that its reduction is functional and injective without explicitly referring to the reduction.</p>
<ul>
<li>From any relation in any group, the other relations in that group are (relatively) easy to derive in terms of inversion, transitive closure, or transitive reduction.  Defining a relation in the third group typically proves more interesting.  And while it&#8217;s more convenient to choose the primitive relations from among the reductions, it turns out that at least in some cases it&#8217;s easiest to define the third group in terrms of one of the closures.  For example, given the parent and next-sibling relations, it proves easier to define document-order-following in terms of the primitives than to define next-node.</li>
</ul>
<p>It occurs to me to wonder whether there are ways to define XPath 1.0 trees that don&#8217;t reduce to or include one of these forty-eight.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1090</wfw:commentRss>
		</item>
		<item>
		<title>One way to define the XPath data model</title>
		<link>http://cmsmcq.com/mib/?p=1084</link>
		<comments>http://cmsmcq.com/mib/?p=1084#comments</comments>
		<pubDate>Tue, 06 Apr 2010 20:52:42 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[XML]]></category>

		<category><![CDATA[XPath]]></category>

		<category><![CDATA[formal methods]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1084</guid>
		<description><![CDATA[[6 April 2010; addenda and copy editing 7-8 April 2010]
After discovering earlier this year that the definition of the XPath 1.0 data model falls short of the goal of guaranteeing the desired properties to all instances of the data model, I&#8217;ve been spending some time experimenting with alternative definitions, trying to see what must be [...]]]></description>
			<content:encoded><![CDATA[<p>[6 April 2010; addenda and copy editing 7-8 April 2010]</p>
<p>After discovering earlier this year that the definition of the XPath 1.0 data model falls short of the goal of guaranteeing the desired properties to all instances of the data model, I&#8217;ve been spending some time experimenting with alternative definitions, trying to see what must be specified <em>a priori</em> and what properties can be left to follow from others.</p>
<p>It&#8217;s no particular surprise that the data model can be defined in a variety of different ways.  I&#8217;ve worked out three with a certain degree of precision.  Here is one, which is not the usual way of defining things.  For simplicity, it ignores attributes and namespace nodes; it&#8217;s easy enough to add them in once the foundations are a bit firmer.</p>
<p>Assume a non-empty finite set S and two binary relations R and Q on S, with the following properties <span style="font-style: italic;">[Some constraints are shown here as deleted:  they were included in the first version of this list but later proved to be redundant; see below]</i> :</p>
<ol>
<li>R is <del>functional,</del> acyclic<del>, and injective (i.e. for any x and y, R(x) = R(y) implies x = y)</del>.</li>
<li>There is <del>exactly one member of S which is not in the domain of R (i.e. R(e) has no value), and</del> exactly one which is not in the range of R (i.e. there is one element e such that for no element f do we have e = R(f)).</li>
<li>Q is transitive<del> and acyclic</del>.</li>
<li>The transitive reduction of Q is <del>functional and</del> injective.</li>
<li>It will be observed that R essentially places the elements of S in a sequence without duplicates. For all elements e, f, g, h of S, if Q includes the pairs (e, f) and (g, h) and if g falls between e and f in the sequence defined by R (or, more formally, if the transitive closure of R contains the pairs (e, f), (e, g), and (g, f)), then h also falls between e and f in that sequence.</li>
<li>The transitive closure of the inverse of R (i.e. R<sup>-1</sup>*) contains Q as a subset.</li>
<li>The single element of S which is not in the domain of R is also neither in the domain nor the range of Q.</li>
</ol>
<p>It turns out that if we have any relations R and Q defined on some set S, then we have an instance of the XPath 1.0 data model.  The nodes in the model instance, the axes defining their interrelations, and so on can all be defined in terms of S, R, and Q.</p>
<p>For the moment, I&#8217;ll leave the details as an exercise for the reader.  (I also realize, as I&#8217;m about to click &#8220;Publish&#8221;, that I have not actually checked to see whether the set of constraints given above is minimal.  I started with a short list and added constraints until S, R, and Q sufficed to determine a unique data model instance, but I have not checked to see whether any of the later additions rendered any of the earlier additions unnecessary.  So points for any reader who identifies redundant constraints in the list given above.)</p>
<p style="font-style: italic;">[When I did check for minimality, it turned out that several of the constraints included in the list above are redundant.  The fact that relation R is functional and injective, for example, follows from the others shown.  Actually it follows from a subset of them.  The deletions above show one way of reducing the number of <i>a priori</i> constraints:  they all follow from the others and can be dropped.  None of the remaining items follows from the others; if any of them are deleted, the constraints no longer suffice to ensure the properties required by XPath.]</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1084</wfw:commentRss>
		</item>
		<item>
		<title>Alloy as logical desk calculator</title>
		<link>http://cmsmcq.com/mib/?p=1073</link>
		<comments>http://cmsmcq.com/mib/?p=1073#comments</comments>
		<pubDate>Sat, 27 Mar 2010 02:37:59 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[formal methods]]></category>

		<category><![CDATA[Alloy]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1073</guid>
		<description><![CDATA[[26 March 2010]
Long ago I used a wonderful file-oriented database system called Watfile, which was designed as a sort of desk-calculator for data.  It was designed for personal use, not industrial-strength data management, and its designers successfully resisted the temptation to add more features and more power at the cost of a more complex [...]]]></description>
			<content:encoded><![CDATA[<p>[26 March 2010]</p>
<p>Long ago I used a wonderful file-oriented database system called Watfile, which was designed as a sort of desk-calculator for data.  It was designed for personal use, not industrial-strength data management, and its designers successfully resisted the temptation to add more features and more power at the cost of a more complex user interface. Watfile was to a full enterprise-class database as a desk calculator of the 1960s was to &#8230; oh, perhaps to Fortran.  For suitable problems, the ease of setup far outweighed any considerations of power or completeness.</p>
<p>The experience of using Watfile for data manipulation tasks established in my mind the class of &#8216;desk-calculator-like&#8217; packages for various kinds of problem.</p>
<p>Today I experimented with Alloy as a sort of logical desk calculator, and I&#8217;m happy to report that it passed the test with flying colors.</p>
<p>For reasons I won&#8217;t go into here, I&#8217;ve wondered a bit recently what it might look like to apply the technique of distinctive-feature analysis (originally developed for phonological descriptions of sound systems of language) to writing systems.  When I sat down a few months ago with pencil and paper to see if I could devise a smallish set of typographic features which could (say) distinguish the twenty-six letters of the alphabet as I was taught it in first grade, I rapidly found that working solely with pen and paper made me impatient:  it was too tedious to look at the set of features already identified and see which letters could not yet be distinguished (because they had the same value for all the features in question).  </p>
<p>When I came back to this problem this afternoon, I thought for a few minutes about what questions I&#8217;d like to be able to ask the machine.  Given a specified set of graphemes (as a first exercise, I chose the lower-case alphabet) and a specified set of binary features (does the letter have an ascender?  a descender?  a vertical stroke?  a full or partial circle or bowl? Is the stroke to the left of the bowl? &#8230;), with information about which graphemes have the feature in question, I want to be able to ask, first, whether a particular set of features suffices to distinguish each individual grapheme?  Or are there two or more graphemes which have the same value for all features in the set? And of course, if there are such sets of indistinct graphemes, what are they?</p>
<p>It occurred to me to solve the problem in Prolog; it would take just a relatively simple set of Prolog predicates to do what I wanted.  But as I was preparing to launch X Windows, so that I could launch Prolog, I realized that I already had the Alloy Analyzer running.  And so I wrote the predicates I wanted in Alloy instead of Prolog, to see whether it would work.  </p>
<p>The upshot is:  yes, it worked, and it was probably a bit easier to do than it would have been in Prolog.  When I was thinking about how to set up the problem in Prolog, I found myself wondering about the best data representation to choose, and so on, almost as much as about the structure of the problem.  I won&#8217;t say that Alloy posed no analogous problems &mdash; I did have to think for a moment or two about the best way to describe graphemes and distinctive features.  But the high level of abstraction afforded by Alloy made the decision feel less binding, and made me feel a bit more comfortable experimenting.  (It sounds strange to put it this way:  after all, Prolog&#8217;s high level of abstraction is one of its great design strengths.  But Prolog is also designed to be an efficient and effective programming language, which means that some details are exposed which have only procedural significance, and sometimes you find yourself thinking about them, even in situations where questions of execution efficiency don&#8217;t arise.</p>
<p>In very short order, I found it possible to define a suitably abstract representation of graphemes and features, specify some useful functions and predicates for asking the questions described above, and specify a small set of features (ten) which have a certain degree of typographic plausibility and which suffice to distinguish the graphemes in question.  (Ten binary features for twenty-six graphemes may seem high, given that the theoretical minimum is only five, and that ten bits suffice to distinguish a thousand objects, not just twenty-six.  But writing, like natural language, has some redundancy.  Feature sets used to analyse natural language sound systems are also often very inefficient.)  The visualization tools did not prove very helpful, but the Evaluator feature of the Alloy Analyzer was a great help.  </p>
<p>If I pursue this work any further, I probably will put it into Prolog, where the interactive interface for expression evaluation is perhaps a bit more convenient than in Alloy.  But it&#8217;s nice to know that Alloy can be used for this kind of problem, too. </p>
<p>Interested readers can find both <a href="http://www.blackmesatech.com/2010/03/graphemes.als">the generic definitions</a> and <a href="http://www.blackmesatech.com/2010/03/lc_english.als">the specific graphemes and features for lower-case Latin letters</a> (as used in Anglophone countries) on the Black Mesa Technologies web site.</p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1073</wfw:commentRss>
		</item>
		<item>
		<title>The axes of XPath</title>
		<link>http://cmsmcq.com/mib/?p=1062</link>
		<comments>http://cmsmcq.com/mib/?p=1062#comments</comments>
		<pubDate>Fri, 26 Mar 2010 01:41:03 +0000</pubDate>
		<dc:creator>cmsmcq</dc:creator>
		
		<category><![CDATA[XPath]]></category>

		<category><![CDATA[formal methods]]></category>

		<guid isPermaLink="false">http://cmsmcq.com/mib/?p=1062</guid>
		<description><![CDATA[[25 March 2010; error noticed by Dimitre Novatchev fixed 29 March 2010]
Steve DeRose and I have been discussing the XPath [1.0] data model recently (among other things), and in the course of the discussion an interesting observation has emerged.
it&#8217;s obvious that some of the axes in XPath expressions are inverses of each other, and also [...]]]></description>
			<content:encoded><![CDATA[<p>[25 March 2010; error noticed by Dimitre Novatchev fixed 29 March 2010]</p>
<p>Steve DeRose and I have been discussing the XPath [1.0] data model recently (among other things), and in the course of the discussion an interesting observation has emerged.</p>
<p>it&#8217;s obvious that some of the axes in XPath expressions are inverses of each other, and also that some are transitive closures of others (or, going the other way, that some are transitive reductions of others).  What surprised me a little was that (if for the moment you leave out of account the <code>self</code> and the <code>XYZ-or-self</code> axes, the attribute axis, and the namespace axis [and also <code>preceding</code> and <code>following</code>) all of the XPath axes fit naturally into a pattern that can be represented by three squares.  (Will table markup work here?  I wonder.)  The first square represents the up/down axes:</p>
<table border="1" style="margin-left: 4em;">
<tbody>
<tr>
<th colspan="2">Up/down</th>
</tr>
<tr>
<td><code> parent </code></td>
<td><code> ancestor </code></td>
</tr>
<tr>
<td><code> child </code></td>
<td><code> descendant </code></td>
</tr>
</tbody>
</table>
<p>The next square covers sibling relations.  Unlike <code>parent</code> and <code>child</code>, which are just short-hand for single steps along the up or down axis, XPath provides no syntactic sugar for <code>preceding-sibling :: * [1]</code> and <code>following-sibling :: * [1]</code>, so I&#8217;ve invented the names &ldquo;nextsib&rdquo; and &ldquo;prevsib&rdquo; (marked with a star here to signal that they are invented): </p>
<table border="1" style="margin-left: 4em;">
<tbody>
<tr>
<th colspan="2">Sideways</th>
</tr>
<tr>
<td> *prevsib </td>
<td><code> preceding-sibling </code></td>
</tr>
<tr>
<td> *nextsib </td>
<td><code> following-sibling </code></td>
</tr>
</tbody>
</table>
<p>The third square describes overall document order; again, I&#8217;ve invented names for the single-step relations [note that the names used here for the transitive relations are given by XPath 2.0; XPath 1.0 doesn't provide notation for them]:</p>
<table border="1" style="margin-left: 4em;">
<tbody>
<tr>
<th colspan="2">Overall document order</th>
</tr>
<tr>
<td> *prevnode </td>
<td><code> &gt;&gt; </code></td>
</tr>
<tr>
<td> *nextnode </td>
<td><code> &lt;&lt; </code></td>
</tr>
</tbody>
</table>
<p style="font-size: 80%;">[In the first version of this post, the right-hand columns were labeled <code>preceding</code> and <code>following</code>, but Dimitre Novatchev reminded me gently that these axes do not in fact correspond to document order:  <code>preceding</code> excludes andestors and <code>following</code> excludes descendants.  That's a plausible exclusion, since no one in their right mind would say that chapter one of <em>Moby Dick</em> precedes the first paragraph of <em>Moby Dick</em>.  Contains, yes; precedes, no.  In fact, I remember getting into an argument with Phil Wadler about this, early on in the days of the XML Query working group, not realizing (a) that the document ordering he was describing was actually prescribed by XPath 1.0, nor (b) that saying that ancestors precede their descendants in document order didn't mean that the ancestors would have to be present on the <code>preceding</code> axis.  Thank you, Dimitre!  And sorry, Phil!]</p>
<p>In each table row, the relation on the right is the positive transitive closure of the one on the left, and the one on the left is the transitive reduction of the one on the right. </p>
<p>In each table column, the relations in the top and bottom rows are inverses of each other.</p>
<p>The tables make it easy to see that it suffices to take a single pair of relations on nodes as primitive (e.g. <code>child</code> [or better <code>first-child</code>] and <code>nextsib</code>, or <code>parent</code> and <code>prevsib</code>); everything else in the tree can be defined in terms of the two primitive relations.  (It&#8217;s not completely clear to me yet whether any two relations will do as long as they are from different tables, or not.  Taking <code>nextnode</code> and <code>parent</code> seems to work, as does the pair <code>nextnode</code> and <code>child</code> but <code>nextnode</code> and <code>first-child</code> seems to pose problems &mdash; why can <code>child</code> be replaced by <code>first-child</code> in some situations but not others?  Hmm.) </p>
<p>There seem to be implications for the formalization of the data model (which is how we got here in the first place), but maybe also for teaching new users how to think about or learn XPath. </p>
]]></content:encoded>
			<wfw:commentRss>http://cmsmcq.com/mib/?feed=rss2&amp;p=1062</wfw:commentRss>
		</item>
	</channel>
</rss>
