C. M. Sperberg-McQueen
This document list points to some of the documents I make available in
this Web space.
- A small test suite for
(one version of) the purchase-order schema in the XML Schema 1.0 Primer;
prepared in connection with the work on schemas and definite clause
translation grammars listed below.
- Notes on finite
state automata with counters, a paper (still unpolished and in
some respects unfinished) trying to work out what finite-state
automata would be like if they were augmented with counters, in
ways analogous to the extension of regular expressions to allow
them to use integer exponents instead of just the Kleene star.
Someone has to have done this before me, but I haven't found it.
Not touched since May 2004; if I'm not going to finish and polish it
anytime soon, it may as well be visible in its current state.
- A brief introduction
to definite clause grammars
and definite clause translation grammars,
A working paper prepared for the W3C XML Schema Working Group,
January 2004.
- A definite-clause grammar representation
of an XSD schema,
A working paper prepared for the W3C XML Schema Working Group,
January 2004.
- Trip report: XML 2003,
Philadelphia, December 2003.
- Two posters from XML 2003, December 2003:
“Why
would you want to perform the world's most expensive identity transform?”
and
“How being schema-valid is different
from being pregnant”.
- Logic grammars
and XML Schema.
Paper for Extreme Markup Languages 2003, Montréal.
An abbreviated form of the paper
Notes on logic grammars
and XML Schema, which I never quite completed in its
original form and which has now been split into
several papers).
- XML Schema 1.0:
A language for document grammars,
slides for a talk I gave at ACH/ALLC 2003
(see the trip report listed below).
- Trip report:
ACH/ALLC 2003 / Web X: a decade of the World Wide Web,
Annual joint conference
of the Association for Computers and the Humanities
and the Association for Literary and Linguistic Computing,
Athens, Georgia, 29 May - 2 June 2003.
I include a couple papers here for which I am not, technically,
the author, but in which I have a certain paternal interest.
- Sevastopol: An XSD schema represented as a definite-clause translation grammar,
A working paper prepared for the W3C XML Schema Working Group,
January 2004 - October 2005.
- How schema-validity is different from being married,
a paper submitted to
XML 2005 in
Atlanta, November 2005.
- Applications of Brzozowski
derivatives to XML Schema processing, a paper given at
Extreme Markup Languages
2005 in August. A brief summary of what Brzozowski derivatives
are, followed by discussions of how to use them in validation
(they allow validation of content models which cannot be
translated into finite state automata without prohibitive cost),
checking content model determinism (they allow a precise definition
of the determinism rule in terms of content models, without any detour
through finite state automata), checking subsumption relations between
content models, and other applications. In general, Brzozowski
derivatives are a beautiful and elegant tool that deserve to be
better known. (If special characters like ε
and ∅ do not display usefully [i.e. as a Greek epsilon
and the empty-set symbol] in your browser, you'll want
the asciified version of the
paper.) The text is available (styled differently) from the Extreme Markup
Languages proceedings online in
XML,
HTML,
and
PDF.
The slides are also available.
-
XML Schema validation outcomes, a tabular display created
together with Henry Thompson and Richard Tobin.
- Notes on schema
resolution and Notes on schema-validation
results, both of which are draft notes on topics relating
to XML Schema, expanded from some old postings on email lists.
- An unfinished paper on S-expressions for XML documents and
for XML Schema components (rev. August 2001), which attempts to
rewrite the rules for XML Schema component structure and document
validation in Lisp, using S-expressions to represent the components
and defining functions to capture the meaning of key terms in XML
Schema. (Currently accessible only to W3C members, sorry. I'll make
it public when it's further along. [If ever. It doesn't look as if
I'm ever going to get back to this.])
- Context-sensitive rules in XML Schema
(February 2000, rev. April 2000).
(XML version)
The title is a slight
misnomer: what is offered is an example of limited
non-local effects, not really arbitrary context-sensitivity.
There is a clear tradeoff between the limited awareness of
context and the size of the grammar (approximately: one bit of
context-sensitivity requires doubling the size of the affected
portion of the grammar). I hope to clean this up and publish
it sometime.
- Replicating DTD Functionality
Using XML Schema (April 2000).
See also the talk given at ACH/ALLC 2001.
- Notes on schema
annotation (February and April 2002). Some general
thoughts on what the actual problem is and what a solution might
look like, in an attempt to define terms clearly enough to allow
useful discussion about what should be done.
- Meaning and Interpretation of Markup,
by C. M. Sperberg-McQueen,
Claus Huitfeldt, and Allen Renear. Published in
Markup Languages: Theory & Practice 2.3 (2000): 215-234.
- GODDAG:
A Data Structure for Overlapping Hierarchies,
by C. M. Sperberg-McQueen and
Claus Huitfeldt. Paper given at Principles of Digital Document Processing 2000,
Munich, September 2000.
Published in
DDEP-PODDP
2000, ed. P. King and E.V. Munson, Lecture Notes in Computer
Science 2023 (Berlin: Springer, 2004), pp. 139-160.
Mostly slides from talks, in a few cases, post-hoc expansions
or transcriptions from tape.
In some cases, you can get a reasonable
idea of what I said from my slides, and in other cases, not.
-
XML vocabulary design and specification
Using W3C XML Schema 1.0,
slides from talk during the 'training track' at
XML 2007, sponsored by IDEAlliance,
Boston, 5 December 2007.
- Meaning and interpretation of markup: a
report on the Bechamel Project,
slides from talk sponsored by the W3C German/Austrian Office at
the Fraunhofer Gesellschaft Institut Medienkommunikation
in Sankt Augustin, Germany, 1 October 2004.
- What does XML have to do with Immanuel Kant?,
slides from talk at
Net.Object Days 2004, Erfurt 29 September 2004.
- Semantic interpretation of
XML documents, slides from talk at
Modelling linguistic information resources /
Modellierung sprachlicher Informationsressourcen,
workshop organized by the Zentrum für interdisziplinäre Forschung
(ZiF) of the Universität Bielefeld,
12-14 January 2004.
-
Perspectives on XML and related standards,
slides from talk at Korpuslinguistik
deutsch: synchron, diachron, konstrastiv,
University of Würzburg, 22 February 2003.
- What matters?
(August 2002). Closing remarks at the conference Extreme
Markup Languages 2002.
- Slides from The TEI is dead;
Long live the TEI
(November 2001), the opening keynote at the first members meeting
of the TEI Consortium,
held in Pisa 16-17 November 2001
(printer-friendlier version).
- Slides from A gentle
introduction to XML Schema and XML document grammars
(October 2001), a talk I gave as part of a series of seminars and
talks organized by the HIT
Centre (Humanities Information Technology Research Programme) at
the University of Bergen, where I am a guest researcher. (XML version, printer version). Mostly this is a
very slightly revised version of the slides from Darmstadt earlier
this month, but I've added another extended example based on some
software in Bergen.
- Slides from Constrain early
and often: XML Schema and the definition of document grammars for XML
vocabularies (October 2001), a talk I gave at the
Fraunhofer Institut für integrierte Publikations- und
Informationssysteme in Darmstadt. Thanks to my host, Peter
Fankhauser, who argues that it would be really cool to have a
well-defined method for providing type labeling on an XML instance
independent of any schema or schema validation. (printer version)
- Slides from The World Wide Web
Consortium and Standards (June 2001), a talk I gave on W3C
work relevant to computing in the humanities, in a session at ACH/ALLC
2001, the annual conference of the Association for Computers and the
Humanities and the Association for
Literary and Linguistic Computing.
- Slides from a talk on Practical Extraction
of Meaning from Markup I gave at ACH/ALLC
2001 (June 2001), reporting on work being done together with Claus
Huitfeldt (University of Bergen) and Allen Renear (University of
Illinois at Urbana/Champaign).
- What is XML and Why Should Humanists
Care?, an abstract for a talk delivered at the Digital Resources
for the Humanities DRH '97 conference in Oxford, September 1997.
- Back to the
Frontiers and Edges,
Closing Remarks at SGML '92: the quiet revolution, sponsored
by the Graphic Communications Association. Danvers, Massachusetts,
29 October 1992. This got the TEI document number ED W31.
See also trip report, above.
These documents were done as part of my work on the Text
Encoding Initiative. (A lot of others will eventually appear here,
when I get around to converting them into XML and making HTML
versions to put here.)
-
ED P1: Design Principles for Text Encoding Guidelines
(1988, rev. 1990).
An early attempt to enunciate the ground rules for the TEI.
-
ED P3: Theoretical Stance and Resolution of Theory Conflict
(1989).
A statement of the problem of arriving at consensus in fields
where different people take very different theoretical stances.
It had no visible effect on most of the TEI working groups, but
at least it was useful for me.
-
ED W03: SGML Problems for Research (1988).
A transcription of my notes for a talk at SGML '88. Not complete:
the notes may cover a quarter of what I said, but not much more. (In
particular, the example from the Glossa ordinaria is not here.)
-
ED W05: Notes on Features and Tags (1989).
Lou Burnard and I wrote the first draft of this over a quiet weekend
in Luxembourg; this paper introduced the notion of the `Waterloo DTD',
which (as I later discovered) infuriated some of our colleagues
in Waterloo, who were unhappy at the notion that they used any kind
of DTD at all. (It was intended as a compliment.)
-
ED W12: Tagging parts of speech (1990).
A study of various ways of marking parts of speech in SGML,
based on a careful study of the part-of-speech tagging in the
Lancaster/Oslo/Bergen (LOB) corpus.
-
AI1 W02: List of common morphological features
for inclusion in TEI Starter Set
of grammatical-annotation tags (1991).
A set of lists of word classes and features which some Western
European languages express morphologically, and which may
therefore be useful in linguist annotation of western European
languages. I helped draft this, but the intellectual responsibility
lies with the linguistic-annotation working group (AI 1) of the
TEI (members included Terry Langendoen, Stephen Anderson,
Geoffrey Sampson, Nicoletta Calzolari, and Gary Simons).
As a result, this is linguistically much more sophisticated
than ED W12.
- A URI recognizer: a transcription into
definite-clause grammar of the ABNF productions for URIs and URI
references in RFC
3986. (ABNF is defined in RFC 2234.)
An unfinished paper on Namespaces
and RDDL (September 2001).
SWeb: an SGML Tag Set for Literate
Programming (1993, rev. 1994, 1995, 1996). A set of tags for
literate programming, designed to be used as a DTD module, in
combination with some tag set for document structure, etc., such as
TEI Lite or Docbook or HTML. An XML version of the combined Sweb + TEI
Lite DTD (swebxml.dtd) is
available, as is a display stylesheet (swebtohtml.xsl,
which relies on tltohtml.xsl
being in the same directory). A simple tangle processor
in XSLT has replaced swebyacc,
my earlier yacc+lex based program. It took a couple of hours to
implement, as compared to several weeks' intermittent work for
swebyacc.
Working documentation for the
Web ORB
and Oasis (WOO) system (1998-1999), the
last production database I worked on before leaving the University
of Illinois at Chicago. Not likely to be of general interest,
though I'm still proud of parts of it.
AIK L2: a letter to the staff of
Amalgamated Interkludge (AIK) from You Know Who, dated 8 November
1993. Amalgamated Interkludge was (is?) a fictional organization
whose products, policies and marketing slogans ("AIK: we have hammers
of many sizes") were a topic of humorous discussion among
members of the technical staff in the organization where I worked at
the time.
This piece was a way of letting off steam about issues of quality
control (in particular, the apparent determination of management to
prevent programmers from exerting any); the names used are either
those of well-known computer scientists or of colleagues (now: former
colleagues). It was distributed anonymously, but often quoted.
My colleague John Andrews did actually use the Andrews
test, though he used it as a milestone, rather than a completion test,
and when I wrote this
Fred Damen was well on his way to becoming a figure of legend,
though there is no record that he ever used the Damen test.
A directed-graph
data structure for text manipulation, abstract for a talk
given at the 1989 ICCH/ALLC conference The Dynamic Text
at the University of Toronto. Describes the ‘Rhine Delta’
data structure for the representation of textual variation. I
can't believe I was the first to invent this, but I have not found
clear antecedents.
The formal paper I intended to write on this work was never
finished, partly because other work was more pressing and partly
because I never quite figured out how to reconcile the Rhine
Delta with the non-linear representation of texts as trees or
graphs which is implicit in SGML and XML. (There is a reconciliation
of sorts in the TEI markup for textual variants, which can readily
be used to generate Rhine Delta structures within single elements
like paragraphs or lines of verse, but variations which span
structural boundaries still feel like an unsolved problem.)
There have been some requests for the XSL stylesheet I use for my
slides. I write my slides (like almost everything else I write) in
TEI Lite, and after years of using an aging copy of SoftQuad
Panorama Pro to project them, have changed to using an XSLT stylesheet
in an XSL-capable Web browser to do so. If you want to look at what
I've done and how it works, here are some pointers:
- TEI Lite DTD, hacked for XML (this
is my copy; there is a more official
copy at the TEI Consortium web site, which may vary in minor
ways)
- TEI Lite documentation
- tltohtml.xsl, my base XSLT
style sheet for translating TEI Lite into HTML
- tlslides.xsl, the stylesheet
which turns TEI div1 or top-level div elements into
slides and adds the hyperlinks; for the rest, it imports
the base tltohtml stylesheet. Set the browser locally to use
a large font; otherwise the slide titles are smaller than the slide
text (I haven't quite figured out CSS font-size properties, I guess).
- sample slides, with
HTML version produced by
tlslides.xsl and
printer-friendlier version
produced by tltohtml.xsl.
- A DTD for XSLT stylesheets; this works
for stylesheets that use xsl:element instead of having
literal result elements, and I use it to get better editing of stylesheets
in emacs using PSGML.
When giving talks, I currently use Galeon to display my slides; the
one drawback is that in order to persuade Galeon to display any
graphics I have to do a batch transformation to HTML, rather than
displaying directly from the XML. It's a pain, but a manageable pain.
Before moving to Linux, I used Microsoft's Internet Explorer for
displaying my slides, because of its built-in XSLT support. These
stylesheets worked with version 6.0; I haven't tried them with earlier
versions. (Before using IE for displaying slides, I used SoftQuad
Panorama [pause for teary-eyed recollections], but those stylesheets
are different.)
As a training and debugging aid, I frequently use the options on
the xsv and Xerces-J schema validators which cause them to write out
an XML representation of the post-schema-validation info set (PSVI),
and then run that through XSLT to display the document in a Web browser:
in the output, the text color reflects the value of the [validity]
property on each element or attribute green text means it's
valid, red means invalid, amber means unknown. The background
reflects the [validation attempted] property: white
background means fully validated, light gray partially validated,
dark gray unvalidated.
For example, to use both XSV and XercesJ to validate the XML
file theduck.xml against
the schema pointed to by its xsi:schemaLocation attribute,
I say simply
psvixsv theduck.xml
psvixercesj theduck.xml
The scripts cause new tabs to open in my Web browser window
with the two PSVI dumps in XML (for exploration) and the HTML versions
of the PSVI (for visual examination).
The things I use for this are:
- psvihtml.xsl: my XSLT stylesheet
for transforming PSVI dumps into HTML. This is the only thing you
really need; the other things listed here are merely for convenience
in using this XSLT file.
- shell scripts for invoking the processors with the right
options to elicit a PSVI dump in XML form:
- psvixsv: for XSV
- psvixercesj: for
Xerces J (this script also uses sed to
insert namespace declarations into the Xerces PSVI dump)
- xsv.xsl: an XSLT stylesheet I use to make
XSV output display with slightly more detail (not essential to the display
of the PSVI, but included because my script mentions it)
- sample input:
- theduck.xml: an XML document
- tds.xsd: a sample schema, which makes
parts of theduck.xml valid, some parts invalid, and some parts
unknown.
- sample output:
Note that XSV and Xerces disagree on whether the author element is
valid or invalid and on whether its child elements are notKnown or
invalid; open the two output files side by side and
the differences in color should make clear where the two processors
disagree.
Since I wrote these only for myself, I have not done
anything to the shell scripts to make them system independent; you
will need to adapt them to make sure the paths are right for your
system. If you have trouble, please send me email and I'll try to
help, within the limits of the time I have available and my
deplorably short attention span.
$Id: doclist.html,v 1.57 2007/03/24 02:05:04 cmsmcq Exp $