Bare-bones TEI

[6 January 2010]

The markup vocabulary defined by the Text Encoding Initiative is, for good and sound reasons, rather large and in some ways rather complicated. From time to time, however, it’s useful to have a radically cut-down version of the TEI vocabulary. For people just learning TEI markup (to name just one instance), a cut-down version can simplify initial training and make the TEI feel a little less intimidating.

Many people use TEI Lite, which is much smaller than full TEI. But for some work I’m doing with Yves Marcoux and Claus Huitfeldt, we felt it would be handy to have an even smaller subset of TEI to work with. Years ago (1994), I defined a profile of TEI called ‘Bare Bones TEI’, intended not so much for serious use as for training and for thought experiments.

I had long regarded the full details of bare-bones TEI as lost to history: the documentation has been preserved by Robin Cover at the XML Cover Pages, but it didn’t include the DTD modification file showing the precise changes from the then-current TEI DTD. I had tried a few times, searching both the Web and my hard disk, to find the original data, but had had no luck. But the other day, for reasons I don’t think I can explain, an attempt at one more search eventually found an old copy of the documentation, the modification files, and the full DTD for bare-bones TEI.

I’ve now translated the original documentation to XML, added updated links to the various DTD files, and added parameter entities to the DTD to make it work both for SGML contexts and for XML. The documentation for Bare bones TEI is now available on this site, as is the modified DTD.

The current version of the bare-bones DTD is based on TEI P3, and the translation to XML loses a small amount of information involving the pb (page break) element. Eventually, perhaps, I’ll apply the bare-bones customization to TEI P5 and produce an updated version of the schema and documentation.

Note: in searching for Bare-bones TEI on the TEI Consortium site just now, I discovered that someone has produced a similar profile, called ‘Bare TEI’. [Further research shows that although the page on customizations does not identify the authors, the work was done originally by Laurent Romary and later edited by Syd Bauman, Lou Burnard, and possibly also Sebastian Rahtz.] This may be worth exploring, for those seeking a minimal profile of TEI. Unfortunately, I haven’t found any usable documentation for Bare TEI, so I don’t know the design principles that govern it, and the DTD is full of undefined ghost elements (or ‘zombies’), which renders it unfortunately cumbersome in a syntax-directed editor. So for now I’m sticking with bare-bones TEI.

5 thoughts on “Bare-bones TEI

  1. If you visit the Roma website at http://www.tei-c.org/Roma/ you will find that you this can generate a perfectly good (zombie free) DTD, and also some HTML documentation, from the TEI Bare ODD file. You’ll have to ask Laurent what the “design principles” underlying it were, since he’s responsible for the original ODD file, but I am pretty sure they were the same as those in your original P3 (!) version.

    We use this schema as part of the P5 validation suite, and also in teaching Roma as a very simple way into the TEI, to show that TEI schemas can be very very simple — and that this is not always an unmixed blessing.

  2. I must also apologise for the fact that the TEI vault is currently unavailable from the main TEI website. The folks responsible for maintaining this site are aware of the problem and assure me it will be rectified imminently. You can meantime access it at http://projects.oucs.ox.ac.uk/teiweb/Vault

    Much to my chagrin however I discover it doesn’t seem to have a copy of TEI U6 or associated files in it as yet! Assuming I don’t find them somewhere, would you allow us to take copies from your website?

  3. Lou writes

    If you visit the Roma website at http://www.tei-c.org/Roma/ you will find that you this can generate a perfectly good (zombie free) DTD, …

    but in fact at the moment the DTD Roma is generating looks much the same as the one on the tei-c.org site. Maybe I’m misreading something, but in a declaration like

    <!ELEMENT TEI (teiHeader,
                  ((_DUMMY_model.resourceLike+,text?) |
                  text))>
    

    the reference to _DUMMY_model.resourceLike (which is not declared as an element anywhere in the DTD) looks like a zombie to me.

  4. Hmm, I think this is a fair cop. I was wrong to imply that Roma was clearing up undeclared elements. It obviously isn’t. On the other hand, none of the DTD parsers I’ve tried seems to give two hoots about the presence of said undeclared elements in the content models. I’m going to try redefining tei_bare using the new ODD syntax to see if that does better (it’s the ghost of those parameter entities which is at the root of this problem)

  5. The “TEI Bare” customization was indeed concocted by Laurent Romary as a teaching aid; it is meant to be used as the starting point for real work. CMSMcQ’s Bare is slightly different, perhaps closer in rationale to TEI Tite.

    I would point out that other syntax-directed editors than emacs psgml are available, and have no problems with the Haitian parts of the TEI P5 DTDs…. though of course one will get better results (eg datatyping) with the relaxng or w3c schemas.

Comments are closed.