Trip Report:
International Conference on Very Large-Scale Knowledge Bases and
Knowledge Systems, Tokyo, 1-2 December 1993
Workshop on Very Large-Scale Knowledge Bases and Knowledge Systems,
Tokyo, 3-4 December 1993
SGML '93, Boston, 6-9 December 1993
SGML Open Technical Committee meeting, Boston, 10 December 1993
C. M. Sperberg-McQueen
December 1993
This trip report must be too brief to do justice to the meetings I
attended, but perhaps I can record some points of interest.
The International Conference on Building and Sharing of Very Large-Scale
Knowledge Bases '93 was organized by the Japan Information Processing
Development Center, with the avowed hope that it will be followed by
other similar conferences held elsewhere. Some speculation around the
conference held that it was intended as a prelude to a large-scale
knowledge base project intended to be a successor to the Fifth
Generation and Electronic Dictionary Research projects, a connection
subtly conveyed by the keynote addresses, given by by Kazuhiro Fuchi of
the Fifth Generation Project and Toshio Yokoi of the EDR.
Most notable to me, in Yokoi's keynote address, was the emphasis on
problems like "Making information machine-readable" and "Making
information hyperstructured", for which SGML appears to offer an
important tool for future work. I was also interested by the stress on
reusability and modifiability of the knowledge representation methods
used in building large-scale knowledge bases, since many of the concerns
expressed mirror those of the TEI. As the conference wore on, I began
to realize that for this audience, at least, the simplest way to
describe the TEI's product is as a 'text representation language',
analogous to a 'knowledge representation language.' (This notion then
invites speculation on how the SGML and TEI communities can develop
better languages for modeling, querying, and manipulation of texts.)
During the first morning, Norio Fujisawa gave an outline of a Platonic
view of 'knowledge' which provided a high standard of clarity and a
refreshing background for consideration of what everyone else was
talking about. "Plato, however, clearly denies the information stored
in a book (or in a computer) to be in itself true
knowledge." Most interesting was Fujisawa's objection to
the usual subject-predicate, substance-attribute method of expressing
propositional content; his remarks could not help reminding me, however,
of an imaginary language described by Jorge Luis Borges in which no
nouns exist and the only open lexical class is that of verbs, precisely
in order to avoid the use of subject-predicate patterns.
There followed a survey of current language technology, in which Makato
Nagao gave an informative survey of NLP work, and Susan
Armstrong-Warwick spoke of problems involved in the acquisition and
exploitation of textual resources for NLP. SAW rather stressed the
difficulties of acquiring permissions, and downplayed the fundamental
problems of data representation, which rather disappointed me, but
otherwise I liked her talk quite well.
A session on "Sharable Knowledge Sources" allowed Antonio Zampolli to
list the bewildering variety of European projects aimed a producing
same, and Susan Hockey to describe the TEI and its relevance for
sharable materials. In the same session, Douglas Lenat spoke about the
Cyc project, providing (inter alia) a virtuoso set of reasons for not
publishing a lot of papers about a project, including the unanswerable
one that getting the project done is more important than publishing
papers about it. This endeared him to my heart enough that I was able
to forgive him when he, too, radically understated the amount of
information and sophistication present in properly done representations
of textual material.
The concluding panel included a great deal of wise speculation about the
future of knowledge bases, which I cannot summarize if I am to get this
report out today. Its most memorable moment, for me, came during the
question period, when Hisao Yamada (of the National Center for Science
Information Systems of Japan) said everything everyone had said seemed
to be floating in the air, because all the techniques they were talking
about were suitable for Western languages, but not for scripts which use
Chinese characters. Knowledge representation languages, SGML, the TEI
were fine for alphabets, but have not addressed the problem of writing
in kana, let alone kanji. I had risen to reply, but the chair
recognized Prof. Yokoi, who spoke at some length about the fact that
the TEI had in fact addressed the character set issue head on, thanks in
large part to the work of Prof. Tutiya, and that continued
collaboration with the TEI was an obvious desideratum for Japanese work
in document and natural language processing, for which funding was being
actively sought. Having nothing to add (and indeed not wanting to spoil
the moment), I sat back down without speaking.
During the conference, several members of the steering committee met
with several representatives of the Japanese research establishment; a
memorandum of the discussion at this meeting will be distributed
separately.
The next two days were occupied by a workshop on the same topic as the
conference, but somewhat less formal and limited to 60 people, instead
of the 450-odd who had been attending the conference. There were a
number of good talks, as well as some rather disappointing ones, but the
most important results of the meeting for me lay in the personal contact
made. I believe that at least Tim Finin of Univ. Maryland/Baltimore
County, who is working on a language for distributed knowledge querying,
now has a stronger interest in SGML and the TEI as a basis for text
representation, which should fill a gaping hole in the language he is
designing as it now exists. Also notable was the interest in SGML from
the database people, including separate invitations from Joachim Schmidt
of Hamburg and A. Desai Narasimhalu of Singapore to consider
collaborating with them, and/or using their software systems as a basis
for work with SGML. Schmidt is developing object-oriented dbms for
objects of a polymorphic type system with variable persistence;
Narasimhalu and his colleagues have already developed an SQL-based
Document Query Language (which was presented at SGML '93) and, in
conjunction with Fujitsu, an SGML system built on a dbms foundation.
On the final morning of the workshop, Syun Tutiya arranged for me to
have breakfast with several representatives of East Asian countries,
including China, Korea, Malaysia, and Thailand. We exchanged
compliments and expressions of mutual interest, and I explained a bit
about TEI character set handling, stressing that the definition of TEI
conformance has no component governing character set usage, so that TEI
conformance is possible no matter what character set mechanisms one is
using; this seemed to reassure the Thais in particular. ST expressed
some interest in getting people from other countries in East Asia to
attend a TEI workshop in Asia sometime in 1994, and even suggested that
perhaps it should be held elsewhere in Asia, rather than in Japan. Our
colleague in Bangkok (Vilas Wuwongse of the Asian Institute of
Technology) later privately offered assistance with organizing such a
workshop in Bangkok.
From Tokyo I went directly to Boston, where the Graphic Communications
Association sponsored its annual SGML conference, and where Lou Burnard
and I made a TEI sandwich of the meeting by giving, respectively, the
opening and the closing keynote addresses. Lou managed, by focusing on
the acccomplishments of the TEI in using SGML, to clarify the direct
relevance of the TEI to other SGML projects in a way that previous talks
on the TEI succeeded in doing; focusing as they often have done on the
peculiarly difficult problems posed by some older texts, some of our
earlier presentations clearly left some people with the idea that the
TEI was all about medieval manuscripts, and had nothing to do with
problems like modularity of DTDs, class systems for SGML elements,
version control, and the like.
As usual, the conference was full of interesting talks and attended by
many interesting people; probably the most important items to note in
this report are these:
-
Charles Goldfarb and Erik Naggum announced the release, in the first
quarter of 1994, of a C++ library of SGML parsing routines
and a 'portable object-oriented entity manager' (POEM),
implemented by a consortium going under the name of Project YAO.
This library should make it easier to embed SGML awareness in
processors other than SGML parsers, and POEM should make it
easier to use external entities other than files in SGML
documents.
-
A talk by Dave Sklar of Electronic Book Technologies, and a panel of
implementers, on the subject of SGML transformation engines,
showed quite clearly that the problem of text manipulation
is receiving a good deal of attention. The sample problems, and
their solutions using the SGML Hammer (Avalanche), OmniMark
(Exoterica), Balise and Polypus (AIS/Berger-Levrault), TagWrite
(Zandar), and CoST (the Copenhagen SGML Tool, a non-commercial
tool written by Claus Harbo, rather in the style of Lou
Burnard's Spitbol-based tf filter program) should all be posted
on comp.text.sgml, and may be collected and published in a
journal of some kind.
-
Word Perfect appeared for the first time among the vendors; alas, I
never did see the demo. But they were there.
-
Several vendors showed database-oriented projects, most (but I think
not all) based on an underlying database technology (rather than
on the left-right scanning characteristic of most existing
SGML products).
The day after the meeting concluded, the vendor consortium SGML Open
held a full day of meetings, which Lou and I attended on behalf of the
TEI. According to Yuri Rubinsky, the TEI will be an affiliated
organization, which gives us the same rights and privileges as a
corporate associate member (i.e. somewhat less than the rights of a
corporate sponsoring member, and much more than a simple subscriber).
Notably, we will have the ability to participate as members in the
technical committees of SGML Open, though not the ability to vote on the
adoption of committee reports by SGML Open. I was impressed into
service on a working committee to address character set issues, but
managed to persuade Steve Edwards of Recording for the Blind to chair
the committee, and to persuade the committee to accept chapters CH and
WD as representing the TEI position on character sets.
C. M. Sperberg-McQueen