How being schema-valid is different
from being pregnant
C. M. Sperberg-McQueen
Poster presentation at XML 2003, Philadelphia, December 2003
Validation with some schema languages (e.g. XML 1.0 and SGML
DTDs) is a black-or-white question: either the document is (wholly)
valid or it's not valid (at all). There are no gray areas: a
document cannot be ‘mostly valid’ any more
than one can be ‘a little bit pregnant’.
The theoretical information content of a validation result in
these systems is thus exactly one bit: yes/valid or no/invalid.
In practice good validators try to provide a little more information
about errors, but quality of error diagnostics varies and error
handling is not standardized.
In other languages (e.g. XML Schema), validity assessment
is designed to provide more than one binary digit of useful
information. XML Schema allows various forms of partial validation:
- Validation can start at an element other than the root.
- Wildcards can specify that elements they match should
not be validated but skipped (aka ‘black-box
processing’ -- the data must be well-formed XML, but
don't look inside).
- Wildcards can specify ‘lax’ validation
(matching elements must be well-formed XML, and if the schema has
declarations for them, they'll be validated, but the absence of
declarations doesn't make the container invalid).
In XML Schema, schema-validity is captured by three properties:
- [validation attempted]: Did we try to validate this
item? Values: full, none, partial.
- [validity]: Is the element valid? Values:
valid, invalid, notKnown.
- [schema error code]: If the item is not valid,
then a list of error codes (references to XML Schema validation rules) explaining
why.
[P.S. 3 February 2004:
The stylesheet and shell scripts I use to
generate these results are now
linked
from
my People page.]
Thanks to Michael Hahn (Document Management Solutions, Inc.) for
suggesting the Ogden Nash poem as a useful example.
Thanks to the XML 2003 poster chair Kate Hamilton of Mulberry Technologies
for encouragement and advice.
$Id: poster1.html,v 1.1 2004/02/03 16:39:27 cmsmcq Exp $