An XML Schema validator in logic-grammar form

A working paper prepared for the W3C XML Schema Working Group

C. M. Sperberg-McQueen

17 October 2002

N.B. not complete: work in progress



This document describes a translation of the XML Schema 1.0 specification into logic-grammar form. A Prolog interpreter can interpret the logic grammar in such a way as to provide either a schema compiler or a schema interpreter.
In its current state, this paper is unfinished. Some highlighted notes about what needs to be done to complete the paper are included.

1. Introduction

Outline of context, overview of paper

2. DCTGs and schema components

Translating schema documents into schema components (nodes with properties).
A processor may also accept schemas written in the XML transfer syntax; to make our processor do this, we need to meet the two obligations imposed on such processors: in addition to the core conformance requirements outlined above, they must
  • implement (enforce?) the Schema Representation Constraints (see Appendix C.3 and subsection 3 of each component) Proof obligation: provide routines which check the input schema documents against these constraints, and prove them correct.
  • follow the rules for mapping from schema documents to schema components. Proof obligation: provide routines which translate from the Prolog representation of an XML schema document into some representation of components; if this representation differs from that described in [Sperberg-McQueen 2003b], provide further routines which map from this initial representation to the working representation described above.

2.1. Attribute Declarations

In handling attribute declarations, we need to check the following:
  • SRC: Attribute declaration representation OK
  • * VR: Attribute locally valid
  • * VR: Schema-validity assessment (Attribute)
  • * SISC: Assessment outcome (Attribute)
  • SISC: Validation failure (Attribute)
  • SISC: Attribute declaration
  • SISC: Attribute validated by type
  • SCC: Attribute declaration properties correct
  • SCC: xmlns not allowed
  • SCC: xsi not allowed

2.2. Element Declarations

In handling element declarations, we need to check the following:
  • SRC: Element declaration representation OK
  • * VR: Element locally valid (Element)
  • * VR: Element locally valid (Type)
  • VR: Validation root valid (ID/IDREF)
  • * VR: Schema-validity assessment (Element)
  • SISC: Assessment outcome (Element)
  • SISC: Validation failure (Element)
  • SISC: Element declaration (Element)
  • SISC: Element validated by type
  • SISC: Element default value
  • SCC: Element declaration properties correct
  • SCC: Element default valid (immediate)
  • SCC: Substitution group OK (transitive)
  • SCC: Substitution group

2.3. Complex Type Definitions

In handling complex type definitions, we need to check the following:
  • SRC: Complex type definition representation OK
  • * VR: Element locally valid (Complex type)
  • SISC: Attribute default value
  • SCC: Complex type definition properties correct
  • SCC: Derivation valid (extension)
  • SCC: Derivation valid (restriction, complex)
  • SCC: Type derivation OK (Complex)

2.4. Attribute Uses

In handling attribute uses, we need to check the following:
  • * VR: Attribute locally valid (use)
  • SCC: Attribute use correct

2.5. Attribute Group Definitions

In handling attribute group definitions, we need to check the following:
  • SRC: Attribute group definition representation OK
  • SCC: Attribute group definition properties correct

2.6. Model Group Definitions

In handling model group definitions, we need to check the following:
  • SRC: Model group definition representation OK
  • SCC: Model group definition properties correct

2.7. Model Groups

In handling model groups, we need to check the following:
  • SRC: Model group representation OK
  • * VR: Element sequence valid
  • SCC: Model group correct
  • SCC: All group limited
  • SCC: Element declarations consistent
  • SCC: Unique particle attribution
  • SCC: Effective total range (all and sequences)
  • SCC: Effective total range (choice)

2.8. Particles

In handling particles, we need to check the following:
  • * VR: Element sequence locally valid (particle)
  • SCC: Particle correct
  • SCC: Particle valid (extension)
  • SCC: Particle valid (restriction)
  • SCC: Occurrence range OK
  • SCC: Particle restriction OK (Elt:Elt — Name and type OK)
  • SCC: Particle derivation OK (Elt:Any — NSCompat)
  • SCC: Particle derivation OK (Elt:All/Choice/Sequence — RecurseAsIfGroup)
  • SCC: Particle derivation OK (Any:Any — NSSubset)
  • SCC: Particle derivation OK (All/Choice/Sequence:Any — NSRecurseCheckCardinality)
  • SCC: Particle derivation OK (All:All, Sequence:Sequence — Recurse)
  • SCC: Particle derivation OK (Choice:Choice — RecurseLax)
  • SCC: Particle derivation OK (Sequence:All — RecurseUnordered)
  • SCC: Particle derivation OK (Sequence:Choice — MapAndSum)
  • SCC: Particle emptiable

2.9. Wildcards

In handling wildcards, we need to check the following:
  • SRC: Wildcard representation OK
  • VR: Item valid (Wildcard)
  • VR: Wildcard allows namespace name
  • SCC: Wildcard properties correct
  • SCC: Wildcard subset
  • SCC: Attribute wildcard union
  • SCC: Attribute wildcard intersection

2.10. Identity-constraint Definitions

In handling identity-constraint definitions, we need to check the following:
  • SRC: Identity-constraint definition representation OK
  • VR: Identity-constraint satisfied
  • SISC: Identity-constraint table
  • SCC: Identity-constraint definition properties correct
  • SCC: Selector value OK
  • SCC: Fields value OK

2.11. Notation Declarations

In handling notation declarations, we need to check the following:
  • SRC: Notation definition representation OK
  • SISC: Validated with notation
  • SCC: Notation declaration correct

2.12. Annotations

In handling annotations, we need to check the following:
  • SRC: Annotation definition representation OK
  • SCC: Annotation correct

2.13. Simple Type Definitions

In handling simple type definitions, we need to check the following:
  • SRC: Simple type definition representation OK
  • SRC: Simple type restriction (Facets)
  • * VR: String valid
  • SCC: Simple type definition properties correct
  • SCC: Derivation valid (Restriction, simple)
  • SCC: Type derivation OK (Simple)

2.14. Schemas as a Whole

In handling schemas as a whole, we need to check the following:
  • SRC: QName interpretation
  • SRC: QName resolution (Schema document)
  • * VR: QName resolution (Instance)
  • SISC: Schema information
  • SISC: ID/IDREF table
  • SCC: Schema properties correct

2.15. Schema composition

When allowing for schema composition, we need to check the following:
  • SRC: Inclusion constraints and semantics
  • SRC: Redefinition constraints and semantics
  • SRC: Individual component redefinition
  • SRC: Import constraints and semantics
  • SRC: Schema document location strategy

3. A two-level system

We can combine the two applications of DCTGs we have just outlined to create a two-level system:
  • At one level, a DCTG for schema documents in XML Schema 1.0 notation parses the documents and builds a schema, that is a set of components, represented as Prolog structures in the format defined by DCTG.
  • This set of components can be translated into a DCTG for performing schema-validity assessment against documents, at the second level.
The only missing piece is the translation of schema components (in the format described in the preceding section) into a definite-clause translation grammar (of the form described in the paper A logic-grammar representation of XML Schemas.
Translating schema components into DCTGs

4. Schema representation constraints (SRCs) not yet placed

[ Attribute Declaration Representation OK
In addition to the conditions imposed on <seg><attribute></seg> element information items by the schema for schemas, <seg>all of the following must be true:</seg>
  • default and fixed must not both be present.
  • If default and use are both present, use must have the actual value optional.
  • If the item's parent is not <seg><schema></seg>, then <seg>all of the following must be true:</seg>
    • One of ref or name must be present, but not both.
    • If ref is present, then all of <seg><simpleType></seg>, form and type must be absent.
  • type and <seg><simpleType></seg> must not both be present.
  • The corresponding attribute declaration must satisfy the conditions set out in Constraints on Attribute Declaration Schema Components.
]
[ Element Declaration Representation OK
In addition to the conditions imposed on <seg><element></seg> element information items by the schema for schemas: <seg>all of the following must be true:</seg>
  • default and fixed must not both be present.
  • If the item's parent is not <seg><schema></seg>, then <seg>all of the following must be true:</seg>
    • One of ref or name must be present, but not both.
    • If ref is present, then all of <seg><complexType></seg>, <seg><simpleType></seg>, <seg><key></seg>, <seg><keyref></seg>, <seg><unique></seg>, nillable, default, fixed, form, block and type must be absent, i.e. only minOccurs, maxOccurs, id are allowed in addition to ref, along with <seg><annotation></seg>.
  • type and either <seg><simpleType></seg> or <seg><complexType></seg> are mutually exclusive.
  • The corresponding particle and/or element declarations must satisfy the conditions set out in Constraints on Element Declaration Schema Components and Constraints on Particle Schema Components.
]
[ Complex Type Definition Representation OK
In addition to the conditions imposed on <seg><complexType></seg> element information items by the schema for schemas, <seg>all of the following must be true:</seg>
]
[ Attribute Group Definition Representation OK
In addition to the conditions imposed on <seg><attributeGroup></seg> element information items by the schema for schemas, <seg>all of the following must be true:</seg>
]
[ Model Group Definition Representation OK
In addition to the conditions imposed on <seg><group></seg> element information items by the schema for schemas, the corresponding model group definition, if any, must satisfy the conditions set out in Constraints on Model Group Schema Components.
]
[ Model Group Representation OK
In addition to the conditions imposed on <seg><all></seg>, <seg><choice></seg> and <seg><sequence></seg> element information items by the schema for schemas, the corresponding particle and model group must satisfy the conditions set out in Constraints on Model Group Schema Components and Constraints on Particle Schema Components.
]
[ Wildcard Representation OK
In addition to the conditions imposed on <seg><any></seg> element information items by the schema for schemas, the corresponding particle and model group must satisfy the conditions set out in Constraints on Model Group Schema Components and Constraints on Particle Schema Components.
]
[ Identity-constraint Definition Representation OK
In addition to the conditions imposed on <seg><key></seg>, <seg><keyref></seg> and <seg><unique></seg> element information items by the schema for schemas, the corresponding identity-constraint definition must satisfy the conditions set out in Constraints on Identity-constraint Definition Schema Components.
]
[ Notation Definition Representation OK
In addition to the conditions imposed on <seg><notation></seg> element information items by the schema for schemas, the corresponding notation definition must satisfy the conditions set out in Constraints on Notation Declaration Schema Components.
]
[ Annotation Definition Representation OK
In addition to the conditions imposed on <seg><annotation></seg> element information items by the schema for schemas, the corresponding annotation must satisfy the conditions set out in Constraints on Annotation Schema Components.
]
[ Simple Type Definition Representation OK
In addition to the conditions imposed on <seg><simpleType></seg> element information items by the schema for schemas, <seg>all of the following must be true:</seg>
  • The corresponding simple type definition, if any, must satisfy the conditions set out in Constraints on Simple Type Definition Schema Components.
  • If the <seg><restriction></seg> alternative is chosen, either it must have a base attribute or a <seg><simpleType></seg> among its children, but not both.
  • If the <seg><list></seg> alternative is chosen, either it must have an itemType attribute or a <seg><simpleType></seg> among its children, but not both.
  • Circular union type definition is disallowed. That is, if the <seg><union></seg> alternative is chosen, there must not be any entries in the memberTypes attribute at any depth which resolve to the component corresponding to the <seg><simpleType></seg>.
]
[ Simple Type Restriction (Facets)
For a simple type definition (call it R) to restrict another simple type definition (call it B) with a set of facets (call this S) <seg>all of the following must be true:</seg>
If above holds, the {facets} of R constitute a restriction of the {facets} of B with respect to S.
]
[ QName Interpretation
Where the type of an attribute information item in a document involved in validation is identified as QName, its actual value is composed of a local name and a namespace name. Its actual value is determined based on its normalized value and the containing element information item's in-scope namespaces following [W3C 1999]:
<seg>The appropriate case among the following must be true:</seg>
In the absence of the in-scope namespaces property in the infoset for the schema document in question, processors must reconstruct equivalent information as necessary, using the namespace attributes of the containing element information item and its ancestors.
]
[ QName resolution (Schema Document)
For a QName to resolve to a schema component of a specified kind <seg>all of the following must be true:</seg>
]
[ Inclusion Constraints and Semantics
In addition to the conditions imposed on <seg><include></seg> element information items by the schema for schemas, <seg>all of the following must be true:</seg>
  • If the actual value of the schemaLocation attribute successfully resolves <seg>one of the following must be true:</seg>
    • It resolves to (a fragment of) a resource which is an XML document (of type application/xml or text/xml with an XML declaration for preference, but this is not required), which in turn corresponds to a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    • It resolves to a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    In either case call the <seg><include></seg>d <seg><schema></seg> item SII, the valid schema I and the <seg><include></seg>ing item's parent <seg><schema></seg> item SII’.
  • <seg>one of the following must be true:</seg>
  • <seg>The appropriate case among the following must be true:</seg>
    • or above is satisfied
      the schema corresponding to SII’ must include not only definitions or declarations corresponding to the appropriate members of its own children, but also components identical to all the schema components of I.
    • above is satisfied
      the schema corresponding to the <seg><include></seg>d item's parent <seg><schema></seg> must include not only definitions or declarations corresponding to the appropriate members of its own children, but also components identical to all the schema components of I, except that anywhere the absent target namespace name would have appeared, the actual value of the targetNamespace attribute of SII’ is used. In particular, it replaces absent in the following places:
      1. The target namespace of named schema components, both at the top level and (in the case of nested type definitions and nested attribute and element declarations whose code was qualified) nested within definitions;
      2. The {namespace constraint} of a wildcard, whether negated or not;
It is not an error for the actual value of the schemaLocation attribute to fail to resolve it all, in which case no corresponding inclusion is performed. It is an error for it to resolve but the rest of clause 1 above to fail to be satisfied. Failure to resolve may well cause less than complete assessment outcomes, of course.
[ As discussed in Missing Sub-components, QNames in XML representations may fail to resolve, rendering components incomplete and unusable because of missing subcomponents. During schema construction, implementations are likely to retain QName values for such references, in case subsequent processing provides a referent. Absent target namespace names of such as-yet unresolved reference QNames in <seg><include></seg>d components should also be converted if is satisfied. ]
]
[ Redefinition Constraints and Semantics
In addition to the conditions imposed on <seg><redefine></seg> element information items by the schema for schemas <seg>all of the following must be true:</seg>
  • If there are any element information items among the children other than <seg><annotation></seg> then the actual value of the schemaLocation attribute must successfully resolve.
  • If the actual value of the schemaLocation attribute successfully resolves <seg>one of the following must be true:</seg>
    • it resolves to (a fragment of) a resource which is an XML document (see ), which in turn corresponds to a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    • It resolves to a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    In either case call the <seg><redefine></seg>d <seg><schema></seg> item SII, the valid schema I and the <seg><redefine></seg>ing item's parent <seg><schema></seg> item SII’.
  • <seg>one of the following must be true:</seg>
  • <seg>The appropriate case among the following must be true:</seg>
  • Within the children, each <seg><simpleType></seg> must have a <seg><restriction></seg> among its children and each <seg><complexType></seg> must have a restriction or extension among its grand-children the actual value of whose base attribute must be the same as the actual value of its own name attribute plus target namespace;
  • Within the children, for each <seg><group></seg> <seg> the appropriate case among the following must be true:</seg>
  • Within the children, for each <seg><attributeGroup></seg> <seg> the appropriate case among the following must be true:</seg>
    [ An attribute group restrictively redefined per corresponds to an attribute group whose {attribute uses} consist all and only of those attribute uses corresponding to <seg><attribute></seg>s explicitly present among the children of the <seg><redefine></seg>ing <seg><attributeGroup></seg>. No inheritance from the <seg><redefine></seg>d attribute group occurs. Its {attribute wildcard} is similarly based purely on an explicit <seg><anyAttribute></seg>, if present. ]
]
[ Individual Component Redefinition
Corresponding to each non-<seg><annotation></seg> member of the children of a <seg><redefine></seg> there are one or two schema components in the <seg><redefine></seg>ing schema:
  1. The <seg><simpleType></seg> and <seg><complexType></seg> children information items each correspond to two components:
    1. One component which corresponds to the top-level definition item with the same name in the <seg><redefine></seg>d schema document, as defined in Schema Component Details, except that its name is absent;
    2. One component which corresponds to the information item itself, as defined in Schema Component Details, except that its base type definition is the component defined in 1.1 above.
    This pairing ensures the coherence constraints on type definitions are respected, while at the same time achieving the desired effect, namely that references to names of redefined components in both the <seg><redefine></seg>ing and <seg><redefine></seg>d schema documents resolve to the redefined component as specified in 1.2 above.
  2. The <seg><group></seg> and <seg><attributeGroup></seg> children each correspond to a single component, as defined in Schema Component Details, except that if and when a self-reference based on a ref attribute whose actual value is the same as the item's name plus target namespace is resolved, a component which corresponds to the top-level definition item of that name and the appropriate kind in I is used.
In all cases there must be a top-level definition item of the appropriate name and kind in the <seg><redefine></seg>d schema document.
]
[ Import Constraints and Semantics
In addition to the conditions imposed on <seg><import></seg> element information items by the schema for schemas <seg>all of the following must be true:</seg>
  • <seg>The appropriate case among the following must be true:</seg>
  • If the application schema reference strategy using the actual values of the schemaLocation and namespace attributes, provides a referent, as defined by Schema Document Location Strategy, <seg>one of the following must be true:</seg>
    • The referent is (a fragment of) a resource which is an XML document (see ), which in turn corresponds to a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    • The referent is a <seg><schema></seg> element information item in a well-formed information set, which in turn corresponds to a valid schema.
    In either case call the <seg><schema></seg> item SII and the valid schema I.
  • <seg>The appropriate case among the following must be true:</seg>
It is not an error for the application schema reference strategy to fail. It is an error for it to resolve but the rest of above to fail to be satisfied. Failure to find a referent may well cause less than complete assessment outcomes, of course.
The schema components (that is {type definitions}, {attribute declarations}, {element declarations}, {attribute group definitions}, {model group definitions}, {notation declarations}) of a schema corresponding to a <seg><schema></seg> element information item with one or more <seg><import></seg> element information items must include not only definitions or declarations corresponding to the appropriate members of its children, but also, for each of those <seg><import></seg> element information items for which above is satisfied, a set of schema components identical to all the schema components of I.
]
[ Schema Document Location Strategy
Given a namespace name (or none) and (optionally) a URI reference from xsi:schemaLocation or xsi:noNamespaceSchemaLocation, schema-aware processors may implement any combination of the following strategies, in any order:
  1. Do nothing, for instance because a schema containing components for the given namespace name is already known to be available, or because it is known in advance that no efforts to locate schema documents will be successful (for example in embedded systems);
  2. Based on the location URI, identify an existing schema document, either as a resource which is an XML document or a <seg><schema></seg> element information item, in some local schema repository;
  3. Based on the namespace name, identify an existing schema document, either as a resource which is an XML document or a <seg><schema></seg> element information item, in some local schema repository;
  4. Attempt to resolve the location URI, to locate a resource on the web which is or contains or references a <seg><schema></seg> element;
  5. Attempt to resolve the namespace name to locate such a resource.
Whenever possible configuration and/or invocation options for selecting and/or ordering the implemented strategies should be provided.
]

A. Works cited and further reading

Abramson, Harvey. 1984. “Definite Clause Translation Grammars”. Proceedings of the 1984 International Symposium on Logic Programming, Atlantic City, New Jersey, February 6-9, 1984, pp. 233-240. (IEEE-CS 1984, ISBN 0-8186-0522-7)

Abramson, Harvey, and Veronica Dahl. 1989. Logic Grammars. Symbolic Computation AI Series. Springer-Verlag, 1989.

Abramson, Harvey, and Veronica Dahl, rev. Jocelyn Paine. 1990. DCTG: Prolog definite clause translation grammar translator. (Prolog code for translating from DCTG notation to standard Prolog. Note says syntax extended slightly by Jocelyn Paine to accept && between specifications of grammatical attributes, to minimize need for parentheses. Available from numerous AI/NLP software repositotries, including http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/prolog/code/syntax/dctg/0.html, http://www.ims.uni-stuttgart.de/ftp/pub/languages/prolog/libraries/imperial_college/dctg.tar.gz, and http://www.ifs.org.uk/~popx/prolog/dctg/.)

Alblas, Henk. 1991. “Introduction to attribute grammars”. Attribute grammars, applications and systems: International Summer School SAGA, Prague, Czechoslovakia, June 4-13, 1991, Proceedings, pp. 1-15. Berlin: Springer, 1991. Lecture Notes in Computer Science, 545.

Bratko, Ivan. 1990. Prolog programming for artificial intelligence. Second edition. Wokingham: Addison-Wesley. xxi, 597 pp.

Brown, Allen L., Jr., and Howard A. Blair. 1990. “A logic grammar foundation for document representation and layout”. In EP90: Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography, ed. Richard Furuta. Cambridge: Cambridge University Press, 1990, pp. 47-64.

Brown, Allen L., Jr., Toshiro Wakayama, and Howard A. Blair. 1992. “A reconstruction of context-dependent document processing in SGML”. In EP92: Proceedings of Electronic Publishing, 1992, ed. C. Vanoirbeek and G. Coray. Cambridge: Cambridge University Press, 1992. Pages 1-25.

Brüggemann-Klein, Anne. 1993. Formal models in document processing. Habilitationsschrift, Freiburg i.Br., 1993. 110 pp. Available at ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps (Cover pages archival copy also at http://www.oasis-open.org/cover/bruggDissert-ps.gz).

[Brüggemann-Klein provides a formal definition of 1-unambiguity, which corresponds to the notion of unambiguity in ISO 8879 and determinism in XML 1.0. Her definition of 1-unambiguity can be used to check XML Schema's Unique Particle Attribution constraint by changing every minOccurs and maxOccurs value greater than 1 to 1, if the two are equal, and otherwise changing minOccurs to 1 maxOccurs greater than 1 to unbounded.]

Clocksin, W. F., and C. S. Mellish. Programming in Prolog. Second edition. Berlin: Springer, 1984.

Gal, Annie, Guy Lapalme, Patrick Saint-Dizier, and Harold Somers. 1991. Prolog for natural language processing. Chichester: Wiley, 1991. xiii, 306 pp.

Gazdar, Gerald, and Chris Mellish. 1989. Natural language processing in PROLOG: An introduction to computational linguistics. Wokingham: Addison-Wesley, 1989. xv, 504 pp.

Grune, Dick, and Ceriel J. H. Jacobs. 1990. Parsing techniques: a practical guide. New York, London: Ellis Horwood, 1990. Postscript of the book is available from the first author's Web site at http://www.cs.vu.nl/~dick/PTAPG.html

Knuth, D. E. 1968. “Semantics of context-free languages”. Mathematical Systems Theory 2: 127-145.

König, Esther, and Roland Seiffert. Grundkurs PROLOG für Linguisten. Tübingen: Francke, 1989. [= Uni-Taschenbücher 1525]

Sperberg-McQueen, C. M. 2003a. “Notes on logic grammars and XML Schema”. Working paper prepared for the W3C XML Schema Working Group. [Incomplete; current draft is at dctgnotes.html. Introduction to logic grammar notation, illustrative translation of purchase-order schema into logic grammar form.]

Sperberg-McQueen, C. M. 2003b. “A logic grammar representation for XML Schema”. Working paper prepared for the W3C XML Schema Working Group. [Incomplete; incomplete outline is at lgrxs.html. Attempt at a systematic demonstration that DCTGs (can be made to) correctly implement the validation rules, provide the schema-infoset contributions, and obey the constraints on schemas defined in and .]

Stepney, Susan. High-integrity compilation. Prentice-Hall. Available from http://www-users.cs.york.ac.uk/~susan/bib/ss/hic/index.htm. Chapter 3 (Using Prolog) provides a terse introduction to DCTG notation and use.

[W3C 1999] W3C. Namespaces in XML, ed. Tim Bray et al. W3C Recommenation 14 January 1999. See http://www.w3.org/TR/1999/REC-xml-names-19990114/

[W3C 2001a] “XML Schema Part 0: Primer”, ed. David Fallside. W3C Recommendation, 2 May 2001. [Cambridge, Sophia-Antipolis, Tokyo: W3C] http://www.w3.org/TR/xmlschema-0/.

[W3C 2001b] 2001. XML Schema Part 1: Structures, ed. Henry S. Thompson, David Beech, Murray Maloney, and Noah Mendelsohn. W3C Recommendation 2 May 2001. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

[W3C 2001c] W3C. 2001. XML Schema Part 2: Datatypes, ed. Biron, Paul V. and Ashok Malhotra. W3C Recommendation 2 May 2001. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

Wielemaker, Jan. “SWI-Prolog SGML/XML parser: Version 1.0.14, March 2001”. http://www.swi-prolog.org/packages/sgml2pl.html

For more information on the representation of SGML and XML documents as Prolog structures, see the SWI add-ons documentation [Wielemaker 2001]. Other representations are possible; I have used this one because it's convenient and because representing sibling sets in a list makes it easier to use DCGs and DCTGs.
Definite-clause grammars (DCGs) are introduced in almost any good Prolog textbook: e.g. [Clocksin/Mellish 1984], [Bratko 1990]. They are discussed at somewhat greater length in treatments of Prolog for natural-language processing, including [König/Seiffert 1989], [Gazdar/Mellish 1989], and [Gal et al. 1991]. Most extended discussions show how to use additional arguments to record syntactic structure or handle the semantics of the material.
Definite-clause translation grammars were introduced as a way of making it easier to handle semantics; they provide explicit names for attributes (in the sense of attribute grammars [Knuth 1968]).

B. To do

C. Toward a useful layering

The layering proposed above is a start; I'd like to confirm it by walking through a validation episode or two. I'll start with the purchase order from the tutorial. We visit the following validation rules; under each, I note some features I think should probably be omitted from layer 1, and introduced as elaborations later, in some sequence to be determined.
  • Sec. 5.2, rule 3
    • ability of user or application to stipulate a type definition at startup
    • ability of user or application to stipulate an element declaration at startup
    • ability of user or application to stipulate a starting point other than the root of the document
  • sva_element_334
    • use of xsi:type (and with it clause 1.2)
  • qname_resolution_instance_3f4
  • elementlocallyvalid_element_334
    • absent declarations
    • abstract element declarations
    • xsi:nil
    • default values, fixed values (?)
  • element_locally_valid_type_334
    • absent type definitions
    • abstract types
  • string valid (and from there to simple-type checking)
  • element_locally_valid_complextype
    • attribute wildcards
    • wild IDs
  • element sequence locally valid (particle)
    wildcards
    ...
  • element sequence valid
  • attribute locally valid (use) 354
  • schema-validity assessement (attribute) 324
  • assessment outcome (attribute) 325