A logic grammar representation for XML Schema

A working paper prepared for the W3C XML Schema Working Group

C. M. Sperberg-McQueen

25 March 2003

N.B. not complete: work in progress

1. Introduction
2. DCTGs as attribute grammars
3. Element-only validation
- 3.1. Overview of validation rules
- 3.2. Starting a validation episode
- 3.3. QName lookup
- 3.4. Assessing the schema-validity of an element
- 3.5. Local validity of elements
- 3.6. Local validity against a complex type
- 3.7. Local validity of element sequences (children)
- 3.8. Element sequence validity
- 3.9. PSVI properties
- 3.10. Error handling and diagnosis
- 3.11. Constraints on schemas
  - 3.11.1. Elements
  - 3.11.2. Complex types
  - 3.11.3. Other
4. Simple types
- 4.1. Validation
- 4.2. SICs
- 4.3. Component structure
5. Handling attributes
- 5.1. Validation
- 5.2. SICs
- 5.3. Constraints on schema components
6. xsi:type
- 6.1. xsi:type
- 6.2. Top-level invocation with type definition
7. Wildcards
8. Substitution groups
9. Derivation relations
10. Miscellaneous
- 10.1. ID/IDREF
- 10.2. Identity constraints
- 10.3. Notations
- 10.4. Annotations
- 10.5. Top-level schema component
- 10.6. Groups
11. Schema information set contributions (SICs) not yet placed
12. Terms and primitive concepts to be defined

A. Works cited and further reading
B. To do
C. Toward a useful layering

This document describes a method of representing XML Schema 1.0 schemas as logic grammars. The term logic grammar is used to denote grammars written in logic-programming systems; this paper works with definite-clause translation grammars (DCTGs), which employ a formalism which closely resembles attribute grammars as described by [Knuth 1968] and later writers.

In its current state, this paper is unfinished. Some highlighted notes about what needs to be done to complete the paper are included.

1. Introduction

Brief review of the context; overview of the paper.

In a separate paper [Sperberg-McQueen 2003a], I have illustrated how logic grammar formalisms can be used to elaborate a document grammar which supports the most important features of XML Schema 1.0. The goal has been to familiarize the reader with the notations and to make plausible the idea that logic grammars can in fact be used to represent schemas.

In this paper, I begin the systematic construction of a logic-grammar representation of XML Schema 1.0, by showing how to represent the Validation Rules of XML Schema in logic-grammar or pure-Prolog form. Following a process of stepwise refinement, I'll start with a core layer of functionality similar to that already shown in [Sperberg-McQueen 2003a], leaving out as many complications and optional features as possible, then adding features one by one until all the validation rules have been covered. In the process, I will also pause from time to time to discuss the Constraints on Schemas (COS) rules imposed by and the conditions under which DCTGs as used here are guaranteed to conform to the COS rules.

Section 2 describes DCTGs as a kind of attribute grammar and establishes some basic terminology. Section 3 constructs the basic framework of a validator: starting from section 5.2 of [W3C 2001b], which describes how a schema-validation episode may be initiated, the exposition walks through the validation rules one by one, focusing on the validation of elements against content models. Later sections will add support for the other features of XML Schema 1.0.

2. DCTGs as attribute grammars

Following [Alblas 1991], one may characterize an attribute grammar as a 5-tuple AG = (G, SD, AD, R, C):

G = (V_N, V_T, P, Z) is the underlying context-free grammar ...
SD = (TYPE-SET, FUNC-SET) is a semantic domain.
AD = (A, I, S, TYPE) is a description of attributes.
R(p) is a finite set of attribute evaluation rules associated with production p ∈ P.
C(p) is a finite set of semantic conditions associated with production p ∈ P.

A definite-clause translation grammar (DCTG) is, similarly, ...

Complete this description ...

3. Element-only validation

3.1. Overview of validation rules

Overview of the validation rules encountered in validating an element, with dependency graph, to give the user some sense of what lies ahead and how it's structured.

3.2. Starting a validation episode

Section 5.2 of says there are three ways to start a validation episode. In each case, the invoker of the schema processor specifies an information item (an element or an attribute[1]) to be validated. The three variants are:

The invoker specifies a top-level type.[2]
The invoker specifies a top-level declaration for the element [or attribute], against which it should be validated.
The invoker specifies neither.

The formal description in the spec (section 5.2) reads:

With a schema [...], the schema-validity of an element information item can be assessed. Three primary approaches to this are possible:
The user or application identifies a complex type definition from among the {type definitions} of the schema, and appeals to Schema-Validity Assessment (Element) (§3.3.4) (clause 1.2);

The user or application identifies a element declaration from among the {element declarations} of the schema, checks that its {name} and {target namespace} match the [local name] and [namespace name] of the item, and appeals to Schema-Validity Assessment (Element) (§3.3.4) (clause 1.1);

The processor starts from Schema-Validity Assessment (Element) (§3.3.4) with no stipulated declaration or definition, and either strict or lax assessment ensues, depending on whether or not the element information and the schema determine either an element declaration (by name) or a type definition (via xsi:type) or not.

In order to model all three approaches described, we can define three ways to invoke the schema validator:

with an element and a complex or simple type definition
with an element and a top-level element declaration
with an element alone

If we assume that there is only one schema available in the current Prolog workspace, these can be implemented with the following Prolog predicates:

validity_by_type(Element,Type,Result).
validity_by_decl(Element,Elemdecl,Result).
validity(Element,Result).

The last of these forms seeks an xsi:type attribute or an element declaration, and then invokes the appropriate predicate with the type definition or element declaration it finds:

< 1 Validity assessment with lookup (first cut) > ≡

validity(Element,Result) :-
  xsitype_label(Element,Type), !,
  validity_by_type(Element,Type,Result).
validity(Element,Result) :-
  qnamelookup(Element,Decl), !,
  validity_by_decl(Element,Decl,Result).

Continued in < 2 >
This code is not used elsewhere.

If neither a type definition nor an element declaration is found, lax validation is undertaken:

< 2 [continues 1 Validity assessment with lookup (first cut)] > ≡

validity(Element,Result) :-
  validity_lax(Element,Result).

It is obvious that these predicates should also be useful when we recursively validate the attributes and children of an element.

The outcome of the validation will be recorded in the [validation attempted] and [validity] properties of the element we start with (and normally also the same properties on its descendants, except where we hit a black-box or white-box wildcard). I'll show the details of that later. But the PSVI should also record the starting point of the validation episode, which means we have to remember which element we started with.

Note that every element and attribute information item participating in the assessment will also have a [validation context] property which refers back to the element information item at which assessment began. [Definition:] This item, that is the element information item at which assessment began, is called the validation root.

In order to be able to fill in the [validation context] property, the recursive validation predicates need a new argument. So the sketch given above should be replaced with a new one:

< 3 Validity assessment with lookup > ≡

/* Top-level invocation of schema-validity assessment with a type:
   recur to validity_by_type/4, passing starting element as validation
   root. Ditto for assessment with an element declaration. */

validity_by_type(Element,Type,Result) :-
  validity_by_type(Element,Type,Element,Result).
validity_by_decl(Element,Decl,Result) :-
  validity_by_decl(Element,Decl,Element,Result).

/*  Top-level invocation of schema-validity assessment without any
    type or declaration: find one, if you can, otherwise go lax. */
validity(Element,Result) :-
  xsitype_label(Element,Type), !,
  validity_by_type(Element,Type,Element,Result).
validity(Element,Result) :-
  Element = element(Ns:Local,_,_),
  qname_lookup(elementDeclarations,Ns,Local,Decl), !,
  validity_by_decl(Element,Decl,Element,Result).
validity(Element,Result) :-
  validity_lax(Element,Element,Result).

This code is not used elsewhere.

3.3. QName lookup

As in the third case of the list of ways to start validation, it is from time to time necessary to look up components in the schema by means of their namespace-qualified names. The qname_lookup predicate implements this function.

The formal requirements of this process are expressed in the following validation rule:

Validation Rule: QName resolution (Instance)

A pair of a local name and a namespace name (or absent) resolve to a schema component of a specified kind in the context of validation by appeal to the appropriate property of the schema being used for the assessment. Each such property indexes components by name. The property to use is determined by the kind of component specified, that is, the appropriate case among the following must be true:

If the kind specified is simple or complex type definition,

then the property is the {type definitions}.
If the kind specified is attribute declaration,

then the property is the {attribute declarations}.
If the kind specified is element declaration,

then the property is the {element declarations}.
If the kind specified is attribute group,

then the property is the {attribute group definitions}.
If the kind specified is model group,

then the property is the {model group definitions}.
If the kind specified is notation declaration,

then the property is the {notation declarations}.

The component resolved to is the entry in the table whose local name matches the local name of the pair and whose target namespace is identical to the namespace name of the pair.

As described, this process involves access to the properties of the schema component. Because the ^^ operator provides such a convenient notation for properties of information items and components, it is handy to assume that it can be used on the schema component, as well as on other components.

In other words, for now I assume that the schema component will be represented using the same node structure used in DCTGs for nodes in the attributed parse tree. So the general structure of a schema component is

node(schema,
     [],
     [typeDefinitions([...]),
      attributeDeclarations([...]),
      elementDeclarations([...]),
      attributeGroupDefinitions([...]),
      modelGroupDefinitions([...]),
      notationDeclarations([...]),
      ])

As noted above, for now I am assuming only a single schema in the Prolog workspace. In order to apply the ^^ operator, I'll need to instantiate a variable to the schema component; I'll use the predicate the_schema to find ‘the’ schema in the workspace. If one wanted to allow multiple schemas in the workspace, it it easy to see how to change this predicate.

< 4 QName lookup > ≡

qname_lookup(Comptype,Ns,Local,Result) :-
  the_schema(S),
  qname_lookup(S,Comptype,Ns,Local,Result).

qname_lookup(S,Comptype,Ns,Local,Result) :-
  S^^Comptype(List),
  listfind_by_qname(List,Ns,Local,Result).

/* If the head of the list has the right namespace
 * and local name, we have found the item we seek. 
 * Otherwise, look in the tail of the list.
 */
listfind_by_qname([Head|Tail],Ns,Local,Head) :- 
  Head^^targetNamespace(Ns),
  Head^^name(Local).
listfind_by_qname([_|Tail],Ns,Local,Result) :- 
  listfind_by_qname(Tail,Ns,Local,Result).

This code is not used elsewhere.

3.4. Assessing the schema-validity of an element

The rules in section 5.2 send us to the validation rule Schema-validity assessment (element) in section 3.3.4 of the spec, which can be paraphrased thus: To assess an element's schema-validity,

Either an element declaration or a type definition must be known and available, and must be used to check the element's local validity, and
The attributes and element children of the element must be validated recursively.

An element declaration may be known in various ways:

The user may stipulate it as part of the initial invocation of the validator.
The declaration may be the ‘context-determined declaration’ for the element; i.e. the validator may know the appropriate declaration because it has just finished looking at the declaration of the parent element and is now performing the recursion step.
The declaration may be found by means of QName lookup; this can happen during recursive lax validation.

The behavior of the validator does not depend on which of these methods is used; we model all of them in the same way, with a call to validity_by_decl(E,D,VR,R) (where E is an element, as returned by the SWI Prolog SGML/XML parser), D is an element declaration, VR is the validation-root (an element), and R is the result.[3] The different ways of knowing which element declaration to use correspond simply to the different contexts from which validity_by_decl(E,D,VR,R) may be called.

A type definition for the element may be known in two different ways:

The user may stipulate it as part of the initial invocation of the validator.
The definition may be specified by means of an xsi:type attribute and located by means of QName lookup.

For now, I'm going to skip over the details of validity_by_type; I'll come back to them later.

The actual text of the spec reads:

Validation Rule: Schema-Validity Assessment (Element)

The schema-validity assessment of an element information item depends on its validation and the assessment of its element information item children and associated attribute information items, if any.

So for an element information item's schema-validity to be assessed all of the following must be true:

One of the following must be true:
1. All of the following must be true:
  1. A non-absent element declaration must be known for it, because one of the following is true:
    
    A declaration was stipulated by the processor (see Assessing Schema-Validity).
    
    A declaration has been established as its context-determined declaration.
    
    All of the following must be true:
    
    Its context-determined declaration is not skip.
    
    Its [local name] and [namespace name] resolve to an element declaration as defined by QName resolution (Instance).
  2. Its validity with respect to that declaration must have been evaluated as per Element Locally Valid (Element).
  3. If that evaluation involved the evaluation of Element Locally Valid (Type), clause 1 thereof must be satisfied.
2. All of the following must be true:
  1. A non-absent type definition is known for it because one of the following is true:
    
    A type definition was stipulated by the processor (see Assessing Schema-Validity).
    
    All of the following must be true:
    
    There is an attribute information item among the element information item's [attributes] whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is type.
    
    The normalized value of that attribute information item is valid with respect to the built-in QName simple type, as defined by String Valid.
    
    The local name and namespace name (as defined in QName Interpretation), of the actual value of that attribute information item resolve to a type definition, as defined in QName resolution (Instance) -- [Definition:] call this type definition the local type definition.
    
    If there is also a processor-stipulated type definition, the local type definition must be validly derived from that type definition given its {prohibited substitutions}, as defined in Type Derivation OK (Complex) (if it is a complex type definition), or given the empty set, as defined in Type Derivation OK (Simple) (if it is a simple type definition).
  2. The element information item's validity with respect to the local type definition (if present and validly derived) or the processor-stipulated type definition (if no local type definition is present) has been evaluated as per Element Locally Valid (Type).
The schema-validity of all the element information items among its [children] has been assessed as per Schema-Validity Assessment (Element), and the schema-validity of all the attribute information items among its [attributes] has been assessed as per Schema-Validity Assessment (Attribute).

[Definition:] If either case of clause 1 above holds, the element information item has been strictly assessed.

If the item cannot be strictly assessed, because neither clause 1.1 nor clause 1.2 above are satisfied, [Definition:] an element information item's schema validity may be laxly assessed if its context-determined declaration is not skip by validating with respect to the lax ur-type definition as per Element Locally Valid (Type).

Note: In general if clause 1.1 above holds clause 1.2 does not, and vice versa. When an xsi:type [attribute] is involved, however, clause 1.2 takes precedence, as is made clear in Element Locally Valid (Element).

3.5. Local validity of elements

Section 3.3.4 formulates the following validation rule:

Validation Rule: Element Locally Valid (Element)

For an element information item to be locally valid with respect to an element declaration all of the following must be true:

The declaration must not be absent.
Its {abstract} must be false.
The appropriate case among the following must be true:
1. If {nillable} is false,
  
  then there must be no attribute information item among the element information item's [attributes] whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is nil.
2. If {nillable} is true and there is such an attribute information item and its actual value is true ,
  then all of the following must be true:
  1. The element information item must have no character or element information item [children].
  2. There must be no fixed {value constraint}.
If there is an attribute information item among the element information item's [attributes] whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is type, then all of the following must be true:
1. The normalized value of that attribute information item must be valid with respect to the built-in QName simple type, as defined by String Valid;
2. The local name and namespace name (as defined in QName Interpretation), of the actual value of that attribute information item must resolve to a type definition, as defined in QName resolution (Instance) -- [Definition:] call this type definition the local type definition;
3. The local type definition must be validly derived from the {type definition} given the union of the {disallowed substitutions} and the {type definition}'s {prohibited substitutions}, as defined in Type Derivation OK (Complex) (if it is a complex type definition), or given {disallowed substitutions} as defined in Type Derivation OK (Simple) (if it is a simple type definition).
[Definition:] The phrase actual type definition occurs below. If the above three clauses are satisfied, this should be understood as referring to the local type definition, otherwise to the {type definition}.
The appropriate case among the following must be true:
1. If the declaration has a {value constraint}, the item has neither element nor character [children] and clause 3.2 has not applied,
  then all of the following must be true:
  1. If the actual type definition is a local type definition then the canonical lexical representation of the {value constraint} value must be a valid default for the actual type definition as defined in Element Default Valid (Immediate).
  2. The element information item with the canonical lexical representation of the {value constraint} value used as its normalized value must be valid with respect to the actual type definition as defined by Element Locally Valid (Type).
2. If the declaration has no {value constraint} or the item has either element or character [children] or clause 3.2 has applied,
  then all of the following must be true:
  1. The element information item must be valid with respect to the actual type definition as defined by Element Locally Valid (Type).
  2. If there is a fixed {value constraint} and clause 3.2 has not applied, all of the following must be true:
    
    The element information item must have no element information item [children].
    
    The appropriate case among the following must be true:
    
    If the {content type} of the actual type definition is mixed,
    
    then the initial value of the item must match the canonical lexical representation of the {value constraint} value.
    
    If the {content type} of the actual type definition is a simple type definition,
    
    then the actual value of the item must match the canonical lexical representation of the {value constraint} value.
The element information item must be valid with respect to each of the {identity-constraint definitions} as per Identity-constraint Satisfied.
If the element information item is the validation root, it must be valid per Validation Root Valid (ID/IDREF).

3.6. Local validity against a complex type

Section formulates the following validation rule:

Validation Rule: Element Locally Valid (Type)

For an element information item to be locally valid with respect to a type definition all of the following must be true:

The type definition must not be absent;
~~Its {abstract} must be false.~~ It must not have {abstract} with value true.
The appropriate case among the following must be true:
1. If the type definition is a simple type definition,
  then all of the following must be true:
  1. The element information item's [attributes] must be empty, excepting those whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is one of type, nil, schemaLocation or noNamespaceSchemaLocation.
  2. The element information item must have no element information item [children].
  3. If clause 3.2 of Element Locally Valid (Element) did not apply, then the normalized value must be valid with respect to the type definition as defined by String Valid.
2. If the type definition is a complex type definition,
  
  then the element information item must be valid with respect to the type definition as per Element Locally Valid (Complex Type);

We'll deal with simple types later. For now, let's work on handling complex types.

Section formulates the following validation rule:

Validation Rule: Element Locally Valid (Complex Type)

For an element information item to be locally valid with respect to a complex type definition all of the following must be true:

{abstract} is false.
If clause 3.2 of Element Locally Valid (Element) did not apply, then the appropriate case among the following must be true:
1. If the {content type} is empty,
  
  then the element information item has no character or element information item [children].
2. If the {content type} is a simple type definition,
  
  then the element information item has no element information item [children], and the normalized value of the element information item is valid with respect to that simple type definition as defined by String Valid.
3. If the {content type} is element-only,
  
  then the element information item has no character information item [children] other than those whose [character code] is defined as a white space in [W3C 2000].
4. If the {content type} is element-only or mixed,
  
  then the sequence of the element information item's element information item [children], if any, taken in order, is valid with respect to the {content type}'s particle, as defined in Element Sequence Locally Valid (Particle).
For each attribute information item in the element information item's [attributes] excepting those whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is one of type, nil, schemaLocation or noNamespaceSchemaLocation, the appropriate case among the following must be true:
1. If there is among the {attribute uses} an attribute use with an {attribute declaration} whose {name} matches the attribute information item's [local name] and whose {target namespace} is identical to the attribute information item's [namespace name] (where an absent {target namespace} is taken to be identical to a [namespace name] with no value),
  
  then the attribute information must be valid with respect to that attribute use as per Attribute Locally Valid (Use). In this case the {attribute declaration} of that attribute use is the context-determined declaration for the attribute information item with respect to Schema-Validity Assessment (Attribute) and Assessment Outcome (Attribute).
2. otherwise all of the following must be true:
  1. There must be an {attribute wildcard}.
  2. The attribute information item must be valid with respect to it as defined in Item Valid (Wildcard).
The {attribute declaration} of each attribute use in the {attribute uses} whose {required} is true matches one of the attribute information items in the element information item's [attributes] as per clause 3.1 above.
Let [Definition:] the wild IDs be the set of all attribute information item to which clause 3.2 applied and whose validation resulted in a context-determined declaration of mustFind or no context-determined declaration at all, and whose [local name] and [namespace name] resolve (as defined by QName resolution (Instance)) to an attribute declaration whose {type definition} is or is derived from ID. Then all of the following must be true:
1. There must be no more than one item in wild IDs.
2. If wild IDs is non-empty, there must not be any attribute uses among the {attribute uses} whose {attribute declaration}'s {type definition} is or is derived from ID.
Note: This clause serves to ensure that even via attribute wildcards no element has more than one attribute of type ID, and that even when an element legitimately lacks a declared attribute of type ID, a wildcard-validated attribute must not supply it. That is, if an element has a type whose attribute declarations include one of type ID, it either has that attribute or no attribute of type ID.

Note: When an {attribute wildcard} is present, this does not introduce any ambiguity with respect to how attribute information items for which an attribute use is present amongst the {attribute uses} whose name and target namespace match are assessed. In such cases the attribute use always takes precedence, and the assessment of such items stands or falls entirely on the basis of the attribute use and its {attribute declaration}. This follows from the details of clause 3.

3.7. Local validity of element sequences (children)

Section formulates the following validation rule:

Validation Rule: Element Sequence Locally Valid (Particle)

For a sequence (possibly empty) of element information items to be locally valid with respect to a particle the appropriate case among the following must be true:

If the {term} is a wildcard,
then all of the following must be true:
1. The length of the sequence must be greater than or equal to the {min occurs}.
2. If {max occurs} is a number, the length of the sequence must be less than or equal to the {max occurs}.
3. Each element information item in the sequence must be valid with respect to the wildcard as defined by Item Valid (Wildcard).
If the {term} is an element declaration,
then all of the following must be true:
1. The length of the sequence must be greater than or equal to the {min occurs}.
2. If {max occurs} is a number, the length of the sequence must be less than or equal to the {max occurs}.
3. For each element information item in the sequence one of the following must be true:
  1. The element declaration is local (i.e. its {scope} must not be global), its {abstract} is false, the element information item's [namespace name] is identical to the element declaration's {target namespace} (where an absent {target namespace} is taken to be identical to a [namespace name] with no value) and the element information item's [local name] matches the element declaration's {name}.
    
    In this case the element declaration is the context-determined declaration for the element information item with respect to Schema-Validity Assessment (Element) and Assessment Outcome (Element).
  2. The element declaration is top-level (i.e. its {scope} is global), {abstract} is false, the element information item's [namespace name] is identical to the element declaration's {target namespace} (where an absent {target namespace} is taken to be identical to a [namespace name] with no value) and the element information item's [local name] matches the element declaration's {name}.
    
    In this case the element declaration is the context-determined declaration for the element information item with respect to Schema-Validity Assessment (Element) and Assessment Outcome (Element).
  3. The element declaration is top-level (i.e. its {scope} is global), its {disallowed substitutions} does not contain substitution, the [local ] and [namespace name] of the element information item resolve to an element declaration, as defined in QName resolution (Instance) -- [Definition:] call this declaration the substituting declaration and the substituting declaration together with the particle's element declaration's {disallowed substitutions} is validly substitutable for the particle's element declaration as defined in Substitution Group OK (Transitive).
    
    In this case the substituting declaration is the context-determined declaration for the element information item with respect to Schema-Validity Assessment (Element) and Assessment Outcome (Element).
If the {term} is a model group,
then all of the following must be true:
1. There is a partition of the sequence into n sub-sequences such that n is greater than or equal to {min occurs}.
2. If {max occurs} is a number, n must be less than or equal to {max occurs}.
3. Each sub-sequence in the partition is valid with respect to that model group as defined in Element Sequence Valid.

Note: Clauses clause 1 and clause 2.3.3 do not interact: an element information item validatable by a declaration with a substitution group head in a different namespace is not validatable by a wildcard which accepts the head's namespace but not its own.

3.8. Element sequence validity

Section formulates the following validation rule:

Validation Rule: Element Sequence Valid

[Definition:] Define a partition of a sequence as a sequence of sub-sequences, some or all of which may be empty, such that concatenating all the sub-sequences yields the original sequence.

For a sequence (possibly empty) of element information items to be locally valid with respect to a model group the appropriate case among the following must be true:

If the {compositor} is sequence,

then there must be a partition of the sequence into n sub-sequences where n is the length of {particles} such that each of the sub-sequences in order is valid with respect to the corresponding particle in the {particles} as defined in Element Sequence Locally Valid (Particle).
If the {compositor} is choice,

then there must be a particle among the {particles} such that the sequence is valid with respect to that particle as defined in Element Sequence Locally Valid (Particle).
If the {compositor} is all,

then there must be a partition of the sequence into n sub-sequences where n is the length of {particles} such that there is a one-to-one mapping between the sub-sequences and the {particles} where each sub-sequence is valid with respect to the corresponding particle as defined in Element Sequence Locally Valid (Particle).

Nothing in the above should be understood as ruling out groups whose {particles} is empty: although no sequence can be valid with respect to such a group whose {compositor} is choice, the empty sequence is valid with respect to empty groups whose {compositor} is sequence or all.

3.9. PSVI properties

...

Schema Information Set Contribution: Assessment Outcome (Element)

If the schema-validity of an element information item has been assessed as per Schema-Validity Assessment (Element), then in the post-schema-validation infoset it has properties as follows:

PSVI Contributions for element information items
[validation context]	The nearest ancestor element information item with a {schema information} property (or this element item itself if it has such a property).
[validity]	The appropriate case among the following: If it was strictly assessed, then the appropriate case among the following: If all of the following are true: One of the following is true: clause 1.1 of Schema-Validity Assessment (Element) applied and the item was valid as defined by Element Locally Valid (Element); clause 1.2 of Schema-Validity Assessment (Element) applied and the item was valid as defined by Element Locally Valid (Type). Neither its [children] nor its [attributes] contains an information item (element or attribute respectively) whose validity is `invalid`. Neither its [children] nor its [attributes] contains an information item (element or attribute respectively) with a context-determined declaration of `mustFind` whose validity is `unknownnotKnown`. , then `valid`; otherwise `invalid.`. otherwise `notKnown`.
[validation attempted]	The appropriate case among the following: If it was strictly assessed and neither its [children] nor its [attributes] contains an information item (element or attribute respectively) whose validation attempted is not `full`, then `full`; If it was not strictly assessed and neither its [children] nor its [attributes] contains an information item (element or attribute respectively) whose validation attempted is not `none`, then `none`; otherwise `partial`.

Schema Information Set Contribution: Validation Failure (Element)

If the local validity, as defined by Element Locally Valid (Element) above and/or Element Locally Valid (Type) below, of an element information item has been assessed, in the post-schema-validation infoset the item has a property:

PSVI Contributions for element information items
[schema error code]	The appropriate case among the following: If the item is not valid, then a list. Applications wishing to provide information as to the reason(s) for the validation failure are encouraged to record one or more error codes (see Outcome Tabulations (normative)) herein. otherwise absent.

Schema Information Set Contribution: Element Declaration

If an element information item is valid with respect to an element declaration as per Element Locally Valid (Element) then in the post-schema-validation infoset the element information item must, at processor option, have either:

PSVI Contributions for element information items
[element declaration]	an item isomorphic to the declaration component itself

PSVI Contributions for element information items
[nil]	`true` if clause 3.2 of Element Locally Valid (Element) above is satisfied, otherwise `false`

Schema Information Set Contribution: Element Validated by Type

If an element information item is valid with respect to a type definition as per Element Locally Valid (Type), in the post-schema-validation infoset the item has a property:

PSVI Contributions for element information items
[schema normalized value]	The appropriate case among the following: If clause 3.2 of Element Locally Valid (Element) and Element Default Value above have not applied and either the type definition is a simple type definition or its {content type} is a simple type definition, then the normalized value of the item as validated. otherwise absent.

Furthermore, the item has one of the following alternative sets of properties:

Either

PSVI Contributions for element information items
[type definition]	An item isomorphic to the type definition component itself.
[member type definition]	If and only if that type definition is a simple type definition with {variety} `union`, or a complex type definition whose {content type} is a simple ~~thype~~type definition with {variety} `union`, then an item isomorphic to that member of the union's {member type definitions} which actually validated the element item's normalized value.

PSVI Contributions for element information items
[type definition type]	`simple` or `complex`, depending on the type definition.
[type definition namespace]	The target namespace of the type definition.
[type definition anonymous]	`true` if the name of the type definition is absent, otherwise `false`.
[type definition name]	The name of the type definition, if it is not absent. If it is absent, schema processors may, but need not, provide a value unique to the definition.

If the type definition is a simple type definition or its {content type} is a simple type definition, and that type definition has {variety} union, then calling [Definition:] that member of the {member type definitions} which actually validated the element item's normalized value the actual member type definition, there are three additional properties:

PSVI Contributions for element information items
[member type definition namespace]	The {target namespace} of the actual member type definition.
[member type definition anonymous]	`true` if the {name} of the actual member type definition is absent, otherwise `false`.
[member type definition name]	The {name} of the actual member type definition, if it is not absent. If it is absent, schema processors may, but need not, provide a value unique to the definition.

The first (item isomorphic) alternative above is provided for applications such as query processors which need access to the full range of details about an item's assessment, for example the type hierarchy; the second, for lighter-weight processors for whom representing the significant parts of the type hierarchy as information items might be a significant burden.

Also, if the declaration has a {value constraint}, the item has a property:

PSVI Contributions for element information items
[schema default]	The canonical lexical representation of the declaration's {value constraint} value.

Note that if an element is laxly assessed, then the {type definition} and {member type definition} properties, or their alternatives, are based on the lax ur-type definition.

3.10. Error handling and diagnosis

...

3.11. Constraints on schemas

3.11.1. Elements

Schema Component Constraint: Element Declaration Properties Correct

All of the following must be true:

The values of the properties of an element declaration must be as described in the property tableau in The Element Declaration Schema Component, modulo the impact of Missing Sub-components.
If there is a {value constraint}, the canonical lexical representation of its value must be valid with respect to the {type definition} as defined in Element Default Valid (Immediate).
If there is an {substitution group affiliation}, the {type definition} of the element declaration must be validly derived from the {type definition} of the {substitution group affiliation}, given the value of the {substitution group exclusions} of the {substitution group affiliation}, as defined in Type Derivation OK (Complex) (if the {type definition} is complex) or as defined in Type Derivation OK (Simple) (if the {type definition} is simple).
If the {type definition} or {type definition}'s {content type} is or is derived from ID then there must not be a {value constraint}.

Note: The use of ID as a type definition for elements goes beyond XML 1.0, and should be avoided if backwards compatibility is desired.
Circular substitution groups are disallowed. That is, it must not be possible to return to an element declaration by repeatedly following the {substitution group affiliation} property.

3.11.2. Complex types

Schema Component Constraint: Complex Type Definition Properties Correct

All of the following must be true:

The values of the properties of a complex type definition must be as described in the property tableau in The Complex Type Definition Schema Component, modulo the impact of Missing Sub-components.
If the {base type definition} is a simple type definition, the {derivation method} must be extension.
Circular definitions are disallowed, except for the ur-type definition. That is, it must be possible to reach the ur-type definition by repeatedly following the {base type definition}.
Two distinct attribute declarations in the {attribute uses} must not have identical {name}s and {target namespace}s.
Two distinct attribute declarations in the {attribute uses} must not have {type definition}s which are or are derived from ID.

3.11.3. Other

Schema Component Constraint: Model Group Correct

All of the following must be true:

The values of the properties of a model group must be as described in the property tableau in The Model Group Schema Component, modulo the impact of Missing Sub-components.
Circular groups are disallowed. That is, within the {particles} of a group there must not be at any depth a particle whose {term} is the group itself.

Schema Component Constraint: All Group Limited

When a model group has {compositor} all all of the following must be true:

one of the following must be true:
1. It appears as the model group of a model group definition.
2. It appears in a particle with {min occurs}={max occurs}=1, and that particle must be part of a pair which constitutes the {content type} of a complex type definition.
The {max occurs} of all the particles in the {particles} of the group must be 0 or 1.

Schema Component Constraint: Element Declarations Consistent

If the {particles} contains, either directly, indirectly (that is, within the {particles} of a contained model group, recursively) or implicitly two or more element declaration particles with the same {name} and {target namespace}, then all their type definitions must be the same top-level definition, that is, all of the following must be true:

all their {type definition}s must have a non-absent ~~name~~{name}.
all their {type definition}s must have the same ~~name~~{name}.
all their {type definition}s must have the same ~~target namespace~~{target namespace}.

[Definition:] A list of particles implicitly contains an element declaration if a member of the list contains that element declaration in its substitution group.

Schema Component Constraint: Unique Particle Attribution

A content model must be formed such that during validation of an element information item sequence, the particle contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence.

Note: This constraint reconstructs for XML Schema the equivalent constraints of [W3C 2000] and SGML. Given the presence of element substitution groups and wildcards, the concise expression of this constraint is difficult, see Analysis of the Unique Particle Attribution Constraint (non-normative) for further discussion.

Schema Component Constraint: Effective Total Range (all and sequence)

The effective total range of a particle whose {term} is a group whose {compositor} is all or sequence is a pair of minimum and maximum, as follows:

minimum

The product of the particle's {min occurs} and the sum of the {min occurs} of every wildcard or element declaration particle in the group's {particles} and the minimum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles}).

maximum

unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the sum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles}).

Schema Component Constraint: Effective Total Range (choice)

The effective total range of a particle whose {term} is a group whose {compositor} is choice is a pair of minimum and maximum, as follows:

minimum

The product of the particle's {min occurs} and the minimum of the {min occurs} of every wildcard or element declaration particle in the group's {particles} and the minimum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles}).

maximum

unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the maximum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles}).

Schema Component Constraint: Particle Correct

All of the following must be true:

The values of the properties of a particle must be as described in the property tableau in The Particle Schema Component, modulo the impact of Missing Sub-components.
If {max occurs} is not unbounded, that is, it has a numeric value, then all of the following must be true:
1. {min occurs} must not be greater than {max occurs}.
2. {max occurs} must be greater than or equal to 1.

4. Simple types

4.1. Validation

Section formulates the following validation rule:

Validation Rule: String Valid

A string is locally valid with respect to a simple type definition if it is schema-valid with respect to that definition as defined by Datatype Valid in [W3C 2001c].

4.2. SICs

Schema Information Set Contribution: Element Default Value

If the local validity, as defined by Element Locally Valid (Element) above, of an element information item has been assessed, in the post-schema-validation infoset the item has a property:

PSVI Contributions for element information items
[schema specified]	The appropriate case among the following: If the item is valid with respect to an element declaration as per Element Locally Valid (Element) and the {value constraint} is present, but clause 3.2 of Element Locally Valid (Element) above is not satisfied and the item has no element or character information item [children], then `schema`. Furthermore, the post-schema-validation infoset has the canonical lexical representation of the {value constraint} value as the item's {schema normalized value} property. otherwise `infoset`.

4.3. Component structure

Schema Component Constraint: Element Default Valid (Immediate)

For a string to be a valid default with respect to a type definition the appropriate case among the following must be true:

If the type definition is a simple type definition,

then the string must be valid with respect to that definition as defined by String Valid.
If the type definition is a complex type definition,
then all of the following must be true:
1. its {content type} must be a simple type definition or mixed.
2. The appropriate case among the following must be true:
  1. If the {content type} is a simple type definition,
    
    then the string must be valid with respect to that simple type definition as defined by String Valid.
  2. If the {content type} is mixed,
    
    then the {content type}'s particle must be emptiable as defined by Particle Emptiable.

Schema Component Constraint: Simple Type Definition Properties Correct

All of the following must be true:

The values of the properties of a simple type definition must be as described in the property tableau in Datatype definition, modulo the impact of Missing Sub-components.
All simple type definitions must be derived ultimately from the simple ur-type definition (so circular definitions are disallowed). That is, it must be possible to reach a built-in primitive datatype or the simple ur-type definition by repeatedly following the {base type definition}.
The {final} of the {base type definition} must not contain restriction.
If the {base type definition} is not the simple ur-type definition, all of the following must be true:
1. The definition must be a valid restriction as defined in Derivation Valid (Restriction, Simple).
2. If {variety} is not atomic, then the appropriate case among the following must be true:
  1. If the {variety} is list,
    
    then the {final} of the {base type definition} must not contain list.
  2. If the {variety} is union,
    
    then the {final} of the {base type definition} must not contain union.

5. Handling attributes

... these are not really sequenced yet ...

5.1. Validation

Section 3.2.4 formulates the following validation rule:

Validation Rule: Attribute Locally Valid

For an attribute information item to be locally valid with respect to an attribute declaration all of the following must be true:

The declaration must not be absent (see Missing Sub-components for how this can fail to be the case).
Its {type definition} must not be absent.
The item's normalized value must be locally valid with respect to that {type definition} as per String Valid.
The item's actual value must match the value of the {value constraint}, if it is present and fixed.

Section 3.2.4 formulates the following validation rule:

Validation Rule: Schema-Validity Assessment (Attribute)

The schema-validity assessment of an attribute information item depends on its validation alone.

[Definition:] During validation, associations between element and attribute information items among the [children] and [attributes] on the one hand, and element and attribute declarations on the other, are established as a side-effect. Such declarations are called the context-determined declarations. See clause 3.1 (in Element Locally Valid (Complex Type)) for attribute declarations, clause 2 (in Element Sequence Locally Valid (Particle)) for element declarations.

For an attribute information item's schema-validity to have been assessed all of the following must be true:

A non-absent attribute declaration must be known for it, namely one of the following:
1. A declaration which has been established as its context-determined declaration;
2. A declaration resolved to by its [local name] and [namespace name] as defined by QName resolution (Instance), provided its context-determined declaration is not skip.
Its validity with respect to that declaration must have been evaluated as per Attribute Locally Valid.
Both clause 1 and clause 2 of Attribute Locally Valid must be satisfied.

[Definition:] For attributes, there is no difference between assessment and strict assessment, so if the above holds, the attribute information item has been strictly assessed.

Section formulates the following validation rule:

Validation Rule: Attribute Locally Valid (Use)

For an attribute information item to be valid with respect to an attribute use its normalized value must match the canonical lexical representation of the attribute use's {value constraint} value, if it is present and fixed.

5.2. SICs

Schema Information Set Contribution: Assessment Outcome (Attribute)

If the schema-validity of an attribute information item has been assessed as per Schema-Validity Assessment (Attribute), then in the post-schema-validation infoset it has properties as follows:

PSVI Contributions for attribute information items
[validation context]	The nearest ancestor element information item with a {schema information} property.
[validity]	The appropriate case among the following: If it was strictly assessed, then the appropriate case among the following: If it was valid as defined by Attribute Locally Valid, then `valid`; otherwise `invalid`. otherwise `notKnown`.
[validation attempted]	The appropriate case among the following: If it was strictly assessed, then `full`; otherwise `none`.
[schema specified]	`infoset`. See Attribute Default Value for the other possible value.

Schema Information Set Contribution: Validation Failure (Attribute)

If the local validity, as defined by Attribute Locally Valid above, of an attribute information item has been assessed, in the post-schema-validation infoset the item has a property:

PSVI Contributions for attribute information items
[schema error code]	The appropriate case among the following: If the item is not valid, then a list. Applications wishing to provide information as to the reason(s) for the validation failure are encouraged to record one or more error codes (see Outcome Tabulations (normative)) herein. otherwise absent.

Schema Information Set Contribution: Attribute Declaration

If an attribute information item is valid with respect to an attribute declaration as per Attribute Locally Valid then in the post-schema-validation infoset the attribute information item may, at processor option, have a property:

PSVI Contributions for attribute information items
[attribute declaration]	An item isomorphic to the declaration component itself.

Schema Information Set Contribution: Attribute Validated by Type

If clause 3 of Attribute Locally Valid applies with respect to an attribute information item, in the post-schema-validation infoset the attribute information item has a property:

PSVI Contributions for attribute information items
[schema normalized value]	The normalized value of the item as validated.

Furthermore, the item has one of the following alternative sets of properties:

Either

PSVI Contributions for attribute information items
[type definition]	An item isomorphic to the relevant attribute declaration's {type definition} component.
[member type definition]	If and only if that type definition has {variety} `union`, then an item isomorphic to that member of its {member type definitions} which actually validated the attribute item's [normalized value].

PSVI Contributions for attribute information items
[type definition type]	`simple`.
[type definition namespace]	The {target namespace} of the type definition.
[type definition anonymous]	`true` if the {name} of the type definition is absent, otherwise `false`.
[type definition name]	The {name} of the type definition, if it is not absent. If it is absent, schema processors may, but need not, provide a value unique to the definition.

If the type definition has {variety} union, then calling [Definition:] that member of the {member type definitions} which actually validated the attribute item's normalized value the actual member type definition, there are three additional properties:

PSVI Contributions for attribute information items
[member type definition namespace]	The {target namespace} of the actual member type definition.
[member type definition anonymous]	`true` if the {name} of the actual member type definition is absent, otherwise `false`.
[member type definition name]	The {name} of the actual member type definition, if it is not absent. If it is absent, schema processors may, but need not, provide a value unique to the definition.

Also, if the declaration has a {value constraint}, the item has a property:

PSVI Contributions for attribute information items
[schema default]	The canonical lexical representation of the declaration's {value constraint} value.

If the attribute information item was not strictly assessed, then instead of the values specified above,

The item's {schema normalized value} property has the initial value of the item as its value;
The {type definition} and {member type definition} properties, or their alternatives, are based on the simple ur-type definition.

Schema Information Set Contribution: Attribute Default Value

For each attribute use in the {attribute uses} whose {required} is false and whose {value constraint} is not absent but whose {attribute declaration} does not match one of the attribute information items in the element information item's [attributes] as per clause 3.1 of Element Locally Valid (Complex Type) above, the post-schema-validation infoset has an attribute information item whose properties are as below added to the [attributes] of the element information item.

[local name]

The {attribute declaration}'s {name}.

[namespace name]

The {attribute declaration}'s {target namespace}.

{schema normalized value}

The canonical lexical representation of the {value constraint} value.

{schema default}

The canonical lexical representation of the {value constraint} value.

{validation context}

The nearest ancestor element information item with a {schema information} property.

{validity}

valid.

{validation attempted}

full.

{schema specified}

schema.

The added items should also either have {type definition} (and {member type definition} if appropriate) properties, or their lighter-weight alternatives, as specified in Attribute Validated by Type.

5.3. Constraints on schema components

Schema Component Constraint: Attribute Declaration Properties Correct

All of the following must be true:

The values of the properties of an attribute declaration must be as described in the property tableau in The Attribute Declaration Schema Component, modulo the impact of Missing Sub-components.
if there is a {value constraint}, the canonical lexical representation of its value must be valid with respect to the {type definition} as defined in String Valid.
If the {type definition} is or is derived from ID then there must not be a {value constraint}.

Schema Component Constraint: xmlns Not Allowed

The {name} of an attribute declaration must not match xmlns.

Note: The {name} of an attribute is an NCName, which implicitly prohibits attribute declarations of the form xmlns:*.

Schema Component Constraint: xsi: Not Allowed

The {target namespace} of an attribute declaration, whether local or top-level, must not match http://www.w3.org/2001/XMLSchema-instance (unless it is one of the four built-in declarations given in the next section).

Note: This reinforces the special status of these attributes, so that they not only need not be declared to be allowed in instances, but must not be declared. It also removes any temptation to experiment with supplying global or fixed values for e.g. xsi:type or xsi:nil, which would be seriously misleading, as they would have no effect.

Schema Component Constraint: Attribute Use Correct

All of the following must be true:

The values of the properties of an attribute use must be as described in the property tableau in The Attribute Use Schema Component, modulo the impact of Missing Sub-components.
If the {attribute declaration} has a fixed {value constraint}, then if the attribute use itself has a {value constraint}, it must also be fixed and its value must match that of the {attribute declaration}'s {value constraint}.

6. xsi:type

6.1. xsi:type

6.2. Top-level invocation with type definition

For simplicity, in the first pass through the validation rules I ignored the case where the user stipulates a type definition when invoking the validator.

7. Wildcards

Section formulates the following validation rule:

Validation Rule: Item Valid (Wildcard)

For an element or attribute information item to be locally valid with respect to a wildcard constraint its [namespace name] must be valid with respect to the wildcard constraint, as defined in Wildcard allows Namespace Name.

When this constraint applies the appropriate case among the following must be true:

If {process contents} is lax,

then the item has no context-determined declaration with respect to Assessment Outcome (Element), Schema-Validity Assessment (Element) and Schema-Validity Assessment (Attribute).
If {process contents} is strict,

then the item's context-determined declaration is mustFind.
If {process contents} is skip,

then the item's context-determined declaration is skip.

Section formulates the following validation rule:

Validation Rule: Wildcard allows Namespace Name

For a value which is either a namespace name or absent to be valid with respect to a wildcard constraint (the value of a {namespace constraint}) one of the following must be true:

The constraint must be any.
All of the following must be true:
1. The constraint is a pair of not and a namespace name or absent ([Definition:] call this the namespace test).
2. The value must not be identical to the namespace test.
3. The value must not be absent.
The constraint is a set, and the value is identical to one of the members of the set.

Schema Component Constraint: Wildcard Properties Correct

The values of the properties of a wildcard must be as described in the property tableau in The Wildcard Schema Component, modulo the impact of Missing Sub-components.

Schema Component Constraint: Attribute Wildcard Union

For a wildcard's {namespace constraint} value to be the intensional union of two other such values (call them O1 and O2): the appropriate case among the following must be true:

If O1 and O2 are the same value,

then that value must be the value.
If either O1 or O2 is any,

then any must be the value.
If both O1 and O2 are sets of (namespace names or absent),

then the union of those sets must be the value.
If the two are negations of different values (namespace names or absent),

then ~~the intersection is not expressible~~a pair of not and absent must be the value.
If either O1 or O2 is a pair of not and a namespace name and the other is a set of (namespace names or absent) (call this set S),
then The appropriate case among the following must be true:
1. If the set S includes both the negated namespace name and absent,
  
  then any must be the value.
2. If the set S includes the negated namespace name but not absent,
  
  then a pair of not and absent must be the value.
3. If the set S includes absent but not the negated namespace name,
  
  then the union is not expressible.
4. If the set S does not include either the negated namespace name or absent,
  
  then whichever of O1 or O2 is a pair of not and a namespace name must be the value.
If either O1 or O2 is a pair of not and absent and the other is a set of (namespace names or absent) (again, call this set S),
then The appropriate case among the following must be true:
1. If the set S includes absent,
  
  then any must be the value.
2. If the set S does not include absent,
  
  then a pair of not and absent must be the value.

In the case where there are more than two values, the intensional ~~intersection~~union is determined by identifying the intensional ~~intersection~~union of two of the values as above, then the intensional ~~intersection~~union of that value with the third (providing the first ~~intersection~~union was expressible), and so on as required.

Schema Component Constraint: Attribute Wildcard Intersection

For a wildcard's {namespace constraint} value to be the intensional intersection of two other such values (call them O1 and O2): the appropriate case among the following must be true:

If O1 and O2 are the same value,

then that value must be the value.
If either O1 or O2 is any,

then the other must be the value.
If either O1 or O2 is a pair of not and a value (a namespace name or absent) and the other is a set of (namespace names or absent),

then that set, minus the negated ~~namespace name~~value if it was in the set, minus absent if it was in the set, must be the value.
If both O1 and O2 are sets of (namespace names or absent),

then the intersection of those sets must be the value.
If the two are negations of different namespace names,

then the intersection is not expressible.
If the one is a negation of a namespace name and the other is a negation of absent,

then the one which is the negation of a namespace name must be the value.

In the case where there are more than two values, the intensional intersection is determined by identifying the intensional intersection of two of the values as above, then the intensional intersection of that value with the third (providing the first intersection was expressible), and so on as required.

8. Substitution groups

Schema Component Constraint: Substitution Group OK (Transitive)

For an element declaration (call it D) ~~together with a blocking constraint (a subset of substitution, extension, restriction, the value of a {disallowed substitutions})~~ to be validly substitutable for another element declaration (call it C) subject to a blocking constraint (a subset of substitution, extension, restriction, the value of a {disallowed substitutions}) one of the following must be true:

D and C are the same element declaration.
All of the following must be true:
1. The blocking constraint does not contain substitution.
2. There is a chain of {substitution group affiliation}s from D to C, that is, either D's {substitution group affiliation} is C, or D's {substitution group affiliation}'s {substitution group affiliation} is C, or . . .
3. The set of all {derivation method}s involved in the derivation of D's {type definition} from C's {type definition} does not intersect with the union of the blocking constraint, C's {prohibited substitutions} (if C is complex, otherwise the empty set) and the {prohibited substitutions} (respectively the empty set) of any intermediate {type definition}s in the derivation of D's {type definition} from C's {type definition}.

Schema Component Constraint: Substitution Group

[Definition:] Every element declaration (call this HEAD) in the {element declarations} of a schema defines a substitution group, a subset of those {element declarations}, as follows:

Define PSG, the potential substitution group for HEAD, as follows:

The element declaration itself is in ~~the group~~PSG;
~~The group~~PSG is closed with respect to {substitution group affiliation}, that is, if any element declaration in the {element declarations} has a {substitution group affiliation} in ~~the group~~PSG, then it is also in ~~the group~~PSG itself.

HEAD's actual substitution group is then the set consisting of each member of PSG such that all of the following must be true:

Its {abstract} is false.
It is validly substitutable for HEAD subject to an empty blocking constraint, as defined in Substitution Group OK (Transitive).

9. Derivation relations

The rules governing substitution groups and the use of xsi:type all specify that certain derivation relations must hold between the types of the substitution group head and of members, or between the type specified in an element declaration and the type specified by xsi:type. So far, we have taken the relevant relations on faith, but there are some rules which must be observed if one type is to count as a valid derivation from another type.

Schema Component Constraint: Derivation Valid (Extension)

If the {derivation method} is extension, the appropriate case among the following must be true:

If the {base type definition} is a complex type definition,
then all of the following must be true:
1. The {final} of the {base type definition} must not contain extension.
2. Its {attribute uses} must be a subset of the {attribute uses} of the complex type definition itself, that is, for every attribute use in the {attribute uses} of the {base type definition}, there must be an attribute use in the {attribute uses} of the complex type definition itself whose {attribute declaration} has the same {name}, {target namespace} and {type definition} as its attribute declaration.
3. If it has an {attribute wildcard}, the complex type definition must also have one, and the base type definition's {attribute wildcard}'s {namespace constraint} must be a subset of the complex type definition's {attribute wildcard}'s {namespace constraint}, as defined by Wildcard Subset.
4. One of the following must be true:
  1. The {content type} of the {base type definition} and the {content type} of the complex type definition itself must be the same simple type definition.
  2. The {content type} of both the {base type definition} and the complex type definition itself must be empty.
  3. All of the following must be true:
    
    The {content type} of the complex type definition itself must specify a particle.
    
    One of the following must be true:
    
    The {content type} of the {base type definition} must be empty.
    
    All of the following must be true:
    
    Both {content type}s must be mixed or both must be element-only.
    
    The particle of the complex type definition must be a valid extension of the {base type definition}'s particle, as defined in Particle Valid (Extension).
5. It must in principle be possible to derive the complex type definition in two steps, the first an extension and the second a restriction (possibly vacuous), from that type definition among its ancestors whose {base type definition} is the ur-type definition or the lax ur-type definition, whichever is encountered first in the chain of {base type definition}s.
  
  Note: This requirement ensures that nothing removed by a restriction is subsequently added back by an extension. It is trivial to check if the extension in question is the only extension in its derivation, or if there are no restrictions bar the first from the ur-type definition or the lax ur-type definition.
  Constructing the intermediate type definition to check this constraint is straightforward: simply re-order the derivation to put all the extension steps first, then collapse them into a single extension. If the resulting definition can be the basis for a valid restriction to the desired definition, the constraint is satisfied.
If the {base type definition} is a simple type definition,
then all of the following must be true:
1. The {content type} must be the same simple type definition.
2. The {final} of the {base type definition} must not contain extension.

[Definition:] If this constraint Derivation Valid (Extension) holds of a complex type definition, it is a valid extension of its {base type definition}.

Schema Component Constraint: Derivation Valid (Restriction, Complex)

If the {derivation method} is restriction all of the following must be true:

The {base type definition} must be a complex type definition whose {final} does not contain restriction.
For each attribute use (call this R) in the {attribute uses} the appropriate case among the following must be true:
1. If there is an attribute use in the {attribute uses} of the {base type definition} (call this B) whose {attribute declaration} has the same {name} and {target namespace},
  then all of the following must be true:
  1. one of the following must be true:
    
    B's {required} is false.
    
    R's {required} is true.
  2. R's {attribute declaration}'s {type definition} must be validly derived from B's {type definition} given the empty set as defined in Type Derivation OK (Simple).
  3. [Definition:] Let the effective value constraint of an attribute use be its {value constraint}, if present, otherwise its {attribute declaration}'s {value constraint} . Then one of the following must be true:
    
    B's effective value constraint is absent or default.
    
    R's effective value constraint is fixed with the same string as B's.
2. otherwise the {base type definition} must have an {attribute wildcard} and the {target namespace} of the R's {attribute declaration} must be valid with respect to that wildcard, as defined in Wildcard allows Namespace Name.
For each attribute use in the {attribute uses} of the {base type definition} whose {required} is true, there must be an attribute use with an {attribute declaration} with the same {name} and {target namespace} as its {attribute declaration} in the {attribute uses} of the complex type definition itself whose {required} is true.
If there is an {attribute wildcard}, all of the following must be true:
1. The {base type definition} must also have one.
2. The complex type definition's {attribute wildcard}'s {namespace constraint} must be a subset of the {base type definition}'s {attribute wildcard}'s {namespace constraint}, as defined by Wildcard Subset.
3. Note: See also the Note below regarding {process contents}.
One of the following must be true:
1. All of the following must be true:
  1. The {content type} of the complex type definition must be a simple type definition
  2. One of the following must be true:
    1. The {content type} of the {base type definition} must be a simple type definition offrom which the {content type} is ~~a valid restriction~~ validly derived given the empty set as defined in ~~Derivation Valid (Restriction, Simple)~~Type Derivation OK (Simple).
    2. The {base type definition} must be mixed and have a particle which is emptiable as defined in Particle Emptiable.
2. All of the following must be true:
  1. The {content type} of the complex type itself must be empty
  2. One of the following must be true:
    1. The {content type} of the {base type definition} must also be empty.
    2. The {content type} of the {base type definition} must be elementOnly or mixed and have a particle which is emptiable as defined in Particle Emptiable.
3. All of the following must be true:
  1. ~~the {content type} of the {base type definition} is mixed or the {content type} of the complex type definition itself is element-only~~
    One of the following must be true:
    1. The {content type} of the complex type definition itself must be element-only
    2. The {content type} of the complex type definition itself and of the {base type definition} must be mixed
  2. The particle of the complex type definition itself must be a valid restriction of the particle of the {content type} of the {base type definition} as defined in Particle Valid (Restriction).
Note: Attempts to derive complex type definitions whose {content type} is element-only by restricting a {base type definition} whose {content type} is empty are not ruled out by this clause. However if the complex type definition itself has a non-pointless particle it will fail to satisfy Particle Valid (Restriction). On the other hand some type definitions with pointless element-only content, for example an empty <sequence>, will satisfy Particle Valid (Restriction) with respect to an empty {base type definition}, and so be valid restrictions.

[Definition:] If this constraint Derivation Valid (Restriction, Complex) holds of a complex type definition, it is a valid restriction of its {base type definition}.

Schema Component Constraint: Type Derivation OK (Complex)

For a complex type definition (call it D, for derived) to be validly derived from a type definition (call this B, for base) given a subset of extension, restriction all of the following must be true:

If B and D are not the same type definition, then the {derivation method} of D must not be in the subset.
One of the following must be true:
1. B and D must be the same type definition.
2. B must be D's {base type definition}.
3. All of the following must be true:
  1. D's {base type definition} must not be the ur-type definition.
  2. The appropriate case among the following must be true:
    
    If D's {base type definition} is complex,
    
    then it must be validly derived from B given the subset as defined by this constraint.
    
    If D's {base type definition} is simple,
    
    then it must be validly derived from B given the subset as defined in Type Derivation OK (Simple).

Schema Component Constraint: Particle Valid (Extension)

[Definition:] For a particle (call it E, for extension) to be a valid extension of another particle (call it B, for base) one of the following must be true:

They are the same particle.
E's {min occurs}={max occurs}=1 and its {term} is a sequence group whose {particles}' first member is a particle all of whose properties, recursively, are identical to those of B, with the exception of annotation properties.

Schema Component Constraint: Particle Valid (Restriction)

[Definition:] For a particle (call it R, for restriction) to be a valid restriction of another particle (call it B, for base) one of the following must be true:

They are the same particle.

depending on the kind of particle, per the table below, with the qualifications that all of the following must be true:

Any top-level element declaration particle (in R or B) which is the {substitution group affiliation} of one or more other element declarations and whose substitution group contains at least one element declaration other than itself is treated as if it were a choice group whose {min occurs} and {max occurs} are those of the particle, and whose {particles} consists of one particle with {min occurs} and {max occurs} of 1 ~~for the top-level element declaration and~~ for each of the declarations in its substitution group.
Any pointless occurrences of <sequence>, <choice> or <all> are ignored, where pointlessness is understood as follows:
<sequence>
One of the following must be true:
1. {particles} is empty.
2. All of the following must be true:
  The particle within which this <sequence> appears has {max occurs} and {min occurs} of 1.
  
  One of the following must be true:
  
  The <sequence>'s {particles} has only one member.
  
  The particle within which this <sequence> appears is itself among the {particles} of a <sequence>.
<all>
One of the following must be true:
1. {particles} is empty.
2. {particles} has only one member.
<choice>
One of the following must be true:
1. {particles} is empty and the particle within which this <choice> appears has {min occurs} of 0.
2. All of the following must be true:
  The particle within which this <choice> appears has {max occurs} and {min occurs} of 1.
  
  One of the following must be true:
  
  The <choice>'s {particles} has only one member.
  
  The particle within which this <choice> appears is itself among the {particles} of a <choice>.

	Base particle
Derived particle	elt	any	all	choice	sequence
elt	NameAnd- TypeOK	NSCompat	Recurse- AsIfGroup	Recurse- AsIfGroup	RecurseAs- IfGroup
any	NSSubset	Forbidden	Forbidden	Forbidden	Forbidden
all	NSRecurse- CheckCardinality	Recurse	Forbidden	Forbidden	Forbidden
choice	NSRecurse- CheckCardinality	RecurseLax	Forbidden	Forbidden	Forbidden
seq- uence	NSRecurse- CheckCardinality	Recurse- Unordered	MapAndSum	Recurse	Forbidden

Schema Component Constraint: Occurrence Range OK

For a particle's occurrence range to be a valid restriction of another's occurrence range all of the following must be true:

Its {min occurs} is greater than or equal to the other's {min occurs}.
one of the following must be true:
1. The other's {max occurs} is unbounded.
2. Both {max occurs} are numbers, and the particle's is less than or equal to the other's.

Schema Component Constraint: Particle Restriction OK (Elt:Elt -- NameAndTypeOK)

For an element declaration particle to be a valid restriction of another element declaration particle all of the following must be true:

The declarations' {name}s and {target namespace}s are the same.
Either B's {nillable} is true or R's {nillable} is false.
R's occurrence range is a valid restriction of B's occurrence range as defined by Occurrence Range OK.
either B's declaration's {value constraint} is absent, or is not fixed, or R's declaration's {value constraint} is fixed with the same value.
R's declaration's {identity-constraint definitions} is a subset of B's declaration's {identity-constraint definitions}, if any.
R's declaration's {disallowed substitutions} is a superset of B's declaration's {disallowed substitutions}.
R's {type definition} is validly derived given extension, list, union from B's {type definition} as defined by Type Derivation OK (Complex) or Type Derivation OK (Simple), as appropriate.

Note: The above constraint on {type definition} means that in deriving a type by restriction, any contained type definitions must themselves be explicitly derived by restriction from the corresponding type definitions in the base definition, or be one of the member types of a corresponding union..

Schema Component Constraint: Particle Derivation OK (Elt:Any -- NSCompat)

For an element declaration particle to be a valid restriction of a wildcard particle all of the following must be true:

The element declaration's {target namespace} is valid with respect to the wildcard's {namespace constraint} as defined by Wildcard allows Namespace Name.
R's occurrence range is a valid restriction of B's occurrence range as defined by Occurrence Range OK.

Schema Component Constraint: Particle Derivation OK (Elt:All/Choice/Sequence -- RecurseAsIfGroup)

For an element declaration particle to be a valid restriction of a group particle (all, choice or sequence) a group particle of the variety corresponding to B's, with {min occurs} and {max occurs} of 1 and with {particles} consisting of a single particle the same as the element declaration must be a valid restriction of the group as defined by Particle Derivation OK (All:All,Sequence:Sequence -- Recurse), Particle Derivation OK (Choice:Choice -- RecurseLax) or Particle Derivation OK (All:All,Sequence:Sequence -- Recurse), depending on whether the group is all, choice or sequence.

Schema Component Constraint: Particle Derivation OK (Any:Any -- NSSubset)

For a wildcard particle to be a valid restriction of another wildcard particle all of the following must be true:

R's occurrence range must be a valid restriction of B's occurrence range as defined by Occurrence Range OK.
R's {namespace constraint} must be an intensional subset of B's {namespace constraint} as defined by Wildcard Subset.

Note: As defined above, this constraint is not strong enough to prevent the derivation of type definitions with a larger membership than their base, which should not be allowed. The next version of this specification will constrain valid restrictions along the following lines:

R's {process contents} must be identical to or stronger than B's {process contents}, where strict is stronger than lax is stronger than skip.

Although derivations which violate the above constraint on {process contents} are not ruled out by this specification as it stands, they are likely to become invalid in a future release, and their use is therefor to be avoided. Many uses of {process contents} with value skip in existing schemas will occur in types definitions derived (eventually) by restriction from the lax ur-type definition, and therefor fall into the class of definitions which will become invalid. The conformant way to do this, which is both correct per this specification and will continue to be correct, is to change such type definitions to be derived explicitly from the ur-type definition instead, i.e. as follows:

<xs:complexType>
 <xs:complexContent>
  <xs:restriction base="xs:urType">
   . . .
   <xs:any . . . processContents="skip"/>
   . . .
  </xs:restriction>
 </xs:complexContent>
</xs:complexType>

Schema Component Constraint: Particle Derivation OK (All/Choice/Sequence:Any -- NSRecurseCheckCardinality)

For a group particle to be a valid restriction of a wildcard particle all of the following must be true:

Every member of the {particles} of the group is a valid restriction of the wildcard as defined by Particle Valid (Restriction).
The effective total range of the group, as defined by Effective Total Range (all and sequence) (if the group is all or sequence) or Effective Total Range (choice) (if it is choice) is a valid restriction of B's occurrence range as defined by Occurrence Range OK.

Schema Component Constraint: Particle Derivation OK (All:All,Sequence:Sequence -- Recurse)

For an all or sequence group particle to be a valid restriction of another group particle with the same {compositor} all of the following must be true:

R's occurrence range is a valid restriction of B's occurrence range as defined by Occurrence Range OK.
There is a complete order-preserving functional mapping from the particles in the {particles} of R to the particles in the {particles} of B such that all of the following must be true:
1. Each particle in the {particles} of R is a valid restriction of the particle in the {particles} of B it maps to as defined by Particle Valid (Restriction).
2. All particles in the {particles} of B which are not mapped to by any particle in the {particles} of R are emptiable as defined by Particle Emptiable.

Note: Although the validation semantics of an all group does not depend on the order of its particles, derived all groups are required to match the order of their base in order to simplify checking that the derivation is OK.

[Definition:] A complete functional mapping is order-preserving if each particle r in the domain R maps to a particle b in the range B which follows (not necessarily immediately) the particle in the range B mapped to by the predecessor of r, if any, where “predecessor” and “follows” are defined with respect to the order of the lists which constitute R and B.

Schema Component Constraint: Particle Derivation OK (Choice:Choice -- RecurseLax)

For a choice group particle to be a valid restriction of another choice group particle all of the following must be true:

R's occurrence range is a valid restriction of B's occurrence range as defined by Occurrence Range OK;
There is a complete order-preserving functional mapping from the particles in the {particles} of R to the particles in the {particles} of B such that each particle in the {particles} of R is a valid restriction of the particle in the {particles} of B it maps to as defined by Particle Valid (Restriction).

Note: Although the validation semantics of a choice group does not depend on the order of its particles, derived choice groups are required to match the order of their base in order to simplify checking that the derivation is OK.

Schema Component Constraint: Particle Derivation OK (Sequence:All -- RecurseUnordered)

For a sequence group particle to be a valid restriction of an all group particle all of the following must be true:

R's occurrence range is a valid restriction of B's occurrence range as defined by Occurrence Range OK.
There is a complete functional mapping from the particles in the {particles} of R to the particles in the {particles} of B such that all of the following must be true:
1. No particle in the {particles} of B is mapped to by more than one of the particles in the {particles} of R;
2. Each particle in the {particles} of R is a valid restriction of the particle in the {particles} of B it maps to as defined by Particle Valid (Restriction);
3. All particles in the {particles} of B which are not mapped to by any particle in the {particles} of R are emptiable as defined by Particle Emptiable.

Note: Although this clause allows reordering, because of the limits on the contents of all groups the checking process can still be deterministic.

Schema Component Constraint: Particle Derivation OK (Sequence:Choice -- MapAndSum)

For a sequence group particle to be a valid restriction of a choice group particle all of the following must be true:

There is a complete functional mapping from the particles in the {particles} of R to the particles in the {particles} of B such that each particle in the {particles} of R is a valid restriction of the particle in the {particles} of B it maps to as defined by Particle Valid (Restriction).
The pair consisting of the product of the {min occurs} of R and the length of its {particles} and unbounded if {max occurs} is unbounded otherwise the product of the {max occurs} of R and the length of its {particles} is a valid restriction of B's occurrence range as defined by Occurrence Range OK.

Note: This clause is in principle more restrictive than absolutely necessary, but in practice will cover all the likely cases, and is much easier to specify than the fully general version.

Note: This case allows the “unfolding” of iterated disjunctions into sequences. It may be particularly useful when the disjunction is an implicit one arising from the use of substitution groups.

Schema Component Constraint: Particle Emptiable

[Definition:] For a particle to be emptiable one of the following must be true:

Its {min occurs} is 0.
Its {term} is a group and the minimum part of the effective total range of that group, as defined by Effective Total Range (all and sequence) (if the group is all or sequence) or Effective Total Range (choice) (if it is choice), is 0.

Schema Component Constraint: Wildcard Subset

For a namespace constraint (call it sub) to be an intensional subset of another namespace constraint (call it super) one of the following must be true:

super must be any.
All of the following must be true:
1. sub must be a pair of not and a value (a namespace name or absent).
2. super must be a pair of not and the same value.
All of the following must be true:
1. sub must be a set whose members are either namespace names or absent.
2. One of the following must be true:
  1. super must be the same set or a superset thereof.
  2. super must be a pair of not and a value (a namespace name or absent) and neither that value nor absent must ~~not~~ be in sub's set.

Schema Component Constraint: Derivation Valid (Restriction, Simple)

The appropriate case among the following must be true:

If the {variety} is atomic,
then all of the following must be true:
1. The {base type definition} must be an atomic simple type definition or a built-in primitive datatype.
2. The {final} of the {base type definition} must not contain restriction.
3. For each facet in the {facets} (call this DF) all of the following must be true:
  1. DF must be an allowed constraining facet for the {primitive type definition}, as specified in the appropriate subsection of 3.2 Primitive datatypes.
  2. If there ~~must be~~is a facet of the same kind in the {facets} of the {base type definition} (call this BF),~~of whose value~~then the ~~facet in question~~DF's value must be a valid restriction of BF's value as defined in [W3C 2001c].
If the {variety} is list,
then all of the following must be true:
1. The {item type definition} must have a {variety} of atomic or union (in which case all the {member type definitions} must be atomic).
2. ~~Only length, minLength, maxLength, pattern and enumeration facet components are allowed among the {facets}.~~
3. The appropriate case among the following must be true:
  1. If the {base type definition} is ~~not~~ the simple ur-type definition ,
    then all of the following must be true:
    
    The {final} of the {item type definition} must not contain list.
    
    The {facets} must be empty.
  2. otherwise all of the following must be true:
    
    The {base type definition} must have a {variety} of list.
    
    The {final} of the {base type definition} must not contain restriction.
    
    Only length, minLength, maxLength, pattern and enumeration facet components are allowed among the {facets}.
    
    For each facet in the {facets} (call this DF), if there ~~must be~~is a facet of the same kind in the {facets} of the {base type definition} (call this BF),~~of whose value~~then the ~~facet in question~~DF's value must be a valid restriction of BF's value as defined in [W3C 2001c].
  The first case above will apply when a list is derived by specifying an item type, the second when derived by restriction from another list.
If the {variety} is union,
then all of the following must be true:
1. The {member type definitions} must all have {variety} of atomic or list.
2. ~~Only pattern and enumeration facet components are allowed among the {facets}.~~
3. The appropriate case among the following must be true:
  1. If the {base type definition} is ~~not~~ the simple ur-type definition ,
    then all of the following must be true:
    
    All of the {member type definitions} must have a {final} which does not contain union.
    
    The {facets} must be empty.
  2. otherwise all of the following must be true:
    
    The {base type definition} must have a {variety} of union.
    
    The {final} of the {base type definition} must not contain restriction.
    
    Only pattern and enumeration facet components are allowed among the {facets}.
    
    For each facet in the {facets} (call this DF), if there ~~must be~~is a facet of the same kind in the {facets} of the {base type definition} (call this BF),~~of whose value~~then the ~~facet in question~~DF's value must be a valid restriction of BF's value as defined in [W3C 2001c].
  The first case above will apply when a union is derived by specifying one or more member types, the second when derived by restriction from another union.

[Definition:] If this constraint Derivation Valid (Restriction, Simple) holds of a simple type definition, it is a valid restriction of its base type definition.

Schema Component Constraint: Type Derivation OK (Simple)

For a simple type definition (call it D, for derived) to be validly derived from a simple type definition (call this B, for base) given a subset of extension, restriction, list, union (of which only restriction is actually relevant) one of the following must be true:

They are the same type definition.
All of the following must be true:
1. restriction is not in the subset, or in the {final} of its own {base type definition};
2. One of the following must be true:
  1. D's base type definition is B.
  2. D's base type definition is not the simple ur-type definition and is validly derived from B given the subset, as defined by this constraint.
  3. D's {variety} is list or union and B is the simple ur-type definition.
  4. B's {variety} is union and D is validly derived from a type definition in B's {member type definitions} given the subset, as defined by this constraint.

10. Miscellaneous

10.1. ID/IDREF

Section formulates the following validation rule:

Validation Rule: Validation Root Valid (ID/IDREF)

For an element information item which is the validation root to be valid all of the following must be true:

There must be no ID/IDREF binding in the item's [ID/IDREF table] whose [binding] is the empty set.
There must be no ID/IDREF binding in the item's [ID/IDREF table] whose [binding] has more than one member.

See ID/IDREF Table for the definition of ID/IDREF binding.

Note: The first clause above applies when there is a reference to an undefined ID. The second applies when there is a multiply-defined ID. They are separated out to ensure that distinct error codes (see Outcome Tabulations (normative)) are associated with these two cases.

Note: Although this rule applies at the validation root, in practice processors, particularly streaming processors, may wish to detect and signal the clause 2 case as it arises.

Note: This reconstruction of [W3C 2000]'s ID/IDREF functionality is imperfect in that if the validation root is not the document element of an XML document, the results will not necessarily be the same as those a validating parser would give were the document to have a DTD with equivalent declarations.

Schema Information Set Contribution: ID/IDREF Table

In the post-schema-validation infoset a set of ID/IDREF binding information items is associated with the validation root element information item:

PSVI Contributions for element information items
[ID/IDREF table]	A (possibly empty) set of ID/IDREF binding information items, as specified below.

[Definition:] Let the eligible item set be the set of consisting of every attribute or element information item for which all of the following are true:

its validation context is the validation root;
it was successfully validated with respect to an attribute declaration as per Attribute Locally Valid or element declaration as per Element Locally Valid (Element) (as appropriate) whose attribute {type definition} or element {type definition} (respectively) is the built-in ID, IDREF or IDREFS simple type definition or a type derived from one of them.

Then there is one ID/IDREF binding in the {ID/IDREF table} for every distinct string which is

one of the following:

the actual value of a member of the eligible item set whose type definition is or is derived from ID or IDREF;
one of the items in the actual value of a member of the eligible item set whose type definition is or is derived from IDREFS.

Each ID/IDREF binding has properties as follows:

PSVI Contributions for ID/IDREF binding information items
[id]	The string identified above.
[binding]	A set consisting of every element information item for which all of the following are true: its {validation context} is the validation root; it has an attribute information item in its [attributes] or an element information item in its [children] which was validated by the built-in ID simple type definition or a type derived from it whose schema normalized value is the {id} of this ID/IDREF binding.

The net effect of the above is to have one entry for every string used as an id, whether by declaration or by reference, associated with those elements, if any, which actually purport to have that id. See Validation Root Valid (ID/IDREF) above for the validation rule which actually checks for errors here.

Note: The ID/IDREF binding information item, unlike most other aspects of this specification, is essentially an internal bookkeeping mechanism. It is introduced to support the definition of Validation Root Valid (ID/IDREF) above. Accordingly, conformant processors may, but are not required to, expose it in the post-schema-validation infoset. In other words, the above constraint may be read as saying assessment proceeds as if such an infoset item existed.

10.2. Identity constraints

Section formulates the following validation rule:

Validation Rule: Identity-constraint Satisfied

For an element information item to be locally valid with respect to an identity-constraint all of the following must be true:

The {selector}, with the element information item as the context node, evaluates to a node-set (as defined in [W3C 1999]). [Definition:] Call this the target node set.
Each node in the target node set is an element node among the descendants of the context node.
For each node in the target node set all of the {fields}, with that node as the context node, evaluate to either an empty node-set or a node-set with exactly one member, which must have a simple type. [Definition:] Call the sequence of the type-determined values (as defined in [W3C 2001c]) of the schema normalized value of the element and/or attribute information items in those node-sets in order the key-sequence of the node.
[Definition:] Call the subset of the target node set for which all the {fields} evaluate to a node-set with exactly one member which is an element or attribute node with a simple type the qualified node set. The appropriate case among the following must be true:
1. If the {identity-constraint category} is unique,
  
  then no two members of the qualified node set have key-sequences whose members are pairwise equal, as defined by Equal in [W3C 2001c].
2. If the {identity-constraint category} is key,
  then all of the following must be true:
  1. The target node set and the qualified node set are equal, that is, every member of the target node set is also a member of the qualified node set and vice versa.
  2. No two members of the qualified node set have key-sequences whose members are pairwise equal, as defined by Equal in [W3C 2001c].
  3. No element member of the key-sequence of any member of the qualified node set was assessed as valid by reference to an element declaration whose {nillable} is true.
3. If the {identity-constraint category} is keyref,
  
  then for each member of the qualified node set (call this the keyref member), there must be a node table associated with the {referenced key} in the {identity-constraint table} of the element information item (see Identity-constraint Table, which must be understood as logically prior to this clause of this constraint, below) and there must be an entry in that table whose key-sequence is equal to the keyref member's key-sequence member for member, as defined by Equal in [W3C 2001c].

Note: The use of schema normalized value in the definition of key sequence above means that default or fixed value constraints may play a part in key sequences.

Schema Component Constraint: Identity-constraint Definition Properties Correct

All of the following must be true:

The values of the properties of an identity-constraint definition must be as described in the property tableau in The Identity-constraint Definition Schema Component, modulo the impact of Missing Sub-components.
If the {identity-constraint category} is keyref, the cardinality of the {fields} must equal that of the {fields} of the {referenced key}.

Schema Component Constraint: Selector Value OK

All of the following must be true:

The {selector} must be a valid XPath expression, as defined in [W3C 1999].
One of the following must be true:
1. It must conform to the following extended BNF: Selector XPath expressions
```
Selector ::= Path ( '|' Path )*
Path     ::= ('.//')? Step ( '/' Step )*
Step     ::= '.' | NameTest
NameTest ::= QName | '*' | NCName ':' '*'
```
2. It must be an XPath expression involving the child axis whose abbreviated form is as given above.

For readability, whitespace may be used in selector XPath expressions even though not explicitly allowed by the grammar: whitespace may be freely added within patterns before or after any token.

When tokenizing, the longest possible token is always returned.Lexical productions

token      ::= '.' | '/' | '//' | '|' | '@' | NameTest
whitespace ::= S

Schema Component Constraint: Fields Value OK

All of the following must be true:

Each member of the {fields} must be a valid XPath expression, as defined in [W3C 1999].
One of the following must be true:
1. It must conform to the extended BNF given above for Selector, with the following modification: Path in Field XPath expressions
```
Path ::= ('.//')? ( Step '/' )* ( Step | '@' NameTest )
```
  This production differs from the one above in allowing the final step to match an attribute node.
2. It must be an XPath expression involving the child and/or attribute axes whose abbreviated form is as given above.

For readability, whitespace may be used in field XPath expressions even though not explicitly allowed by the grammar: whitespace may be freely added within patterns before or after any token.

When tokenizing, the longest possible token is always returned.

Schema Information Set Contribution: Identity-constraint Table

[Definition:] An eligible identity-constraint of an element information item is one such that clause 4.1 or clause 4.2 of Identity-constraint Satisfied is satisfied with respect to that item and that constraint, or such that any of the element information item [children] of that item have an {identity-constraint table} property whose value has an entry for that constraint.

[Definition:] A node table is a set of pairs each consisting of a key-sequence and an element node.

Whenever an element information item has one or more eligible identity-constraints, in the post-schema-validation infoset that element information item has a property as follows:

PSVI Contributions for element information items

[identity-constraint table]

one Identity-constraint Binding information item for each eligible identity-constraint, with properties as follows:

PSVI Contributions for Identity-constraint Binding information items
[definition]	The eligible identity-constraint.
[node table]	A node table with one entry for every key-sequence (call it k) and node (call it n) such that one of the following must be true: There is an entry in one of the node tables associated with the {definition} in an Identity-constraint Binding information item in at least one of the {identity-constraint table}s of the element information item [children] of the element information item whose key-sequence is k and whose node is n; n appears with key-sequence k in the qualified node set for the {definition}. provided no two entries have the same key-sequence but distinct nodes. Potential conflicts are resolved by not including any conflicting entries which would have owed their inclusion to clause 1 above. Note that if all the conflicting entries arose under clause 1 above, this means no entry at all will appear for the offending key-sequence.

Note: The complexity of the above arises from the fact that keyref identity-constraints may be defined on domains distinct from the embedded domain of the identity-constraint they reference, or the domains may be the same but self-embedding at some depth. In either case the node table for the referenced identity-constraint needs to propagate upwards, with conflict resolution.

The Identity-constraint Binding information item, unlike others in this specification, is essentially an internal bookkeeping mechanism. It is introduced to support the definition of Identity-constraint Satisfied above. Accordingly, conformant processors may, but are not required to, expose them via {identity-constraint table} properties in the post-schema-validation infoset. In other words, the above constraints may be read as saying validation of identity-constraints proceeds as if such infoset items existed.

10.3. Notations

Schema Component Constraint: Notation Declaration Correct

The values of the properties of a notation declaration must be as described in the property tableau in The Notation Declaration Schema Component, modulo the impact of Missing Sub-components.

Schema Information Set Contribution: Validated with Notation

Whenever an attribute information item is valid with respect to a NOTATION, in the post-schema-validation infoset its parent element information item either has a property as follows:

PSVI Contributions for element information items
[notation]	An item isomorphic to the notation declaration whose {name} and {target namespace} match the local name and namespace name (as defined in QName Interpretation) of the attribute item's actual value

or has a pair of properties as follows:

PSVI Contributions for element information items
[notation system]	The value of the {system identifier} of that notation declaration.
[notation public]	The value of the {public identifier} of that notation declaration.

Note: For compatibility, only one such attribute should appear on any given element. If more than one such attribute does appear, which one supplies the infoset property or properties above is not defined.

10.4. Annotations

Schema Component Constraint: Annotation Correct

The values of the properties of an annotation must be as described in the property tableau in The Annotation Schema Component, modulo the impact of Missing Sub-components.

10.5. Top-level schema component

Schema Component Constraint: Schema Properties Correct

All of the following must be true:

The values of the properties of a schema must be as described in the property tableau in The Schema Itself, modulo the impact of Missing Sub-components;
Each of the {type definitions}, {element declarations}, {attribute group definitions}, {model group definitions} and {notation declarations} must not contain two or more schema components with the same name and target namespace.

10.6. Groups

It's not clear that it makes sense to have named attribute or model groups in a DCTG representation of a schema. But for completeness, the constraints on such groups are listed here.

Schema Component Constraint: Model Group Definition Properties Correct

The values of the properties of a model group definition must be as described in the property tableau in The Model Group Definition Schema Component, modulo the impact of Missing Sub-components.

11. Schema information set contributions (SICs) not yet placed

Schema Information Set Contribution: Schema Information

Schema components provide a wealth of information about the basis of assessment, which may well be of relevance to subsequent processing. Reflecting component structure into a form suitable for inclusion in the post-schema-validation infoset is the way this specification provides for making this information available.

Accordingly, [Definition:] by an item isomorphic to a component is meant an information item whose type is equivalent to the component's, with one property per property of the component, with the same name, and value either the same atomic value, or an information item corresponding in the same way to its component value, recursively, as necessary.

Processors must add a property in the post-schema-validation infoset to the element information item at which assessment began, as follows:

[schema information]

A set of namespace schema information information items, one for each namespace name which appears as the target namespace of any schema component in the schema used for that assessment, and one for absent if any schema component in the schema had no target namespace. Each namespace schema information information item has the following properties and values:

[schema namespace] A namespace name or absent.

[schema components] A (possibly empty) set of schema component information items, each one an item isomorphic to a component whose target namespace is the sibling {schema namespace} property above, drawn from the schema used for assessment.

[schema documents]

A (possibly empty) set of schema document information items, with properties and values as follows, for each schema document which contributed components to the schema, and whose targetNamespace matches the sibling {schema namespace} property above (or whose targetNamespace was absent but that contributed components to that namespace by being <include>d by a schema document with that targetNamespace as per Assembling a schema for a single target namespace from multiple schema definition documents):

PSVI Contributions for schema document information items
[document location]	Either a URI reference, if available, otherwise absent
[document]	A document information item, if available, otherwise absent.

The {schema components} property is provided for processors which wish to provide a single access point to the components of the schema which was used during assessment. Lightweight processors are free to leave it empty, but if it is provided, it must contain at a minimum all the top-level (i.e. named) components which actually figured in the assessment, either directly or (because an anonymous component which figured is contained within) indirectly.

12. Terms and primitive concepts to be defined

stipulation of declaration or definition (used only of top-level invocation?)

validator (cf. processor)

processor (cf. validator)

user

invocation

A. Works cited and further reading

Abramson, Harvey. 1984. “Definite Clause Translation Grammars”. Proceedings of the 1984 International Symposium on Logic Programming, Atlantic City, New Jersey, February 6-9, 1984, pp. 233-240. (IEEE-CS 1984, ISBN 0-8186-0522-7)

Abramson, Harvey, and Veronica Dahl. 1989. Logic Grammars. Symbolic Computation AI Series. Springer-Verlag, 1989.

Abramson, Harvey, and Veronica Dahl, rev. Jocelyn Paine. 1990. DCTG: Prolog definite clause translation grammar translator. (Prolog code for translating from DCTG notation to standard Prolog. Note says syntax extended slightly by Jocelyn Paine to accept && between specifications of grammatical attributes, to minimize need for parentheses. Available from numerous AI/NLP software repositotries, including http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/prolog/code/syntax/dctg/0.html, http://www.ims.uni-stuttgart.de/ftp/pub/languages/prolog/libraries/imperial_college/dctg.tar.gz, and http://www.ifs.org.uk/~popx/prolog/dctg/.)

Alblas, Henk. 1991. “Introduction to attribute grammars”. Attribute grammars, applications and systems: International Summer School SAGA, Prague, Czechoslovakia, June 4-13, 1991, Proceedings, pp. 1-15. Berlin: Springer, 1991. Lecture Notes in Computer Science, 545.

Bratko, Ivan. 1990. Prolog programming for artificial intelligence. Second edition. Wokingham: Addison-Wesley. xxi, 597 pp.

Brown, Allen L., Jr., and Howard A. Blair. 1990. “A logic grammar foundation for document representation and layout”. In EP90: Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography, ed. Richard Furuta. Cambridge: Cambridge University Press, 1990, pp. 47-64.

Brown, Allen L., Jr., Toshiro Wakayama, and Howard A. Blair. 1992. “A reconstruction of context-dependent document processing in SGML”. In EP92: Proceedings of Electronic Publishing, 1992, ed. C. Vanoirbeek and G. Coray. Cambridge: Cambridge University Press, 1992. Pages 1-25.

Brüggemann-Klein, Anne. 1993. Formal models in document processing. Habilitationsschrift, Freiburg i.Br., 1993. 110 pp. Available at ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps (Cover pages archival copy also at http://www.oasis-open.org/cover/bruggDissert-ps.gz).

Note: Brüggemann-Klein provides a formal definition of 1-unambiguity, which corresponds to the notion of unambiguity in ISO 8879 and determinism in XML 1.0. Her definition of 1-unambiguity can be used to check XML Schema's Unique Particle Attribution constraint by changing every minOccurs and maxOccurs value greater than 1 to 1, if the two are equal, and otherwise changing minOccurs to 1 maxOccurs greater than 1 to unbounded.

Clocksin, W. F., and C. S. Mellish. Programming in Prolog. Second edition. Berlin: Springer, 1984.

Gal, Annie, Guy Lapalme, Patrick Saint-Dizier, and Harold Somers. 1991. Prolog for natural language processing. Chichester: Wiley, 1991. xiii, 306 pp.

Gazdar, Gerald, and Chris Mellish. 1989. Natural language processing in PROLOG: An introduction to computational linguistics. Wokingham: Addison-Wesley, 1989. xv, 504 pp.

Grune, Dick, and Ceriel J. H. Jacobs. 1990. Parsing techniques: a practical guide. New York, London: Ellis Horwood, 1990. Postscript of the book is available from the first author's Web site at http://www.cs.vu.nl/~dick/PTAPG.html

Knuth, D. E. 1968. “Semantics of context-free languages”. Mathematical Systems Theory 2: 127-145.

König, Esther, and Roland Seiffert. Grundkurs PROLOG für Linguisten. Tübingen: Francke, 1989. [= Uni-Taschenbücher 1525]

Sperberg-McQueen, C. M. 2003a. “Notes on logic grammars and XML Schema”. Working paper prepared for the W3C XML Schema Working Group. [Incomplete; current draft is at dctgnotes.html. Introduction to logic grammar notation, illustrative translation of purchase-order schema into logic grammar form.]

Sperberg-McQueen, C. M. 2003b. “An XML Schema validator in logic-grammar form”. Working paper prepared for the W3C XML Schema Working Group. [Incomplete; incomplete outline is at xsvlgf.html. Will provide a DCTG representation of the schema for schemas, with actions to perform the XML-to-component transformation and code to check the constraints on schemas.]

Stepney, Susan. 1993. High-integrity compilation. Prentice-Hall. Available from http://www-users.cs.york.ac.uk/~susan/bib/ss/hic/index.htm. Chapter 3 (Using Prolog) provides a terse introduction to DCTG notation and use.

[W3C 1999] W3C. XML Path Language, ed. James Clark and Steve DeRose. W3C Recommendation 16 November 1999. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. See http://www.w3.org/TR/1999/REC-xpath-19991116

[W3C 2000] W3C. Extensible Markup Language (XML) 1.0, Second Edition, ed. Tim Bray et al. W3C Recommendation 6 October 2000. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. http://www.w3.org/TR/2000/REC-xml-20001006

[W3C 2001a] “XML Schema Part 0: Primer”, ed. David Fallside. W3C Recommendation, 2 May 2001. [Cambridge, Sophia-Antipolis, Tokyo: W3C] http://www.w3.org/TR/xmlschema-0/.

[W3C 2001b] 2001. XML Schema Part 1: Structures, ed. Henry S. Thompson, David Beech, Murray Maloney, and Noah Mendelsohn. W3C Recommendation 2 May 2001. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

[W3C 2001c] W3C. 2001. XML Schema Part 2: Datatypes, ed. Biron, Paul V. and Ashok Malhotra. W3C Recommendation 2 May 2001. [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

Wielemaker, Jan. “SWI-Prolog SGML/XML parser: Version 1.0.14, March 2001”. http://www.swi-prolog.org/packages/sgml2pl.html

For more information on the representation of SGML and XML documents as Prolog structures, see the SWI add-ons documentation [Wielemaker 2001]. Other representations are possible; I have used this one because it's convenient and because representing sibling sets in a list makes it easier to use DCGs and DCTGs.

Definite-clause grammars (DCGs) are introduced in almost any good Prolog textbook: e.g. [Clocksin/Mellish 1984], [Bratko 1990]. They are discussed at somewhat greater length in treatments of Prolog for natural-language processing, including [König/Seiffert 1989], [Gazdar/Mellish 1989], and [Gal et al. 1991]. Most extended discussions show how to use additional arguments to record syntactic structure or handle the semantics of the material.

Definite-clause translation grammars were introduced as a way of making it easier to handle semantics; they provide explicit names for attributes (in the sense of attribute grammars [Knuth 1968]).

B. To do

I won't be working seriously on this paper until after I have finished Notes on logic grammars and XML Schema. Then:

Walk through a simple validation exercise (e.g. the one Henry and I did on Matthew's example document, or a validation of the tutorial's purchase order), and put each validation rule into a section, in sequence. Filter out the clauses and validation rules that can be associated with wildcards, etc.; they will be added later.
As each component type is encountered, move the appropriate constraints on schemas to the relevant sections.
Move schema-information-set contributions to appropriate locations.
Implement in Prolog.
Add to later layers the features / constructs omitted in earlier layers.

C. Toward a useful layering

The layering proposed above is a start; I'd like to confirm it by walking through a validation episode or two. I'll start with the purchase order from the tutorial. We visit the following validation rules; under each, I note some features I think should probably be omitted from layer 1, and introduced as elaborations later, in some sequence to be determined.

Sec. 5.2, rule 3
- ability of user or application to stipulate a type definition at startup
- ability of user or application to stipulate an element declaration at startup
- ability of user or application to stipulate a starting point other than the root of the document
sva_element_334
- use of xsi:type (and with it clause 1.2)
qname_resolution_instance_3f4
elementlocallyvalid_element_334
- absent declarations
- abstract element declarations
- xsi:nil
- default values, fixed values (?)
element_locally_valid_type_334
- absent type definitions
- abstract types
string valid (and from there to simple-type checking)
element_locally_valid_complextype
- attribute wildcards
- wild IDs
element sequence locally valid (particle)
wildcards

...
element sequence valid
attribute locally valid (use) 354
schema-validity assessement (attribute) 324
assessment outcome (attribute) 325