Introduction to

W3C XML Schema 1.0

C. M. Sperberg-McQueen

18 August 2003

http://www.w3.org/People/cmsmcq/2003/xstut.sydney.html

I. Welcome and overview
I.1. Workshop goals
I.2. Non-goals
I.3. Rules
I.4. Acknowledgements
I.5. Workshop overview
II. Introduction
II.1. What's a schema?
II.2. What's XML Schema?
II.3. Other schema languages
II.4. Why use schemas?
II.5. What is well-formedness?
II.6. The duck
II.7. The duck
II.8. The duck
II.9. What the computer sees
II.10. Find the errors
II.11. Now imagine ...
II.12. The Iron Law
II.13. Why well-formedness isn't enough
II.14. Document grammars
II.15. Conceptual layers
II.16. Conceptual layers (2)
II.17. Uses of document grammars
II.18. DTDs as a schema language
II.19. DTDs: special notation
III. Basic ideas of XML Schema
III.1. XML Schema
III.2. The XML Schema 1.0 specification
III.3. Implementations
III.4. Use cases
III.5. Data-intensive applications
III.6. Document-oriented applications
III.7. Schema-validity and schema-validity assessment
III.8. Validation
III.9. Some fundamental ideas
III.10. Schema components
IV. A simple example
IV.1. Making a schema document
IV.2. The schematic duck (1)
IV.3. Linking document and schema
IV.4. Running the validator
IV.5. Post-schema-validation infoset
IV.6. Validation outcomes
IV.7. The schematic duck (2a)
IV.8. The schematic duck (2b)
IV.9. The schematic duck (3)
V. A more complex example: the purchase order
V.1. The purchase order schema
V.2. Declaring elements
V.3. Declaring elements
V.4. Declaring complex types
V.5. Character data
V.6. Exploring the purchase order schema
VI. Simple types and facets
VI.1. Simple datatypes
VI.2. Simple type hierarchy
VI.3. Built-in primitive datatypes
VI.4. Built-in derived datatypes
VI.5. Using a built-in type
VI.6. Using xsi:type
VI.7. What is an atomic?
VI.8. What is an atomic? (take 2)
VI.9. Derivation of simple types
VI.10. Derivation of simple types (2)
VI.11. Regular expressions for the pattern facet
VI.12. Regular expressions (2)
VI.13. Enumerations
VI.14. Non-atomic simple types
VI.15. Creating a list of integers
VI.16. Creating a union
VI.17. Examples
VII. Complex types
VII.1. Content models
VII.2. Content model: example
VII.3. Content model: example
VII.4. Content model: example
VII.5. Attributes
VII.6. Global and local declarations
VII.7. Wildcards
VII.8. A schema for xsi:type usage
VII.9. Examples
VIII. Post-schema-validation infoset
VIII.1. Post-schema-validation infoset (PSVI)
VIII.2. Infoset contributions
VIII.3. Validation outcomes
VIII.4. Reflecting / serializing the PSVI
VIII.5. The ‘alternating form’ PSVI
IX. Some questions of usage
IX.1. ~~Inheritance~~ Type derivation
IX.2. Inheritance in document systems
IX.3. Schemas and namespaces
IX.4. Schema layers
IX.5. Modularization
IX.6. Modularizing vocabularies: tasks
IX.7. Modularizing vocabularies: techniques
IX.8. The tag/type distinction and non-local effects
IX.9. Non-local effects in XML Schema
IX.10. Determinism
IX.11. Practical issues
X. Review and conclusion
X.1. Why schema languages?
X.2. Fundamental ideas of XML Schema 1.0
X.3. Simple types
X.4. Complex types
X.5. Deployment

I. Welcome and overview

goals
non-goals
rules
overview

I.1. Workshop goals

At the end of the morning, you should

know what XML Schema is
understand why schema languages in general are needed
have a general grasp of XML Schema's basic concepts and how to impose various kinds of constraints using XML Schema
know how XML syntax is used in XML Schema documents
understand how schemas are put together from schema documents and how multiple-namespace documents can be validated

I.2. Non-goals

At the end of the morning, you will not:

know how to perform the document and data analysis necessary to define XML vocabularies well
have hands-on experience writing a DTD or schema
be able to impress computer scientists with a profound understanding of the relative merits of single and multiple inheritance for handling limited context sensitivity in a basically context-free environment

— unless, of course, you already do.

I.3. Rules

Just to let you know what I expect:

If you cannot hear me, please interrupt.
If something is unclear, please interrupt and ask.
If a rabbit hole is tempting, please hold back.
Break at 10:30. Sharp.*

* Asterisks in the slides mean that annotation or interpretation is needed; if I don't provide it, ask.

I.4. Acknowledgements

I'm indebted (for general discussions and for specific material included here) to

Elaine Brennan
Robin Cover
David Fallside
Michael Hahn
Dave Hollander
Eve L. Maler
Murray Maloney
Jeni Tennison
Henry S. Thompson

as well as to my colleagues at W3C and in the W3C XML Schema Working Group.

I.5. Workshop overview

Overview and introduction
Basic ideas of XML Schema
A simple example (the duck)
A more complex example (purchase order)

[break]

Simple types and their facets
Complex types and their derivation hierarchy
The post-schema-validation infoset
Some usage questions (modularization, designing for reuse and extension, multi-namespace documents)
Review and conclusion

II. Introduction

What's XML Schema?
Why use schemas? what's wrong with well-formed XML?
DTDs as a schema language
Core ideas of XML Schema

II.1. What's a schema?

For our purposes, a schema is

a formal expression

of the structure

of an XML document

and of constraints on the text therein

There are other meanings in DBMS, programming languages, mathematics, and elsewhere. No* relation.

In ISO 8879, the term document type definition has this* meaning.[1]

II.2. What's XML Schema?

XML Schema 1.0 is

A W3C Recommendation

issued in May 2001,

developed by the W3C XML Schema Working Group,

which defines

a system of simple and complex types
several types of schema components
an XML transfer syntax for schema documents
rules for schema-validity assessment of XML infosets
contributions to a post-schema-validation infoset

II.3. Other schema languages

There are other schema languages for XML document types:

Relax NG
Schematron
XML Data Reduced (XDR)
Relax
Trex
Document structure definition (DSD)
Schema for object-oriented XML (SOX)
various DTD extensions

II.4. Why use schemas?

So what's wrong with well-formed XML?

What is well-formedness?
The duck
A malformed duck
What the computer sees

II.5. What is well-formedness?

A document is well-formed if it obeys all the rules of XML itself:

Start-tags match end-tags.
Elements nest properly.
Attributes are quoted.
There is a single root element.
All entities used are declared.
...

Any additional constraints are imposed by the application, not by XML.

II.6. The duck

by Ogden Nash

Behold the duck.
It does not cluck.
A cluck it lacks.
It quacks.

It is especially fond
Of a puddle or pond.
When it dines or sups
It bottoms-ups.

II.7. The duck

Let us consider a straightforward XML encoding:

<poem>
<title>The duck</title>
<author>Ogden Nash</author>
<stanza>
<line>Behold the duck.</line>
<line>It does not cluck.</line>
<line>A cluck it lacks.</line>
<line>It quacks.</line>
</stanza>
<stanza>
<line>It is especially fond</line>
<line>Of a puddle or pond.</line>
<line>When it dines or sups</line>
<line>It bottoms-ups.</line>
</stanza>
</poem>

II.8. The duck

Even if the data are meaningless, some errors are obvious:

<poem>
<author>Btqra Anfu</author>
<stanza>
<line>Orubyq gur qhpx.</line>
<line>Vg qbrf abg pyhpx.</line>
<line>N pyhpx vg ynpxf.</line>
<line>Vg dhnpxf.</line>
</stanza>
<title>Gur qhpx</title>
<stanza>
<line>Vg vf rfcrpvnyyl sbaq</line>
<line>Bs n chqqyr be cbaq.</line>
<line>Jura vg qvarf be fhcf</line>
<line>Vg obggbzf-hcf.</line>
</stanza>
</poem>

II.9. What the computer sees

What the computer sees, however, is less clear:

<cbrz>
<gvgyr>Gur qhpx</gvgyr>
<nhgube>ol Btqra Anfu</nhgube>
<fgnamn>
<yvar>Orubyq gur qhpx.</yvar>
<yvar>Vg qbrf abg pyhpx.</yvar>
<yvar>N pyhpx vg ynpxf.</yvar>
<yvar>Vg dhnpxf.</yvar>
</fgnamn>
<fgnamn>
<yvar>Vg vf rfcrpvnyyl sbaq</yvar>
<yvar>Bs n chqqyr be cbaq.</yvar>
<yvar>Jura vg qvarf be fhcf</yvar>
<yvar>Vg obggbzf-hcf.</yvar>
</fgnamn>
</cbrz>

II.10. Find the errors

This document is well-formed, but has several typos.

<cbrz>
<gvgyr>Gur qhpx</gvgyr>
<nhgube>ol Btqra Anfu</nhgube>
<fgnamn>
<yvar>Orubyq gur qhpx.</yvar>
<yvar>Vg qbrf abg pyhpx.</yvar>
<yvar>N pyhpx vg ynpxf.</yvar>
<yvar>Vg dhnpxf.</yvar>
</fgnamn>
<fgnanm>
<yyar>Vg vf rfcrpvnyyl sbaq</yyar>
<yyar>Bs n chqqyr be cbaq.</yyar>
<yyar>Jura vg qvarf be fhcf</yyar>
<yvar>Vg obggbzf-hcf.</yvar>
</fgnanm>
</cbrz>

II.11. Now imagine ...

that it's production data:

The document is well-formed but has typos.
It's not a poem but a purchase-order.
Owing to the typos, your order for ten laser printers has become an order for ten gross of laser printers.
(And you just learned your supplier isn't good at correcting errors in their computer systems.)

II.12. The Iron Law

Garbage* in, garbage out.

Three questions:

Can errors exist? or is every string of bits a possible message?
Can errors be found
- automatically?
- by clerical inspection?
- through inspections by highly trained experts?
Is the cost of undetected errors
- trivial?
- small?
- large?
- catastrophic?

II.13. Why well-formedness isn't enough

Well-formed documents

can have errors

with serious consequences

some of which* can be caught mechanically.

II.14. Document grammars

Origin: pragmatic, not theoretical; partial post hoc alignment with formal language theory.

Formal specification of validity conditions → automated validation.

Er, ah, partial formal specification of validity conditions → partial automated validation.

Distinction between

document type definition (DTD) and
“set of effective formal declarations”

→ division of labor.

II.15. Conceptual layers

Three layers of rules governing data.

II.16. Conceptual layers (2)

Distinguish logical and physical structure.

II.17. Uses of document grammars

Document grammars may have several uses:

in the struggle against dirty data (sanity checking)
as documentation of the content of data flows
as documentation of a contract between data provider and data consumer
as specification of client/server protocols
validation (to enforce the contract or check the implementation)
automation for document authoring
reasoning about data and software (query processing, completeness checking for software)

II.18. DTDs as a schema language

SGML/XML DTDs resemble Backus-Naur Form grammars, but:

They describe bracketed languages* ...
... so ‘non-terminals’ are visible*.
SGML allows inclusion and exclusion exceptions (Rizzi: NP-complete parsing problem for non-bracketed L).
They are not purely grammatical (notations, entities).
Determinism rule.

II.19. DTDs: special notation

compact, clear distinction of levels
ad hoc, adds complexity (1/3 of the rules in the SGML grammar)
because the notation is different, DTDs require
- special parsers
- special editors
- special processors
learning curve? (not really relevant, even if we agreed on which is harder)
no datatypes*
do not play well with namespaces
no formal role for documentation
no inheritance (no kind-of information, only part-of)

III. Basic ideas of XML Schema

XML Schema: a first approximation
the spec and some implementations
applications
validation / schema-validity assessment
fundamental ideas
schema components

III.1. XML Schema

DTD++, DTD--
instance syntax
supporting programming-language and database-oriented types (inheritance)
schema combination rules
better hooks for documentation and semantics

III.2. The XML Schema 1.0 specification

Three parts:

Part 0: Primer (introduction to XML Schema)
http://www.w3.org/TR/xmlschema-0/
Part 1: Structures (structural constraints, schema components and XML representation, validation rules)
http://www.w3.org/TR/xmlschema-1/
Part 2: Datatypes (atomic datatypes, lists, unions)
http://www.w3.org/TR/xmlschema-2/

III.3. Implementations

Among the most widely used validators:

MSXML4 (Microsoft)
Xerces-J, Xerces-C++ (Apache)
XSV (Henry Thompson, Richard Tobin)
Schema Quality Checker (IBM)
Multi Schema Validator (Sun)
Topologi Schematron Validator (uses MSXML for XML Schema validation)

Editors include:

XML Spy (Altova)
XMetaL (Corel)
...

III.4. Use cases

quality assurance
database exchange
translation to/from OO systems
reuse of schemas and schema fragments
smooth evolution of schemas, applications, data, software

III.5. Data-intensive applications

electronic commerce
Web Services
database exchange
inter-process communication
metadata processing
process modeling

III.6. Document-oriented applications

publishing
Web page design
form controls
online catalogs
multimedia presentations
electronic books
maps, directories, ...

III.7. Schema-validity and schema-validity assessment

XML Schema 1.0 defines

schema-validity assessment:
input infoset × schema → output infoset (PSVI)
schema (≡ set of abstract schema components)
schema document (XML representation)
mapping from XML transfer syntax to schema components

III.8. Validation

In schema-validity assessment, we

identify an XML element information item* to validate
identify a schema to validate against
assess the schema-validity of the element and its descendants
add validity and type information to the infoset

III.9. Some fundamental ideas

The syntax is not the schema.
Namespaces are fundamental (but not the same as schemas).
Schema-validity assessment is an infoset-to-infoset mapping.
We separate tags and types.
Types can be simple or complex.
Elements and types can be global (top-level) or local.
Schemas can be combined.
Declarations can use wildcards, type derivation (extension, restriction), substitution groups, application-specific annotations.

III.10. Schema components

XML Schema 1.0 defines fourteen types of component. The most important:

element declarations
attribute declarations
complex type definitions
simple type definitions
the schema itself
annotations

Entities conspicuous by their absence.

IV. A simple example

a schema for the duck
validating the data with
- XSV
- Xerces
- MSXML (using Topologi interface)

IV.1. Making a schema document

the schema element
declaring elements
occurrence indicators
character data
linking the schema and the document

[Shift back and forth to emacs for construction of a simple schema for “The Duck”]

IV.2. The schematic duck (1)

First, let's just declare all the element types:

<xsd:schema 
 xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="poem"></xsd:element>
 <xsd:element name="title"></xsd:element>
 <!--* ... etc. ... *-->
</xsd:schema>

Validate:

with complete schema
omitting some declarations
with errors in document (misspellings, order, ...)

IV.3. Linking document and schema

Two ways to link document and schema:

inline (clunky, problematic, but easy to understand and implement)
out-of-band (not standardized)

An inline example:

<poem 
 xmlns:xsi
 ="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="tds03.xsd"
 >

IV.4. Running the validator

XSV: xsv -t -w -o xsvout.xml -s "file:///c:/Program%20Files/XSV/xsv.xsl" file.xml
- -t = show timings
- -w = include warnings
- -o file = write errors to file
- -s file = include stylesheet file in error output
Xerces: java sax.Counter -v -s -f file.xml
- -v = validate
- -s = use XML Schema
- -f = schema full checking (check schema for correctness)
Topologi (menu interface)

IV.5. Post-schema-validation infoset

XSV and Xerces-J can also dump the PSVI:

XSV: xsv -t -w -o xsvout.xml -s "file:///c:/Program%20Files/XSV/xsv.xsl" -r alt file.xml > psvi.out.xml
- -t, -w, -o, -s as before
- -r alt = write PSVI in alternating normal form
- -r ind = write PSVI in individual normal form
Xerces: java sax.Writer -v -s -f -p xni.parser.PSVIParser file.xml > psvi.out.xerces.xml [2]
- -v, -s, -f as before
- -p parser = use specified parser in lieu of default

IV.6. Validation outcomes

N.B. there are six outcomes, not two:

	Validity
Validation attempted	valid	invalid	notKnown
full	OK. Entire subtree valid.	OK. Entire subtree assessed; error here or at some descendant.	Not possible (contradictory)
partial	OK. This node assessed and valid. Some descendant skipped.	OK. Problem at this node, or in a child. Also, some descendant skipped.	OK. This node not assessed (but a descendant was.)
none (subtree skipped)	Not possible (contradictory)	Not possible (contradictory)	OK. This subtree was skipped.

IV.7. The schematic duck (2a)

Next, let's declare the rules more correctly:

<xsd:schema 
 xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="poem">
  <xsd:complexType>
   <xsd:sequence>
    <xsd:element ref="title"/>
    <xsd:element ref="author"/>
    <xsd:element ref="stanza" maxOccurs="unbounded"/>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
 <!--* ... etc. ... *-->
</xsd:schema>

IV.8. The schematic duck (2b)

To handle textual data:

 <xsd:element name="title">
  <xsd:complexType mixed="true"/>
 </xsd:element>

 <xsd:element name="author" type="xsd:string"/>

Validate:

with complete schema
omitting some declarations
with errors in document (misspellings, order, ...)

IV.9. The schematic duck (3)

A better way for textual data:

  <xsd:complexType name="words" mixed="true"/>

which allows us to say simply:

 <xsd:element name="title" type="words"/>
 <xsd:element name="author" type="words"/>
 <xsd:element name="line" type="words"/>

V. A more complex example: the purchase order

Once more, slowly:

schema element
element declarations with named types
element declarations with anonymous types
handling natural-language data: string, mixed content

V.1. The purchase order schema

At the outer level is a schema element:

<xsd:schema 
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:po="http://www.example.com/PO1"
     targetNamespace="http://www.example.com/PO1"
>
 <!--* declarations and definitions go here *-->
</xsd:schema>

N.B. the schema does not identify a document-root element / start symbol.

V.2. Declaring elements

With named types:

 <xsd:element name="purchaseOrder" 
              type="po:PurchaseOrderType"/>
 <xsd:element name="comment"       
              type="xsd:string"/>

V.3. Declaring elements

With anonymous types:

 <xsd:element name="quantity">
  <xsd:simpleType>
   <xsd:restriction base="positiveInteger">
    <xsd:maxExclusive value="100"/>
   </xsd:restriction>
  </xsd:simpleType>
 </xsd:element>

V.4. Declaring complex types

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo"    type="po:USAddress"/>
   <xsd:element name="billTo"    type="po:USAddress"/>
   <xsd:element ref="po:comment" minOccurs="0"/>
   <xsd:element name="items"  type="po:Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

Note difference between element declaration and element reference.
Implicit occurrence information: min = max = 1.

V.5. Character data

 <xsd:element name="comment"       
              type="xsd:string"/>

or as mixed content:

 <xsd:element name="comment">
  <xsd:complexType mixed="true">
  </xsd:complexType>
 </xsd:element>

V.6. Exploring the purchase order schema

Validate:

correct purchase order
missing billTo
invalid product count
invalid product number

VI. Simple types and facets

classification of simple types
overview of built-in types
what is a simple type?
examples

VI.1. Simple datatypes

built-in
- primitive
- derived
user-defined (all derived)

VI.2. Simple type hierarchy

VI.3. Built-in primitive datatypes

string
boolean
decimal, float, double
dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth
duration
hexBinary, base64Binary
anyURI
QName
NOTATION

VI.4. Built-in derived datatypes

normalizedString, token, language
IDREFS, ENTITIES, NMTOKEN, NMTOKENS, Name, NCName, ID, IDREF, ENTITY
integer, nonPositiveInteger, negativeInteger, long, int, short, byte, nonNegativeInteger, unsignedLong, unsignedInt, unsignedShort, unsignedByte, positiveInteger

VI.5. Using a built-in type

We can declare built-ins:

<xsd:element name="USPrice"  type="xsd:decimal"/>
<xsd:attribute name="orderDate" type="xsd:date"/>

Or just use them dynamically:

<shipDate xsi:type="xsd:date">2003-08-18</date>

VI.6. Using xsi:type

The special attribute xsi:type can be used to associate specific elements in an instance with types. Conditions:

elements only (why?)
if schema says type="B" and instance saysxsi:type="D", then D must be derived* from B.

Some people think the use of xsi:type for simple types is the easiest way to edge into schema usage.

VI.7. What is an atomic?

Extensional view:

a set of values V
a set of lexical forms L
a mapping from L to V

VI.8. What is an atomic? (take 2)

Intensional view:

a base mapping L → V
a set of fundamental facets:
- equality (identity)
- order (partial, total, none)
- boundedness
- cardinality
- numeric
a set of constraining facets:
- length, minLength, maxLength
- pattern (constrains lexical space)
- enumeration
- whiteSpace
- maxInclusive, maxExclusive, minInclusive, minExclusive
- totalDigits, fractionDigits

VI.9. Derivation of simple types

Simple types can be derived by restricting a facet:

<xsd:simpleType>
  <xsd:restriction 
       base="xsd:positiveInteger">
    <xsd:maxExclusive value="100"/>
  </xsd:restriction>
</xsd:simpleType>

Most facets directly control the value space (and the lexical space indirectly).

VI.10. Derivation of simple types (2)

Some facets control the lexical space directly (and the value space indirectly):

<xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{3}-[A-Z]{2}"/>
  </xsd:restriction>
</xsd:simpleType>

VI.11. Regular expressions for the pattern facet

The regular expressions for the pattern facet are mostly conventional:

concatenation: ab
alternation: a | b
repetition: a*b+c?
character classes: [a-zA-Z], [^aeiou]
single-character escapes: \n, \r, etc.

...

VI.12. Regular expressions (2)

... with some extensions:

numeric exponents: a{1,5}
class subtraction: [a-zA-Z-[aeiou]]
Unicode-property classes: \p{Lu} (characters with property Lu, i.e. upper-case letters), \P{Lu} (negation: characters lacking property Lu)
Unicode-block classes: \p{IsBasicLatin} (characters in the Basic Latin block), \P{IsBasicLatin} (negation: characters outside that block)

VI.13. Enumerations

Enumerations can be used to specify a list of legal values:

 <xsd:simpleType type="width-keywords">
  <xsd:restriction base="xsd:string">
   <xsd:enumeration value="full"/>
   <xsd:enumeration value="half"/>
   <xsd:enumeration value="none"/>
   <xsd:enumeration value="default"/>
  </xsd:restriction>
 </xsd:simpleType>

VI.14. Non-atomic simple types

list (white-space delimited)
unions (ordered)

VI.15. Creating a list of integers

Lists can be created by restricting anySimpleType:

<xsd:simpleType name="listofdates">
  <xsd:list itemType="xsd:date"/>
</xsd:simpleType>

VI.16. Creating a union

Unions are similarly restrictions of anySimpleType:

 <xsd:simpleType name="widthType">
  <xsd:union 
   memberTypes
    ="width-keywords xsd:positiveInteger">
  </xsd:union>
 </xsd:simpleType>

VI.17. Examples

numbers: decimal, integer, positive integer
strings: patterns
dates and times: minima and maxima, date format
binary data (hex, base 64)
lists, unions

[Switch to emacs.]

VII. Complex types

content models
attributes
global and local declarations
wildcards

VII.1. Content models

Productions in the document grammar:

regular expression-like
primitive tokens are elements (recognized by name)
sequence, choice, all
numeric occurrence indicators
determinism rule

VII.2. Content model: example

<xsd:sequence>
 <xsd:element name="name"   type="xsd:string"/>
 <xsd:element name="street" type="xsd:string"/>
 <xsd:element name="city"   type="xsd:string"/>
 <xsd:element name="state"  type="xsd:string"/>
 <xsd:element name="zip"    type="xsd:decimal"/>
</xsd:sequence>

VII.3. Content model: example

Let's allow ourselves three kinds of customers:

<xsd:choice>
 <xsd:element name="indiv" type="po:person"/>
 <xsd:element name="corp" type="po:organization"/>
 <xsd:element name="internal" type="po:dept"/>
</xsd:choice>

VII.4. Content model: example

Mixing choice and sequence:

<xsd:sequence>
 <xsd:choice>
  <xsd:element name="customer" type="po:USAddress"/>
  <xsd:sequence>
   <xsd:element name="shipTo" type="po:USAddress"/>
   <xsd:element name="billTo" type="po:USAddress"/>
  </xsd:sequence>
 </xsd:choice>
 <xsd:element ref="po:comment" minOccurs="0"/>
 <xsd:element name="items"  type="po:Items"/>
</xsd:sequence>

VII.5. Attributes

<xsd:attribute name="orderDate" 
               type="xsd:date"/>

use = (prohibited | optional | required) (default is optional)
form = (qualified | unqualified) (default is set on schema element)

VII.6. Global and local declarations

Elements can be

global (top-level) to a namespace, or
local to a complex type

Types can be

named / global (top-level) to a namespace, or
anonymous / local to an element or attribute

VII.7. Wildcards

Two kinds:

element wildcard: xsd:any
attribute wildcard: xsd:anyAttribute

Two parameters:

processContents = (strict | lax | skip)
nameSpace = (##any | ##other | ##targetNamespace | ##local | namespace URI)

VII.8. A schema for xsi:type usage

Some people think the use of xsi:type for simple types is the easiest way to edge into schema usage. Here's one way:

 <xsd:element name="mydoc">
  <xsd:complexType>
   <xsd:sequence>
    <xsd:any 
     namespace="##any" 
     processContents="lax" 
     minOccurs="0" 
     maxOccurs="unbounded">
    </xsd:any>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>

VII.9. Examples

element declarations
extension
restriction
complex types with simple content
wildcards

VIII. Post-schema-validation infoset

what it is
how you get at it
what it's good for

VIII.1. Post-schema-validation infoset (PSVI)

XML-Schema validation: infoset → infoset.

additions, no changes
type assignment information
validation-attempted information (strict, lax, skip)
validation-outcome information

VIII.2. Infoset contributions

type information:
- [type definition] (or: [type definition name], [type definition namspace], [type definition type], and [type definition anonymous])
- [member type definition] if needed
- [attribute declaration] or [element declaration]
default values:
- [schema default] = default / fixed value
- [schema specified] = infoset or schema
white-space processing: [schema normalized value]
validity:
- [validity] = valid or invalid or notKnown
- [validation context] = element where validation started
- [validation attempted] = full or partial or none
- [schema error code] if needed

VIII.3. Validation outcomes

There are six outcomes, not two:

	Validity
Validation attempted	valid	invalid	notKnown
full	OK. Entire subtree valid.	OK. Entire subtree assessed; error here or at some descendant.	Not possible (contradictory)
partial	OK. This node assessed and valid. Some descendant skipped.	OK. Problem at this node, or in a child. Also, some descendant skipped.	OK. This node not assessed (but a descendant was.)
none (subtree skipped)	Not possible (contradictory)	Not possible (contradictory)	OK. This subtree was skipped.

VIII.4. Reflecting / serializing the PSVI

In principle, PSVI is abstract.

In practice, exposed

through API
through XML serialization
- additional attributes on input XML
- various normal-form reflections of PSVI graph

VIII.5. The ‘alternating form’ PSVI

For example, part of a PSVI for a purchase order:

<document xmlns:p="http://www.w3.org/2001/05/PSVInfosetExtension"
          xmlns="http://www.w3.org/2001/05/XMLInfoset"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <children>
    <element id="g1">
      <namespaceName>http://www.example.com/PO1</namespaceName>
      <localName>purchaseOrder</localName>
      <prefix xsi:nil="true"/>
      <children>
        <character>
          <characterCode>10</characterCode>
          <elementContentWhitespace>true</elementContentWhitespace>
        </character>
        <character>
          <characterCode>32</characterCode>
          <elementContentWhitespace>true</elementContentWhitespace>
        </character>
        <element>
          <namespaceName>http://www.example.com/PO1</namespaceName>
          <localName>shipTo</localName>
          <prefix xsi:nil="true"/>
          <children> 
            ...

[Examples as output from XSV and Xerces]

IX. Some questions of usage

modularization
design for reuse (final, abstract, block)
design for extension (abstract, substitution groups)
multi-namespace documents
limited sensitivity to context

IX.1. Inheritance Type derivation

It turns out to be hard to model stepwise refinement of types:

restriction (preserves subset semantics)
extension (preserves prefix semantics)

IX.2. Inheritance in document systems

Existing document systems turn out to have a very different model of class systems and inheritance.

inheritance of attributes
inheritance of locations

XML Schema models these with

inheritance (by extension or restriction)
substitution groups

IX.3. Schemas and namespaces

Some (unpleasant) facts of life:

Namespaces are not incompatible with document grammars
— but they don't play well with DTDs.
Namespaces allow us to distinguish mine from not-mine.
Namespaces do not provide universal names.
The namespace : language relation is 1:n.
The language : grammar relation is 1:n.
Therefore, the namespace : schema relation is 1:n.

Live with it.

IX.4. Schema layers

We distinguish:

schema documents (with single target namespace)
schemas (sets of abstract components)

Schema composition operations:

import
include
include with override / redefine

IX.5. Modularization

XML Schema makes it possible to write modular document type definitions:

late collection of schema components
namespace-aware name matching, validation
white-box wildcards (lax / opportunistic)
black-box wildcards (skip)

IX.6. Modularizing vocabularies: tasks

The basic requirements for defining modules:

control over exposing and hiding
a way to refer to items in different modules
a way to say “anything from another module goes here”
*a way to allow the integrator to say “these specific things from other module go there”

IX.7. Modularizing vocabularies: techniques

The basic requirements for defining modules:

expose by making top-level; hide by making local
refer to items using namespaces and qualified names
use wildcards to allow unrestricted insertion
use substitution groups to allow integrators / extenders to allow specific items to go specific places

IX.8. The tag/type distinction and non-local effects

Consider the HTML input element:

legal only in p and similar elements
legal only within form elements

SGML DTDs have partial solutions:

inclusion exceptions
content models

IX.9. Non-local effects in XML Schema

Fundamentally, we trade verbosity for context-sensitivity:

 <xsd:element name="div" type="div-type"/>
 <xsd:element name="div" type="div-in-form-type"/>

 <xsd:element name="p" type="p-type"/>
 <xsd:element name="p" type="p-in-form-type"/>

 <xsd:element name="ul" type="ul-type"/>
 <xsd:element name="ul" type="ul-in-form-type"/>

 <xsd:element name="li" type="li-type"/>
 <xsd:element name="li" type="li-in-form-type"/>

One bit of context information = double the size of grammar.

Cf. van Wijngaarden grammars (infinite size, unlimited of context sensitivity).

IX.10. Determinism

The determinism rule remains controversial:

LL(1) guarantees may help implementors
All regular languages have a deterministic FSA;
... but not necessarily a deterministic regular expression!
Implications for closure under union, intersection.
Implications for subsumption tests.
Implications for interoperability, single-pass processing.

IX.11. Practical issues

XML notation*
Linking document and schema
- namespace name
- schemaLocation hint
Hooks for schema annotation: the annotation element

X. Review and conclusion

motivation for schema languages
fundamental ideas of XML Schema 1.0
simple types
complex types
PSVI
deployment issues

X.1. Why schema languages?

documentation
contract
firewall

X.2. Fundamental ideas of XML Schema 1.0

conventional data typing as in programming languages and database management systems
systematic separation of tags and types
capture inheritance
support multiple namespace schemas, late integration (wildcards, local/top-level, substitution groups, xsi:type)
schemas are not data streams

X.3. Simple types

alignment with programming languages and database management systems
atomic values
lexical spaces
limited lists and unions

X.4. Complex types

match DTD content models and attributes
extension and restriction
some alignment with OO systems
wildcards

X.5. Deployment

the schemaLocation attribute is a hint, not a directive

Pretty much all else follows from that.