Contents
Overview 1
The Need for Validation 2
Writing an XML Schema 15
Extending an XML Schema 34
Validating XML in a Client/Server
Environment 38
Lab 8: Validating XML Data Using Schemas 46
Review 54
Module 8: Validating
XML Data Using
Schemas Information in this document is subject to change without notice. The names of companies,
products, people, characters, and/or data mentioned herein are fictitious and are in no way intended
to represent any real individual, company, product, or event, unless otherwise noted. Complying
with all applicable copyright laws is the responsibility of the user. No part of this document may
be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of Microsoft Corporation. If, however, your only
means of access is electronic, permission to print one copy is hereby granted.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
Manufacturing Manager: John Williams
Group Product Manager: Steve Elston
Module 8: Validating XML Data Using Schemas iii Instructor Notes
This module describes how to create and use Microsoft XML schemas to
validate XML documents. Students have already been introduced to the concept
of validation in Module 2, “Overview of XML Technologies.” In that module,
we introduced the need for validation, showed the syntax of simple Document
Type Definitions (DTDs), and briefly mentioned the use of XML schemas as
the preferred alternative to DTDs.
In this module, students are now shown the full syntax for XML schemas, as
supported by Microsoft. The Microsoft standard for XML schemas is also
referred to as XDR (XML Data Reduced), and is different in several respects to
the schema definition in the W3C draft standard.
Once the XML schema syntax has been introduced, this module describes how
an XML schema can be applied to an XML document received at the server.
You should impress upon students the need to perform validation of XML data
at the server whenever it is received from an unknown client.
After completing this module, students will be able to:
!
Describe when validation is needed.
!
Create an XML schema.
!
Validate an XML document by using an XML schema.
!
Apply an XML schema to an XML document, both statically and
Read the latest information about Microsoft’s support for XML schemas at
/>.
Presentation:
150 Minutes
Lab:
60 Minutes
iv Module 8: Validating XML Data Using Schemas Module Strategy
Use the following strategies to present this module:
!
The Need for Validation
This section sets the scene for the rest of the module — students need to
appreciate the need for validation before looking at the details. This section
begins by differentiating between the structure of an XML document (that
is, what elements and attributes are allowed where) and the semantics of an
XML document (for example, what value an <address> element can have).
You should emphasize that validation can help to identify what data is
allowed, but it does not specify how you should process that data.
You should also spend some time describing why a document might have an
invalid structure, whether due to programming error, malicious or accidental
corruption, or different versions of a document being used at the client and
server.
!
Writing an XML Schema
This is the main section of the module. It covers all the syntax for defining
XML schemas, and how to apply an XML schema to a static XML
document.
In this section, we show how to apply an XML schema to an XML
document in a more realistic client/server environment. Initially, we show
how the client can attach an XML schema before sending the document to
the server. In this scenario, when the document is loaded into an XMLDOM
object at the server, the document is automatically validated (as long as the
validateOnParse flag is set to True, which is the default).
However, when the server loads the XML document, it has no way of
knowing in advance whether an XML schema is present. Only when the
document has been loaded can the server check for the XML schema. If
there is no XML schema attached, the server can programmatically apply
the XML schema, then load the XML document (plus the XML schema)
into a new XMLDOM object to enable the document to be validated.
At the server, you cannot reference an external XML schema by specifying
a URL in the xmlns attribute — you can only use schemas found on the local
Web server. For more information, search for the Knowledge Base article
#Q235344: “FIX: DTDs and Schemas Not Resolved When Using loadXML
Method” on the MSDN™ Web site at
.
!
Lab 8: Validating XML Data Using Schemas
This is the final lab in the course, and completes the LitWare Books Web
application. By this point in the course, students should have a good grasp
of the application and be comfortable with the flow of XML data between
the client and the server.
In this lab, students write a new XML schema to validate the XML customer
order submitted from the client during checkout.
The first exercise takes the majority of the allocated time for this lab. In this
exercise, students write the XML schema and test it against a static XML
document to ensure that the XML schema is correct.
In the second exercise, students use the XML schema in the context of the
structure or the format of these XML documents can be agreed upon and
defined.
The proper format of an XML document may be defined in either an XML
schema or a Document Type Definition (DTD). Once defined, the XML schema
or DTD can be used to validate the contents of an XML document.
XML schemas offer several technical advantages over DTDs. This module
describes how to create an XML schema, and shows you how to apply the
schema to an XML document. You will also learn how to apply an XML
schema to an XML document dynamically when the document is received at
the Web server.
After completing this module, you will be able to:
!
Describe when validation is needed.
!
Create an XML schema.
!
Validate an XML document by using an XML schema.
!
Apply an XML schema to an XML document, both statically and
dynamically.
Slide Objective
To provide an overview of
the module topics and
objectives.
Lead-in
In this module, you will learn
how to use XML schemas to
validate XML documents in
a Web-based application.
same organization, and across organizational boundaries.
This section highlights what can and cannot be achieved by using XML
schemas or DTDs to validate XML documents, and also describes some of the
typical causes of invalid XML documents.
Slide Objective
To provide an overview of
the topics in this section.
Lead-in
This section introduces the
need for validation of XML
data, and describes the
benefits of using XML
schemas rather than DTDs
to achieve validation.
Module 8: Validating XML Data Using Schemas 3 What Can Be Validated: Structure
!
When XML documents are exchanged, it is necessary to
agree on the structure of the documents
!
XML schemas and DTDs define the structure of XML
documents
Elements
What elements are allowed?
What child elements are required or optional?
Is the order or number of child elements important?
What content type is allowed?
Attributes
What is the data type of each attribute?
!
Is there a restricted set of values that can be used for each attribute?
!
Is there a default value for each attribute?
Slide Objective
To describe how the
structure of an XML
document can be tested by
validation.
Lead-in
XML schemas and DTDs
were introduced earlier in
the course as a means of
defining the format of XML
documents.
4 Module 8: Validating XML Data Using Schemas What Cannot Be Validated: Semantics
!
Data cannot be meaningfully processed without
validating its semantics
!
DTDs and XML schemas provide validation tests for
structure, not semantics
!
Solution:
$
<postal-address>12 Main Street</postal-address>
<email-address></email-address>
Slide Objective
To explain why the semantic
meaning of an XML
document cannot be tested
by validation.
Lead-in
Just because an XML
document has the correct
structure, it doesn’t
necessarily follow that the
data is meaningful.
Module 8: Validating XML Data Using Schemas 5 A slight variation on this theme might be to use an <address> element for each
type of address as before, but to add an attribute to describe what type of
address is being represented:
<address type="postal">12 Main Street</address>
<address type="email"></address>
This approach preserves the semantic differences between the different address
types, but prevents the proliferation of new element types that might clutter up
an XML document.
6 Module 8: Validating XML Data Using Schemas Detecting Incorrect Documents Using Validation
will be able to validate the document.
Alternatively, the document author can choose to omit the XML schema or
DTD initially. Recipients can then apply the XML schema or DTD themselves
when the document is received. This entails more work for the recipient, but
allows the recipient to be sure that the correct XML schema or DTD is applied.
This is the preferred method in a client/server environment because a server that
receives XML data from clients cannot be sure that an XML schema or DTD
has been applied beforehand.
Slide Objective
To describe the process for
validating XML documents.
Lead-in
There are certain tasks you
must undertake if you want
your XML documents to be
validated when they are
loaded.
Module 8: Validating XML Data Using Schemas 7 Enabling the parser
When an XML schema or DTD is applied to an XML document, validating
parsers will check the structure of the document as it is loaded. Any validation
errors will cause the document loader to abort.
You can perform validation at either the client or the server. For example, the
client might wish to confirm that an XML document received from the server is
valid. Conversely, the server might need to validate XML posted from the
client. Depending on where you wish to perform validation, you can write
client-side script or server-side script to test the outcome of the validation
operation, and respond accordingly.
document in Internet
Explorer 5.0.
Delivery Tip
1. Open Windows Explorer
and navigate to the folder
\InetPub\WWWRoot\1905\D
emoCode\Mod08. Note that
the folder contains files
named msxmlval.htm and
msxmlval.inf.
2. Right-click msxmlval.inf,
and then click Install to
install the validator utility.
3. Start Internet Explorer 5.0
(or restart it if it is already
running) and open
books.xml from the same
folder.
4. Right-click anywhere in
the main window of Internet
Explorer 5.0, and then click
Validate XML. A message
box appears confirming that
this is a valid document.
5. In Notepad, open
books.xml and edit it to
make it invalid, for example,
by deleting a <price>
element.
6. Refresh the view in
elements and/or attributes to be present within an element without having to
declare each and every element in the XML schema. In contrast, DTDs
define a “closed” model, which means a document cannot contain additional
content except that explicitly defined in the DTD.
!
Data type for every element or attribute
XML schemas allow you to specify a data type for an element or attribute.
Data types indicate the format of the data, provide for validation of the type
by the XML parser, and enable processing specific to the data type in the
DOM. DTDs do not support data types.
!
Extensible
XML schemas are extensible; that is, custom schemas can be built from
standard schemas. For example, let’s imagine that there is a definition for an
<address> element in another schema, and that this definition meets your
current needs for an <address> element. With namespaces, this schema can
be used directly with a reference to the original schema, rather than copying
the definition.
Only one DTD document can be attached per XML document.
Slide Objective
To describe the technical
advantages of XML
schemas as compared to
DTDs.
Lead-in
DTDs are established and
standardized, but XML
schemas are more powerful.
Microsoft recommends the
use of XML schemas for the
Schema Conversion on the Student CD-ROM.
This folder contains the dtd2schema.exe utility.
2. Open a Command window and navigate to this folder.
3. Run dtd2schema.exe as follows:
dtd2schema -o myxmlfile.xml mydtdfile.dtd
This creates an XML schema named myxmlfile.xml.
Delivery Tip
This is an important
distinction. DTD’s apply to
the entire document. With
schemas, you can have
different schemas for
different elements.
Delivery Tip
As an optional
demonstration, you can
describe how to use the
DTD-to-Schema conversion
utility to translate a DTD into
a schema.
1. Open Windows Explorer
and navigate to the folder
\InetPub\WWWRoot\1905\D
emoCode\Mod08, which
contains the
dtd2schema.exe utility.
2. Open a Command
window and navigate to this
$
Misinterpretation of XML schema or DTD
$
Programming errorThere are several situations that might give rise to an invalid document being
delivered to a Web server. Some of the possibilities are outlined below.
Version skew
Versioning is an issue that affects many aspects of the Information Technology
industry, and XML is no exception. A client program might build an XML
document according to a previous version of an XML schema/DTD, unaware
that a newer version exists on the server.
The server must ensure that the correct XML schema/DTD is applied to the
document, and validate the document against this grammar.
Malicious alteration of XML data
When an XML document is submitted from a client to a server, the possibility
exists that the document might be tampered with in transit — document content
might be maliciously added, modified, or deleted.
The server must be capable of validating the XML data as it is received, rather
than assuming that if the XML data was valid at the client, it is still valid when
received at the server.
Misinterpretation of XML schema or DTD
XML schemas and DTDs perform a dual role. As well as enabling validation to
take place when an XML document is loaded, they also act as a source of
documentation for application developers, telling them what structure of XML
to build.
XML schemas and DTDs can be quite complex to read and understand, and the
developer might misinterpret the grammar rules and build an invalid XML
document by mistake.
options are available:
$
Reject the document and issue an error message
$
Use the DOM to access the valid parts of the document
$
Use XSL to transform the document into a valid format
$
Ignore the validation errorWhen an invalid document is received, you can choose to abandon the
document altogether and issue an error message to its creator. Alternatively,
you can try to recover as much meaningful data from the document as possible,
and continue on that basis.
Another course of action might be to ignore the validation error altogether and
carry on regardless. This might be appropriate if you simply wish to record the
XML data that has been received, without performing much processing on the
data.
The following list describes techniques for handling invalid documents.
!
Reject the document and issue an error message
This is the most straightforward way of dealing with invalid documents.
Return an error message to the application that generated the document,
stating the reason why validation failed and the location of the problem
within the XML document.
This approach is suitable in situations where the creator of the invalid XML
document can respond to the error message, rectify the problem, and issue a
valid XML document instead.
!
of a document into another version.
For example, in the bookstore scenario described earlier, you can define a
style sheet that creates an empty e-mail address element and adds it to the
document.
Another use of style sheets in this context is to strip out repeating elements
where only a single instance is required. For example, the XML document
sent from the bookstore to the publisher might itemize all the authors for
each book. If the publisher only expects a single author per book, an XSL
style sheet can be applied in order to extract the first author and ignore the
others.
!
Ignore the validation error
There are many reasons for exchanging XML documents between a client
and server, such as placing orders, issuing confirmation messages,
exchanging technical information, and so on. In some situations, you might
not need to perform detailed processing on the XML data.
For example, a popular Web site might invite visitors to add their name and
address to the Visitors Book. In this case the Web site doesn’t perform any
significant processing on the data, so it doesn’t really matter if the data
format is slightly incorrect.
In XML terms, the way to ignore a validation error is to create an
XMLDOM object, disable validation, and load the document into the parser
programmatically. You will learn how to do this later in this module.
Module 8: Validating XML Data Using Schemas 15 #
##
#
<Schema> Is the root element of the XML schema document.
<ElementType> Defines an element type that may be used within the XML schema
document.
<AttributeType> Defines an attribute type that may be used within the XML
schema document.
<element> Appears within <ElementType>, to define the allowed child
elements for that <ElementType>.
<group> Appears within <ElementType>, to define how child elements are
grouped within that <ElementType>.
<attribute> Appears within <ElementType>, to define the allowed attributes
for that <ElementType>.
<datatype> Appears within <ElementType> or <AttributeType>, to define the
data type for that <ElementType> or <AttributeType>.
<description> Provides documentation about <Schema>, <ElementType>, or
<AttributeType> elements.
Slide Objective
To provide an overview of
the topics in this section.
Lead-in
This section describes the
detailed syntax for creating
Microsoft XML schemas,
and shows how to apply
them to static XML
documents.
Delivery Tip
The company Extensibility
has a product called XML
Authority that can create an
<Schema name="mySchema"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
</Schema>
<?xml version="1.0"?>
<booklist xmlns="x-schema:
/></booklist>
<?xml version="1.0"?>
<booklist xmlns="x-schema:
/></booklist>To use an XML schema to validate an XML document, first create the XML
schema, and then apply it to the XML document.
Create the XML schema
An XML schema is an XML document. The Microsoft definition of XML
schemas requires a root element named <Schema>. The <Schema> element
contains all the rules defined in the XML schema document.
The <Schema> element requires a name attribute that defines the name of the
schema. Unlike DTDs, where the name of the DTD must match the root
element of the XML document, the name attribute in an XML schema may be
assigned any value:
<Schema name="mySchema" ... >
</Schema>
The <Schema> element must be defined with the following namespace
declaration. It is convenient to make this the default namespace in order to
avoid having to use a different namespace prefix throughout the XML schema
document:
xmlns="urn:schemas-microsoft-com:xml-data"
remember namespaces
from earlier in the course.
Remind them about
namespaces if necessary —
the concepts and the
syntax.
18 Module 8: Validating XML Data Using Schemas The following example is a minimal XML schema document:
<?xml version="1.0"?>
<Schema name="mySchema"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
</Schema>
Applying the XML schema to a static document
To apply an XML schema to a static XML document, you must add a
namespace declaration of the following form to your XML document:
<?xml version="1.0" ?>
<myElement xmlns="x-schema:
</myElement>
The namespace declaration indicates the URL of the appropriate XML schema
document. The URL is prefixed by “x-schema” to indicate to the parser that the
URL defines an XML schema.
When the XML document is loaded into a validating XML parser, the parser
loads the XML schema so that validation may take place.
Notice that the XML schema is applied to the root element of the XML
document, for example, <myelement>, so that it defines the grammar for the
<ElementType name="book" content="eltOnly">
</ElementType><ElementType> introduces a new type of element to the XML schema, and
defines the rules for that element when it appears in the XML document.
The XML schema must define a separate <ElementType> for each type of
element that can appear in the XML document (unless an “open” content model
is defined). You can specify an “open” element by setting the model attribute to
open.
<ElementType
name = "element tag name"
content = "empty"|"textOnly"|"eltOnly"|"mixed"
model = "open" | "closed"
order = "one" | "seq" | "many"
dt:type = "XML data type" >
</ElementType>
<ElementType> has the attributes name, content, model, order, and dt:type.
!
name
The name attribute must be provided. This attribute defines the tag name for
the <ElementType>, and therefore defines a valid tag name for elements in
the XML document. For example, if the <ElementType> name is “book”,
the XML document will have elements named <book>.
!
content
The content attribute is optional. This attribute defines the content type for
the element in the XML document.
Slide Objective