<xsl:apply-templates/>
</A>
</xsl:template>
<xsl:template match=”url[@protocol=’mailto’]”>
<A>
<xsl:attribute name=”href”>mailto:<xsl:apply-templates/>
</xsl:attribute>
<xsl:apply-templates/>
</A>
</xsl:template>
<xsl:template match=”p”>
<P><xsl:apply-templates/></P>
</xsl:template>
<xsl:template match=”abstract | date | keywords | copyright”/>
</xsl:stylesheet>
DOM and SAX
DOM (Document Object Model) and SAX (Simple API for XML) are APIs to
access XML documents. They allow applications to read XML documents
without having to worry about the syntax (not unlike translators). They are
complementary: DOM is best suited for forms and editors, SAX is best with
application-to-application exchange.
✔ DOM and SAX are covered in Chapter 7, “The Parser and DOM,” page 191 and Chapter 8,
“Alternative API: SAX,” page 231. Chapter 9, “Writing XML,” page 269 discusses how to
create XML documents.
XLink and XPointer
XLink and XPointer are two parts of one standard currently under develop-
ment to provide a mechanism to establish relationships between docu-
ments.
Listing 1.12 demonstrates how a set of links can be maintained in XML.
Listing 1.12: A Set of Links in XML
<?xml version=”1.0” standalone=”no”?>
vendors are supporting it. This, in turn, means that many applications are
available to manipulate XML documents.
This section lists some of the most commonly used XML applications.
Again, this is not a complete list. We will discuss these products in more
detail in the following chapters.
XML Browser
An XML browser is the first application you would think of because it is so
close to the familiar HTML browser. An XML browser is used to view and
print XML documents. At the time of this writing, there are not many high-
quality XML browsers.
Microsoft Internet Explorer has supported XML since version 4.0. Internet
Explorer 5.0 has greatly enhanced the XML support. Unfortunately, the
support is based on early versions of the style sheet standards and is not
complete. Yet Internet Explorer 5.0 is the closest thing to a largely deployed
XML browser today.
36
Chapter 1: The XML Galaxy
03 2429 CH01 2.29.2000 2:18 PM Page 36
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Netscape Communicator currently has no support for XML except for
Mozilla, the open-source version of Netscape Communicator. Mozilla has
strong support for XML. However, because Mozilla is still a work-in-
progress, it is not yet stable enough for practical usage.
Several other vendors have produced XML browsers. These browsers are at
various stages of development. One of the most interesting is InDelv XML
Browser, which has the most complete implementation of XSL at the time
of writing.
✔ Browsers are discussed in Chapter 5, “XSL Transformation,” and Chapter 6, “XSL
Formatting Objects and Cascading Style Sheet.”
XML Editors
✔ XSL processors are discussed in Chapter 5, “XSL Transformation.”
What’s Next
The book is organized as follows:
• Chapters 2 through 4 will teach you the XML syntax, including the
syntax for DTDs and namespaces.
• Chapters 5 and 6 will teach you how to use style sheets to publish
documents.
• Chapters 7, 8, and 9 will teach you how to manipulate XML docu-
ments from JavaScript applications.
• Chapter 10 will discuss the topic of modeling. You have seen in this
introduction how structure is important for XML. Modeling is the
process of creating the structure.
• Chapter 11, “N-Tiered Architecture and XML,” and Chapter 12,
“Putting It All Together: An e-Commerce Example,” will wrap it up
with a realistic electronic commerce application. This application exer-
cises most if not all the techniques introduced in the previous chap-
ters.
• Appendix A will teach you just enough Java to be able to follow the
examples in Chapters 8 and 12. It also discusses when you should use
JavaScript and when you should use Java.
38
Chapter 1: The XML Galaxy
03 2429 CH01 2.29.2000 2:18 PM Page 38
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
03 2429 CH01 2.29.2000 2:18 PM Page 39
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
04 2429 CH02 11/12/99 1:00 PM Page 40
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2
The XML Syntax
The structure is the key.
Getting Started with XML Markup
Listing 2.1 is a (small) address book in XML. It has only two entries: John
Doe and Jack Smith. Study it because we will use it throughout most of
this chapter and the next.
Listing 2.1: An Address Book in XML
<?xml version=”1.0”?>
<!-- loosely inspired by vCard 3.0 -->
<address-book>
<entry>
<name>John Doe</name>
<address>
<street>34 Fountain Square Plaza</street>
<region>OH</region>
<postal-code>45202</postal-code>
<locality>Cincinnati</locality>
<country>US</country>
</address>
<tel preferred=”true”>513-555-8889</tel>
<tel>513-555-7098</tel>
<email href=”mailto:[email protected]”/>
42
Chapter 2: The XML Syntax
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 42
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
</entry>
<entry>
<name><fname>Jack</fname><lname>Smith</lname></name>
<tel>513-555-3465</tel>
In both cases, it is easy to recognize the names, the phone numbers, the
email addresses, and so on. If anything, Listing 2.2 is probably more read-
able.
43
A First Look at the XML Syntax
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 43
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
For software, however, it’s exactly the opposite. Software needs to be told
which is what. It needs to be told what the name is, what the address is,
and so on. That’s what the markup is all about; it breaks the text into its
constituents so software can process it.
Software does have one major advantage—speed. While it would take you a
long time to sort through a long list of a thousand addresses, software will
plunge through the same list in less than a minute.
However, before it can start, it needs to have the information in a predi-
gested format. This chapter and the following two chapters will concentrate
on XML as a predigested format.
The reward comes in Chapter 5, “XSL Transformation,” and subsequent
chapters where we will see how to tell the computer to do something useful
with these documents.
Element’s Start and End Tags
The building block of XML is the element, as that’s what comprises XML
documents. Each element has a name and a content.
<tel>513-555-7098</tel>
The content of an element is delimited by special markups known as start
tag and end tag. The tagging mechanism is similar to HTML, which is logi-
cal because both HTML and XML inherited their tagging from SGML.
The start tag is the name of the element (tel in the example) in angle
brackets; the end tag adds an extra slash character before the name.
XML specification itself.
NOTE
There is one more character you can use in names—the colon (:). However, the colon is
reserved for namespaces; therefore, it will be introduced in Chapter 4, “Namespaces.”
The following are examples of valid element names in XML:
<copyright-information>
<p>
<base64>
<décompte.client>
<firstname>
The following are examples of invalid element names. You could not use
these names in XML:
<123>
<first name>
<tom&jerry>
Unlike HTML, names are case sensitive in XML. So, the following names
are all different:
<address>
<ADDRESS>
<Address>
By convention, HTML elements in XML are always in uppercase. (And, yes,
it is possible to include HTML elements in XML documents. In Chapter 5,
you will see when it is useful.)
By convention, XML elements are frequently written in lowercase. When a
name consists of several words, the words are usually separated by a
hyphen, as in
address-book
.
45
A First Look at the XML Syntax
or
address
or
ADDRESS
? Mixing case is cumbersome and is consid-
ered a poor style.
NOTE
As we will see in the “Unicode” section, XML supports characters from most spoken
languages. You can use letters from any alphabet in names, including letters from the
Greek, Japanese, or Cyrillic alphabets.
Attributes
It is possible to attach additional information to elements in the form of
attributes. Attributes have a name and a value. The names follow the same
rules as element names.
Again, the syntax is similar to HTML. Elements can have one or more
attributes in the start tag, and the name is separated from the value by the
equal character. The value of the attribute is enclosed in double or single
quotation marks.
46
Chapter 2: The XML Syntax
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 46
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
For example, the
tel
element can have a
preferred
attribute:
<tel preferred=”true”>513-555-8889</tel>
Unlike HTML, XML insists on the quotation marks. The XML processor
element. The
entry
for John Doe has
two
tel
elements. Figure 2.1 is the tree of Listing 2.1.
47
A First Look at the XML Syntax
EXAMPLE
EXAMPLE
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 47
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Figure 2.1: Tree of the address book
An element that is enclosed in another element is called a child. The ele-
ment it is enclosed into is its parent. In the following example, the
name
element has two children: the
fname
and the
lname
elements.
name
is the
parent of both elements.
<name>
<fname>Jack</fname>
<lname>Smith</lname>
</name>
Start and end tags must always be balanced and children are always com-
It is easy to fix the previous example. It suffices to introduce a new root,
such as
address-book
.
<?xml version=”1.0”?>
<address-book>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>
<entry>
<name>JackSmith</name>
<email href=”mailto:[email protected]”/>
</entry>
</address-book>
There is no rule that says the top-level element must be
address-book
.
If there is only one
entry
, then
entry
can act as the top-level element.
<?xml version=”1.0”?>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>
XML Declaration
The XML declaration is the first line of the document. The declaration iden-
As you can see, the core of the XML syntax is not difficult. Furthermore, if
you already know HTML, XML is familiar.
One of the design goals of XML was to develop a simple markup language
that would be easy to use and would remain human-readable. I think it
achieved that goal.
This section covers more advanced features of XML. You might not use
them in every document, but they are often useful.
Comments
To insert comments in a document, enclose them between “
<!--
” and “
-->
”.
Comments are used for notes, indication of ownership, and more. They are
intended for the human reader and they are ignored by the XML processor.
In the following example, a comment is made that the document was
inspired by vCard. The software does nothing with this comment but it
helps us next time we open this document.
<!-- loosely inspired by vCard 3.0 -->
Comments cannot be inserted in the markup. They must appear before or
after the markup.
Unicode
Characters in XML documents follow the Unicode standard. Unicode is a
major extension to the familiar ASCII character set. The Unicode
50
Chapter 2: The XML Syntax
EXAMPLE
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 50
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
<nom>José Dupont<nom/>
<email href=”mailto:[email protected]”/>
</entrée>
NOTE
You might wonder how the XML processor can read the encoding parameter. Indeed, to
reach the encoding parameter, the processor must read the declaration. However, to
read the declaration, the processor needs to know which encoding is being used.
This looks like a dog running after his tail until you realize that the first characters of
an XML document always are <?xml. The XML processor can match these four charac-
ters against the encoding it supports and guess enough of the encoding (is it 8 or 16
bits?) to read the declaration.
51
Advanced Topics
EXAMPLE
continues
04 2429 CH02 11/12/99 1:00 PM Page 51
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
What about those documents that have no declaration (since the declaration is
optional)? These documents must use one of the default encoding parameters (UTF-8
or UTF-16). Again, the XML processor can match the first character (which must be a <)
against its encoding in UTF-8 or UTF-16.
Entities
The document in Listing 2.1 (page 42) is self-contained: The document is
complete and it can be stored in just one file. Complex documents are often
split over several files: the text, the accompanying graphics, and so on.
XML, however, does not reason in terms of files. Instead it organizes docu-
ments physically in entities. In some cases, entities are equivalent to files;
in others, they are not.
XML entities is a complex topic that we will revisit in the next chapter,
when we will see how to declare entities in the DTD. In this chapter, we
nation ]]> in CDATA sections (see the following)
•
'
single quote “
‘
” can be escaped with
'
essentially in para-
meter value
•
"
double quote “
”
” can be escaped with
"
essentially in
parameter value
The following is not valid because the ampersand would confuse the XML
processor:
<company>Mark & Spencer</company>
Instead, it must be rewritten to escape the ampersand bracket with an
&
entity:
52
Chapter 2: The XML Syntax
EXAMPLE
EXAMPLE
04 2429 CH02 11/12/99 1:00 PM Page 52
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
<company>Mark & Spencer</company>
guage of the element’s content. For example:
<p xml:lang=”en-GB”>What colour is it?</p>
<p xml:lang=”en-US”>What color is it?</p>
Processing Instructions
Processing instructions (abbreviated PI) is a mechanism to insert non-XML
statements, such as scripts, in the document.
EXAMPLE
Character code
04 2429 CH02 11/12/99 1:00 PM Page 53
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
At first sight, processing instruction is at odds with the XML concept that
processing is always derived from the structure. As we saw in the first
chapter, with SGML and XML, processing is derived from the structure of
the document. There should be no need to insert specific instructions in a
document. This is one of the major improvements of SGML when compared
to earlier markup languages.
That’s the theory. In practice, there are cases where it is easier to insert
processing instructions rather than define complex structure. Processing
instructions are a concession to reality from the XML standard developers.
You already are familiar with processing instructions because the XML dec-
laration is a processing instruction:
<?xml version=”1.0” encoding=”ISO-8859-1”?>
✔ In Chapter 5, “XSL Transformation,” you will see how to use processing instructions to
attach style sheets to documents (page 125).
<?xml-stylesheet href=”simple-ie5.xsl” type=”text/xsl”?>
Finally, processing instructions are used by specific applications. For exam-
ple, XMetaL (an XML editor) uses them to create templates. This process-
ing instruction is specific to XMetaL:
<?xm-replace_text {Click here to type the name}?>
The processing instruction is enclosed in