Team[oR] 2001
[x] java Java and XML
page 2
Java and XML
Copyright © 2000 O'Reilly & Associates, Inc. All rights reserved.
Printed in the United States of America.
Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
The Java™ Series is a trademark of O'Reilly & Associates, Inc. Java™ and all Java-based
trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc., in the
United States and other countries. O'Reilly & Associates, Inc. is independent of Sun Microsystems.
The O'Reilly logo is a registered trademark of O'Reilly & Associates, Inc. Many of the designations
used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where
those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark
claim, the designations have been printed in caps or initial caps. The association between the image
of a Tupperware SHAPE-O® and Java™ and XML is a trademark of O'Reilly & Associates, Inc.
SHAPE-O® is a registered trademark of Dart Industries Inc. (Tupperware Worldwide) and is used
with permission.
While every precaution has been taken in the preparation of this book, the publisher assumes no
responsibility for errors or omissions, or for damages resulting from the use of the information
contained herein.
© 2001, O'Reilly & Associates, Inc.
Comments and Questions 9..............................................
Acknowledgments 10...........................................................
Chapter 1. Introduction 11....................................................
What Is It? 12.......................................................................
How Do I Use It? 19.............................................................
Why Should I Use It? 21......................................................
What’s Next? 33..................................................................
Chapter 2. Creating XML 33..................................................
An XML Document 34..........................................................
An XML Document 35..........................................................
The Content 36....................................................................
What’s Next? 43..................................................................
Chapter 3. Parsing XML 43....................................................
Getting Prepared 43............................................................
SAX Readers 45..................................................................
Content Handlers 49............................................................
Error Handlers 64................................................................
Error Handlers 70................................................................
"Gotcha!" 76.........................................................................
What’s Next? 79..................................................................
Chapter 4. Constraining XML 79...........................................
Why Constrain XML Data? 79.............................................
Document Type Definitions 82.............................................
XML Schema 94..................................................................
What’s Next? 106..................................................................
Chapter 5. Validating XML 106................................................
Configuring the Parser 106....................................................
Output of XML Validation 110................................................
The DTDHandler Interface 114..............................................
"Gotcha!" 116.........................................................................
Putting the Load on the Server 232.......................................
The Real World 246...............................................................
What’s Next? 249..................................................................
Chapter 11. XML for Configurations 249...............................
EJB Deployment Descriptors 250..........................................
Creating an XML Configuration File 252................................
Reading an XML Configuration File 257................................
The Real World 265...............................................................
What’s Next? 273..................................................................
Chapter 12. Creating XML with Java 273...............................
Loading the Data 273............................................................
Modifying the Data 282..........................................................
XML from Scratch 287...........................................................
The Real World 288...............................................................
What’s Next? 295..................................................................
Chapter 13. Business-to-Business 295..................................
The Foobar Public Library 296..............................................
mytechbooks.com 304...........................................................
Push Versus Pull 311............................................................
The Real World 322...............................................................
What’s Next? 322..................................................................
Chapter 14. XML Schema 323.................................................
To DTD or Not To DTD 323...................................................
Java Parallels 325.................................................................
What’s Next? 332..................................................................
Appendix A. API Reference 332..............................................
A.1 SAX 2.0 332....................................................................
A.2 DOM Level 2 343............................................................
A.3 JAXP 1.0 349..................................................................
A.4 JDOM 1.0 351.................................................................
This is a book about XML, but it is geared specifically towards Java developers. While both XML
and Java are powerful tools in their own right, it is their marriage that this book is concerned with,
and that gives XML its true power. We will cover the various XML vocabularies, look at creating,
constraining, and transforming XML, and examine all of the APIs for handling XML from Java
code. Additionally, we cover the hot topics that have made XML such a popular solution for
dynamic content, messaging, e-business, and data stores. Through it all, we take a very narrow
view: that of the developer who has to put these tools to work. A candid look at the tools XML
provides is given, and if something is not useful (even if it is popular!), we will address it and move
on. If a particular facet of XML is a hidden gem, we will extract the value of the item and put it to
use. Java and XML is meant to serve as a handbook to help you, and is neither a reference nor a
book geared towards marketing XML.
Finally, the back half of this book is filled with working, practical code. Although available for
download, the purpose of this code is to walk you through creating several XML applications, and
you are encouraged to follow along with the examples rather than skimming the code. We introduce
a new API for manipulating XML from Java as well, and complete coverage and examples are
included. This book is for you, the Java developer, and it is about the real world; it is not a
theoretical or fanciful flight through what is "cool" in the industry. We abandon buzzwords when
possible, and define them clearly when not. All of the code and concepts within this book have been
entered by hand into an editor, prodded and tested, and are intended to aid you on the path to
mastering Java and XML.
Java and XML
page 6
Organization
This book is structured in a very particular way: the first half of the book (Chapter 1 through
Chapter 7) focuses on getting you grounded in XML and the core Java APIs for handling XML.
Although these chapters are not glamorous, they should be read in order, and at least skimmed even
if you are familiar with XML. We cover the basics, from creating XML to transforming it. Chapter
8 serves as a halfway point in the book, covering an exciting new API for handling XML within
Java, JDOM. This chapter is a must-read, as the API is being publicly released as this book goes to
Java and XML
page 7
Continuing to look at transforming XML documents, we discuss XSL transformation
processors and how they can be used to convert XML into other formats. We also examine
the Document Object Model (DOM) and how it can be used for handling XML data.
Chapter 8
We begin by looking at the Java API for XML Parsing ( JAXP), and discuss the importance
of vendor-independence when using XML. I then introduce the JDOM API, discuss the
motivation behind its development, and detail its use, comparing it to SAX and DOM.
Chapter 9
This chapter looks at what a web publishing framework is, why it matters to you, and how to
choose a good one. We then cover the Apache Cocoon framework, taking an in-depth look
at its feature set and how it can be used to serve highly dynamic content over the Web.
Chapter 10
In this chapter, we cover Remote Procedure Calls (RPC), their relevance in distributed
computing as compared to RMI, and how XML makes RPC a viable solution for some
problems. We then look at using XML-RPC Java libraries and building XML-RPC clients
and servers.
Chapter 11
In this chapter, we look at using configuration data in an XML format and why that format
is so important to cross-platform applications, particularly as it relates to distributed
systems.
Chapter 12
Although this topic is covered in part in other chapters, here we look at the process of
generating and mutating XML from Java and how to perform these modifications from
server-side components such as Java servlets, and outline concerns when mutating XML.
Chapter 13
This chapter details a "case study" of creating inter- and intra-business communication
channels using XML as a portable data format. Using multiple languages, we build several
Knudsen (O'Reilly & Associates), before starting this book. I do not assume that you know anything
about XML, and so I start with the basics. However, I do assume that you are willing to work hard
and learn quickly; for this reason, we move rapidly through the basics so that the bulk of the book
can deal with advanced concepts. Material is not repeated unless appropriate, so you may need to
re-read previous sections or be prepared to flip back and forth, as previously covered concepts are
used in later chapters. If you want to learn XML, know some Java, and are prepared to enter some
example code into your favorite editor, you should be able to get through this book without any real
problem.
Software and Versions
This book covers XML 1.0 and the various XML vocabularies in their latest form as of April 2000.
Because various XML specifications that are covered are not final, minor inconsistencies may be
present between printed publications of this book and the current version of the specification in
question.
All of the Java code used is based on the Java 1.1 platform, with the exception of the JDOM 1.0
coverage. This variance with regard to JDOM is noted in the text in Chapter 8
, and addressed there.
The Apache Xerces parser, Apache Xalan processor, and Apache FOP libraries were the latest
stable versions available as of April 2000, and the Apache Cocoon web publishing framework used
was Version 1.7.3. The XML-RPC Java libraries used were Version 1.0 beta 3. All software used is
freely available and can be obtained online from , , and
.
The source code for the examples in this book, including the
com.oreilly.xml
utility classes, is
contained completely within the book itself. Both source and binary forms of all examples
(including extensive Javadoc not necessarily included in the text) are available online from
and . All of the examples that
Java and XML
page 9
•
Additions to code examples
•
Parts of code examples that are discussed specifically in the text
Comments and Questions
Please address comments and questions concerning this book to the publisher:
O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
(800) 998-9938 (in the U.S. or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)
You can also send us messages electronically. To be put on our mailing list or to request a catalog,
send email to:
To ask technical questions or comment on the book, send email to:
Java and XML
page 10
We have a web site for the book, where we'll list errata and any plans for future editions. You can
access this page at:
For more information about this book and others, see the O'Reilly web site at: Acknowledgments
As I look at the stack of pages that comprise the manuscript of this book, it seems absurd to try and
thank all the people involved in making this book in only a few paragraphs. However, as this is
XML more in line with each other, as well as keeping the focus of using XML on the Java
programming language and usability, rather than on vague concepts and obscurity. Second, Jason
has become an invaluable friend, and has helped me through the often confusing process of
completing a book and being an O'Reilly author. We spent entirely too many evenings talking for
Java and XML
page 11
hours into the night across the country about how to make JDOM and other code samples work in
an intuitive way.
Most importantly, I owe everything in these pages to my wife, Leigh. Miraculously, she has
managed to not kick me out of the house over the last six months, as I have been tired, inaccessible,
and extremely busy almost constantly. The few moments I had with her away from writing and my
full-time consulting job have been what made everything worthwhile. I have missed her terribly,
and am anxious to return to spending time with her, my three basset hounds (Charlie, Molly, and
Daisy), and my labs (Seth and Moses).
And to my grandfather, Robert Earl Burden, who didn't get to see this, you are everything that I
have ever wanted to be; thanks for teaching me that other people's expectations were always lower
than I should be satisfied with.
Chapter 1. Introduction
XML. These three letters have brought shivers to almost every developer in the world today at some
point in the last two years. While those shivers were often fear at another acronym to memorize,
excitement at the promise of a new technology, or annoyance at another source of confusion for
today's developer, they were shivers all the same. Surprisingly, almost every type of response was
well merited with regard to XML. It is another acronym to memorize, and in fact brings with it a
dizzying array of companions: XSL, XSLT, PI, DTD, XHTML, and more. It also brings with it a
huge promise: what Java did for portability of code, XML claims to do for portability of data. Sun
has even been touting the rather ambitious slogan "Java + XML = Portable Code + Portable Data"
in recent months. And yes, XML does bring with it a significant amount of confusion. We will seek
to unravel and demystify XML, without being so abstract and general as to be useless, and without
diving in so deeply that this becomes just another droll specification to wade through. This is a
language parser. For example, HTML has a strict set of tags that are allowed. You may use the tag
<TABLE>
but not the tag
<CHAIR>
. While the first tag has a specific meaning to an application using
the data, and is used to signify the start of a table in HTML, the second tag has no specific meaning,
and although most browsers will ignore it, unexpected things can happen when it appears. That is
because when HTML was defined, the tag set of the language was defined with it. With each new
version of HTML, new tags are defined. However, if a tag is not defined, it may not be used as part
of the markup language without generating an error when the document is parsed. The grammar of
a markup language defines the correct use of the language's tags. Again, let's use HTML as an
example. When using the
<TABLE>
tag, several attributes may be included, such as the width, the
background color, and the alignment. However, you cannot define the
TYPE
of the table because the
grammar of HTML does not allow it.
XML, by defining neither the tags nor the grammar, is completely extensible; thus its name. If you
choose to use the tag
<TABLE>
and then nest within that tag several
<CHAIR>
tags, you may do so. If
you wish to define a
TYPE
attribute for the
<CHAIR>
tag, you may do that also. You could even use
tags named after your children or co-workers if you so desired! To demonstrate, let's take a look at
<cushion>
tags (although
one could, just as the XHTML specification defines HTML tags in XML); they are completely
concocted. This is the power of XML: it allows you to define the content of your data in a variety of
ways as long as you conform to the general structure that XML requires. Later we will go into detail
on some additional constraints, but for now it is sufficient to realize that XML is built to allow
flexibility of data formatting.
Java and XML
page 13
Although this flexibility is one of XML's strongest points, it also creates one of its greatest
weaknesses: because XML documents can be processed in so many different ways and for so many
different purposes, there are a large number of XML-related standards to handle translation and
specification of data. These additional acronyms, and their constant pairing with XML itself, often
confuse what XML is and what it is not. More often than not, when you hear "XML," the speaker is
not referring specifically to the Extensible Markup Language, but to all or part of the suite of XML
tools. Although sometimes these will be referred to separately, be aware that "XML" does not just
mean XML; more often it means "XML and all the great ways there are to manipulate and use it."
With those preliminaries out of the way, we are ready to define some of the most common XML
acronyms and give short descriptions of each. These will be fundamental to everything else in the
book, so keep this chapter marked for reference. These descriptions should start to help you
understand how the XML suite of tools fits together, what XML is, and what it isn't. Discussion of
publishing engines, applications, and tools for XML is avoided; these are discussed later when we
talk about specific XML topics. Rather, this section only refers to specifications and
recommendations in various stages of consideration. Most of these are initiatives of the W3C, the
World Wide Web Consortium. This group defines standards for the XML community that help
provide a common base of knowledge for this technology, much as Sun provides standards for Java
and related APIs. For more on the W3C, visit on the Web.
1.1.1 XML
XML, of course, is the root of all these three- and four-letter acronyms. It defines the core language
1.1.1.1 PI
A PI in an XML document is a processing instruction . A processing instruction tells an application
to perform some specific task. While PIs are a small portion of the XML specification, they are
important enough to warrant a section in our discussion of XML acronyms. A PI is distinguished
from other XML data because it represents a command to either the XML parser or a program that
would use the XML document. For example, in our sample XML document in Example 1.1, the
first line, which indicates the version of XML, is a processing instruction. It indicates to the parser
what version of XML is being used. Processing instructions are of the form
<?target
instructions?>
. Any PI that has the target
XML
is part of the XML standard set of PIs that parsers
should recognize, often called XML instructions, but PIs can also specify information to be used by
applications that may be wrapping the parsing behavior; in this case, the wrapping application
might have a keyword (such as "cocoon") that could be used as the PI's target.
Processing instructions become extremely important when XML data is used in XML-aware
applications. As a more salient example, consider the application that might process our sample
XML file and then create advertisements for a furniture store based on what stock is available and
listed in the XML document. A processing instruction could let the application know that some
furniture is on a "want" list and must be routed to another application, such as an application that
sends requests for more inventory, and should not be included in the advertisement, or other
application-specific instructions. An XML parser will see PIs with external targets and pass them on
unchanged to the external application.
1.1.1.2 DTD
A DTD is a document type definition. A DTD establishes a set of constraints for an XML document
(or a set of documents). DTD is not a specification on its own, but is defined as part of the XML
specification. Within an XML document, a document type declaration can both include markup
constraints and refer to an external document with markup constraints. The sum of these two sets of
example XML file to know how to process and search within the received file. The DTD is what
adds portability to an XML document's extensibility, resulting not only in flexible data, but data that
can be processed and validated by any machine that can locate the document's DTD.
Java and XML
page 15
1.1.2 Namespaces
Namespaces is one of the few XML-related concepts that has not been converted into an acronym.
It even has a name that describes its purpose! A namespace is a mapping between an element prefix
and a URI. This mapping is used for handling namespace collisions and defining data structures that
allow parsers to handle collisions. As an example of a possible namespace collision, consider an
XML document that might include a
<price>
tag for a chair, between a
<chair>
and
</chair>
tag. However, we also include in the chair definition a
<cushion>
tag, which might also have a
<price>
tag. Also consider that the document may reference another XML document for copyright
information. Both documents could reasonably have
<date>
or possibly
<company>
tags.
Conflicting tags such as these result in ambiguity as to which tag means what. This ambiguity
creates significant problems for an XML parser. Should the
the data because of a different representation, XSL provides a complete separation of data, or
content, and presentation. If an XML document needs to be mapped to another representation, then
XSL is an excellent solution. It provides a method comparable to writing a Java program to
translate data into a PDF or HTML document, but supplies a standard interface to accomplish the
task.
To perform the translation, an XSL document can contain formatting objects . These formatting
objects are specific named tags that can be replaced with appropriate content for the target
document type. A common formatting object might define a tag that some processor uses in the
transformation of an XML document into PDF; in this case, the tag would be replaced by PDF-
specific information. Formatting objects are specific XSL instructions, and although we will lightly
discuss them, they are largely beyond the scope of this book. Instead, we will focus more on XSLT,
a completely text-based transformation process. Through the process of XSLT (Extensible
Stylesheet Language Transformation), an XSL textual stylesheet and a textual XML document are
"merged" together, and what results is the XML data formatted according to the XSL stylesheet. To
help clarify this difficult concept further, let's look at another sample XML file, shown in Example
1.2.
Java and XML
page 16
Example 1.2. Another Sample XML File
<?xml version="1.0"?>
<?xml-stylesheet href="hello.xsl" type="text/xsl"?>
<!-- Here is a sample XML file -->
<page>
<title>Test Page</title>
<content>
<paragraph>What you see is what you get!</paragraph>
</content>
</xsl:template>
</xsl:stylesheet>
This stylesheet is designed to convert our basic XML document and its data into HTML suitable for
a web browser. While most of these details are things we will discuss later, concentrate on the
<xsl:template
match="[element
name]">
tags. Any time this type of tag occurs, the element at
the matching tag, for example,
paragraph
, is replaced by the contents of the XSL stylesheet, which
in this case results in a
<p>
tag with italicized font encoding. What results from the transformation
of the XML document by the XSL stylesheet is shown in Example 1.4.
Example 1.4. HTML Result from Examples Example 1.2 and Example 1.3
<html>
<head>
<title>
Test Page
</title>
</head>
Java and XML
page 17
<body bgcolor="#ffffff">
<p align="center">
</JavaXML:Content>
<JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright>
</JavaXML:Book>
evaluating the expression when the current node is the
JavaXML:Book
element would yield the
JavaXML:Content
and
JavaXML:Copyright
elements. The complete XPath specification is online
at
1.1.5 XML Schema
XML Schema is designed to replace and amplify DTDs. XML Schema offers an XML-centric
means to constrain XML documents. Though we have only looked briefly at DTDs so far, they have
some rather critical limitations: they have no knowledge of hierarchy, they have difficulty handling
namespace conflicts, and they have no means of specifying allowed relationships between XML
documents. This is understandable, as the members of the working group who wrote the
specification certainly had no idea that XML would be used in so many different ways! However,
the limitations of DTDs have become constricting to XML authors and developers.
Java and XML
page 18
The most significant fact about XML Schema is that it brings DTDs back into line with XML itself.
That may sound confusing; consider, though, that every acronym we have talked about uses XML
documents to define its purpose. XSL stylesheets, namespaces, and the rest all use XML to define
specific uses and properties of XML. But a DTD is entirely different. A DTD does not look like
XML, it does not share XML's hierarchical structure, and it does not even represent data in the same
way. This makes the DTD a bit of an oddball in the XML world, and because DTDs currently
define how XML documents must be constructed, this has been causing some confusion. XML
<author name="William Crawford" location="Massachusetts" />
</book>
</xql:result>
There will most likely be quite a bit of change as the specification matures and is hopefully adopted
by the W3C, but XQL is a technology worth keeping an eye on. The current proposal for XQL is at
This proposal made its way to the W3C in January of
2000, and current requirements for the XML Query language can be found at
1.1.7 And All the Rest . . .
You have now been sped through a very brief introduction of some of the major XML-related
specifications we will cover. You can probably think of one or two acronyms we didn't cover, if not
more. We have selected only the particular acronyms that are especially relevant to our discussions
Java and XML
page 19
on handling XML within Java. There are quite a few more, and they are listed here with the URLs
for the appropriate recommendations or working drafts:
•
Resource Description Framework (RDF):
•
XML Link Language (XLL)
•
XLink:
•
XPointer:
•
XHTML:
This list will probably be outdated by the time you read this chapter, as more XML-based ideas are
being examined and proposed every day. Just because these are not given significant time or space
in this book, it should not make you think they are somehow less important; they are just not as
of errors and warnings is defined, allowing handling of the various situations that can occur in XML
parsing, such as an invalid document, or one that is not well-formed. Behavior can be added to
customize the parsing process, resulting in very application-specific tasks being available for
definition, all with a standard interface into XML documents. For the SAX API documentation and
other information on SAX, visit
Before continuing, it is important to clear up a common misconception about SAX. SAX is often
mistaken for an XML parser. We even discuss SAX here as providing a means to parse XML data.
However, SAX provides a framework for parsers to use, and defines events within the parsing
process to monitor. A parser must be supplied to SAX to perform any XML parsing. This has
resulted in many excellent parsers being made available in Java, such as Sun's Project X, the
Apache Software Foundation's Xerces, Oracle's XML Parser, and IBM's XML4J. These can all be
plugged into the SAX APIs and result in parsed XML data. SAX APIs provide the means to parse a
document, not the XML parser itself.
1.2.2 DOM
DOM is an API for the Document Object Model. While SAX only provides access to the data
within an XML document, DOM is designed to provide a means of manipulating that data. DOM
provides a representation of an XML document as a tree. Because a tree is an age-old data
representation, traversal and manipulation of tree structures are easy to accomplish in programming
languages, Java being no exception. DOM also reads an entire XML document into memory,
storing all the data in nodes, so the entire document is very fast to access; it is all in memory for the
length of its existence in the DOM tree. Each node represents a piece of the data pulled from the
original document.
There is a significant drawback to DOM, however. Because DOM reads an entire document into
memory, resources can become very heavily taxed, often slowing down or even crippling an
application. The larger and more complex the document, the more pronounced this performance
degradation becomes. Keep in mind that while DOM is a good, prevalent means of manipulating
XML data, it is not the only means of accomplishing this task. We will spend time using DOM, and
we will also write code that manipulates data straight from SAX. Your application requirements
will most likely define which solution is correct for your specific development project. To read the
DOM recommendations at W3C, go to in your web browser.
applications, and give you a reason to use XML in your applications today. We will first look at
how XML is being used today in applications, and we'll give you the information to convince that
boss of yours that "everybody's doing it." Next we will take a look at support for XML and related
technologies, all in light of Java applications. In Java, there is a wealth of available parsers,
transformers, publishing engines, and frameworks designed specifically for XML. Finally, we will
spend some time looking at where XML is going and try to anticipate how it will affect applications
six months and a year from now. This is the information to use to convince your boss's boss that
XML can not only keep you even with your competitors, but give your company the leading edge in
your industry, and help get you that next promotion!
1.3.1 Java and XML: A Perfect Match
Even if you have been convinced that XML is a great technology, and that it is taking the world by
storm, we have yet to mention why this book is about Java and XML, rather than just XML alone.
Java is, in fact, the ideal counterpart for XML, and the reason can be summed up in a single phrase:
Java is portable code, and XML is portable data. Taken separately, both technologies are wonderful,
but have limitations. Java requires the developer to dream up formats for network data and formats
for presentation, and to use technologies like JavaServer Pages™ (JSP) that do not provide a real
separation of content and presentation layers. XML is simply metadata, and without programs like
parsers and XSL processors, is essentially "vapor-ware." However, Java and XML matched
together fill in the gaps in the application development picture.
Writing Java code assures that any operating system and hardware with a Java™ Virtual Machine (
JVM) can run your compiled bytecode. Add to this the ability to represent input and output to your
applications with a system-independent, standards-based data layer, and your data is now portable.
Your application is completely portable, and can communicate with any other application using the
same (widely accepted) standards. If this isn't enough, we've already mentioned that Java provides
the most robust set of APIs, parsers, processors, publishing frameworks, and tools for XML use of
any programming language. With this synergy in mind, let's look at how these two technologies fit
together, both today and tomorrow.
Java and XML
page 22
Although an application may not need to support a wireless phone, certainly there are advantages to
allowing employees or customers the service if they have the equipment; and while a handheld
organizer may not allow a user to perform all the operations that a web browser might, frequent
travelers who could manage their accounts online would certainly be more likely to continue to use
a service that a company provides. The shift from lots of functionality being offered to specific
types of clients to a standard set of functionality being offered to an enormous variety of client
types has left many companies and application developers scratching their heads. XML can resolve
this confusion.
Although we said earlier that XML is not a presentation technology, it can be used to generate a
presentation layer. If there doesn't seem to be much of a difference between the two, consider this:
HTML is a presentation technology. It is a markup language designed specifically to allow
graphical views of content for web browser clients. However, HTML is not by any means a good
data representation. An HTML document is not easy to parse, search, or manipulate. It follows only
a loose format, and is at least one-half presentation information, if not more, while only a small
percentage of the document is actual data. XML is substantially different, as it is a data-driven
markup language. Nearly all of an XML document is data and data structure. Only instructions to an
XML parser or wrapping application are not data-centric. XML is easily searchable and can be
Java and XML
page 23
manipulated with APIs and tools due to the strict structure a DTD or schema can impose. This
makes it very non-presentation-oriented. However, it can be used for presentation with its
companion technologies, XSL and XSLT. XSL allows definition of presentation and formatting
constructs and instructions on how to apply these constructs to the data within an XML document.
And through XSLT, the original XML can be displayed to a client in a variety of ways, including
very complex HTML. Still, the core XML document remains separate from any presentation-
specific information and can just as easily be transformed into an entirely different style of
presentation, such as a Swing user interface, with no change to the underlying content.
Perhaps the most powerful component offered by XML and XSL for presentation is the ability to
specify multiple stylesheets to an XML document, or to impose XSL stylesheets on an XML
need data formatted in a manner somewhat specific to the purpose the data is being used for.
This exercise should convince you that data is almost always transformed, often multiple times.
Consider an XML document that is converted to a format usable for another application by an XSL
stylesheet (see Figure 1.2). The result remains XML. That application may then use the data to gain
a new result set, and create a new XML document. The original application then needs this
Java and XML
page 24
information, so the new XML document is transformed back into the format used by the original
application, although it now contains different data! This scenario is a very common one.
Figure 1.2. XML/XSL transformations between applications
This repeated process of transforming a document, and always generating a new XML result, is
what makes XML such a powerful tool for communication. The same set of rules can be used at
every step, always starting with XML, applying one or more XSL stylesheets over one or more
transformations, and resulting in XML that is still usable with the same tools that initially created
the original document.
Also consider that XML is a purely textual representation of data. Because text is such a lightweight
and easily serialized data representation, XML provides a fast means of transmitting data across a
network. Although some binary data formats can be transmitted very efficiently, textual network
transmissions will typically average out as a faster means of communication.
1.3.2.3 XML-RPC
One specification concerned with using XML for communication is XML-RPC. XML-RPC is
concerned with communication not between applications, but between components within an
application, or to a shared set of services functioning across applications. RPC stands for Remote
Procedure Calls, one of the primary predecessors of Remote Method Invocation (RMI). RPC is used
for making procedural calls over a network, and receiving a response, also over the network. Note
that this is significantly different than RMI, which actually allows a client to invoke methods on an
object via stubs and skeletons loaded over the network. The primary difference is that RPC calls
generate a remote response, and the response is returned over the network; the client never interacts