XML Features in ADO.NET
Dino Esposito
Wintellect
December 13, 2001
XML and some of its related technologies, including XPath, XSL Transformation, and XML Schema, are
unquestionably at the foundation of ADO.NET. As a matter of fact, XML constitutes the key element for the greatly
improved interoperability of the ADO.NET object model when compared to ADO. In ADO, XML was merely an (non-
default) I/O format used to persist the content of a disconnected recordset. The participation of XML in the building
and in the inter-workings of ADO.NET is much deeper. The aspects of ADO.NET where the interaction and the
integration with XML is stronger can be summarized in the following points:
l
Objects serialization and remoting
l
A dual programming interface
l
XML-driven batch update (for SQL Server 2000 only)
In ADO.NET, you have several options to save objects to, and restore objects from, XML documents. To say it all,
this ability belongs to one object only—the DataSet—but can be extended to other container objects with minimal
coding. Saving objects like DataTable and DataView to XML is substantially seen as a special case of the DataSet
serialization.
Furthermore, ADO.NET and XML classes provide for a sort of unified intermediate API that is made available to
programmers through a dual and synchronized programming interface. You can access and update data using either
the hierarchical and node-based approach of XML ,or the relational approach of column-based tabular data sets. At
any time, you can switch from a DataSet representation of the data to XMLDOM, and vice versa. Data is
synchronized and any change you enter in either model is immediately reflected and visible in the other. In this
article, I'll cover ADO.NET-to-XML serialization and XML data access —that is, the first two points in the list above.
Next month, I'll attack XML-driven batch update—one of the coolest features you get from SQL Server 2000 XML
Extensions (SQLXML 2.0).
DataSet and XML
Just like any other .NET object, the DataSet object is stored in memory in a binary format. Unlike other objects,
though, the DataSet is always remoted and serialized in a special XML format called the DiffGram. When the
entities. You can take the XML schema out of a DataSet and use it as a string. Alternately, you could write it to a
disk file or load it into an empty DataSet object. Side by side with the methods listed in the table above, the
DataSet object also features two XML-related properties—Namespace and Prefix . Namespace determines the
XML namespace used to scope XML attributes and elements when you read them into a DataSet. The prefix to alias
the namespace is stored in the Prefix property.
Building a DataSet from XML
The ReadXml method fills out a DataSet object reading from a variety of sources, including disk files, .NET
streams, or instances of XmlReader objects. The method can process any type of XML file, but, of course, XML files
having a non-tabular and rather irregularly shaped structure may create some problems when rendered in terms of
rows and columns.
The ReadXml method has several overloads, all of which are rather similar. They take the XML source plus an
optional XmlReadMode value as arguments. For example:
public XmlReadMode ReadXml(String, XmlReadMode);
The method creates the relational schema for the DataSet depending on the read mode specified, and whether or
not a schema already exists in the DataSet. The following code snippet illustrates the typical code you would use to
load a DataSet from XML.
StreamReader sr = new StreamReader(fileName);
DataSet ds = new DataSet();
ds.ReadXml(sr); // defaults to XmlReadMode.Auto
sr.Close();
When loading the contents of XML sources into a DataSet, ReadXml does not merge rows whose primary key
information match. To merge an existing DataSet with one loaded from XML, you first have to create a new DataSet,
and then merge the two using the Merge method. During the merging, the rows that get overwritten are those with
matching primary keys. An alternate way to merge existing DataSet objects with contents read from XML is
through the DiffGram format (more on this later).
The table below illustrates the various read modes that ReadXml supports. You can set them using the
XmlReadMode enumeration.
The default read mode is not listed in the table and is XmlReadMode.Auto . When this mode is set, or when no
read mode has been explicitly set, the ReadXml method examines the XML source and chooses the most
appropriate option.
been inferred. Existing schemas are extended by adding new tables, or by adding new columns to existing tables, as
appropriate. You can use the DataSet's InferXmlSchema method to load the schema from the specified XML file
into the DataSet. You can control, to some extent, the XML elements processed during the schema inference
operation. The signature of the method InferXmlSchema allows you to specify an array of namespaces whose
elements will be excluded from inference.
void InferXmlSchema(String fileName, String[] rgNamespace);
A DiffGram is an XML format that ADO.NET utilizes to persist the state of a DataSet. Similar to the SQLXML's
updategram format, the DiffGram contains both current and original versions of data rows. Loading a DiffGram using
ReadXml will merge rows that have the matching primary keys. You explicitly instruct ReadXml to work on a
DiffGram using the XmlReadMode.DiffGram flag. When using the DiffGram format, the target DataSet must have
the same schema as the DiffGram, otherwise the merge operation fails and an exception is thrown.
When the XmlReadMode.Fragment option is set, the DataSet is loaded from an XML fragment. An XML fragment
is a valid piece of XML that identifies elements, attributes, and documents. The XML fragment for an element is the
markup text that fully qualifies the XML element (node, CDATA, processing instruction, comment). The fragment for
an attribute is the attribute value, and for a document is the entire content set. When the XML data is a fragment,
the root level rules for well-formed XML documents are not applied. Fragments that match the existing schema are
appended to the appropriate tables and fragments that do not match the schema are discarded. ReadXml reads
from the current position to the end of the stream. The XmlReadMode.Fragment option should not be used to
populate an empty, and subsequently schema-less, DataSet.
Serializing DataSet Objects to XML
The XML representation of the DataSet can be written to a file, a stream, an XmlWriter object, or a string, using
the WriteXml method. The XML representation can include, or not include, schema information. The actual behavior
of the WriteXml method can be controlled through the optional XmlWriteMode parameter you can pass. The
values in the XmlWriteMode enum determine the output's layout. The DataSet representation includes tables,
relations, and constraints definitions. The rows in the DataSet's tables are written in their current versions unless
you choose to employ the DiffGram format. The table below summarizes the writing options available with
XmlWriteMode.
XmlWriteMode.IgnoreSchema is the default option. The following code shows the typical way to serialize a
DataSet to XML.
Write Mode Description
<CustomerID>2</CustomerID>
<FName>Joe</FName>
<LName>Users</LName>
</Customers>
<Orders>
<CustomerID>1</CustomerID>
<OrderID>000A01</OrderID>
</Orders>
<Orders>
<CustomerID>1</CustomerID>
<OrderID>000B01</OrderID>
</Orders>
</MyDataSet>
From the listing above, you can hardly say that the two tables are in relation. Some information about this is set in
the <xs:schema> tree, but aside from this, nothing else would hint toward that conclusion. A relation set on the
CustomerID field put down in words sounds like this—all the orders issued by a given customer. The XML tree above
does not provide an immediate representation for this information. To change the order of the nodes when a data
relation is present in the DataSet, you can set the Nested attribute of the DataRelation object to true. As a result
of this change, the XML code from above changes as follows:
<MyDataSet>
<xs:schema ... />
<Customers>
<CustomerID>1</CustomerID>
<FName>John</FName>
<LName>Smith</LName>
<Orders>
<CustomerID>1</CustomerID>
<OrderID>000A01</OrderID>
DataSet. It is in no way a .NET type. The following code snippet shows how to serialize a DataSet object to a
DiffGram.
StreamWriter sw = new StreamWriter(fileName);
ds.WriteXml(sw, XmlWriteMode.DiffGram);
sw.Close();
The resulting XML code is rooted in the <diffgr:diffgram> node, and contains up to three distinct sections of data,
as shown below:
<diffgr:diffgram>
<MyDataSet>
:
</MyDataSet>
<diffgr:before>
:
</diffgr:before>
<diffgr:errors>
:
</diffgr:errors>
</diffgr:diffgram>
Mapping Description
Element Mapped to an XML node element:
<CustomerID>value</CustomerID>
Attribute Mapped to an XML node attribute:
<Customers CustomerID=value>
Hidden Not displayed in the XML data unless the DiffGram format is used
SimpleContent Mapped to simple text:
<Customers>value</Customers>