Tài liệu Expert SQL Server 2008 Development- P5 - Pdf 92

CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
server. For a further discussion of the correct placement of data and application logic, refer back to
Chapter 1.
Examples of commonly cited situations in which SQLCLR is perhaps a better choice than TSQL
include manipulation of string or XML data, certain math functions that are provided by dedicated
methods in the .NET Base Class Library, and situations where procedural code is more efficient than set-
based logic. In order to test the validity of these claims, I decided to set up some simple test cases to
compare the relative performance of T-SQL against SQLCLR, which are described in the following
sections.
Creating a “Simple Sieve” for Prime Numbers
For this test, I created two simple procedures that return a list of all prime numbers up to a supplied
maximum value—one implemented in T-SQL, and one using SQLCLR. The logic of these tests was made
as simple as possible: each is supplied with a maximum value that is decremented in a loop, and in each
iteration of the loop, the modulo operator is used to determine the remainder when that value is divided
by every lesser number. If the remainder of the division is 0 (in other words, we have found a factor), we
know that the current value is not a prime number. The loop therefore moves on to test the next possible
value. If the inner loop tests every possible divisor and has not found any factors, then we know the
value must be a prime. Using this kind of “simple sieve” algorithm for finding prime numbers relies on
basic mathematical functions and procedural logic, which makes it a good test to compare the
performance of T-SQL and SQLCLR. Here’s the T-SQL implementation:
CREATE PROCEDURE ListPrimesTSQL (
@Limit int
)
AS BEGIN
DECLARE
-- @n is the number we're testing to see if it's a prime
@n int = @Limit,
--@m is all the possible numbers that could be a factor of @n
@m int = @Limit - 1;
-- Loop descending through the candidate primes
WHILE (@n > 1)

public static void ListPrimesCLR(SqlInt32 Limit)
{
int n = (int)Limit;
int m = (int)Limit - 1;

while(n > 1)
{
while(m > 0)
{
if(m == 1)
{
SqlContext.Pipe.Send(n.ToString() + " is a prime");
}

if(n%m != 0)
{
m = m - 1;
continue;
}
else
{
break;
}
}
n = n - 1;
m = n - 1;
}
}
 Note Clearly, if you actually wanted to get a list of the prime numbers, you would NOT use such a naive
approach as this. The example used here is intended to provide a simple procedure that can be implemented

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
Unfortunately, the process required to satisfy this query is not very efficient. Assuming that an index
exists on the x column of the table, the preceding query generates the execution plan shown in Figure
7-2. Figure 7-2. A nested index seek used to create a running sum in T-SQL
To sum all of the previous values in the column requires a nested loop containing an index seek. The
number of rows returned by this seek increases exponentially as more rows are processed. On the first
row, this seek must only sum one value, but to find the running sum over a set of 100 rows, 5,050 total
rows need to be read. For a set of 200 rows, the query processor needs to process 20,100 total rows—four
times the amount of work required to satisfy the previous query. Thus, the performance of this approach
to calculate running aggregates degrades rapidly as more rows are added to the table.
An alternative solution, which can yield some significant performance benefits, is to make use of a
cursor. There is a commonly held perception in the SQL Server development world that cursors are a
bad thing, but they do have valid use cases, and this might be one of them. However, there are a number
of good reasons why many developers are reluctant to use cursors, and I’m certainly not advocating their
use in general.
A better approach would be to use SQLCLR to loop through and store the running values using local
variables, and then stream the results one row at a time via the SqlPipe. An example of such a solution is
given in the following code listing:
[Microsoft.SqlServer.Server.SqlProcedure]
public static void RunningSum()
{
using (SqlConnection conn = new SqlConnection("context connection=true;"))
{
SqlCommand comm = new SqlCommand();
comm.Connection = conn;
comm.CommandText = "SELECT x FROM T ORDER BY x";

}
}
I’ve used this solution on a number of occasions and find it to be very efficient and maintainable,
and it avoids the need for any temp tables to be used to hold the running sums as required by the
alternatives. When testing against a table containing 100,000 rows, I achieve an average execution time
of 2.7 seconds for the SQLCLR query, compared to over 5 minutes for the TSQL equivalent.
String Manipulation
To compare the performance of string-handling functions between T-SQL and SQLCLR, I wanted to
come up with a fair, practical test. The problem is that there are lots of ingenious techniques for working
with string data: in T-SQL, some of the best performing methods use one or more common table
expressions (CTEs), CROSS APPLY operators, or number tables; or convert text strings to XML in or order
to perform nontrivial manipulation of character data. Likewise, in SQLCLR, the techniques available
differ considerably depending on whether you use the native String methods or those provided by the
StringBuilder class.
I decided that, rather than try to define a scenario that required a string-handling technique, the
only fair test was to perform a direct comparison of two built-in methods that provided the equivalent
functionality in either environment. I decided to settle on the T-SQL CHARINDEX and .NET’s
String.IndexOf(), each of which searches for and returns the position of one string inside another
string. For the purposes of the test, I created nvarchar(max) strings of different lengths, each composed
entirely of the repeating character a. I then appended a single character x onto the end of each string,
and timed the performance of the respective methods to find the position of that character over 10,000
iterations.
183
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
The following code listing demonstrates the T-SQL method:
CREATE PROCEDURE SearchCharTSQL
(
@needle nchar(1),
@haystack nvarchar(max)

available through SQLCLR—you might be surprised by the performance gain you achieve.
Enhancing Service Broker Scale-Out with SQLCLR
Having discussed some of the theory behind working with the SQLCLR and given some isolated
performance comparisons, let’s now turn our attention to a more detailed example that puts these ideas
into practice. Service Broker is frequently mentioned as an excellent choice for helping to scale out
database services. One of the more compelling use cases is a Service Broker service that can be used to
asynchronously request data from a remote system. In such a case, a request message would be sent to
the remote data service from a local stored procedure, which could do some other work while waiting for
the response—the requested data—to come back.
There are many ways to architect such a system, and given that Service Broker allows messages to
be sent either as binary or XML, I wondered which would provide the best overall performance and
value from a code reuse perspective. In the following sections, I’ll guide you through my investigations
into XML and binary serialization using SQLCLR.
XML Serialization
I started working with the HumanResources.Employee table from the AdventureWorks2008 database as a
sample data set, imagining a remote data service requesting a list of employees along with their
attributes. After some experimentation, I determined that the FOR XML RAW option is the easiest way to
serialize a table in XML format, and I used the ROOT option to make the XML valid:
DECLARE @x xml;
SET @x = (
SELECT *
FROM HumanResources.Employee
FOR XML RAW, ROOT('Employees')
);
GO
XML is, of course, known to be an extremely verbose data interchange format, and I was not
surprised to discover that the data size of the resultant XML is 105KB, despite the fact that the
HumanResources.Employee table itself has only 56KB of data. I experimented with setting shorter column
names, but it had very little effect on the size and created what I feel to be unmaintainable code.
Next, I set up a trace to gather some idea of the performance of the XML serialization (for more


SELECT
col.value('@BusinessEntityID', 'int') AS BusinessEntityID,
col.value('@NationalIDNumber', 'nvarchar(15)') AS NationalIDNumber,
col.value('@LoginID', 'nvarchar(256)') AS LoginID,
CAST(col.value('@OrganizationNode', 'nvarchar(256)') AS hierarchyid)
AS OrganizationNode,
col.value('@JobTitle', 'nvarchar(50)') AS JobTitle,
col.value('@BirthDate', 'datetime') AS BirthDate,
col.value('@MaritalStatus', 'nchar(1)') AS MaritalStatus,
col.value('@Gender', 'nchar(1)') AS Gender,
col.value('@HireDate', 'datetime') AS HireDate,
col.value('@SalariedFlag', 'bit') AS SalariedFlag,
col.value('@VacationHours', 'smallint') AS VacationHours,
col.value('@SickLeaveHours', 'smallint') AS SickLeaveHours,
col.value('@CurrentFlag', 'bit') AS CurrentFlag,
col.value('@rowguid', 'uniqueidentifier') AS rowguid,
col.value('@ModifiedDate', 'datetime') AS ModifiedDate
FROM @x.nodes ('/Employees/row') x (col);
GO
The next problem was performance. When I tested deserializing the XML using the preceding query,
performance went from poor to downright abysmal—averaging 6.8157 seconds per iteration.
At this point, I decided to investigate SQLCLR options for solving the problem, focusing on both
reuse potential and performance.
186
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
Binary Serialization with SQLCLR
My first thought was to return binary serialized DataTables; in order to make that happen, I needed a
way to return binary-formatted data from my CLR routines. This of course called for .NET’s

public static object getObject(byte[] theBytes)
{
using (System.IO.MemoryStream ms =
new System.IO.MemoryStream(theBytes, false))
{
return(getObject(ms));
}
}

public static object getObject(System.IO.Stream s)
{
SecurityPermission sp =
new SecurityPermission(
SecurityPermissionFlag.SerializationFormatter);
sp.Assert();

187
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
BinaryFormatter bf = new BinaryFormatter();

return (bf.Deserialize(s));
}
};
Use of this class is fairly straightforward: to serialize an object, pass it into the getBytes method. This
method first uses an assertion, as discussed previously, to allow SAFE callers to use it, and then uses the
binary formatter to serialize the object to a Stream. The stream is then returned as a collection of bytes.
Deserialization can be done using either overload of the getObject method. I found that depending on
the scenario, I might have ready access to either a Stream or a collection of bytes, so creating both
overloads made sense instead of duplicating code to produce one from the other. Deserialization also

NationalIDNumber,
LoginID,
OrganizationNode.ToString(),
OrganizationLevel,
JobTitle,
BirthDate,
188
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
MaritalStatus,
Gender,
HireDate,
SalariedFlag,
VacationHours,
SickLeaveHours,
CurrentFlag,
rowguid,
ModifiedDate
FROM HumanResources.Employee';

DECLARE @x varbinary(max);
SET @x = dbo.GetDataTable_Binary(@sql);
GO
 Note The
hierarchyid
CLR datatype is not marked as serializable, so in the preceding query I use the
ToString()
method to serialize the string representation of the
OrganizationNode
value.

{
List<object[]> theList = new List<object[]>();

using (SqlConnection conn =
new SqlConnection("context connection = true;"))
{
SqlCommand comm = new SqlCommand();
comm.Connection = conn;
comm.CommandText = query;

conn.Open();

SqlDataReader read = comm.ExecuteReader();
The next step is to pull the metadata for each column out of the SqlDataReader. A method called
GetSchemaTable is used to return a DataTable populated with one row per column. The available fields
are documented in the MSDN Library, but I’m using the most common of them in the code that follows.
After populating the object collection with the metadata, it is added to the output List:
DataTable dt = read.GetSchemaTable();

//Populate the field list from the schema table
object[] fields = new object[dt.Rows.Count];
for (int i = 0; i < fields.Length; i++)
{
object[] field = new object[5];
field[0] = dt.Rows[i]["ColumnName"];
field[1] = dt.Rows[i]["ProviderType"];
field[2] = dt.Rows[i]["ColumnSize"];
field[3] = dt.Rows[i]["NumericPrecision"];
field[4] = dt.Rows[i]["NumericScale"];


JobTitle,
BirthDate,
MaritalStatus,
Gender,
HireDate,
SalariedFlag,
VacationHours,
SickLeaveHours,
CurrentFlag,
rowguid,
ModifiedDate FROM HumanResources.Employee'

DECLARE @x varbinary(max);
SET @x = dbo.GetBinaryFromQueryResult(@sql);
GO
The result, 57KB worth of binary data, represented a 15 percent reduction in size compared to the
DataTable method. If using this method to transfer data between broker instances on remote servers, the
associated decrease in network traffic could make a big difference to performance. What’s more, the
serialization performance using SqlDataReader was the fastest yet, with an average query execution time
of just 0.0490 seconds.
Binary Deserialization
Pleased with the results of binary serialization using SQLCLR, I decided to go ahead with deserialization.
Continuing with my stress on reuse potential, I decided that a stored procedure would be a better choice
than a UDF. A stored procedure does not have a fixed output as does a UDF, so any input table can be
deserialized and returned without worrying about violating column list contracts.
The first part of the stored procedure follows:
[Microsoft.SqlServer.Server.SqlProcedure]
public static void GetTableFromBinary(SqlBytes theTable)
{
//Deserialize the input

dbType,
(byte)field[3],
(byte)field[4]);
break;
case SqlDbType.Binary:
case SqlDbType.Char:
case SqlDbType.NChar:
case SqlDbType.NVarChar:
case SqlDbType.VarBinary:
case SqlDbType.VarChar:
switch ((int)field[2])
{
//If it's a MAX type, use -1 as the size
case 2147483647:
cols[i] = new SqlMetaData(
(string)field[0],
dbType,
-1);
break;
default:
cols[i] = new SqlMetaData(
(string)field[0],
dbType,
(long)((int)field[2]));
break;
192
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
}
break;

Method Average Serialization Time Average Deserialization Time Size
XML (with TYPE) 3.6687 6.8157 105KB
Binary (DataTable) 0.0576 68KB
Binary (SqlDataReader) 0.0490 0.2666 57KB The combination of better network utilization and much faster serialization/deserialization
demonstrated in this example shows how SQLCLR can be a great technique for transferring tabular data
between Service Broker instances in scale-out and distributed processing scenarios.
193
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 7  SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS
194
Summary
Getting the most out of SQLCLR routines involves a bit of thought investment. Up-front design and
architecture considerations will yield great benefits in terms of security, reliability, and performance.
You should also consider reuse at every stage, in order to minimize the amount of work that must be
done when you need the same functionality six months or a year down the road. If you’ve already coded
it once, why code it again?
To illustrate these concepts, I showed an ex
ample that serialized tables using the BinaryFormatter,
which could be used to extend SQL Server Service Broker. I used a common, core set of more highly
privileged utility assemblies in order to limit the outer surface area, and tried to design the solution to
promote flexibility and potential for use in many projects throughout the lifetime of the code.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
C H A P T E R 8

  


195
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
CHAPTER 8  DYNAMIC T-SQL
Dynamic T-SQL vs. Ad Hoc T-SQL
Before I begin a serious discussion about how dynamic SQL should be used, it’s first important to
establish a bit of terminology. Two terms that are often intermingled in the database world with regard
to SQL are dynamic and ad hoc. When referring to these terms in this chapter, I define them as follows:
• Ad hoc SQL is any batch of SQL generated within an application layer and sent to
SQL Server for execution. This includes almost all of the code samples in this
book, which are entered and submitted via SQL Server Management Studio.
• Dynamic SQL, on the other hand, is a batch of SQL that is generated within T-SQL
and executed using the EXECUTE statement or, preferably, via the sp_executesql
system stored procedure (which is covered later in this chapter).
Most of this chapter focuses on how to use dynamic SQL effectively using stored procedures.
However, if you are one of those working with systems that do not use stored procedures, I advise you to
still read the “SQL Injection” and “Compilation and Parameterization” sections at a minimum. Both
sections are definitely applicable to ad hoc scenarios and are extremely important.
All of that said, I do not recommend the use of ad hoc SQL in application development, and feel that
many potential issues, particularly those affecting application security and performance, can be
prevented through the use of stored procedures.
The Stored Procedure vs. Ad Hoc SQL Debate
A seemingly never-ending battle among members of the database development community concerns
the question of whether database application development should involve the use of stored procedures.
This debate can become quite heated, with proponents of rapid software development methodologies
such as test-driven development (TDD) claiming that stored procedures slow down their process, and
fans of object-relational mapping (ORM) technologies making claims about the benefits of those
technologies over stored procedures. I highly recommend that you search the Web to find these debates
and reach your own conclusions. Personally, I heavily favor the use of stored procedures, for several
reasons that I will briefly discuss here.
First and foremost, stored procedures create an abstraction layer between the database and the

of course, is the question of performance. Proponents of ad hoc SQL make the valid claim that, thanks to
better support for query plan caching in recent versions of SQL Server, stored procedures no longer have
a significant performance benefit when compared to ad hoc queries. Although this sounds like a great
argument for not having to use stored procedures, I personally believe that it is a nonissue. Given
equivalent performance, I think the obvious choice is the more maintainable and secure option (i.e.,
stored procedures).
In the end, the stored procedure vs. ad hoc SQL question is really one of purpose. Many in the ORM
community feel that the database should be used as nothing more than a very simple object persistence
layer, and would probably be perfectly happy with a database that only had a single table with only two
columns: a GUID to identify an object’s ID and an XML column for the serialized object graph.
In my eyes, a database is much more than just a collection of data. It is also an enforcer of data rules,
a protector of data integrity, and a central data resource that can be shared among multiple applications.
For these reasons, I believe that a decoupled, stored procedure–based design is the best way to go.
Why Go Dynamic?
As mentioned in the introduction for this chapter, dynamic SQL can help create more flexible data
access layers, thereby helping to enable more flexible applications, which makes for happier users. This
is a righteous goal, but the fact is that dynamic SQL is just one means by which to attain the desired end
result. It is quite possible—in fact, often preferable—to do dynamic sorting and filtering directly on the
client in many desktop applications, or in a business layer (if one exists) to support either a web-based or
client-server–style desktop application. It is also possible not to go dynamic at all, by supporting static
stored procedures that supply optional parameters—but that’s not generally recommended because it
can quickly lead to very unwieldy code that is difficult to maintain, as will be demonstrated in the
“Optional Parameters via Static T-SQL” section later in this chapter .
Before committing to any database-based solution, determine whether it is really the correct course
of action. Keep in mind the questions of performance, maintainability, and most important, scalability.
Database resources are often the most taxed of any used by a given application, and dynamic sorting
and filtering of data can potentially mean a lot more load put on the database. Remember that scaling
the database can often be much more expensive than scaling other layers of an application.
For example, consider the question of sorting data. In order for the database to sort data, the data
must be queried. This means that it must be read from disk or memory, thereby using I/O and CPU time,

the query processor how to physically access the tables and indexes in the database in order to satisfy
the query. However, query compilation can be expensive for certain queries, and when the same queries
or types of queries are executed over and over, there is generally no reason to compile them each time.
In order to save on the cost of compilation, SQL Server caches query plans in a memory pool called the
query plan cache.
The query plan cache uses a simple hash lookup based on the exact text of the query in order to find
a previously compiled plan. If the exact query has already been compiled, there is no reason to
recompile it, and SQL Server skips directly to the execution phase in order to get the results for the caller.
If a compiled version of the query is not found, the first step taken is parsing of the query. SQL Server
determines which operations are being conducted in the SQL, validates the syntax used, and produces a
parse tree, which is a structure that contains information about the query in a normalized form. The
parse tree is further validated and eventually compiled into a query plan, which is placed into the query
plan cache for future invocations of the query.
The effect of the query plan cache on execution time can be seen even with simple queries. To
demonstrate this, first use the DBCC FREEPROCCACHE command to empty out the cache:
DBCC FREEPROCCACHE;
GO
Keep in mind that this command clears out the cache for the entire instance of SQL Server—doing
this is not generally recommended in production environments. Then, to see the amount of time spent
in the parsing and compilation phase of a query, turn on SQL Server’s SET STATISTICS TIME option,
which causes SQL Server to output informational messages about time spent in parsing/compilation
and execution:
SET STATISTICS TIME ON;
GO
198
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status