SQL Server MVP Deep Dives- P3 - Pdf 70

36
C
HAPTER
3
Finding functional dependencies
None of these examples were rejected by the domain expert, so I was able to conclude
that there are no more single-column dependencies in this table.
Note that I didn’t produce these three examples at the same time. I created them
one by one, for if there had been more functional dependencies I could’ve further
reduced the number of tests still needed. But because there turned out to be no more
dependencies, I decided to combine them in this description, to save space and
reduce the repetitiveness.
Second step: finding two-attribute dependencies
After following the preceding steps, I can now be sure that I’ve found all the cases
where an attribute depends on one of the other attributes. But there can also be attri-
butes that depend on two, or even more, attributes. In fact, I hope there are, because
I’m still left with a few attributes that don’t depend on any other attribute. If you ever
run into this, it’s a sure sign of one or more missing attributes on your shortlist—one
of the hardest problems to overcome in data modeling.
The method for finding multiattribute dependencies is the same as that for single-
attribute dependencies—for every possible combination, create a sample with two
rows that duplicate the columns to test and don’t duplicate any other column. If at
this point I hadn’t found any dependency yet, I’d be facing an awful lot of combina-
tions to test. Fortunately, I’ve already found some dependencies (which you’ll find is
almost always the case if you start using this method for your modeling), so I can rule
out most of these combinations.
Table 6 Testing functional dependencies for Qty
OrderNo CustomerID Product Qty TotalPrice OrderTotal

cover it explicitly. Each of the remaining five possibilities will be covered below.
CANDIDATE KEY AND DEPENDENT ATTRIBUTE
This combination (as well as the combination of two candidate keys, as I already men-
tioned) can be omitted completely. I won’t bother you with the mathematical proof,
but instead will try to explain in language intended for mere mortals.
Given three attributes (A, B, and C), if there’s a dependency from the combination
of A and B to C, that would imply that for each possible combination of values for A
and B, there can be at most one value of C. But if there’s also a dependency of A to B,
this means that for every value of A, there can be at most one value of B—in other
words, there can be only one combination of A and B for every value of A; hence there
can be only one value of C for every value of A. So it naturally follows that if B depends
on A, then every attribute that depends on A will also depend on the combination of
A and B, and every attribute that doesn’t depend on A can’t depend on the combina-
tion of A and B.
CANDIDATE KEY AND INDEPENDENT ATTRIBUTE
For this combination, some testing is required. In fact, I’ll test combination first,
because it’s the most common—and the sooner I find extra dependencies, the sooner
I can start removing attributes from the table, cutting down on the number of other
combinations to test.
But, as before, it’s not required to test all other attributes for dependency on a
given combination of a candidate key and an independent attribute. Every attribute
that depends on the candidate key will also appear to depend on any combination of
the candidate key with any other attribute. This isn’t a real dependency, so there’s no
need to test for it, or to conclude the existence of such a dependency.
This means that in my example, I need to test the combinations of OrderNo and
Product, OrderNo and Qty, and OrderNo and TotalPrice. And when testing the first
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
38
C

only other attribute is OrderID, the key.
So I create the test population of table 10
to check for a possible alternate key.
The domain expert saw no reason to
reject this example (after I populated the
related tables with data that observes all rules discovered so far), so there’s obviously
no dependency from CustomerID and OrderTotal to OrderNo.
Table 9 Testing functional dependencies for the combination of OrderNo and Product
OrderNo CustomerID Product Qty TotalPrice OrderTotal
7001 12 Gizmo 10 125.00 225.00
7001 Gizmo 12 150.00
Table 10 Testing functional dependencies for
the combination of CustomerID and OrderTotal
OrderNo CustomerID OrderTotal
7001 12 125.00
7002 12 125.00
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
39
Modeling the sales order
TWO INDEPENDENT ATTRIBUTES
Because the Orders table used in my example has no independent columns anymore,
I can obviously skip this combination. But if there still were two or more independent
columns left, then I’d have to test each combination for a possible dependency of a
candidate key or any other independent attribute upon this combination.
DEPENDENT AND INDEPENDENT ATTRIBUTES
This last possible combination is probably the least common—but there are cases
where an attribute turns out to depend on a combination of a dependent and an inde-
pendent attribute. Attributes that depend on the key attribute can’t also depend on a
combination of a dependent and an independent column (see the sidebar a few pages

least one of the candidate key attributes.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
40
C
HAPTER
3
Finding functional dependencies
Summary
I’ve shown you a method to find all functional dependencies between attributes. If
you’ve just read this chapter, or if you’ve already tried the method once or twice, it
may seem like a lot of work for little gain. But once you get used to it, you’ll find that
this is very useful, and that the amount of work is less than it appears at first sight.
For starters, in a real situation, many dependencies will be immediately obvious if
you know a bit about the subject matter, and it’ll be equally obvious that there are no
dependencies between many attributes. There’s no need to verify those with the
domain expert. (Though you should keep in mind that some companies may have a
specific situation that deviates from the ordinary.)
Second, you’ll find that if you start by testing the dependencies you suspect to be
there, you’ll quickly be able to divide the data over multiple relations with relatively
few attributes each, thereby limiting the number of combinations to be tested.
And finally, by cleverly combining multiple tests into a single example, you can
limit the number of examples you have to run by the domain expert. This may not
reduce the amount of work you have to do, but it does reduce the number of exam-
ples your domain expert has to assess—and she’ll love you for it!
As a bonus, this method can be used to develop sample data for unit testing, which
can improve the quality of the database schema and stored procedures.
A final note of warning—there are some situations where, depending on the order
you choose to do your tests, you might miss a dependency. You can find them too, but
they’re beyond the scope of this chapter. Fortunately this will only happen in cases

faster rate every year. This tremendous growth can easily be seen in the many
facets of the Microsoft database platform.
SQL
Server is no longer just a simple
SQL
database system; it has become an application platform, a vehicle for the
creation of complex and multifaceted data solutions.
Today’s database developer is expected to understand not only the Transact-
SQL
dialect spoken by
SQL
Server, but also the intricacies of the many compo-
nents that must be controlled in order to make the database system do their bid-
ding. This variety can be seen in the many topics discussed in the pages ahead:
indexing, full-text search,
SQL

CLR
integration,
XML
, external interfaces such as
ADO
.
NET
, and even mobile device development are all subjects within the realm
of database development.
The sheer volume of knowledge both required and available for consump-
tion can seem daunting, and giving up is not an option. The most important
thing we can do is understand that while no one can know everything, we can
strive to continually learn and enhance our skill sets, and that is where this book

2008 Internals (Microsoft Press, 2009) and Expert
SQL
Server 2005
Development (Apress, 2007). Adam regularly speaks at user
groups, community events, and conferences on a variety of
SQL
Server and .
NET
-related topics. He is a Microsoft Most Valuable
Professional (
MVP
) for
SQL
Server, Microsoft Certified
IT
Pro-
fessional (
MCITP
), and a member of the
INETA
North American
Speakers Bureau.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
43
4 Set-based iteration,
the third alternative
Hugo Kornelis
When reading
SQL

Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
44
C
HAPTER
4
Set-based iteration, the third alternative
The answer is that there are cases (admittedly, not many) where none of the existing
options yield acceptable performance, and set-based iteration does.
Declarative (set-based) code
Declarative coding is, without any doubt, the most-used way to manipulate data in
SQL
Server. And for good reason, because in most cases it’s the fastest possible code.
The basic principle of declarative code is that you don’t tell the computer how to
process the data in order to create the required results, but instead declare the results you
want and leave it to the
DBMS
to figure out how to get those results. Declarative code is
also called set-based code because the declared required results aren’t based on individ-
ual rows of data, but on the entire set of data.
For example, if you need to find out which employees earn more than their man-
ager, the declarative answer would involve one single query, specifying all the tables
that hold the source data in its
FROM
clause, all the required output columns in its
SELECT
clause, and using a
WHERE
clause to filter out only those employees that meet
the salary requirement.

Server does know them) can’t be specified
in the language.
The only way to calculate running totals in a set-based fashion is to specify each
running total as the sum of the values in all preceding rows. That implies that a lot
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
45
The common methods and their shortcomings
more summation is done than would be required if intermediate results were avail-
able. This results in performance that degrades exponentially with the amount of
data, so even if you have no problems in your test environment, you will have prob-
lems in your 100-million-row production database!
Iterative (cursor-based) code
The base principle of iterative coding is to write
T-SQL
as if it were just another third-
generation programming language, like
C#
,
VB
.
NET
, Cobol, and Pascal. In those lan-
guages, the only way to process a set of data (such as a sequentially organized file) is to
iterate over the data, reading one record at a time, processing that record, and then
moving to the next record until the end of the file has been reached.
SQL
Server has
cursors as a built-in mechanism for this iteration, hence the term cursor-based code as an
alternative to the more generic iterative code.

clause
The full
ANSI
standard specification of the
OVER
clause includes windowing exten-
sions that allow for simple specification of running totals. This would result in short
queries with probably very good performance—if
SQL
Server had implemented them.
Unfortunately, these extensions aren’t available in any current version of
SQL
Server,
so we still have to code the running totals ourselves.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
46
C
HAPTER
4
Set-based iteration, the third alternative
for each employee read the details of his or her department, and that’s exactly what’ll
happen. But tell
SQL
Server that you want results of employees and departments com-
bined, and that’s only one of the options for the query optimizer to consider.
Set-based iteration
An aspect that’s often overlooked in the “set-based or cursor” discussion is that they
represent two extremes, and there’s plenty of room for alternate solutions in
between. Iterative algorithms typically use one iteration for each row in the table or

SET @BatchSize = 10000;
SET @RowCnt = @BatchSize;
WHILE @RowCnt = @BatchSize
BEGIN;
DELETE TOP (@BatchSize)
FROM TransactionHistory
WHERE TranDate < '20050101';
SET @RowCnt = @@ROWCOUNT;
END;
This form of set-based iteration won’t increase performance of the code. It’s used to
limit the impact of code on concurrency, but may make the code run slower.
This form of set-based iteration isn’t sophisticated enough to warrant much discus-
sion. I merely wanted to include it for the sake of completeness. Using set-based
Listing 1 Set-based iteration with the
TOP
clause
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
47
Set-based iteration
iteration to increase performance of problematic code takes, unfortunately, more
than just adding a
TOP
clause to the query.
Running totals
Adding running totals to a report is a common business requirement. It’s also one of
the few situations where declarative code often (though not always) results in poor
performance.
In this example, I’ll use the AdventureWorks sample database to report all sales,
arranged by customer, ordered by date, and with a running total of all order amounts

ITERATIVE CODE
Because the declarative running totals code usually performs poorly, this problem is
commonly solved with iterative code, using a server-side cursor. Listing 3 shows the
code typically used for this.
Listing 2 Declarative code for calculating running totals
SalesOrderID used
as tie breaker
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
48
C
HAPTER
4
Set-based iteration, the third alternative

USE AdventureWorks;
SET NOCOUNT ON;
DECLARE @Results TABLE
(CustomerID int NOT NULL,
OrderDate datetime NOT NULL,
SalesOrderID int NOT NULL,
TotalDue money NOT NULL,
RunningTotal money NULL,
PRIMARY KEY (CustomerID, OrderDate, SalesOrderID));
INSERT INTO @Results(CustomerID, OrderDate, SalesOrderID, TotalDue)
SELECT CustomerID, OrderDate, SalesOrderID, TotalDue
FROM Sales.SalesOrderHeader;
DECLARE @CustomerID int, @OrderDate datetime,
@SalesOrderID int, @TotalDue money,
@CurrCustomerID int, @RunningTotal money;

FROM @Results
ORDER BY CustomerID, OrderDate, SalesOrderID;
Listing 3 Iterative code for calculating running totals
B
STATIC cursor here faster
than FAST_FORWARD
D
E
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
49
Set-based iteration
The code is pretty straightforward. In order to get all results as one result set, a table
variable
B
is used to store the base data and the calculated running totals. The pri-
mary key on the table variable is there primarily to create a good clustered index for
the iteration, which explains why it includes more columns than the key (which is on
SalesOrderID only). The only way to index a table variable is to add
PRIMARY

KEY
or
UNIQUE
constraints to it.
A
T-SQL
cursor is then used to iterate over the rows. For each row, the variable
holding the running total is incremented with the total of that order and then stored
in the results table

DECLARE @Results TABLE
(CustomerID int NOT NULL,
OrderDate datetime NOT NULL,
SalesOrderID int NOT NULL,
TotalDue money NOT NULL,
RunningTotal money NULL,
Rnk int NOT NULL,
PRIMARY KEY (Rnk, CustomerID));
INSERT INTO @Results
(CustomerID, OrderDate, SalesOrderID,
TotalDue, RunningTotal, Rnk)
Listing 4 Set-based iteration for calculating running totals
B
C
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
50
C
HAPTER
4
Set-based iteration, the third alternative
SELECT CustomerID, OrderDate, SalesOrderID,
TotalDue, TotalDue,
RANK() OVER (PARTITION BY CustomerID
ORDER BY OrderDate,
SalesOrderID)
FROM Sales.SalesOrderHeader;
DECLARE @Rank int,
@RowCount int;
SET @Rank = 1;

While populating the table variable, I also set the running total for each order
equal to its order total. This is, of course, incorrect for all except the first orders, but it
saves the need for a separate
UPDATE
statement for the first orders, and the running
totals for all other orders will eventually be replaced later in the code.
The core of this algorithm is the
UPDATE
statement
D
that joins a selection of all
orders with the next rank to those of the previous rank, so that the next running total
can be set to the sum of the previous running total and the next order total.
On my laptop, this code runs in 0.4 seconds. This speed depends not only on the
amount of data, but also on the required number of iterations. If I change the code
to calculate running totals per territory rather than per customer, the number of
C
D
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
51
Set-based iteration
iterations goes up to almost 7,000, causing the execution time to rise to
approximately 0.9 seconds. And if I change the code to calculate overall running
totals (forcing the number of iterations to be equal to the number of rows), the
clock stops at 2 seconds.
The bottom line is that, even though declarative code runs slightly faster in cases
with a very low iteration count and iterative code is slightly better for very high itera-
tion counts, set-based iteration presents a good algorithm that’s the fastest in many sit-
uations and only slightly slower in the other cases.

or two meters from each roll and still have to use a new roll for the last required
length of six meters.
According to mathematicians, you can only be 100 percent sure that you get the abso-
lute minimum number of bins by trying every possible permutation. It’s obvious that,
however you implement this, it’ll never scale, as the number of possible permutations
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

SQL Server MVP Deep Dives- P3 - Pdf 70

Tài liệu, ebook tham khảo khác

Học thêm