Tài liệu SQL Server MVP Deep Dives- P14 - Pdf 87

476
C
HAPTER
36
Understated changes in SQL Server 2005 replication
Reading the text of hidden replication stored procedures
Consider the case when you need to examine the text of a replication system stored
procedure. You might want to see this text for several reasons. Perhaps you want to
perform the same actions as the
SQL
Server Management Studio (
SSMS
)
GUI
, but
hope to gain programmatic control of the process for yourself. Perhaps you want to
get a better understanding of what happens under the hood in order to increase your
replication expertise and troubleshooting ability. Whatever the reason, usually you
will need to know the name of the system stored procedure and then use
sp_helptext
or the
OBJECT_DEFINITION
function to see the whole procedure definition. For some
of the replication stored procedures, though, you will find that the text is hidden and
these two methods will not work. For example, if you try the following code in a nor-
mal query window, you will have
NULL
returned:
SELECT OBJECT_DEFINITION (OBJECT_id('sys.sp_MSrepl_helparticlecolumns'))
On the other hand, if you use the dedicated administrator connection (
DAC

tion system, don’t forget to close the
DAC
connection when you are done with it!
Creating snapshots without any data—only the schema
When we look in
BOL
at the definition of a replication stored procedure or a replica-
tion agent, we find that the permitted values for the parameters are all clearly listed.
But it occasionally becomes apparent that there are other acceptable values that have
never been documented. The exact number of these hidden parameters is something
we’ll never know, and in all cases they will be unsupported for the general public.
Even so, sometimes they start being used and recommended prior to documentation,
usually in order to fix a bug. A case in point is the
sp_addpublication
procedure, in
which there is now the acceptable value of
database snapshot
for the
@sync_method
.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
477
Undocumented or partially documented changes in behavior
This value was for some time known about, undocumented and yet used, but it now
exists in fully documented (and supported) form in
BOL
. The usual caveats apply if
you decide to use any such workaround; you must take full responsibility, and any
such modifications are completely unsupported.

Select Steps from the left pane.
5
Select the second Run Agent step, and click the Edit button to open it. You
should see the command line in the Command text box.
Once you have added the
/NoBcpData
parameter to the command line, as shown in
figure 1, click
OK
in the Job Step dialog box and click
OK
again in the Job dialog
box to make sure that the change is committed. The
/NoBcpData
switch tells the
Snapshot Agent to create empty
BCP
files instead of bulk-copying data out from the
published tables.
Figure 1 In the Snapshot Agent’s job step, the unofficial (unsupported!)
/NoBcpData
is entered.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
478
C
HAPTER
36
Understated changes in SQL Server 2005 replication
Some changed replication defaults

Automatic range management ensures that each subscriber is reseeded with
its own identity ranges without any extra configuration, whereas manual means
that you will need to change either the seed or the increment of the identity
range on each subscriber to avoid conflicts with the publisher. If you previously
relied on leaving this article property alone and chose to manually administer
the identity range, beware because a range of 1,000 values will have already
been allocated to each of your subscribers.

Default range sizes have increased. The publisher range has changed from
100 to 10,000, and the subscriber range size has increased from 100 to 1,000.

Overflow range is allocated. The merge trigger code on the published article
implements an overflow range that is the same size as the normal range. This
means that by default you will have two ranges of 1,000 values allocated to a sub-
scriber. The clever part is that the overflow range is automatically allocated by
the merge insert trigger and therefore doesn’t require a connection to the pub-
lisher. However, the reseeding performed in the trigger is restricted to those
cases where a member of the db_owner role does the insert.

The
threshold
parameter is no longer used. Although it appears in the arti-
cle properties dialog box much the same as in
SQL
Server 2000, the
threshold
parameter only applies to subscribers running
SQL
Server Mobile or previous
versions of

removed.
(If you are now thinking that it is not possible in
T-SQL
to directly add the
NFR
attribute to an existing identity column, please take a look inside the
sp_identitycolumnforreplication
system stored procedure, because this is the pro-
cedure that marks the identity column as
NFR
.)
DEFERRED UPDATE TRACE FLAGS
For transactional replication, you might be using deferred update trace flags unneces-
sarily. In
SQL
Server 2000, updates to columns that do not participate in a unique key
constraint are replicated as updates to the subscriber unless trace flag 8202 is enabled,
after which they are treated as deferred updates (paired insert/deletes). On the other
hand, updates to columns that do participate in unique constraints are always treated
as deferred updates (paired insert/deletes) unless trace flag 8207 is enabled. In
SQL
Server 2005, all such changes are replicated as updates on the subscriber regardless of
whether the columns being updated participate in a unique constraint or not.
PARTITIONING OF SNAPSHOT FILES
The following change to a replication default is more complicated to explain, but it
deals with a significant improvement that has been made to the initial snapshot pro-
cess. In
SQL
Server 2000, when an article is
BCP

action. In contrast, if you have a
SQL
Server 2005 distributor and
SQL
Server 2005
subscribers, there is now much greater granularity in the process. The article rows are
partitioned into the separate text files, and each partition is applied in a separate
transaction, meaning that after an outage, the snapshot distribution is able to con-
tinue with the partition where it left off and complete the remaining partitions. For a
table containing a lot of rows, this could lead to a huge saving in time.
Other useful side effects are that this can cause less expansion of the transaction
log (assuming that the migration crosses a backup schedule or the subscriber uses
the simple recovery model), and it can lead to paths of parallel execution of the
BCP
process for those machines having more than one processor. (It is true that parallel
execution existed in
SQL
Server 2000, but this was only for the processing of several
articles concurrently and not for a single table.)
Similarly, the same benefits apply when creating the initial snapshot using the
Snapshot Agent. Note that the
–BcpBatchSize
parameter of the Snapshot and Distri-
bution Agents governs how often progress messages are logged and has no bearing at
all on the number of partitions.
Source table
Articlename#1.bcp
Articlename#2.bcp
Figure 2 Snapshot data from
a table is now partitioned

X
(where
X

>

1
) for the Distribution Agent and ensure that the target table doesn’t have
any indexes on it before the Distribution Agent delivers the snapshot.
More efficient methodologies
In the previous section, we looked at several undocumented techniques that can be
used to enhance the replication behavior. We’ll now look at some capabilities that are
fully documented, but that are not always understood to be replacements for less-
efficient methodologies.
Remove redundant pre-snapshot and post-snapshot scripts
In
SQL
Server 2000 publications, we sometimes use pre-snapshot and post-snapshot
scripts. The pre-snapshot scripts are
T-SQL
scripts that run before the snapshot files
are applied, whereas the post-snapshot scripts apply once the snapshot has com-
pleted. Their use is often to overcome
DRI
(declarative referential integrity) issues on
the subscriber.
Remember that the initialization process starts by dropping tables on the sub-
scriber. If all the tables on the subscriber originate from one publication, this is not
an issue, but if there is more than one publication involved, we might have a scenario
where the dropping of tables at the subscriber during initialization would be invalid


dbo.MSsavedforeignkeyextendedproperties
Once the information is safely hived away, the foreign keys are dropped. To re-add the
foreign keys, the Distribution Agent calls the new
sp_MSrestoresavedforeignkeys
system stored procedure once the snapshot has been applied. Note that all this hap-
pens automatically and requires no manual scripts to be created.
Take a look at your existing pre-snapshot and post-snapshot scripts. If they deal
with the maintenance of foreign keys, there’s a good chance they are doing work that
is already done by default, in which case you’ll be able to drop the scripts entirely and
remove the maintenance issue.
Replace merge -EXCHANGETYPE parameters
In
SQL
Server 2005 merge replication, we can now mark articles as download-only,
meaning that changes to the table are only allowed at the publisher and not at the
subscriber. Previously, in
SQL
Server 2000, we would use the
-EXCHANGETYPE
value to
set the direction of merge replication changes. This was implemented by manually
editing the Merge Agent’s job step and adding
-EXCHANGETYPE 1|2|3
as text.
When using
SQL
Server 2000, entering a value of
-EXCHANGETYPE 2
means that

setting mentioned previously, but in the
SQL
Server 2005 case there are no triggers at all on the subscriber table. Another distinc-
tion is that this setting is made at the more granular article level rather than set for the
entire publication. This means that although the
-EXCHANGETYPE
and
sp_add-
mergearticle
methods are logically equivalent, the implementation has become
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
483
Summary
much more sophisticated in
SQL
Server 2005. Triggers that unnecessarily log meta-
data at the subscriber are no longer fired; therefore both subscriber data changes and
the subsequent synchronization are significantly faster.
Put simply, you should replace the use of
EXCHANGETYPE
with download-only articles!
Incidentally, this setting is also implemented by a separate check box in
SSMS
, as
shown in figure 3. This check box does a similar job but sets the value of
@subscriber_upload_options
to 2, which again makes the changes download-only,
but in this case any subscriber settings are prohibited and rolled back.

SQL
Server replication—and has answered over 6,000
questions on the Microsoft
SQL
Server Replication newsgroup.
When not working, he likes spending time with his wife and
son, Ewa and Thomas, going fell-walking in the Lake District,
and learning Ba Gua, a Chinese martial art.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
485
Summary
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
484
37 High-performance
transactional replication
Hilary Cotter
The purpose of this chapter is to educate
DBA
s on how to get maximum perfor-
mance from their high-performance transactional replication topology across all
versions and editions. Most
DBA
s are concerned with latency—in other words, how
old the transactions are when they are applied on the Subscriber.
To set expectations, you should know that the minimum latency of any transac-
tional replication solution will be several seconds (lower limits are between 1 and 2
seconds). Should you need replication solutions which require lower latencies, you
should look at products like Golden Gate, which is an

485
Performance kiss of death factors in transactional replication
Although these performance monitor counters are the best way to get a handle on
your current throughput and latency in your production environments, in this study
we’ll be focusing primarily on throughput. We’ll focus mainly on worker time, or how
long the distribution agent has to work to replicate a given set of commands. We’ll
focus on the Distribution Agent metrics, as the Log Reader is rarely the bottleneck in
a replication topology. Additionally the Log Reader Agent operates asynchronously
from the Distribution Agent; therefore, the Log Reader Agent can keep current with
reading the log, while the Distribution Agent can be experiencing high latencies. By
studying the output of the replication agents themselves, when you replay your work-
loads through them (or measure your workloads as they are replicated by the
agents), you can determine the optimal configuration of profile settings for your
workloads, and determine how to group articles into different publications for the
maximum throughput.
This chapter assumes that you have a good understanding of replication concepts.
Should you be unfamiliar with replication concepts, I advise you to study the section
Replication Administrator InfoCenter in Books Online, accessible online at http:
//
msdn.microsoft.com/en-us/library/ms151314(
SQL.90
).aspx.
Before we begin it is important to look at factors that are the performance kiss of
death to any replication solution. After we look at these factors and possible ways to
mitigate them, we’ll look at tuning the replication agents themselves for maximum
performance.
Performance kiss of death factors
in transactional replication
The following factors will adversely affect the throughput of any replication solution:


High-performance transactional replication
insert into tableName (Col1, Col2) values(1,2)
update tableName set Col1=1, Col2=2 where pk=1
delete from tableName where pk=1
Each singleton is wrapped in a transaction. Contrast this with the following batch
updates (the term update refers to any
DML
—an
insert
,
update
, or
delete
:
insert into tableName
select * from tableName1
update tableName set col1=1 where pk<=20
delete from tableName where pk<=20
In the insert statement the insert batch update will insert as many rows as there are in
tableName1
into
tableName
(as a transaction). Assuming there were 20 rows with a
pk
less than or equal to 20 in
tableName
, 20 rows would be affected by the batch update
and batch deletes.
If you use any transaction log analysis tool, you’ll see that the batch updates are
decomposed into singleton commands. The following

consistency (no rows affected), or some other event that causes the
DML
to fail (for
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
487
Performance kiss of death factors in transactional replication
example, the subscription database transaction log filling up). This can mean that a
lengthy period of time is required to apply large batch updates. While these batch
updates are being applied, the Distribution Agent will wrap them in a transaction so
that it can roll them back on errors, and Subscriber resources are consumed to hold
this transaction open. Latencies that were previously several seconds can quickly grow
to many minutes, and occasionally to hours (for large numbers of modified rows).
SQL
Server will get bogged down when replicating transactions that affect large num-
bers of rows—typically in the tens or hundreds of thousands of rows. Strategies for
improving performance in this regard are presented in the sections that follow.
REPLICATING THE EXECUTION OF STORED PROCEDURES
Replication involves doing your batch
DML
through a stored procedure and then rep-
licating the execution of the stored procedure. If you choose to replicate the execu-
tion of a stored procedure, every time you execute that stored procedure its name and
its parameters will be written to the log, and the Log Reader Agent will pick it up and
write it to the distribution database, where the Distribution Agent will pick it up and
apply it on the Subscriber. The performance improvements are due to two reasons:

Instead of 100,000 commands (for example) being replicated, only one stored
procedure statement would be replicated.


value of the last transaction identifier (
xact_seqno
) and then use
sp_setsub
scriptionxactseqno
to tell the subscription database that all the batch updates have
arrived on the Subscriber. Then restart your Distribution Agent, and the agent will
write only transactions that occurred after the batch update.
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
488
C
HAPTER
37
High-performance transactional replication
Take care to note any transactions that may be in the distribution database and
occurred after the batch update started and before it stopped. You’ll need to ensure
that these commands are also applied on the subscription database.
The problem with this approach is that the Log Reader Agent still has to process
all of the batch update commands that are written to the log. This approach will elim-
inate the lengthy time required for the Distribution Agent to apply the commands to
the Subscriber, but will not address the time that it takes for the Log Reader Agent to
read the batch commands from the log and write them to the distribution database.
MAXCMDSINTRAN
MaxCmdsInTran
is a Log Reader Agent parameter, which will break a large transaction
into small batches. For example, if you set this to 1,000 and do a batch insert of 10,000
rows, as the Log Reader Agent reads 1,000 commands in the log it will write them to
the distribution database, even before that batch insert has completed. This allows
them to be replicated to the Subscriber. If this batch insert was wrapped in a transac-

varbinary(max)
with
filestream
enabled,
var-
binary(max)
, and
XML
.
Like a batch update, when you replicate text, the constituent commands may be
spread over multiple rows in MSrepl_commands.
For example, this statement,
Insert into tableName (col1) values(replicate('x',8000)
Licensed to Kerri Ross <[email protected]>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
489
Performance kiss of death factors in transactional replication
is spread over eight rows in MSrepl_commands. When the text is being replicated,
there is overhead not only when the command is read out of the Publisher’s log and
broken into eight commands in MSrepl_commands, but there is also overhead for the
Distribution Agent in assembling these eight commands into one insert statement to
apply on the Subscriber.
Unfortunately there is no easy way of getting around the overhead, other than
using vertical filtering to avoid replicating text columns. On the newsgroups I fre-
quently encounter misconceptions about the
max

text

repl

The y axis is worker time (ms), and the x axis is
OutputVerboseLevel
. Notice how
a setting for
HistoryVerboseLevel
of 0 and using the default for
OutputVerboseLevel
(1) will give you the best performance and replicate 20 percent faster than its nearest
competitor; 20 percent faster meant a total of 18,356 transactions per second.
The characteristics are completely different for 100 transactions of 100 singleton
inserts as displayed in figure 2.
The y axis is worker time, and the x axis is
OutputVerboseLevel
.
1400
1200
1000
800
600
400
200
0
1 2 3
HistoryVerboseLevel=2
HistoryVerboseLevel=1
HistoryVerboseLevel=0
Figure 1 The effect of
HistoryVerboseLevel
and
OutputVerboseLevel

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Tài liệu SQL Server MVP Deep Dives- P14 - Pdf 87

Tài liệu, ebook tham khảo khác

Học thêm