MySQL High Availability- P4 - Pdf 66

the timeout limit—and acknowledge that the transaction has been properly written
to disk.
You can use the rpl-semi-sync-master-wait-no-slave={ON|OFF} option to turn off
this behavior, in which case the master reverts to asynchronous replication if there
are no connected slaves.
Monitoring semisynchronous replication
Both plug-ins install a number of status variables that allow you to monitor semisyn-
chronous replication. We will cover the most interesting ones here—for a complete
list, consult the online reference manual for semisynchronous replication.
rpl_semi_sync_master_clients
This status variable reports the number of connected slaves that support and have
registered for semisynchronous replication.
rpl_semi_sync_master_status
The status of semisynchronous replication on the master is 1 if it is active, and 0 if
it is inactive—either because it has not been enabled or because it was enabled but
has reverted to asynchronous replication.
rpl_semi_sync_slave_status
The status of semisynchronous replication on the slave is 1 if active—that is, it has
been enabled and the I/O thread is running—and 0 if it is inactive.
You can read the values of these variables either using the SHOW STATUS command or
through the information schema table GLOBAL_STATUS. If you want to use the values for
other purposes, the SHOW STATUS command is hard to use and a query as shown in
Example 4-5 uses SELECT on the information schema to extract the value and store it in
a user-defined variable.
Example 4-5. Retrieving values using the information schema
master> SELECT Variable_value INTO @value
-> FROM INFORMATION_SCHEMA.GLOBAL_STATUS
-> WHERE Variable_name = 'Rpl_semi_sync_master_status';
Query OK, 1 row affected (0.00 sec)
Slave Promotion
The procedures described so far work well when you have a master running that you

Figure 4-8 shows a typical setup with a master and several slaves.
Figure 4-8. Promoting a slave to replace a failed master
128 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
For the traditional method of slave promotion, the following are required:
• Each promotable slave must have a user account for the replication user.
• Each promotable slave should run with --log-bin, that is, with the binary log
enabled.
• Each promotable slave should run without the --log-slave-updates option (the
reason will become obvious shortly).
Assume you are starting with the original setup shown in Figure 4-8 and that the master
fails. You can promote a slave to be the new master by doing the following:
1. Stop the slave using STOP SLAVE.
2. Reset the slave that is going to be the new master using RESET MASTER. This will
ensure the slave starts as the new master and that any connecting slave will start
reading events from the time the slave was promoted.
3. Connect the other slaves to the new master using CHANGE MASTER TO. Since you reset
the new master, you can start replication from the beginning of the binary log, so
it is not necessary to provide any position to CHANGE MASTER TO.
Unfortunately, this approach is based on an assumption that is not generally true—
that the slaves have received all changes that the master has made. In a typical setup,
the slaves will lag behind the master to various degrees. It might be just a few transac-
tions, but nevertheless, they lag behind. In the next section you will see a solution to
that problem.
Regardless of that, this approach is so simple that it is useful if you can handle lost
transactions or if you are operating under a low load.
A revised method for promoting a slave
The traditional approach to promoting a slave is inadequate in most cases because
slaves usually lag behind the master. Figure 4-9 illustrates the typical situation when
the master disappears unexpectedly. The box labeled “binary log” in the center is the

and structure of the tags don’t matter; they just need to be uniquely identifiable no
matter who executed the transaction so each transaction on the master can be found
in the promoted slave’s binary log. We call this kind of tag the global transaction ID.
The easiest way to accomplish this is to insert a statement at the end of each transaction
that updates a special table and use that to keep track of where each slave is. Just before
Figure 4-9. Binary log positions of the master and the connected slaves
130 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
committing each transaction, a statement updates the table with a number that is
unique for the transaction.
Tagging can be handled in two main ways:
• Extending the application code to perform the necessary statements
• Calling a stored procedure to perform each commit and writing the tag in the
procedure
Because the first approach is easier to follow, it will be demonstrated here. If you are
interested in the second approach, see “Stored Procedures to Commit Transac-
tions” on page 141.
To implement the global transaction ID, we have created the two tables in Exam-
ple 4-6: one table named Global_Trans_ID to generate sequence numbers and a
separate table named Last_Exec_Trans to record the global transaction ID.
The server ID is added to the definition of Last_Exec_Trans to distinguish transactions
committed on different servers. If, for example, the promoted slave fails before all the
slaves have managed to connect, it is very important to distinguish between the trans-
action ID of the original master and the transaction ID of the promoted slave. Other-
wise, the slaves that didn’t manage to connect to the promoted slave might start to
execute from a position that is wrong when being redirected to the second promoted
slave. This example uses MyISAM to define the counter table, but it is possible to use
InnoDB for this as well.
Example 4-6. Tables used for generating and tracking global transaction IDs
CREATE TABLE Global_Trans_ID (

you can remove its row from the transaction counter table to save space. This
optional step works only for a MyISAM table. If you use InnoDB, you have to be
careful about leaving the last used global transaction ID in the table. InnoDB de-
termines the next number from the maximum value in the autoincrement column
currently in the table.
master> DELETE FROM Global_Trans_ID WHERE number < 235;
Query OK, 1 row affected (0.00 sec)
4. Turn on the binary log:
master> SET SQL_LOG_BIN = 1;
Query OK, 0 rows affected (0.00 sec)
5. Update the Last_Exec_Trans tracking table with the server ID and the transaction
ID you got in step 2. This is the last step before committing the transaction through
a COMMIT:
master> UPDATE Last_Exec_Trans SET server_id = 0, trans_id = 235;
Query OK, 1 row affected (0.00 sec)
master> COMMIT;
Query OK, 0 rows affected (0.00 sec)
Each global transaction ID represents a point where replication can be resumed. There-
fore, you must carry out this procedure for every transaction. If it is not used for some
transaction, the transaction will not be tagged properly and it will not be possible to
start from that position.
Now, to promote a slave after the master is lost, find the slave that has the latest changes
of all the slaves—that is, has the largest binlog position—and promote it to master.
Then have each of the other slaves connect to it.
For a slave to connect to the promoted slave and start replication at the right position,
it is necessary to find out what position on the promoted slave has the last executed
transaction of the slave. Scan the binary log of the promoted slave to find the right
transaction ID.
132 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

the binary log on slave-3, which we’ll obtain in Example 4-7.
Procedures | 133
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Example 4-7. Master positions of slave-3, which will be promoted
slave-3> SHOW MASTER LOGS;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| slave-3-bin.000001 | 3115 |
| slave-3-bin.000002 | 345217 |
| slave-3-bin.000003 | 24665 |
| slave-3-bin.000004 | 788243 |
| slave-3-bin.000005 | 1778 |
+--------------------+-----------+
5 row in set (0.00 sec)
The
important thing to know from the output of SHOW MASTER LOGS is the names of the
logs, so you can scan them for global transaction IDs. For instance, when reading the
slave-3-bin.000005 file using mysqlbinlog, part of the output will look like that shown
in Example 4-8. The transaction received by slave-3 starting at position 596 (highlighted
in the first line of the output) has the global transaction ID received by slave-1, as shown
by an UPDATE of the Last_Exec_Trans table.
Example 4-8. Output from the mysqlbinlog command for one transaction
# at 596
#091018 18:35:42 server id 1 end_log_pos 664 Query thread_id=952 ...
SET TIMESTAMP=1255883742/*!*/;
BEGIN
/*!*/;
# at 664
#091018 18:35:42 server id 1 end_log_pos 779 Query thread_id=952 ...

going backward
in this manner—locating each of the transactions that you recorded
in the first step in the procedure—you can connect the slaves one by one to the new
master at exactly the right position.
This technique works well if the update statement is added to every transaction commit.
Unfortunately, there are statements that perform an implicit commit before and after
the statement. Typical examples include CREATE TABLE, DROP TABLE, and ALTER TABLE.
Since these statements do an implicit commit, they cannot be tagged properly, hence
it is not possible to restart just after them. This means that if the sequence of statements
in Example 4-9 is executed and there is a crash, you will potentially have problems.
If a slave has just executed the CREATE TABLE and then loses the master, the last seen
global transaction ID is for the INSERT INTO—that is, just before the CREATE TABLE state-
ment. Therefore, the slave will try to reconnect to the promoted slave with the trans-
action ID of the INSERT INTO statement. Since it will find the position in the binary log
of the promoted slave, it will start by replicating the CREATE TABLE statement again,
causing the slave to stop with an error.
You can avoid these problems through careful use and design of statements; for ex-
ample, if CREATE TABLE is replaced with CREATE TABLE IF NOT EXISTS, the slave will notice
that the table already exists and skip execution of the statement.
Example 4-9. Statements where global transaction ID cannot be assigned
INSERT INTO message_board VALUES ('', 'Hello World!');
CREATE TABLE admin_table (a INT UNSIGNED);
INSERT INTO message_board VALUES ('', '');
Slave promotion in Python
You have now seen two techniques for promoting a slave: a traditional technique that
suffers from a loss of transactions on some slaves, and a more complex technique that
recovers all available transactions. The traditional technique is straightforward to im-
plement in Python, so let’s concentrate on the more complicated one. To handle slave
promotion this way, it is necessary to:
• Configure all the slaves correctly

self.__master = master
self.__user = repl_user
def _add_global_id_tables(self, master):
master.sql(_GLOBAL_TRANS_ID_DEF)
master.sql(_LAST_EXEC_TRANS_DEF)
if not master.sql("SELECT @@warning_count"):
master.sql("INSERT INTO Last_Exec_Trans() VALUES ()")
def _relay_events(self, server, config):
config.set('mysqld', 'log-slave-updates')
def imbue(self, server):
# Fetch and update the configuration
config = server.get_config()
self._set_server_id(server, config)
self._enable_binlog(server, config)
self._relay_event(server, config)
# Put the new configuration in place
server.stop()
server.put_config(config)
server.start()
136 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
# Add tables to master
self._add_global_id_tables(self.__master)
server.repl_user = self.__master.repl_user
This
routine configures
the slaves and the master correctly for using global transaction
IDs. You still have to update the Last_Exec_Trans table when committing each trans-
action. In Example 4-11 you can see an example implementation in PHP for committing
transactions. The code is written using PHP, since this is part of the application code

$query = sprintf("SELECT user_id FROM user WHERE email = '%s'", $email);
$result = mysql_query($query, $link);
$row = mysql_fetch_row($result);
$user_id = $row[0];
$update_user = "UPDATE user SET messages = messages + 1 WHERE user_id = %d";
mysql_query(sprintf($update_user, $user_id), $link);
Procedures | 137
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
$insert_message = "INSERT INTO message VALUES (%d,'%s')";
mysql_query(sprintf($insert_message, $user_id, $message), $link);
commit_trans($link);
}
$conn = mysql_connect(":/var/run/mysqld/mysqld1.sock", "root");
add_message('', "MySQL Python Replicant rules!", $conn);
What
remains is
the job of handling the actual slave promotion in the event of a failure.
The procedure was already outlined above, but the implementation is more involved.
The first step is to fetch the binlog files remotely, similarly to the method used in
Chapter 2. In this case, we need to fetch the entire binlog file, since we do not know
where to start reading. The fetch_remote_binlog function in Example 4-12 returns an
iterator to the lines of the binary log.
Example 4-12. Fetching a remote binary log
def fetch_remote_binlog(server, binlog_file):
command = ["mysqlbinlog",
"--read-from-remote-server",
"--force",
"--host=%s" % (server.host),
"--user=%s" % (server.sql_user.name)]
if server.sql_user.passwd:

if event.find(delimiter + "\nBEGIN\n" + delimiter) >= 0:
in_transaction = True
elif not in_transaction:
yield group
group = []
else:
p = event.find("\nCOMMIT")
if p >= 0 and (event.startswith(delimiter, p+7)
or event.startswith(delimiter, p+8)):
yield group
group = []
in_transaction = False
Example
4-14 shows
a function named scan_logfile that scans the mysqlbinlog output
for the global transaction IDs that were introduced. The function accepts a master from
which to fetch the binlog file, the name of a binlog file to scan (the filename is the name
of the binary log on the master), and a callback function on_gid that will be called
whenever a global transaction ID is seen. The on_gid function will be called with the
global transaction ID (consisting of a server_id and a trans_id) and the binlog position
of the end of the transaction.
Example 4-14. Scanning a binlog file for global transaction IDs
_GIDCRE = re.compile(r"^UPDATE Last_Exec_Trans SET\s+"
r"server_id = (?P<server_id>\d+),\s+"
r"trans_id = (?P<trans_id>\d+)$", re.MULTILINE)
_HEADCRE = re.compile(r"#\d{6}\s+\d?\d:\d\d:\d\d\s+"
r"server id\s+(?P<sid>\d+)\s+"
r"end_log_pos\s+(?P<end_pos>\d+)\s+"
r"(?P<type>\w+)")
def scan_logfile(master, logfile, on_gid):

def promote_slave(slaves):
slave_info = {}
# Collect the global transaction ID of each slave
for slave in slaves:
slave.connect()
server_id, trans_id = fetch_global_trans_id(slave)
slave_info.setdefault(trans_id, []).append((server_id, trans_id, slave))
slave.disconnect()
# Pick the slave to promote by taking the slave with the highest
# global transaction id.
new_master = slave_info[max(slave_info)].pop()[2]
def maybe_change_master(server_id, trans_id, position):
from mysqlrep.utility import change_master
try:
for sid, tid, slave in slave_info[trans_id]:
if slave is not new_master:
change_master(slave, new_master, position)
except KeyError:
pass
# Read the the master logfiles of the new master.
new_master.connect()
logs = [row["Log_name"] for row in new_master.sql("SHOW MASTER LOGS")]
new_master.disconnect()
# Read the master logfiles one by one in reverse order, the
# latest binlog file first.
logs.reverse()
for log in logs:
scan_logfile(new_master, log, maybe_change_master)
Worth noting in the code is that the slaves are collected into a dictionary using the
transaction ID from the global transaction ID as a key. Since there can be several slaves

know table names and the intricacies of how to produce and manipulate the global
transaction ID. Once you understand them, the complexities are not as much of a
barrier as they might seem at first. Often, you can handle them with relative ease by
creating functions in the application code that the application writer can call without
having to know the details.
Another approach is to put the transaction commit logic in the database server by using
stored procedures. Depending on the situation, this can sometimes be a better alter-
native. For example, the commit procedure can be changed without having to change
the application code.
For this technique to work, it is necessary to put the transaction ID from the
Global_Trans_ID table and the server ID into either a user-defined variable or a local
variable in the stored routine. Depending on which approach you select, the query in
the binary log will look a little different.
Using local variables is less likely to interfere with surrounding code since user-defined
variables “leak” out from the stored procedure.
Procedures | 141
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The procedure for committing a transaction will then be:
CREATE PROCEDURE commit_trans ()
SQL SECURITY DEFINER
BEGIN
DECLARE trans_id, server_id INT UNSIGNED;
SET SQL_LOG_BIN = 0;
INSERT INTO global_trans_id() values ();
SELECT LAST_INSERT_ID() INTO trans_id,
@@server_id INTO server_id;
SET SQL_LOG_BIN = 1;
INSERT INTO last_exec_trans(server_id, trans_id)
VALUES (server_id, trans_id);
COMMIT;

failure. The reasons for this will become clear in a moment.
Using a circular replication setup with three or more servers can be quite practical for
reasons of locality. As a real-life example, consider the case of a mobile phone operator
with subscribers all over Europe. Since the mobile phone owners travel around quite a
lot, it is convenient to have the registry for the customers close to the actual phone, so
142 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
by placing the data centers at some strategic places in Europe, it is possible to quickly
verify call data and also register new calls locally. The changes can then be replicated
to all the servers in the ring, and eventually all servers will have accurate billing infor-
mation. In this case, circular replication is a perfect setup: all subscriber data is repli-
cated to all sites, and updates of data are allowed in all data centers.
Setting up circular replication (as shown in Figure 4-10) is quite easy. Example 4-16
provides a script that sets up circular replication automatically, so where are the com-
plications? As in every setup, you should ask yourself, “What happens when something
goes wrong?”
Figure 4-10. Circular replication setup
Example 4-16. Setting up circular replication
def circular_replication(server_list):
count = len(server_list)
for i in range(0, count):
change_master(server_list[(i+1) % count], server_list[i])
In Figure
4-10, there are four servers named for the cities in which they are located (the
names are arbitrarily picked and do not reflect a real setup). Replication goes in a circle:
“Stockholm” to “Moscow” to “Paris” to “London” and back to “Stockholm.” This
means that “Moscow” is upstream of “Paris,” but downstream of “Stockholm.” Sup-
pose that “Moscow” goes down suddenly and unexpectedly. To allow replication to
continue, it is necessary to reconnect the “downstream” server “Paris” to the “up-
stream” server “Stockholm” to ensure the continuing operation of the system.

using the following command:
paris> CHANGE MASTER TO
-> MASTER_HOST='stockholm.example.com',
-> IGNORE_SERVER_IDS = (2);
For versions of MySQL earlier than version 5.5, there is no such support and you may
have to devise some other means of removing the offending events. The easiest method
Figure 4-11. Changing topology in response to a failing server
144 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
is probably to bring in a server temporarily with the same ID as the crashed server for
the sole purpose of removing the offending event.
The complete procedure to shrink the ring in a circular setup—assuming that you are
using MySQL 5.5—is as follows:
1. Determine the global transaction IDs of the last committed transactions on the
downstream server for all servers that are still up and running.
paris> SELECT Server_ID, Trans_ID FROM Last_Exec_Trans WHERE Server_ID != 2;
+-----------+----------+
| Server_ID | Trans_ID |
+-----------+----------+
| 1 | 5768 |
| 3 | 4563 |
| 4 | 768 |
+-----------+----------+
3 rows in set (0.00 sec)
2. Scan the binary log of the upstream server for the last of the global transaction IDs
seen in Last_Exec_Trans.
3. Connect the downstream server to this position using CHANGE MASTER.
paris> CHANGE MASTER TO
-> MASTER_HOST='stockholm.example.com',
-> IGNORE_SERVER_IDS = (2);

| 985761 |
+-------------------------------------------------+
1 row in set (156.32 sec)
7. Determine
the position
of the event on the restored server by scanning the binary
log of the restored server for the global ID that was last seen by the downstream
server.
8. Connect the downstream server to the restored server and start replication.
paris> CHANGE MASTER TO
-> MASTER_HOST='moscow.example.com',
-> MASTER_LOG_FILE='moscow-bin.000107',
-> MASTER_LOG_POS=196758,
Query OK, 0 rows affected (0.18 sec)
moscow> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
Conclusion
High availability is a nontrivial concept to implement in practice. In this chapter, we
presented a look into high availability and how you can achieve it with MySQL. In the
next chapter, we will look more at high availability as we examine a companion topic:
scaling out.
Joel’s email notification chime sounded. He clicked on his email and opened the latest
message. It was
from Mr. Summerson, who made comments about his report. He read
it through and at the bottom found what he expected. It read, “I like the redundancy
ideas and especially the hot standby strategy. Make this happen.”
Joel sighed as he realized his plans for getting to know some of his coworkers were
going to have to wait. He had a lot of work to do.
146 | Chapter 4: Replication for High Availability
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.


Nhờ tải bản gốc
Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status