Tags

, , ,

Providing a suitable High Availability (HA) solution for each database system is one of the challenging tasks for a DBA and here we have to answer some questions like the following ones:

  1. What is the cost for that HA solution?
  2. Is it required to change the system structure or the DB design to use that HA solution?
  3. Is it complicate to understand, use or maintain ?

Choosing the suitable HA solution for each system will depend on the answers of such questions …
In this post, I’m going to write about MySQL Master High Availability MHA as a nice tool which provides HA for MySQL by performing automatic fail-over or fast online master switching with almost no downtime!

Before going through more details about MHA, let’s first answer the previous questions:

  1. MHA is a free opensource tool, no cost to use it in your system and even no additional servers are required.
  2. No system changes or DB redesign are required to use MHA, keep your system as it is and just use the tool! (Assuming that MySQL replication is already being used).
  3. MHA is easy to implement and its usage is simple.

Overview

As I mentioned above, MHA provides two main benefits:

  • Automatic fail-over: which is helpful for the unexpected master failure, crash, … etc. (the fail-over can be done also manually).
  • Fast online master switching: which is helpful when performing the regular planned or scheduled maintenance operations like kernel updates, MySQL upgrade, HW upgrade, … etc.

If we have 3 servers, master and two slaves (slave1 and slave2) and we want to perform a planned maintenance on the master, we will need to perform the following:

  1. Make sure that no slave lagging problem and all servers are at the same point.
  2. Promote one of the slaves (slave1) to be a new master.
  3. Point the other slave (slave2) to slave1 instead of master.
  4. Configure master to be slave for slave1 to get all updates which will happen during the maintenance.
  5. Stop MySQL service on master to perform the maintenance operation (kernel update “server restart”, MySQL version upgrade, .. etc).
  6. Start MySQL service on master and wait until getting all missing transactions from slave1.
  7. Promote back master to be the current master.
  8. Re-point slave2 to master again.
  9. Configure slave1 to be back as a slave for master.

It’s not as simple as it appears, especially, if classic replication is being used because we will need to get the matched binary log file name and position between the old master and the new one to be used on the slave(s) in the re-pointing step to the new master!
Also, it makes sense to use VIP on the top of the DB layer to be used by the application to avoid extra work from the application – or maybe downtime – during the fail-over process. So, we will need to do extra steps to remove/add the VIP from/to the demoted/promoted master!

mha

MHA can make such process pretty simple and easy!!

Implementation of MHA (version 0.56 at the time of writing this post)

Installation:

MHA consists of two packages, node and manager:

  1. Node package: Should be installed on all DB servers (the master and the two slaves in this example).
  2. Manager package: Recommended to be installed on a separate server but it’s OK to install it on any one of them (preferred not the master in case of the master failed e.g. slave1).

Note:

  • MHA packages can be downloaded from here.
  • Some Perl dependencies are required (DBD::mysql on all servers while Config::Tiny, Log::Dispatch, Parallel::ForkManager and Time::HiRes are required on the manager).

Scripts preparation (on the manager):

  1. Create an application config file (check /sample/conf):vi /etc/mha/app1.cnf[server default]
    # mysql user and password for MHA (should be created on all DB servers)
    user=mhauser
    password=M0llP@ss
    # Replication user password
    repl_password=s3cret
    ssh_user=root
    # working directory on the manager
    manager_workdir=/var/log/masterha/app1
    # working directory on MySQL servers
    remote_workdir=/var/log/masterha/app1
    #In case of online master switching, the following script will be used
    master_ip_online_change_script=/etc/mha/master_ip_online_change
    #In case of automatic fail-over, the following script will be used
    master_ip_failover_script=/etc/mha/master_ip_failover

    [server1]
    hostname=master

    [server2]
    hostname=slave1
    #Setting higher priority of being the new master (not guaranteed if it was lagging too much)
    candidate_master=1

    [server3]
    hostname=slave2
    #if this server should never be the master (diff data center, weak HW, … etc.)
    no_master=1

    Notes:

    • No need to specify which server is the master as MHA detects it automatically.
    • candidate_master and no_master options are needed only in the fail-over as in the online master switching we will specify the new master host.
  2. Copy the scripts “master_ip_failover” (will be executed when performing the fail-over) and “master_ip_online_change” (will be executed when performing the online master switching) from /sample/scripts to “/etc/mha/” and make sure they are executable (you can copy only the one you plan to use).
  3. Add the VIP handling part in both scripts:my $vip = '192.168.1.100'; # Virtual IP
    my $ssh_add_vip = "/sbin/ip addr add $vip/24 dev eth1";
    my $ssh_del_vip = "/sbin/ip addr del $vip/24 dev eth1";
    my $ssh_send_garp = "/sbin/arping -U $vip -I eth1 -c 1";
    .
    .
    sub main {
    if ( $command eq "stop" ) {
    .
    .
    ## Removing the VIP
    print "Disabling the VIP on the old master if the server is still UP: $orig_master_host \n";
    &del_vip();
    print "DONE ... \n";
    .
    .
    elsif ( $command eq "start" ) {
    .
    .
    ## Adding the VIP
    print "Enabling the VIP - $vip on the new master - $new_master_host \n";
    &add_vip();
    print "DONE ... \n";
    .
    .
    # A simple function that enables the VIP on the new master (should be called when starting the new master)
    sub add_vip() {
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_add_vip \"`;
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_send_garp \"`;
    }
    # A simple function that disables the VIP on the old_master (should be called when stopping the old master)
    sub del_vip() {
    `ssh $orig_master_ssh_user\@$orig_master_host \” $ssh_del_vip \”`;
    }

Environment preparation:

  1. Create MySQL user on all DB servers to be used by MHA:
    GRANT ALL ON *.* TO 'mhauser'@'%' IDENTIFIED BY 'M0llP@ss';
  2. Create the replication user on the candidate-master server (slave1 in this example):
    GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY's3cret';
  3. Generate ssh key on all servers if it was not already generated:
    ssh-keygen -t rsa
  4. Allow connections between all servers using the ssh keys without prompting for passwords (passphraseless login):On the master:
    root@master:/# ssh-copy-id root@slave1
    root@master:/# ssh-copy-id root@slave2
    root@master:/# ssh-copy-id root@master ## Self referential is necessary!!
    And do the same on slave1 and slave2 …
  5. Check from the manager (“slave1”) if SSH settings are correct or not:
    masterha_check_ssh --conf=/etc/mha/app1.cnf
    The message “All SSH connection tests passed successfully” should be printed at the end, otherwise, something wrong should be fixed first.
  6. Check if the replication health is OK or not (from the manager also):
    masterha_check_repl --conf=/etc/mha/app1.cnf
    The message “MySQL Replication Health is OK” should be printed at the end, otherwise, something wrong should be fixed first.
  7. Add the virtual IP on the master to be used by the application (if it was not already added):
    ip addr add 192.168.1.100/24 dev eth1

Now we are ready to perform the fail-over 😉

Perform the fail-over:

Automatic fail-over:

  • To perform the fail-over automatically, the manager should monitor the master. Use the following command to do so:
    slave1# nohup masterha_manager --conf=/etc/mha/app1.cnf /var/log/masterha/app1/app1.log 2>&1 &
  • To check if it is monitoring or not, execute the following command:
    slave1# masterha_check_status --conf=/etc/mha/app1.cnf

Once the manager failed to connect to the master three times (you might need to configure secondary_check_script to make sure it is not a network issue) it will change the master automatically and stop monitoring.

Notes:

  • shutdown_script should be used to power-off the master to make sure it is really down and to avoid split-brain.
  • If you want to stop monitoring the master for any reason, use the following command:
    masterha_stop –conf=/etc/mha/app1.cnf

Manual fail-over:

Sometimes, automatic fail-over is not preferred, so we can do it manually once the master failed using the following command (detecting the master failure is our responsibility!):

masterha_master_switch --master_state=dead --conf=/etc/mha/app1.cnf --dead_master_host=master

Online master switching (planned maintenance):

To perform online master switching for maintenance purpose, just use the following command:
masterha_master_switch --master_state=alive --conf=/etc/mha/app1.cnf --new_master_host=slave1 --orig_master_is_new_slave
Notes:

  • It is recommended to double check the SSH connectivity and replication health also directly before switching the master using masterha_check_ssh and masterha_check_repl scripts.
  • MHA will execute “FLUSH TABLES WITH READ LOCK;” after freezing the writes on the master.
  • Using --orig_master_is_new_slave option will make the master slave for the new master after the switch process.

Common issues and useful tips:

SSH checking is not successful!!

  • Host-names with its IPs are added to /etc/hosts ??
  • Remote root login is permitted ?? “PermitRootLogin yes” in /etc/ssh/sshd_config (sshd should be restarted after that change)!
  • Don’t forget the self referential!!

Replication checking is not successful!!

  • Binary logging is not enabled on the candidate-master slave?!
  • Binary logging is enabled on any of the slaves but the replication user is not created there ?? MHA will consider any slave having bin_log enabled as a candidate master in the replication checking. So, either we have to disable bin_log on all slaves except the candidate master or create the replication user on all slaves having bin_log enabled.
  • Filtration rules are different on the slaves? It should be the same on all slave servers!
  • binlog path is different? consider adding master_binlog_dir in app1.cnf!
  • Same with pid file path, mysql port, mysql client bin dir, … etc. if they are in different paths (full parameter list can be checked out here).
  • Master/Master in active/active setup? (one of them should be read_only as only one active is allowed “active/passive”).
  • MySQL user for MHA ? not enough privileges!!

Slaves can’t connect to the new master!

The replication user does not exist on the candidate master ?

Changing the VIP to the new master is not being reflected immediately to servers in other networks!!

“Gratuitous ARP” need to be sent after adding the VIP on the new master to update the arp table!!
arping -U $vip -I eth1 -c 1

One manager can manage many systems!

I.e. app1, app2, … etc. and it is recommended to have one manager per data-center.

Examples of who uses MHA on productions:

Conclusion

  • MHA implementation in short:
    1. Install node package on all MySQL servers.
    2. Install manager package on the manager.
    3. Create the needed scripts (app1.conf, master_ip_failover, .. etc.).
    4. masterha_check_ssh –conf=/etc/mha/app1.cnf
    5. masterha_check_repl –conf=/etc/mha/app1.cnf
    6. masterha_manager for fail-over or masterha_master_switch for master switch.
    7. That’s it !!!
  • Having your system high available with no additional costs, no system or DB redesign and not in a complex way is possible with the opensource tool, MHA!
  • Finally, thanks to Yoshinori Matsunobu for creating such wonderful tool!!