Galera gotchas – Scalemysql

We’ve recently implemented Galera clustering and have been pleased with the relatively easy install and implementation. A quick description of galera is the joining of individual mysql dbs as nodes to create a cluster that features multi-threaded synchronous replication, which allows for true high availability while still using your original db and engine (innodb). Likewise, the ability to quickly break down the cluster to individual servers if need be. Our specific set up includes two dbs and an arbitrator to avoid ‘split-brain’ in case of a node failure. The process of adding a node simply involves connecting to the group, and the data transfer and sync is automatically done via SST (state snapshot transfer) with the typical overhead associated with the familiar backup methods, mysqldump, rsync and more recently xtrabackup. With xtrabackup you truly have a non-blocking method to add and re-add nodes. Recent improvements also include the addition of IST (incremental state transfer) which allows you to disconnect a node, do work, and reconnect and quickly catch up on the missing transactions on that node.

As mentioned, the install and implementation has been quite smooth, however here are a few things to keep in mind,

1. On installation, when running into errors, it’s important you analyze both the joiner AND donor error logs, as they of course will have differing messages. We ran into what ended up being an xtrabackup issue, which was misleading from the joiner logs, but clear as day in the donor logs.

2. On initial installation, you’ll be told to set wsrep_cluster_address=gcomm:// for your first node, as there is nothing to join to. However, DON’T keep this in your my.cnf, as on restart, you’ll end up creating a new cluster (again), not join the one you’ve made. Change it to specify the ip of one of the other nodes.

3. Similar to replication, Galera will auto-increment offset by the number of nodes, this is automatic, however, keep this in mind regarding large tables and datatype limits.

4. You may be surprised to learn that some fundamental operations will lock the entire cluster for the duration of the operation without some care. Here are two examples and by no means the only statements that can be long running and cause grief,

An alter table locks the entire cluster, even on a table that is not in use
A load data infile also locks the entire cluster even on a table that is not in use

The first is partially due to how galera handles DDL statements, total order isolation (TOI) is the default, more info can be found here,

http://www.codership.com/wiki/doku.php?id=rolling_schema_upgrade

the fact that is affects all tables and dbs is a bug / feature, more details here,

https://bugs.launchpad.net/galera/+bug/928919

I assume the load data infile lock is due to the synchronous requirements of the cluster waiting to receive confirmation of the commit on the second node.

you have a couple of options to avoid a cluster wide lock,

a) for the alter table scenario, as detailed at the link above, you can use a DDL handling method known as rolling schema upgrade (RSU) which automatically detaches the node for the duration of the operation, then resynchronizes when finished.

b) for both the load data infile and alter table you can do a more manual version of this by simply disconnecting each node, performing the operation, and reconnecting

c) A third version is to issue a command to only apply the command locally,

SET wsrep_on=0;
do your command (alter, load data, etc...)
SET wsrep_on=1;

All methods would have to be performed on all nodes and particularly with ALTER, you’ll need to consider whether inconsistent schemas on the nodes will cause replication problems.

5. As noted in the limitations section,

http://www.codership.com/wiki/doku.php?id=limitations

only innodb is supported (myisam is starting to become experimental). when dealing with the mysql permissions db (which are myisam), use ‘grant’ and ‘create’ instead of ‘insert’ if you want the commands to replicate.

6. A caveat and further explanation to the last point – to tables other than innodb, DML statements are not replicated, but DDL statements are. This is because DDL statements seem to be passed on using a different method than DML statements. This difference also has implications that can cause some confusion; on our two nodes, we were mystified as to why the dbs would replicate DDL statements both directions node1 <–> node2 , yet would not replicate DML statements, only node1 –> node2. Figuring that the DDL was replicating both ways, we ruled out a configuration restriction issue, which was wrong, as the eventual cause was replicate-wild-do-table= specification in my.cnf for a particular db, while we were using the ‘test’ db for the test. The setting would not allow DML replication, yet Galera allowed the replication of the DDL (create and alter) of the table in the ‘test’ db.

7. It may take some time for the node to become fully operational, for instance you might not be able to login at first, or you’ve logged in and issue a command, ‘command not found’ is returned. Just give it a little time and/or check the error logs to confirm it’s on track.

One Reply to “Galera gotchas”

Emanuele Luchetti says:

June 26, 2014 at 10:15 am

For the paragraph 4) I would add a d) point:

We are currently using the following strategy for DDL not to meet galera limitations, it’s working correctly, it never caused Galera to hang so far:

CREATE TABLE reservations2 LIKE reservations;
LOCK TABLES reservations WRITE,reservations2 WRITE;
INSERT INTO reservations2 SELECT * FROM reservations;
TRUNCATE TABLE reservations;
// the real DDL
ALTER TABLE `reservations` CHANGE COLUMN `cancellerid` `cancellerid` VARCHAR(100) NULL DEFAULT NULL;

INSERT INTO reservations SELECT * FROM reservations2;
UNLOCK TABLES;
DROP TABLE reservations2;

One Reply to “Galera gotchas”

Leave a Reply to Emanuele Luchetti Cancel reply