November 2012 – Scalemysql

Replication has a multitude of uses and a very effective tool to solve a number of problems, however, it must be used with some forethought, particularly if you are using it to scale OLTP queries, as of course the drawbacks of it’s single-threaded, asynchronous nature can cause all sort of issues.

Instead of a multi-slave replication setup, We recently applied it to a sharded implementation. There are a few fundamental choices regarding sharding, one of which is what to do with the common tables that are not sharded. You can have one copy either on one or a separate node, or replicate these tables to all shards. Factors include the network traffic, the queries (joins across tables), and the write frequency. Considering these factors, we chose to replicate to all nodes, as we wanted to minimize code changes and minimize any additional network traffic and connection overhead going to an additional node on reads (as we run a high volume of simple queries).

Then the question becomes how you replicate, as you only want the common tables to be replicated. Especially when starting your sharding with a few shards, you may be tempted to leave the common tables loose and/or in another db on one of the shards and either use ‘replicate-do-db’ or ‘replicate-do-table’ on the slave side, or ‘binlog-do-db’ on the server side (the db options would of course require these tables in a separate db). Using binlog-do-db means you only write the to the binlog or relay log for the tables in question, while replicate-do-db/table means only when the SQL is replayed on the slave do non-needed tables get ignored. In our case this difference in traffic would be 100:1, the load of a full write to the slave would be significant. However, the major drawback is you no longer are able to use the binlog on the master for point in time recovery which is a deal breaker for us. By leaving the tables in a common instance, you are forcing yourself to make this choice between less replication traffic vs. losing point-in-time backup coverage.

The alternative is to put these common tables in a separate mysql instance, (now the instance can be on the same server). The data is written to this instance, and then this instance containing ‘common tables’ db is replicated to all shards as that db, with the advantages of a separate binlog for this instance, which in our case was modest traffic while still keeping the original binlogs on the shards intact. This is also more in line with scaling the shards, going from 3 -> 20 shards will be transparent, as the common db can simply be moved to it’s own hardware when needed, with lots of room to grow on the replication end of things as well.

For any application needing high availability/scalability on the database, you’ll probably look at a load balancing solution such as HAproxy. The typical description of the architecture boils down to a situation where your app servers are connecting to (usually) more than one Haproxy server on separate hardware, with a keep alive script between them for failover purposes. The active Haproxy serves loadbalanced requests to the pool of databases, either in a failover or balanced scenario.

There is an alternative method to this set up. You can simply add an Haproxy server directly on the app servers (6 app servers, 6 haproxy servers), with no keep alive script. If an app server goes down, the haproxy goes down with it, which is just fine, as each app server has it’s own dedicated Haproxy. Each one also serves a pool of databases either in a failover or balanced scenario.

There are not many situations I can say one option is clearly better than the other, however, the for life of me, I can’t see the tradeoff of why you wouldn’t do the latter setup. One less vertical hardware layer, no need for haproxy failover, no bottleneck and/or complicated haproxy loading logic at haproxy especially as more app and dbs are added. Sure there are many haproxies in play, but they all read from the same dbs and act based on exactly the same test from these dbs, which you should certainly be doing above and beyond a connection test.

Month: November 2012

using replication wisely

HA database pooling