August 2012 – Scalemysql

Mysql in it’s early days had the reputation of being simple and fast but lacking critical features for many use cases. Over the years, it’s become more stable and mature with a ton of advancements in the innodb storage engine, features such as stored code, replication improvements, query planning improvements, etc.. One area mysql has not tackled is BI and analytic OLAP. I can only assume it’s not been a priority with the user base to date. Oracle and SQL server cover a different market and have these options, however, in this new era of big data, a wider array of businesses want to collect and analyze data such as performance statistics to improve their business offering. Fortunately, there are a few 3rd party vendors who have taken on this area with the spirit of open source and/or tiered pay structure in mind, as you’ll find it can be very costly otherwise.

In our search to analyze the ever increasing volume of statistical data we were collecting, we considered the following products,

infobright
infinidb
luciddb
monetdb
fastbit
metakit
greenplum
hive

most all of these solutions implement a columnar architecture to the data in one way or another, essentially rows become columns, columns become rows. This is done to take advantage of the layout of the data and allow sequential reading (vs. random reads) of each long row (which was a column) on disk, which will happen much of the time doing OLAP type queries, aggregating on a dataset much larger than memory.

Back to the products, a number we rejected for one reason or another, generally because they were either developed more for the scientific community (metakit, fastbit), geared towards another open source db such as postgresql (greenplum), or focused on distributed systems (hive) which is a bit of overkill for us at this point.

so we were left with,

infobright
infinidb
luciddb
monetdb

infobright and infinidb have a free and paid version (the free versions both have limitations), luciddb and monetdb are open source.

our priorities in order were as follows,

– high compression
– ease of use (ideally a mysql client and connection)
– full DDL, DML statement use

there is a very good benchmark comparison done by percona on these 4 dbs, http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/

keep in mind it’s a couple of years old, on a specific set of data, so definitely worth testing yourself for your particular needs.

so why do we care about compression? The pros are two-fold, first, getting significant compression obviously saves space, one of the premiums of an OLAP system, the longer you can put off sharding (a distributed system). Typical compression on full products range from a very impressive 10:1 – 40:1, so essentially, you can have 10 TBs of data on a typical 1TB disk. Second, the compression means the disk arm for a sequential read only travels a fraction of the distance, making the operation and read faster.

Right off the bat, infinidb offers only a very modest 22% compression reduction and is their major limitation on the free version (paid you get 40:1 compression), and monetdb it’s not clear there is any meaningful compression at all, the tests matching their non-commital statement about the products compression

Infobright and LucidDb both offer in the neighborhood of 10:1 compression, very nice for a free product.

note: Now, if we were in the market for a paid solution, we take a hard look at InfiniDB along with the 40:1 compression it is a mysql build with their proprietary storage engine incorporated making it an easy transition from stock mysql.

So we are down to,

Infobright
LucidDB

The two have very different remaining pros and cons, Infobright is also a mysql build with a proprietary storage engine incorporated, however, the drawback is the limitations they’ve implemented on the free version, namely most of the DDL and DML statement functionality has been stripped away. That means you can not insert, update, delete, truncate, or alter a table. At first glance that may sound insane, but keep in mind an analytics environment in many cases needs only primarily two things, to be able to load data, and to be able to select data, and do it quickly, which it can do very well, mysql style. However if you wish to change or alter the table or data, you’ll need to fully reload the table, and these restriction may be tough for an environment needing to serve customers, however our requirements are only a daily load, appending the existing data, internal analysis, and we rarely change table structure on these particular tables, and so we are willing to deal with a dump and reload of a table occasionally if need be.

LucidDB is a completely different animal, requiring Java to run, and uses it’s own SQL variant and data loading methods (for instance, there is no datetime datatype, you’re only choice is varchar to keep the data integrity). It does have full DDL, DML statement functionality and no other obvious restrictions. Certainly with a learning curve, we could use this product, but when considering the people that would use this database, such as BAs and developers, it made sense to choose a product that is as consistent as possible with what we are currently using.

So infobright was the choice, with some work arounds to make it usable. Truncate and rename table can be emulated, and other issues had to be dealt with, for instance, you can not simply copy over an innodb schema to a brighthouse schema in many cases, it will complain about keys, foreign keys, auto increment, unsigned datatypes, etc.. We wrote a script to parse this stuff out. Not even close to being usable generally, just our specific tables, otherwise I’d include it here. Another example the utf8 charset is handled badly, you’ll need to increase your datatype definition size 4x e.g. varchar(32) -> varchar(128). There are number of odd things such as this, I’m not so sure I’d be as eager to use for outside customers, but for our internal purposes, it will fit just fine.

This is one of many perl script tools from Percona that makes life easier administrating, monitoring and testing mysql, this one solves the need to alter a table (adding a key, column, partitioning, etc…) on a production system that can’t be taken down. Many of the current Percona tools came from maatkit, developed by Baron Schwartz, this particular one from my understanding was initiated by the Facebook development team for the myriad of db servers they administrate, however, there seems to be two distinct OSC products, the one we are going to talk about is pt-online-schema-change.

It’s very impressive, from a conceptual and safe use point of view. The official description and download page is at http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html

how it works under the hood

The description above will tell you the basics of how it works, essentially an empty altered copy of the table is created, then filled by selecting data from from the active table in manageable chunks as a precaution to overloading a busy server. Triggers on the active table are temporarily created to keep the altered copy up to date with real time changes. You may question how the trigger can update a row on the new table that may not be there yet. This is done by using ‘replace’ (an insert or delete and insert) for all insert, update and delete statements in the triggers. Inserts are of course inserted, updates are now inserts on the new table (or delete and inserts if the row is already there), and deletes do the delete portion of the replace and have nothing to insert after the delete. Now to avoid conflict with the rows that are being selected and inserted from the active table, an ‘ignore’ flag in used on those bulk inserts to leave already inserted rows from the trigger as is. once the copy is completed, an atomic rename of the two tables is done.

safety first

The are many safeguards to ensure your data doesn’t get screwed up or server overloaded, among them options are the ability to control the chunk size of the selects, setting a max amount of threads connected and running due to triggered queries and general load, a dry run vs. execute option which you have to explicitly specify, a rollback feature if the process is terminated at any point along the line, no dropping the old table, no swapping the table and so on. The link above has specific implementation details.

things to keep in mind (as of pt 2.1.2)

a couple of things that I’ve run across are,

1) when dropping foreign keys, you will need to specify the altered table foreign key name, which will not be the same as the active copy foreign key name. The system uses ‘_’ in front on the altered copy, so you’ll need to do the same when specifying those foreign keys.

2) As per mysql, partitioning does not support foreign keys, and you will not be able to both partition and drop foreign keys in the same operation. You’ll most likely need to run it twice, separately.

3) If you are running bin-logs with statement based replication and without tx_isolation=READ-COMMITTED or innodb_locks_unsafe_for_binlog, the ‘insert into select * from…’ for each chunk in the copying process will need a read lock on the table, and depending on what else it going on, such as backups, potentially can cause a lock wait timeout on the process and exiting without completing.

4) there can be a momentary stall while applying the switch of tables. see below for details.

production results

we successfully used pt-online-schema-change to indirectly alter a 1.7 billion row table handling 50,000 queries per second with no downtime, taking about 15 hours to complete. There was a momentary stall (a fraction of a second) resulting in a load spike on switching out the tables, although max_used_connections does not seem to corroborate query build up as the reason. We’ll take a closer look next time.

Month: August 2012

warehousing and data analytics in a mysql world

pt-online-schema-change