Tags

, , ,

Having big tables is one of the expected database problems, especially, for the fast growing database systems. In fact, big tables itself is not a problem but with big tables, the following problems are strongly expected:

  1. Retrieving data from big tables is so slow.
  2. It is a very hard job to maintain those tables like adding/removing an index, adding/dropping/modifying a column, … etc.
  3. System resources, especially, the IO system might not be able to handle such huge traffic of writes and reads.
  4. When it comes to the reporting queries, it might be a horrible nightmare!
  5. Always cause disk space problem.

All the above problems will show up the need for scaling! So, let’s check out what are the possible solutions for that problem:

  • MySQL Partitioning: Is a good solution but we will still face disk space and server resources problems.
  • MySQL Replication: Actually, MySQL replication is good for read scaling not write scaling which means only retrieving data and reporting queries problems can be solved but not the other ones!
  • Galera Cluster for MySQL: Mainly for High Availability and also read scaling but not for writes as data will be available (replicate) on all nodes!!
  • MySQL Sharding: In most cases, this is the most suitable solution for such problems and here I’ll talk about a promising storage engine called Spider for MySQL as a sharding solution.

Note: MySQL Cluster is one of the MySQL sharding solutions and can be used to solve such problems with specific circumstances and conditions.

What is sharding?

Sharding is splitting up (or horizontally partition) big tables among several isolated DB instances, each partition or DB instance called a database shard!

The problem of sharding (not all sharding solutions) is that a logic or a decision point is required between the application and the shards to determine which shard(s) should be selected for each query!

Now, what is Spider for MySQL?

Spider is a Storage Engine (SE) depends on XA transactions and partitioning which allows MySQL server to treating tables on different MySQL instances as they are only one table on a same instance!

Still not clear?! Let’s check the following figure:

spider_overview.png

With Spider for MySQL, we can divide a big table among multiple database instances (having the normal MySQL binaries) and from another MySQL instance (having Spider plugin) we will have – let’s say – a virtual table pointing to the tables in the other instances. The application will only need to connect to the Spider instance and will be able to get the required data from the other ones – through Spider – without the need to connect to them directly or even care about the logic for choosing which server(s) or shard(s) for each query.

Sounds interesting, right?! let’s check the following example for deeper dive:

spider_detail.png

If we have a big table called “logs” containing data for years 2012, 2013, 2014 and 2015 then we can divide that table among four MySQL instances by storing the data for years 2012, 2013, 2014 and 2015 on servers A, B, C and D respectively. Then we will create a mapping table on the Spider instance pointing to the other ones using MySQL partitioning specifying for each partition the connection information for the server in which will be used when hitting that condition.

If for ex. a query on the spider wanted to get the data for the year 2013 (hitting partition2 condition) then the host information for server B will be selected and the data will be retrieved from server B.

By the way, this is not only for selects but also for DMLs (insert, update and delete statements) the same rule will be used.

Advantages and disadvantages of Spider SE

Advantages

  • Divide the load among many database servers (scaling reads and writes) and processing in parallel.
  • Independent from the application (storage layer).
  • Easy to extend. When introducing a new host, a new partition needs to be added to the spider table.
  • Creating non Spider SE tables is also possible on the Spider bundled server.
  • If a shard failed, the application won’t be affected except for those queries which targeting that failed shard.
  • Spider SE was introduced in MariaDB starting from version 10.0.4.
  • A slave for the Spider server can be created to have all data in one server (XA transaction must be enabled).
  • Spider supports SSL.

Disadvantages

  • Query cache and FULL TEXT indexes are not supported in Spider tables. However, FULL TEXT indexes can be added on the individual backend tables.
  • Physical backups (like Xtrabackup) won’t backup the data on the spider tables as the physical data is stored on the shards (shards can be backed up individually).
  • Spider server is a single point of failure (we might need to make it High Available by duplicating it and using VIP on the top of them).
  • Depends on MySQL Partitioning, which means choosing wrong partition key or condition might kill the performance.
  • Also as a result of using MySQL Partitioning, foreign keys are not supported.

Spider SE is not ready for production yet but at least its a promising Engine.

Conclusion

Spider SE for MySQL is a good solution for MySQL sharding, if you face big tables problem and you are evaluating MySQL sharding. Just give it a try!

In the next post, I’ll go through a simple example on how to implement Spider for MySQL.