What is the default sharding key in MySQL Cluster?

Tags

Abdel-Mawla Gharieb, MySQL, MySQL Cluster, Partitioning, Sharding

MySQL Cluster does an automatic sharding/partitioning to the tables across data nodes, enabling databases to scale horizontally to serve read and write-intensive workloads, but what is the default sharding key used in partitioning the data?
According to the recent update (Oct, 2016) of the MySQL Cluster white paper, primary key is the default sharding key:

By default, sharding is based on hashing of the primary key, which generally leads to a more even distribution of data and queries across the cluster than alternative approaches such as range partitioning.

~~However, that is not the case in all MySQL Cluster versions so far!~~
In this post, I’ll do some test cases on MySQL Cluster (of 4 datanodes) to confirm the default sharding key.

Testing on MySQL Cluster 7.2.26

If the PK is the sharding key, then a PK value lookup will be checked in only one partition and – according to the previous explain output – that was the case already in this example, a single id value (id=1) does only require scanning one partition, ”p3”.

What is the case then if we didn’t specify a PK for a table in MySQL Cluster?

As we can see in the previous explain plan, all partitions in the table (p0, p1, p2 & p3) will be scanned to retrieve a single id value (id=1). The reason for that is because MySQL Cluster creates a hidden column to be the sharding key on tables without PKs.

Up to here, the results are as expected but in the latter MySQL Cluster versions (7.3, 7.4 and 7.5), the explain plan has different output!

Testing on MySQL Cluster 7.5.4 (same results on 7.3 and 7.4):

Starting from MySQL Cluster 7.3, the explain plan shows as the hidden column is always used as the default sharding key even on a table that has a PK (check my bug report about this case Bug #84374) which is not true according to the ndb_desc tool! Thanks Maurits for the correction.
If we specified the sharding key explicitly, the output will be corrected:

Conclusion:

Primary key is the default sharding key in MySQL Cluster 7.2 and a hidden column will be used if no PKs defined for a table.
Until the bug got fixed and starting from MySQL Cluster 7.3, ~~the hidden column is always the default sharding key even on a table that has a PK~~ The PK is also the default sharding key but the explain plan shows as a hidden column is always used instead even on a table that has a PK!

3 thoughts on “What is the default sharding key in MySQL Cluster?”

Mauritz Sundell said:

January 9, 2017 at 7:51 am

The default sharding key is still the primary key!

To see that 7.5.4 does not behave as hidden pk case in 7.2.26, look also in the other columns of explain output. The partitions column should be ignored for NDB tables.

Still, you found a bug. The explain output should not depend on how partitioning is specified as long as it is the same.

For NDB the partition pruning is actually not done by the optimizer in MySQL server but within NDB itself.

To inspect the table definition used within NDB one can use ‘ndb_desc -d shard_check_pk’ it should reveal that there are no extra hidden columns, and also that the sharding key called distribution key in NDB is the same as the primary key.

LikeLiked by 2 people

- Moll said:
  
  January 10, 2017 at 8:37 am
  
  Thanks Mauritz, you are right. I’ll update the post but still the explain plan is confusing!
  
  LikeLike
  
- Moll said:
  
  January 13, 2017 at 6:08 pm
  
  I’m also wondering that if the partitions column should be ignored for NDB tables – as you said – , why in the “Optimizing MySQL Cluster Performance” white paper, “4.3 Distribution aware application” section, the partitions column in explain output was not ignored and was actually used to confirm the partitions and the distribution key(s) in NDB tables?! That’s really confusing !
  
  LikeLike

MySQL Step-by-Step Blog

What is the default sharding key in MySQL Cluster?

Testing on MySQL Cluster 7.2.26

Testing on MySQL Cluster 7.5.4 (same results on 7.3 and 7.4):

Conclusion:

3 thoughts on “What is the default sharding key in MySQL Cluster?”

Leave a comment Cancel reply

Testing on MySQL Cluster 7.2.26

Testing on MySQL Cluster 7.5.4 (same results on 7.3 and 7.4):

Conclusion:

Share this:

Related

3 thoughts on “What is the default sharding key in MySQL Cluster?”

Leave a comment Cancel reply