Sharding in ProxySQL

Sharding is another of the key use cases for ProxySQL, essentially there are 3x main approaches for sharding.

  • User based sharding
  • Schema based sharding
  • Data based sharding

The next examples deal with these 3x specific use cases. Keep in mind that query rules are flexible enough to support many other approaches.

User based sharding

In this simplest form of sharding, ProxySQL will route queries to the appropriate shard based on the “user” defined on the connection. All we need to do is define our
MySQL users along with their “default_hostgroup”, no query rules are required for this approach.

For example:

It is recommended to maintain a lookup of sorts in your application to map your application design against your ProxySQL shard layout e.g.:

Schema based sharding

Schema based sharding is also straight forward, here define the query rules mapping the “schemaname” to the appropriate “destination_hostgroup”, assuming the schemas are named “shard_xxx”:

Data based sharding

Data based sharding is the most complex type of sharding.

  • It is crucial to design your application from the ground up with solid design principles that will ensure your queries are aligned to your query rules
  • Several approaches can be implemented, typically using “match_digest” or “match_pattern”
  • When performing data based sharding it means we are sharding specific tables rather than full schemas:

Lets take a simple example of a table which stores users based on location in a MySQL instance as loc_account_data with a related mapping table loc_mapping e.g.:

Now we need to consider all the INSERT / UPDATE / DELETE / SELECT statements that will execute against this table, query rules will need to be implemented for each of the query digests e.g.:

The following rules will cover our needs for INSERTS (rules for INSERTS configured first since it is expected to have more INSERTS than UPDATES / DELETES / SELECTS):

The following rules will cover our needs for UPDATES / DELETES / SELECTS:

Keep in mind that as you add values to the loc_mapping table, you will need to synchronize your ProxySQL query rules configuration accordingly.

Sharding considerations

It is still necessary to implement some logic in your application to make it “sharding friendly”
– Simpler “user” or “schema” based approaches are easier to implement and maintain
– If you require “cross shard” joins this logic will need to be part of your application
– When implementing data based sharding additional rules (for example for RW split) need to be added per shard
– Sharding is not limited to user, schema or table based sharding – these are just the three main methods.