Sharding in ProxySQL
Sharding is another of the key use cases for ProxySQL, essentially there are 3x main approaches for sharding.
– User based sharding
– Schema based sharding
– Data based sharding
The next examples deal with these 3x specific use cases. Keep in mind that query rules are flexible enough to support many other approaches.
User based sharding
In this simplest form of sharding, ProxySQL will route queries to the appropriate shard based on the “user” defined on the connection. All we need to do is define our
MySQL users along with their “default_hostgroup”, no query rules are required for this approach.
For example:
INSERT INTO mysql_users
(username, password, active, default_hostgroup, comment)
VALUES
('accounts', 'shard0_pass', 1, 0, 'Routed to the accounts shard'),
('transactions', 'shard1_pass', 1, 1, 'Routed to the transactions shard'),
('logging', 'shard2_pass', 1, 2, 'Routed to the logging shard');
LOAD MYSQL USERS RULES TO RUNTIME;
SAVE MYSQL USERS RULES TO DISK;
It is recommended to maintain a lookup of sorts in your application to map your application design against your ProxySQL shard layout e.g.:
+--------------+--------------+-------------+
| app_module | shard_user | shard_pass |
+--------------+--------------+-------------+
| CRM | accounts | shard0_pass |
| OLTP | transactions | shard1_pass |
| Log Manager | logging | shard1_pass |
+--------------+--------------+-------------+
Schema based sharding
Schema based sharding is also straightforward, here we need to define the query rules mapping the “schemaname” to the appropriate “destination_hostgroup”, assuming the schemas are named “shard_xxx”:
INSERT INTO mysql_query_rules (rule_id, active, schemaname,
destination_hostgroup, apply)
VALUES
(1, 1, 'shard_0', 0, 1),
(2, 1, 'shard_1', 1, 1),
(3, 1, 'shard_2', 2, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
Data based sharding
Data based sharding is the most complex type of sharding.
– It is crucial to design your application from the ground up with solid design principles that will ensure your queries are aligned to your query rules
– Several approaches can be implemented, typically using “match_digest” or “match_pattern”
– When performing data based sharding it means we are sharding specific tables rather than full schemas:
Let’s take a simple example of a table which stores `users` based on `location` in a MySQL instance as `loc_account_data` with a related mapping table `loc_mapping` e.g.:
# loc_account_data
+----------------------------------------+---------+---------+
| loc_id | user | acc_id |
+----------------------------------------+---------+---------+
| 20086020554955909836090724037181646035 | joe32 | 1 |
| 21503957780049285539986052866765125704 | sam57 | 2 |
| 75863560943999160082133817802533222835 | pam18 | 3 |
+----------------------------------------+---------+---------+
# loc_mapping
+----------------------------------------+---------+
| loc_id | region |
+----------------------------------------+---------+
| 20086020554955909836090724037181646035 | AMERICA |
| 21503957780049285539986052866765125704 | EMEA |
| 75863560943999160082133817802533222835 | OCEANIA |
+----------------------------------------+---------+
Now we need to consider all the INSERT / UPDATE / DELETE / SELECT statements that will execute against this table, query rules will need to be implemented for each of the query digests e.g.:
- INSERT INTO loc_account_data (loc_id, user, acc_id) VALUES (?, ?, ?);
- UPDATE loc_account_data SET user = ? WHERE loc_id = ? AND acc_id = ?;
- UPDATE loc_account_data SET acc_id = ? WHERE loc_id = ? AND user = ?;
- DELETE FROM loc_account_data WHERE loc_id = ? AND acc_id = ?;
- SELECT * FROM loc_account_data WHERE loc_id = ? AND user = ?;
The following rules will cover our needs for INSERTS (rules for INSERTS are configured first since it is expected to have more INSERTS than UPDATES / DELETES / SELECTS):
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup,
apply)
VALUES
(1, 1, 'loc_account_data \(loc_id, user, acc_id\) VALUES
\(20086020554955909836090724037181646035', 0, 1),
(2, 1, 'loc_account_data \(loc_id, user, acc_id\) VALUES
\(20086020554955909836090724037181646035’, 1, 1),
(3, 1, 'loc_account_data \(loc_id, user, acc_id\) VALUES
\(20086020554955909836090724037181646035’, 2, 1);
The following rules will cover our needs for UPDATES / DELETES / SELECTS:
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup,
apply)
VALUES
(4, 1, 'loc_id = 20086020554955909836090724037181646035', 0, 1),
(5, 1, 'loc_id = 21503957780049285539986052866765125704', 1, 1),
(6, 1, 'loc_id = 75863560943999160082133817802533222835', 2, 1);
Keep in mind that as you add values to the `loc_mapping` table, you will need to synchronize your ProxySQL query rules configuration accordingly.
Sharding considerations
It is still necessary to implement some logic in your application to make it “sharding friendly”
– Simpler “user” or “schema” based approaches are easier to implement and maintain
– If you require “cross shard” joins this logic will need to be part of your application
– When implementing data based sharding additional rules (for example for RW split) need to be added per shard
– Sharding is not limited to user, schema or table based sharding – these are just the three main methods.