ProxySQL Read/Write Split

Read/Write split is perhaps one of the most common types of query routing used, while the other most commonly used query routing implementation is Sharding. There are many other use-cases for query routing such as routing reporting queries to specific replicas etc. however Read/Write Split and Sharding are just the most common.

Read/Write split using different ports

If you are using a standard TCP Load Balancer for MySQL you will typically configure it to listen on two separate ports. Each port acts as an endpoint, one for writes and another for reads. It is common to configure ProxySQL using a similar approach, especialy when migrating for another TCP Load Balancer that implements such a mechanism already. In this configuration, the query routing criteria is the incoming port.

Here is an example of how to implement query routing based on incoming port in ProxySQL Admin. This assumes you already have the primary and replicas configured in the right hostgroups: MySQL writer in hostgroup 1, and MySQL readers in hostgroup 2. A similar approach would apply if you are using Galera or Group Replication. The steps are the following:

  • Configure ProxySQL to listen on two ports and restart it : mysql-interfaces is one of the few variables that cannot be changed at runtime and requires a restart
    SET mysql-interfaces='0.0.0.0:6401;0.0.0.0:6402';
    ## save it on disk and restart proxysql
    SAVE MYSQL VARIABLES TO DISK;
    PROXYSQL RESTART;
    
  • Add routing based on incoming port:
    INSERT INTO mysql_query_rules (rule_id,active,proxy_port,destination_hostgroup,apply)
    VALUES (1,1,6401,1,1), (2,1,6402,2,1);
    LOAD MYSQL QUERY RULES TO RUNTIME;
    SAVE MYSQL QUERY RULES TO DISK; # if you want this change to be permanent
    

Now all the queries coming to port 6401 will be sent to the MySQL server in hostgroup 1, while all the queries coming to port 6402 will be sent to one of the MySQL servers in hostgroup 2.

Limitation of Read/Write split based on incoming port

Routing based on incoming port has a major drawback: the application needs to have built-in read/write split capabilities in order to distinguish between reads and writes. Often this is not the case and applications use only a single connection endpoint, that of course turns out to be the MySQL primary.

ProxySQL analyzes traffic; all traffic accepted on a single port and be re-routed based on query type or other criteria

This is very convenient because it doesn’t require application changes. The main advantage with ProxySQL though is not really the ability to route traffic without application changes yet rather that there is a system in place that can be used to control traffic sent to the database. Operations teams need to address production issues without the intervention of development teams, ProxySQL makes this possible by allowing full control of DB traffic.

It is recommened to leverage more advanced ProxySQL Query Rules to implement intelligent routing for specific queries or query types

Generic Read/Write split using regex

Start by clearing the query rules created previously:

DELETE FROM mysql_query_rules;

Then create generic rules to route ALL SELECTs to a replica for the read/write split as follows:

UPDATE mysql_users SET default_hostgroup=1; # by default, all goes to HG1
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK; # if you want this change to be permanent
INSERT INTO mysql_query_rules (rule_id,active,match_digest,destination_hostgroup,apply)
VALUES
(1,1,'^SELECT.*FOR UPDATE$',1,1),
(2,1,'^SELECT',2,1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK; # if you want this change to be permanent

Now routing will work as follows:

  • all SELECT FOR UPDATE statements will be sent to HG1
  • all other SELECT statements will be sent to HG2
  • everything else will be sent to HG1 (the default)

DO NOT USE THESE RULES IN PRODUCTION, GENERIC READ/WRITE RULES LEAD TO PROBLEMS

A better approach is to properly analyze the database workload and implement efficient rules, clear the previous rules once again:

DELETE FROM mysql_query_rules;
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK; # if you want this change to be permanent

Intelligent Read/Write split using regex and digest

A configuration procedure to more efficiently set up read/write split is the following:

  • first configure ProxySQL to send all the traffic to only one MySQL node, the primary (both writes and reads)
  • analyze stats_mysql_query_digest to identify the most expensive SELECT statements
  • determine which of these statements are SAFE to route reader nodes (i.e. not susceptible to replication lag, replica hardware is suitable to support queries etc)
  • configure mysql_query_rules to selectively route only these expensive statements on your replicas

Find expensive queries using stats_mysql_query_digest

Here is a list of examples on how to identify potential queries for Read/Write split. ProxySQL exposes its internal metrics in the stats database tables and you can run standard SQL queries to gather this information on a live instance via CLI.

These results in these examples are based on a very busy ProxySQL instance that’s been running for several months so it has already processed around a hundred billion queries so far.

  • How to find the top 5 queries based on total execution time:
    Admin> SELECT digest,SUBSTR(digest_text,0,25),count_star,sum_time FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' ORDER BY sum_time DESC LIMIT 5;
    +--------------------+--------------------------+------------+---------------+
    | digest             | SUBSTR(digest_text,0,25) | count_star | sum_time      |
    +--------------------+--------------------------+------------+---------------+
    | 0x037C3E6D996DAFE2 | SELECT a.ip_id as ip_id, | 2030026798 | 1479082636017 |
    | 0xB081A85245DEA5B7 | SELECT a.ip_id as ip_id, | 2025902778 | 1206116187539 |
    | 0x38BE36BDFFDBE638 | SELECT instance.name as  | 59343662   | 1096236803754 |
    | 0xB4233552504E43B8 | SELECT ir.type as type,  | 1362897166 | 488971769571  |
    | 0x4A131A16DCFFD6C6 | SELECT i.id as id, i.sta | 934402293  | 475253770301  |
    +--------------------+--------------------------+------------+---------------+
    5 rows in set (0.01 sec)
    
  • How to find the top 5 queries based on count:
    Admin> SELECT digest,SUBSTR(digest_text,0,25),count_star,sum_time FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' ORDER BY count_star DESC LIMIT 5;
    +--------------------+--------------------------+------------+---------------+
    | digest             | SUBSTR(digest_text,0,25) | count_star | sum_time      |
    +--------------------+--------------------------+------------+---------------+
    | 0x037C3E6D996DAFE2 | SELECT a.ip_id as ip_id, | 2030040688 | 1479092529369 |
    | 0xB081A85245DEA5B7 | SELECT a.ip_id as ip_id, | 2025916528 | 1206123010791 |
    | 0x22E0A5C585C53EAD | SELECT id as instanceid, | 1551361254 | 426419508609  |
    | 0x3DB4B9FA4B2CB36F | SELECT i.id as instancei | 1465274289 | 415565419867  |
    | 0xB4233552504E43B8 | SELECT ir.type as type,  | 1362906755 | 488974931108  |
    +--------------------+--------------------------+------------+---------------+
    5 rows in set (0.00 sec)
    
  • How to find the top 5 queries based on maximum execution time:
    Admin> SELECT digest,SUBSTR(digest_text,0,25),count_star,sum_time,sum_time/count_star avg_time, min_time, max_time FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' ORDER BY max_time DESC LIMIT 5;
    +--------------------+--------------------------+------------+--------------+----------+----------+-----------+
    | digest             | SUBSTR(digest_text,0,25) | count_star | sum_time     | avg_time | min_time | max_time  |
    +--------------------+--------------------------+------------+--------------+----------+----------+-----------+
    | 0x36CE5295726DB5B4 | SELECT COUNT(*) as total | 146390     | 185951894994 | 1270249  | 445      | 237344243 |
    | 0xDA8C56B5644C0822 | SELECT COUNT(*) as total | 44130      | 24842335265  | 562935   | 494      | 231395575 |
    | 0x8C1B0405E1AAB9DB | SELECT COUNT(*) as total | 1194       | 1356742749   | 1136300  | 624      | 216677507 |
    | 0x6C03197B4A2C34BE | Select *, DateDiff(Date_ | 4796       | 748804483    | 156131   | 607      | 197881845 |
    | 0x1DEFCE9DEF3BDF87 | SELECT DISTINCT i.extid  | 592196     | 40209254260  | 67898    | 416      | 118055372 |
    +--------------------+--------------------------+------------+--------------+----------+----------+-----------+
    5 rows in set (0.01 sec)
    
  • How to find the top 5 queries ordered by total execution time, and with a minimum execution time of at least 1 millisecond:
    Admin> SELECT digest,SUBSTR(digest_text,0,20),count_star,sum_time,sum_time/count_star avg_time, min_time, max_time FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' AND min_time > 1000 ORDER BY sum_time DESC LIMIT 5;
    +--------------------+--------------------------+------------+-------------+----------+----------+----------+
    | digest             | SUBSTR(digest_text,0,20) | count_star | sum_time    | avg_time | min_time | max_time |
    +--------------------+--------------------------+------------+-------------+----------+----------+----------+
    | 0x9EED412C6E63E477 | SELECT a.id as acco      | 961733     | 24115349801 | 25074    | 10994    | 7046628  |
    | 0x8DDD43A9EA37750D | Select ( Coalesce((      | 107069     | 3156179256  | 29477    | 1069     | 24600674 |
    | 0x9EED412C6E63E477 | SELECT a.id as acco      | 91996      | 1883354396  | 20472    | 10095    | 497877   |
    | 0x08B23A268C35C08E | SELECT id as reward      | 49401      | 244088592   | 4940     | 1237     | 1483791  |
    | 0x437C846F935344F8 | SELECT Distinct i.e      | 164        | 163873101   | 999226   | 1383     | 7905811  |
    +--------------------+--------------------------+------------+-------------+----------+----------+----------+
    5 rows in set (0.01 sec)
    
  • How to find the top 5 queries ordered by total execution time, with an average execution time of at least 1 second. Also show the percentage of the total execution time:
    Admin> SELECT digest,SUBSTR(digest_text,0,25),count_star,sum_time,sum_time/count_star avg_time, ROUND(sum_time*100.00/(SELECT SUM(sum_time) FROM stats_mysql_query_digest),3) pct FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' AND sum_time/count_star > 1000000 ORDER BY sum_time DESC LIMIT 5;
    +--------------------+--------------------------+------------+--------------+----------+-------+
    | digest             | SUBSTR(digest_text,0,25) | count_star | sum_time     | avg_time | pct   |
    +--------------------+--------------------------+------------+--------------+----------+-------+
    | 0x36CE5295726DB5B4 | SELECT COUNT(*) as total | 146390     | 185951894994 | 1270249  | 2.11  |
    | 0xD38895B4F4D2A4B3 | SELECT instance.name as  | 9783       | 12409642528  | 1268490  | 0.141 |
    | 0x8C1B0405E1AAB9DB | SELECT COUNT(*) as total | 1194       | 1356742749   | 1136300  | 0.015 |
    +--------------------+--------------------------+------------+--------------+----------+-------+
    3 rows in set (0.00 sec)
    
  • How to find the top 5 queries ordered by total execution time, with an average execution time of at least 15 milliseconds and also show the percentage of the total execution time:
    Admin> SELECT digest,SUBSTR(digest_text,0,25),count_star,sum_time,sum_time/count_star avg_time, ROUND(sum_time*100.00/(SELECT SUM(sum_time) FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%'),3) pct FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%' AND sum_time/count_star > 15000 ORDER BY sum_time DESC LIMIT 5;
    +--------------------+--------------------------+------------+---------------+----------+--------+
    | digest             | SUBSTR(digest_text,0,25) | count_star | sum_time      | avg_time | pct    |
    +--------------------+--------------------------+------------+---------------+----------+--------+
    | 0x38BE36BDFFDBE638 | SELECT instance.name as  | 59360371   | 1096562204931 | 18472    | 13.006 |
    | 0x36CE5295726DB5B4 | SELECT COUNT(*) as total | 146390     | 185951894994  | 1270249  | 2.205  |
    | 0x1DEFCE9DEF3BDF87 | SELECT DISTINCT i.extid  | 592281     | 40215136635   | 67898    | 0.477  |
    | 0xDA8C56B5644C0822 | SELECT COUNT(*) as total | 44130      | 24842335265   | 562935   | 0.295  |
    | 0x9EED412C6E63E477 | SELECT a.id as accountid | 961768     | 24116011513   | 25074    | 0.286  |
    +--------------------+--------------------------+------------+---------------+----------+--------+
    5 rows in set (0.00 sec)
    

Do all these queries need to be executed on the primary? If the average execution time of one query is above 1 second, the answer is probably no yet this will depend on your application needs. For some applications, even a query running with an average execution time of 15ms can perhaps go to a slave since it queries data that is not frequently updated.

In the results it is evident that some queries have a very high maximum execution time, while minimum execution time is very small, and the average time is reasonably slow as well. Specifically, the query with digest 0x36CE5295726DB5B4 has an average execution time of 1.27 seconds, a minimum execution time of 0.4ms , and a maximum execution time of 237.34 seconds. Maybe it is worth investigating why execution time is so random, it could be a sign of infrastructure issues.

After performing a solid analysis you might decide that query with digest 0x38BE36BDFFDBE638 should go to your replicas:

INSERT INTO mysql_query_rules (rule_id,active,digest,destination_hostgroup,apply)
VALUES
(1,1,'0x38BE36BDFFDBE638',20,1);

Similarly, after checking the output of this statement to evaluate SELECT COUNT(*) statements:

SELECT digest,digest_text,count_star,sum_time,sum_time/count_star avg_time, 
       ROUND(sum_time*100.00/(SELECT SUM(sum_time) FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT%'),3) pct 
FROM stats_mysql_query_digest 
WHERE digest_text LIKE 'SELECT COUNT%' ORDER BY sum_time DESC;

It might make sense to shift all the queries starting with SELECT COUNT(*) to replicas and create a ProxySQL Query Rule such as:

INSERT INTO mysql_query_rules (rule_id,active,match_digest,destination_hostgroup,apply)
VALUES
(1,1,'^SELECT COUNT\(\*\)',20,1);

Load to runtime to apply the configuration and save to disk for persistance:

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK; # if you want this change to be permanent

Finally, make sure to re-evaluate your workload after applying changes as it will surely change and you might find new queries that can be optimised. The query analysis methods described here should be an on-going activity to ensure that you are getting the most value out of your ProxySQL deployment.

Looking for more? Read on about How to set up ProxySQL Sharding