Read/Write Split

Read/write split is perhaps one of the most common type of query routing used, while the other most commonly used query routing implementation is for sharding.

Read/write split using different ports

If you are using a proxy like HAProxy you can configure it to listen on two ports: one as an endpoint for writes, while the second port as an endpoint for reads.
People have often asked how to configure ProxySQL using the same approach, and routing queries based on incoming port.

Here is an example of how to implement query routing based on incoming port, running the following on ProxySQL’s Admin. I will assume you already have master and slaves configured in the right hostgroups: MySQL writer in hostgroup 10, and MySQL readers in hostgroup 20. A similar approach will apply if you are using Galera or Group Replication. The steps are the following:

  • Configure ProxySQL to listen on two ports and restart it : mysql-interfaces is one of the few variables that cannot be changed at runtime and requires a restart
  • add routing based on incoming port:

Done! Now all the queries coming to port 6401 will be sent to the MySQL server in hostgroup 10, while all the queries coming to port 6402 will be sent to one of the MySQL servers in hostgroup 20.

Limitation of read/write split based on incoming port

In the previous paragraph I wrote that people have often asked how to configure ProxySQL to use routing based on incoming port.
Although this is a valid approach sometimes, in my opinion it has a big drawback: the application needs to have built-in read/write split capabilities in order to distinguish between reads and writes.
Often this is not the case. Often an application only uses a connection endpoint, that of course turns out to be the MySQL master. If ProxySQL is used, this can accept all the traffic in a single port and can analyze the traffic to perform a read/write split based on the query type.
This is very convenient because it doesn’t requires any application changes.
Nonetheless, the main advantage is not the ability to route traffic without application changes. The main advantage is that the DBA has now the tool to control the traffic sent to the database. The DBA is the one that gets paged in the middle of the night because the DB server is overloaded, when there are no developers around, and making changes in the application is not an option: he now has the option to control the traffic.

Basic read/write split using regex

In this paragraph I will show an example of how to perform read/write split using regular expression.
First of all, we should remove query rules created previously:

Afterward, we create basic rules for read/write :

Now routing will work as follows:

  • all SELECT FOR UPDATE will be sent to HG10
  • all other SELECT will be sent to HG20
  • everything else will be sent to HG10 (the default)

Note that I believe the above is not a good approach to read/write split.
I often used this example to describe how to configure rules, and it was often misinterpreted as the way to configure read/write split.
DO NOT USE THE ABOVE EXAMPLE IN PRODUCTION

In the next paragraph I will show a better approach.

For now, let’s remove all rules:

Read/write split using regex and digest

A configuration procedure to efficiently set up read/write split is the following:

  • configure ProxySQL to send all the traffic to only one MySQL node, the master (both writes and reads)
  • check in stats_mysql_query_digest which are the most expensive SELECT statements
  • determine which expensive statements should be moved to reader nodes;
  • configure mysql_query_rules (create rules) to send only expensive SELECT statements to the readers

The idea is therefore quite simple: send to slaves/readers only what you want to send to slaves/readers, not just any SELECT.

Find expensive queries using stats_mysql_query_digest

Here is a list of examples of how you can identify potential queries that can be sent to readers. Because ProxySQL exports all metrics in tables, you can create complex queries to gather information.

These results are based on a very busy ProxySQL instance that’s been running for several months and has processed around a hundred billion queries so far.

  • Find the top 5 queries based on total execution time:
  • Find the top 5 queries based on count:

    Hmm, can some of these queries be cached? Maybe ProxySQL query cache can help!

  • Find the top 5 queries based on maximum execution time:

    This specific result shows that some queries have a very high maximum execution time, while minimum execution time is very small, and average is reasonably slow as well.
    For example, the query with digest 0x36CE5295726DB5B4 has an average execution time of 1.27 seconds, a minimum execution time of 0.4ms , and a maximum execution time of 237.34 seconds. Maybe it is worth to investigate why the execution time is so uneven.

  • Find the top 5 queries ordered by total execution time, and with a minimum execution time of at least 1 millisecond:

  • Find the top 5 queries ordered by total execution time, with an average execution time of at least 1 second. Also show the percentage of the total execution time:

  • Find the top 5 queries ordered by total execution time, with an average execution time of at least 15 milliseconds. Also show the percentage of the total execution time:

Do all these queries need to be executed on master? If the average execution time of one query is above 1 second, probably the answer is no.
For some application, even a query running with an average execution time of 15ms can perhaps go to a slave.

After checking with the application’s owner, we can for example decide that query with digest 0x38BE36BDFFDBE638 can go to slaves:

Similarly, after checking the output of this:

we agreed that all the queries starting with SELECT COUNT(*) can go to slaves:

Finally, load every rule to runtime:

Conclusion

ProxySQL can be efficiently used to be very selective with query routing.
While for some applications it is acceptable to send all the SELECTs to readers/slaves while sending everything else to writer/master, for many other applications/workloads it is not so simple. A DBA should be able to configure ProxySQL with complex rules to send only the queries that do not need to be executed on master to the slaves, without any application changes.