On demand query routing - how to prevent downtime on your main MySQL server

  • Date:
  • Tags: proxy mysql proxysql query routing oncall
routing

In a previous post I wrote about how read/write split is implemented in ProxySQL, and what I believe is the best practice for read/write split on top of MySQL servers:

  • start sending all traffic to MySQL master
  • identify what queries can be routed to the MySQL slaves
  • create rules to route these queries

I believe I am remarking that in ProxySQL read/write split is a special case of query routing.
ProxySQL doesn't try to magically perform read/write split for every use case scenarios, but instead it gives you the tools and the power to route traffic dynamically, on demand, through a simple and powerful Admin interface.
It is a tool for DBAs that need to maintain a farm of MySQL servers, and guarantee uptime, high availability, and performance.

In this blog post I will present a use case on how ProxySQL runtime reconfiguration is extremely useful to implement query routing when needed.

Imagine you have a classic MySQL asynchronous replication with one master (db02) and 2 slaves (db01 and db03).
You also have 15 proxysql instances managing all the traffic from all your webnodes.
Because most of the application cannot tolerate to read stale data, you have configured all the proxysql instances to send most of the traffic to the master, and only few expensive and long running queries to the slaves.
MHA is configured to automatically perform a failover in case of master failure, and ProxySQL is correctly configured to check the read_only variable of the servers automatically reconfigure servers in the read and write hostgroups.
The system is in place for several months, not a problem. Until one day, in the middle of the night, you get a page about high load on master.
You check trending graphs and indeed there is an increased load on the master. You can quickly figure out that if you don't take immediate action, the master will soon become so overloaded that will provide terrible response time, so you need to identify the offensive queries and move them away from the master.

cpu before
load before

You start checking slow query log, Anemometer, table stats_mysql_query_digest in ProxySQL itself, or any other tool you currently have deployed (PMM, ClusterControl, VividCortex, SolarWinds, etc etc), and you are able to identify that there are no new queries. Neither is increased the volume of the queries that are normally running.
You will realize that data distribution in some tables have changed, and some queries have become several times slower.
You also know that no additional index will improve the performance of the queries that are particularly slow.
You know that it is late at night, and even if you escalate the issue to developers, no developer will be able to come with a quick fix to change the application logic behind these queries.

Based on historical data, you know that these type of queries were normally running in 5-10 seconds, while now it is running for 60-600 seconds.
Because of the nature of the queries you know that is perfectly fine to send them to slaves. So here is your action plan:

  • using stats_mysql_query_digest in ProxySQL's Admin you are able to identify the digests of these queries
  • in only one of proxysql instance you create 2 new rules to send these queries to slaves
  • once you verify that the routing works as expected, you configure all the proxysql instances with the new query rules

In details:

Identify the queries

I won't go into the details of this specific case, but my previous blog post on read/write split describes how to identify digest in ProxySQL's stats_mysql_query_digest. Once you identified them, you can move to the next step, creating new rules.

Create rules on one proxysql instance

One of the advantage of having multiple proxysql instances, beside removing the single point of failure, is the ability to apply changes in only one of the instances, verify their correctness and if they provide the expected results before applying the changes to all the instances.

Rules were created executing the following in only one of the proxysql instances:

INSERT INTO mysql_query_rules(rule_id,active,username,schemaname,digest,destination_hostgroup,apply)
VALUES
(71,1, "user-proxysql", "myschema", "0xBCC7262FE1010068", 2,1);
INSERT INTO mysql_query_rules(rule_id,active,username,schemaname,digest,destination_hostgroup,apply)
VALUES
(72,1, "user-proxysql", "myschema", "0xA3FAA1B846D4B62F", 2,1);
LOAD MYSQL QUERY RULES TO RUNTIME;

Apply the rules to all the proxysql instances

Once you have verified that the rules work as expected (these queries are routed to slaves) the next step is to apply the change to all the proxies. The way of performing this really depends from how you want to configure a farm of proxies.
Service discovery and configuration management are two common ways to achieve this.
In our case, this change was executed running the above commands to all the proxies, using this simple one liner from any of the server:

$ for i in `seq 42 55` ; do
ssh 10.10.1.$i -- 'mysql -u admin -padmin -h 127.0.0.1 -P6032 -e
"INSERT INTO mysql_query_rules(rule_id,active,username,schemaname,digest,destination_hostgroup,apply)
VALUES
(71,1,\"user-proxysql\",\"myschema\",\"0xBCC7262FE1010068\",2,1),
(72,1,\"user-proxysql\",\"myschema\",\"0xA3FAA1B846D4B62F\",2,1);
LOAD MYSQL QUERY RULES TO RUNTIME;"'
done

Once you are sure that the rules solve the problem, you can make the change permanent:

$ for i in `seq 41 55` ; do
ssh 10.10.1.$i -- 'mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SAVE MYSQL QUERY RULES TO DISK"'
done

As we moved traffic from master to the slaves, CPU usage and load immediately dropped on master, while increased on slaves.

cpu after
load after

Conclusion

ProxySQL provides the ability to dynamically route traffic on demand, as quickly as just executing few commands.
DBAs can now take control of all the traffic sent to the database servers, and take actions in circumstances that normally were beyond their control.

For more real use case scenarios on how to use ProxySQL in production to maintain your MySQL databases, do not miss our session about ProxySQL use case scenarios on April 26, 2017 at Percona Live 2017