How to configure ProxySQL for the first time

This mini HOWTO describes how to configure some of the components of ProxySQL step by step.

This is not a complete guide.

We assume you are already aware of ProxySQL architecture, and this HOWTO assumes that ProxySQL is being reconfigured using the standard SQL admin interface, available by default connecting to port 6032 using trivial (changeable) credentials:

First, let’s verify that there is nothing configured. No entries in mysql_servers, nor in mysql_replication_hostgroups or mysql_query_rules tables.

Add backends

For this demo, I started 3 mysql servers locally using MySQL Sandbox.
Let’s add them to ProxySQL.

All looks good so far.

NOTE: By default, MySQL sandbox will set read_only = 0 on slaves. Set set global read_only = 1 on the slaves.

Configure monitoring

ProxySQL constantly monitors the servers it has configured. To do so, it is important to configure some variables.
Let’s configure them.

Add the credentials of the users required to monitor the backend (the user needs to be already created in mysql server):

Then we configure the various monitoring intervals:

There are a lot of variables, and some are not used (yet) or not relevant for this howto. For now consider only the ones I listed before.
Changes related to MySQL Monitor in table global_variables take places only after running the command LOAD MYSQL VARIABLES TO RUNTIME, and they are permanently stored to disk after running SAVE MYSQL VARIABLES TO DISK .
Details are available [here][1] .

Backend’s health check

Now, let’s see if ProxySQL is able to communicate with these hosts.
ProxySQL has several tables where it stores monitoring information.

Not all the tables in monitor are currently used.
For now we can check the relevant tables with the following queries:

We can conclude that all of the configured servers are healthy.
One important thing to note here is that monitoring on connect and ping is performed based on the content of the table mysql_servers, even before this is loaded to RUNTIME. This approach is intentional: in this way it is possible to perform basic health checks before adding the nodes in production.

Now that we know that the servers are correctly monitored and alive, let’s enable them.

MySQL replication hostgroups

Let’s check another table in the monitor schema , monitor.mysql_server_read_only_log:

This table is currently empty.
The reason is that ProxySQL checks the value of read_only only for servers configured in hostgroups that are configured in mysql_replication_hostgroups. This table is currently empty:

But what is the functionality of this table?
With this table, the listed hostgroups can be configured in pairs of writer and reader hostgroups.
ProxySQL will monitor the value of read_only for all the servers in specified hostgroups, and based on the value of read_only it will assign the server to the writer or reader hostgroups.
To make an example:

Now, all the servers that are either configured in hostgroup 1 or 2 will be moved to the correct hostgroup:

  • If they have read_only=0 , they will be moved to hostgroup 1
  • If they have read_only=1 , they will be moved to hostgroup 2

But at this moment, the algorithm is still not running, because the new table isn’t loaded to runtime. In fact:

Let’s load mysql_replication_hostgroups to runtime using the same LOAD command for MYSQL SERVERS : in fact LOAD MYSQL SERVERS TO RUNTIME processes both mysql_servers and mysql_replication_hostgroups tables.

Wait a few seconds, and check the status again:

Allright, ProxySQL is monitoring the read_only value for the servers.
And it also created hostgroup2 to where it has moved servers with read_only=1 (readers) from hostgroup1.

All looks good. It is time to save the configuration to disk:

MySQL Users

After we configure the servers in mysql_servers, we also need to configure mysql users.
This is performed using the table mysql_users:

The table is initially empty.
Let’s start configuring users.

We left most fields with the default value. The most important fields we configured are :

  • username
  • password
  • default_hostgroup

The meaning of username and password should be very clear.
default_hostgroup is the hostgroup that will be used to send traffic generated by that specific user if there is no matching query rules for a specific query (more details later on).

Again, load configuration to runtime to make it live, and save it to disk to make it persistent across restarts.


We can now try to connect from a different terminal:

It seems that it worked, and not surprisingly the query was sent to the server listening on port 21891, the master, because it is configured on hostgroup1 and is the default for user msandbox.

Functional tests

Now we can try some “benchmark” to verify that ProxySQL is functional.

Assuming you already created the sysbench table, you can run a load test using:

All ran correctly through ProxySQL . Does ProxySQL export metrics about what was running? Yes…

For older versions of sysbench, report-interval should be removed and --db-ps-mode=disable added.

ProxySQL Statistics

ProxySQL collects a lot of real time statistics in the stats schema:

A lot of tables are present in the stats schema. We will analyze them all.

stats.stats_mysql_connection_pool

A small parenthesis: currently, when a server is removed (completely removed, or moved away from a hostgroup) , it is internally marked as OFFLINE_HARD and not really removed.
This is why it shows server on port 21892 as OFFLINE_HARD for hostgroup1 .

This table returns a lot of information about the traffic sent to each server.
As expected, all traffic was sent to server on port 21891 , the master.

stats_mysql_commands_counters

Exactly what type of queries were sent? Table stats_mysql_commands_counters anwswers this question:

Table stats_mysql_commands_counters returns detailed information about the type of statements executed, and the distribution of execution time!

stats_mysql_query_digest

Table stats_mysql_commands_counters provides very useful information.
Can we get more details about the queries that were executed? Table stats_mysql_query_digest helps with this:

Too much information makes it hard to format it here.
Let’s get only the important metrics:

All traffic is sent to hostgroup1. Let’s assume that now we want to send specific queries to slaves…

MySQL Query Rules

Table mysql_query_rules has a lot of fields and it is a very powerful vehicle to control the traffic passing through ProxySQL.
Its table definition is as follows:

We can now configure ProxySQL to send the top 2 queries to slaves, and everything else to the masters

Few notes:

  • query rules are processed as ordered by rule_id
  • only rules that have active=1 are processed. Because query rules is a very powerful tool and if it’s misconfigured, it can lead to difficult debugging (we all love regex, right?) , by default active is 0 (active=0) . You should double check rules regexes before enabling them!
  • the first rule example uses caret (^) and dollar ($) : these are special regex characters that mark the beginning and the end of a pattern. In that case it means thatmatch_digestormatch_pattern should completely match the query
  • in contrast to the first rule example, the second rule example doesn’t use caret or dollar : the match could be anywhere in the query
  • pay a lot of attention to regex to avoid some rules matching what they shouldn’t !
  • you probably notice that the question mark is escaped. It has a special meaning in regex, so as said, pay really a lot of attention to regex syntax !
  • apply=1 means that no further rules are checked if there is a match

Table mysql_query_rules looks like this:

For these 2 specific rules, queries will be sent to slaves.
If no rules match a query, default_hostgroup applies (that is 1 for user msandbox).

Next, let’s reset the contents of the table stats_mysql_query_digest . To achieve this we can simply run any query against stats_mysql_query_digest_reset , for example:

Querying stats_mysql_query_digest_reset allows to atomically get the content of the stats_mysql_query_digest table , and truncate it!

Now we can load the query rules at runtime :

And finally we re-execute the sysbench load:

And let’s verify the content of table stats_mysql_query_digest :

As expected, the top 2 queries are sent to hostgroup2 (the slaves).
Table stats_mysql_query_digest allows us to aggregate results, for example:

Query Caching

A popular use of ProxySQL is to act as a query cache. By default, queries aren’t cached, but it can be enabled setting cache_ttl (in milliseconds) in mysql_query_rules .

Assume we want to also cache for 5 seconds all the queries sent to slaves.

Now, we can run the load test again:

We can now verify the content of table stats_mysql_query_digest :

It is possible to see that what used to be the top 2 queries and were being sent to the hostgroup2 , have the following characteristics now:

  • they are still sent to hostgroup2
  • if they are present in the query cache, they aren’t sent to any hostgroup and marked with a special hostgroup -1
  • the total execution time for the queries cached is 0 (this means that the request was served within the same events loop)

Note: currently it is not possible to define the maximum amount of memory used by the query cache, neither is it possible to force a selective or complete flush of the query cache.
Right now, it is possible to control the memory footprint and the life of result set only through cache_ttl in mysql_query_rules : choose cache_ttl wisely until more control over query cache will be available.

Query Rewrite

ProxySQL supports multiple ways to match a query, like flagIN, username, schemaname.

The most common way to match a query is writing a regular expression that matches the text of the query itself.
To match the text of a query ProxySQL provides 2 mechanisms, using 2 different fields:

  • match_digest : it matches the regular expression again the digest of the query, as represented in stats_mysql_query_digest.query_digest
  • match_pattern : it matches the regular expression again the unmodified text of the query

Why are there two different mechanisms? The digest of a query can be extremely smaller than the query itself (for example, an INSERT statement with several MB of data), thus running a regex against a smaller string is surely faster.
So, in case you aren’t trying to match a specific literal in the query, it is recommended (faster) to use match_digest .

Although, if you want to rewrite queries, you must match against the original query (using match_pattern), because it is the original query that needs to be rewritten.

An example:

Let’s try this new rule.

Something looks wrong, as no rewrite seems to have happened.
This is intentional, so we can now troubleshoot.
A very useful table for troubleshooting is stats.stats_mysql_query_rules :

It seems clear that something is wrong: rule with rule_id=30 has 0 hits!

The problem is that rule with rule_id=20 also matches queries that would match in rule_id=30 , although apply=1 in rule number 20 will prevent them from reaching rule number 30.
Let’s fix this:

Now it should work!
Let’s load the rules again:

Note: when running LOAD MYSQL QUERY RULES TO RUNTIME not only internal query processing structures are reset, but the counters in stats.stats_mysql_query_rules as well:

Let’s try again:

And now we can verify:

There are rewrites, it looks good :-)

What about query execution ?

Note:
Rules with rule_id=20 and rule_id=30 can be merged together into a single rule.
They are separated to describe the importance of the apply field, and that not only multiple rules can match the same query, but multiple rules can transform and apply settings to the same query.

A few more example of query rewrite.

We want to rewrite queries like:

into:

But only for ids between 1000 and 3999 …
I know, this makes no sense, it is just to show some potential, including the ability for some complex sharding!!

How does this look in regex? :-)

Note that “c” and “1” (in sbtest1) are selected just to show the syntax

Let’s try it.

Did it work? Apparently it did :)

Exporting configuration into file

When the ProxySQL server is configured using the above statements it might be necessary to copy runtime configuration to another ProxySQL server. In other cases it might be convenient to back up configuration into the file. The following commands are available for these cases:

  1. SELECT CONFIG INTO OUTFILE /tmp/f1
    This command saves ProxySQL runtime settings into the specified file in the same format as the /etc/proxysql.cnf file. The command allows to store current runtime configuration and reuse it later. This command has the alias SAVE CONFIG TO FILE /tmp/f2.

  2. SELECT CONFIG FILE
    This command exports ProxySQL runtime settings in the format of the resultset. The resultset has one row with one column which contains configuration in the format of the /etc/proxysql.cnf file. Usage example: