ProxySQL’s new routing algorithm to handle hundreds of thousands of schemas/shards
ProxySQL is a service designed to scale and handle traffic in very large setups.
In the past we already showed examples of how ProxySQL can handle thousands of MySQL servers, and millions (yes, millions!) of distinct users.
Although, recently a new challenge raised: can ProxySQL perform routing based on MySQL schemaname for a vary large number of schemas?
We know that in production setups ProxySQL is already performing routing to MySQL servers based on schemaname for few hundreds schemas, but can it handle thousands or more?
Technically there is no limit on how many shards ProxySQL can handle, as long as there are rules for them.
Tibor Korocz already pointed in a blog post that more query rules can affect performance.
In case of query routing based on
schemaname no regular expressions are involved (therefore each rule is not expensive), but the same principle applies: the more rules need to be evaluated, longer it takes to make a decision (in this case to perform the routing).
Scope of this blog post is to understand the performance implication of performing routing based on the number of rules.
To simplify the setup to run some benchmark, we used only 1 MySQL server as backend, and we created 50000 schemas:
for i in `seq 10001 60000` ; do echo "CREATE DATABASE IF NOT EXISTS shard_$i;" done | mysql -u root
To run the benchmark we used a various number of rules.
For every test, we configured ProxySQL setting the right number of rules: for 100 schemas we created 100 rules, for 1000 schema 1000 rules, etc.
For example, for 200 rules we used this configuration:
( echo "DELETE FROM mysql_query_rules; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 10200` ; do echo "INSERT INTO mysql_query_rules (active,username,schemaname,destination_hostgroup,apply) VALUES (1,\"sbtest\",\"shard_$i\",1,1);" done echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032
We then ran queries as the following:
for j in `seq 1 50000` ; do echo "USE shard_$(($RANDOM%200+10001))" ; echo "SELECT 1;" done | mysql -u sbtest -psbtest -h 127.0.0.1 -P6033 --ssl-mode=disabled -NB > /dev/null
It is relevant to note that we aren’t interested in the total execution time (this is why we used this simple one-liner to generate traffic), but in the time spent inside the Query Processor.
Because the execution time of the queries is not relevant, we also configured the query cache.
To compute the time spent in the Query Processor we set variable
mysql-stats_time_query_processor='true' and computed the value of status variable
Query_Processor_time_nsec. This status variable measures the time spent inside the Query Processor, in nanoseconds.
As we expected, the more query rules ProxySQL needs to evaluate, the longer it takes to compute the destination hostgroup for that schemaname. This results in latency before executing the query:
The graph above shows that for 10 rules the average latency is around 1us (microsecond), for 100 rules the average latency is 2us , and that it gradually grows to 6us for 500 rules and 256us for 20000 rules.
6us average latency for 500 rules doesn’t seem to be an issue, but 256us (0.256ms) average latency for 20000 does seem like an issue.
What is also important to note is that these values are the average values for all the shards. Routing decision for some shards are faster than others, depending in which order the rules are written.
In fact, with 20000 rules the average latency for
shard_10001 is 0.89us , while the average latency for
shard_30000 is 581.1us !
Surely, this doesn’t seem a scalable solution.
MySQL Query Rules Fast Routing
ProxySQL 1.4.7 introduces a new routing capability that enhances what already configurable in
The new capability is configurable using table
mysql_query_rules_fast_routing. This table is simpler compared to
mysql_query_rules, and its purpose is simple: given
flagIN, finds the
For reference, this is the table definition of
CREATE TABLE mysql_query_rules_fast_routing ( username VARCHAR NOT NULL, schemaname VARCHAR NOT NULL, flagIN INT NOT NULL DEFAULT 0, destination_hostgroup INT CHECK (destination_hostgroup >= 0) NOT NULL, comment VARCHAR NOT NULL, PRIMARY KEY (username, schemaname, flagIN) )
mysql_query_rules_fast_routing should be considered as an extension of
mysql_query_rules. After processing the rules in
mysql_query_rules, the rules in
mysql_query_rules_fast_routing will be evaluated, unless the last matching rule defined in
Because of the nature of the rules in
mysql_query_rules_fast_routing, one and only one rule will be evaluated.
The rules defined in
mysql_query_rules_fast_routing are loaded in a hash table, where the key is
FlagIN , and the value is the
New routing algorithm in action
This is the command we used to configure
mysql_query_rules_fast_routing with a different number of rules, up to 50000 rules in this example:
( echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing;" echo "INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 60000` ; do echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");" done echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032
The results are really interesting:
In other words:
- between 10 rules to 10k rules, the average latency is between 1.7us to 1.9us
- at 20k rules the average latency is 2.2us
- at 50k rules the average latency is 2.4us
What is really interesting is that this new routing algorithm allows ProxySQL to perform schema routing for up to 10000 shards introducing less latency the old routing algorithm was introducing for 100 shards, and it scales very well to 50k rules wth very small overhead compared to 100 shards in the old routing algorithm.
Disabling mysql-stats_time_query_processor by default
In ProxySQL 1.4.4 the new variable
mysql-stats_time_query_processor was introduced to dynamically enable or disable measuring the time spent in the Query Processor. Although timing is very important to understand the impact of having too many rules, measuring time elapse with good precision has some performance impact.
In fact, on this same server where we ran these benchmark, measuring time elapse with very good precision has a cost (latency) of 0.3us , mostly spent in kernel space (system calls).
That also means that from the metrics above you should remove 0.3us to determine the latency when measuring time is not enabled. For example, “at 50k rules the average latency is 2.4us” should become “at 50k rules the average latency is 2.1us” .
For this reason, since ProxySQL 1.4.4 ,
mysql-stats_time_query_processor is disabled by default.
Why flagIN is relevant?
The new routing algorithm performs routing based on
I think some are asking why
flagIN is relevant.
mysql_query_rules_fast_routing allows to set
mysql_query_rules based on other criteria later relevant for routing.
For example, assume that you don’t want to only perform routing based on
schemaname, but also read/write split.
You can use
flagOUT = 1for all queries to be sent to the server that is the master, no matter in which cluster
flagOUT = 2for all queries to be sent to the server that is the slave, no matter in which cluster
The value of
flagOUT resulting from
mysql_query_rules will become the
mysql_query_rules_fast_routing. This will allow to combine read/write split to routing based on
For benchmark purpose, let’s create 200.000 rules:
( echo "DELETE FROM mysql_query_rules; DELETE FROM mysql_query_rules_fast_routing; INSERT INTO mysql_query_rules (active,username,cache_ttl) VALUES (1,\"sbtest\",120000);" for i in `seq 10001 60000` ; do echo "INSERT INTO mysql_query_rules_fast_routing (username,schemaname,flagIN,destination_hostgroup,comment) VALUES (\"sbtest\",\"shard_$i\",0,1,\"\");" done echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+1, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;" echo "INSERT INTO mysql_query_rules_fast_routing SELECT username, schemaname, FlagIN+2, destination_hostgroup, comment FROM mysql_query_rules_fast_routing;" echo "LOAD MYSQL QUERY RULES TO RUNTIME;" ) | mysql -u admin -padmin -h 127.0.0.1 -P6032 mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT COUNT(*) FROM mysql_query_rules_fast_routing" +----------+ | COUNT(*) | +----------+ | 200000 | +----------+
Updating the graph after testing routing based on 200k rules:
As you can see from the graph above, ProxySQL is now able to perform routing decision based on 200k rules with almost no extra impact compared to 100 rules in the legacy algorithm.
Drawback of the new algorithm
This new routing algorithm seems to have a minimum overhead of 1.7us (or 1.4us with
The minimum overhead is not present in the old/legacy algorithm, therefore it makes sense to use the new routing algorithm only if you have more than 100 routing rules.
Furthermore, ProxySQL Cluster capability doesn’t support
As of v2.0.0
mysql_query_rules_fast_routing is supported also in ProxySQL Cluster mode.
To reduce contention between worker threads in ProxSQL, threads maintain their own copy of query rules. This means that the more rules you have and the more threads you have configured, the more memory is needed. In this example, loading 200k rules and using 4 threads lead to a memory usage of 280MB.
In future we plan to make configurable if, for query processing, ProxySQL should use more memory and be lock free, or use less memory and not be lock free, or a mix of the two options:
mysql_query_rulesto be lock free
mysql_query_rules_fast_routingto be shared between threads
These are implementation details we will be looking into when planning to optimize ProxySQL even further.
ProxySQL 1.4.7 introduces a new routing algorithm that doesn’t replace the legacy algorithm, but enhances it.
Thank to the new routing algorithm, now ProxySQL can easily handle routing based on hundreds of thousands of schemas/shards with almost no impact (few microseconds).