Scaling with ProxySQL Query Cache

  • Date:
  • Tags: proxy mysql caching scalability
ProxySQL Query Cache

MySQL Query Cache

Before writing about ProxySQL Query Cache, let's see what MySQL Query Cache is.

MySQL Query Cache is a very interesting feature that, quoting the documentation:

stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again

It is a cache, so it is meant to improve performance. Although, it is not the magic bullet and it is not unusual to see severe performance degradation or random freezes. Why?
There is a series of well written articles from Peter Zaitsev that describe what MySQL Query Cache is, and a list of idea about giving it a second chance.

But the truth is that MySQL Query Cache is not able to scale due to its locking and invalidation algorithms. I won't repeat the technical details of why MySQL Query Cache doesn't scale: the mentioned articles describe the reason very well.
And if you want to tune your MySQL Query Cache, I strongly recommend query cache tuner by Domas Mituzas!

ProxySQL Query Cache

ProxySQL Query Cache has a completely different nature than MySQL Query Cache.
It is an in-memory key/value storage that uses:

  • as key a combination of username, schema, and query text
  • as value the resultset returned by the backend (mysqld, or another proxysql)

The only way to invalidate entries from ProxySQL QC is through a time-to-live in milliseconds.
Some believes that invalidation through TTL is a limitation, but this isn't the case for many applications.
If application needs absolutely correct data, transparent caching is perhaps not the correct solution.
Any application that can accept to read slightly stale data from a slave, can benefit from QC.
The concept isn't new at all, and there are implementations of query cache in the driver itself: mysqlnd is an example.

Benchmark on Query Cache

The reason why I described MySQL Query Cache and ProxySQL Query Cache is that they are different by nature.
This also means that comparing MySQL QC vs ProxySQL QC is not trivial, and it is not compating apples to apples.

MySQL Query Cache is known to not scale very well. But it still great to get real numbers.
A recent detailed benchmark I found about MySQL Query Cache inability to scale is a blog post from Szymon Komendera (Lead Database Engineer at Amazon Aurora). In the same blog post, Aurora with a QC of 4GB can boost MySQL performance up to 3.1x times (take note of this value).
I will run benchmark following the same methodologies and see if I get similar results on MySQL Query Cache, and see how much ProxySQL Query Cache can boost performance.

Initial setup:

  • 2 client hosts with sysbench 0.5 .
    We are using 2 client hosts because on this hardware a single client is not able to generate enough traffic to push ProxySQL Query Cache to its limits.
    sysbench was executed as:

./sysbench --num-threads=512 --max-time=900 --max-requests=0 --test=./tests/db/oltp.lua --mysql-user=sbtest --mysql-password=sbtest --mysql-host=10.1.1.22 --oltp-table-size=10000000 --mysql-port=${PORT} --mysql-ps-mode=disable --oltp-read-only=on --oltp-point-selects=25 --oltp-skip-trx=on --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-dist-type=uniform run

  • 1 host with mysqld (Percona Server 5.6.25) and proxysql (1.4.0) installed locally

In all the benchmarks, the whole dataset was already into InnoDB buffer pool (in memory, no IO involved) and query cache was reset before every benchmark.

Query Cache results

The above results show that indeed MySQL QC is unable to scale, and enabling it can cause a perform drop of 84%.
On the other hand, ProxySQL QC boosts performance up to 3.3x .

Another interesting result worth noticing is how results differ depending from the length of the benchmark, comparing results of benchmark running for 1 minute or 15 minutes.
With MySQL Query Cache, the longer the benchmark, the lower the throughput (visible drop).
With ProxySQL Query Cache there is no drop in throughput with a longer benchmark, but a performance improvement of 1% that perhaps can be considered a simple fluctuation.

Yet another note about the results above: these are obtained with a workload that could generate up to 10 millions unique SELECT statements and therefore up to 10 millions entries in the query cache, because the table size is 10 millions.
A smaller value for --oltp-table-size would lead to way higher results for MySQL without QC and ProxySQL QC. In fact, out of curiousity, with --oltp-table-size=1000000 a single ProxySQL instance can return more than 1 million QPS on this hardware.

Moving the Query Cache closer to the application

So far, I ran ProxySQL co-located with MySQL Server. Why? This was done to emulate the current expectation of Query Cache right in front of the data itself.
Although I believe that for many workloads, the caching layer should not be close to where the data is stored (the backend), but should be close to where the data is consumed (the frontend). Again, mysqldnd is an example.

What happens if, using the same hardware from previous benchmark, we move ProxySQL away from the database server and into the application server?
We will now have two ProxySQL instances.

ProxySQL Query Cache in application

Not surprising, it scales better. The database server now needs to only execute the queries that are not in the query caches located in the application servers.
When ProxySQL is located in the database server, Query Cache boosts performance up to 3.3x.
When ProxySQL is located in the client, Query Cache boosts performance up to 5.2x !

Query Cache results

Can we achieve even better result? Probably yes. Because the Query Cache can be moved around and be decentralized (no more in the database server), we can also create complex setup, and shard the caching layer itself. For example, it is possible to create a two caching shards each one of them processing and caching only half of the queries, or it is also possible to create a multi-layer caching system.

Conclusion

Although MySQL Query Cache was meant to improve performance, it has serious scalability issues and it can easily become a severe bottleneck.
ProxySQL Query Cache can drastically boost performance for some specific workloads: read intensive workloads with a lot of resultset that can be cached. ProxySQL also allows to decentralize the caching layer, and move the cache away from the database server and closer to the application.