Testing performance of MySQL Group Replication

MySQL Group Replication is a hot topic in MySQL ecosystem. We have been evaluating it in , and it is now time to evaluate its performance, especially now that is GA. In our setup, we wanted to evaluate how it performs compared to standard asynchronous MySQL replication. Therefore MySQL 5.7.17 was first tested with regular async replication, MTS, semi-sync, and finally Group Replication.

Setup

The first testing environment is a 3 nodes “cluster”, with 1 master and 2 async slaves in the same network.
Each node has Intel Xeon CPU E5-2660 v2 @ 2.20GHz , 20 cores.

For benchmark we used sysbench running on a 4th node, using a database with 8 tables with 10M rows each, therefore the dataset completely fits in memory.
sysbench was executed as:

/sysbench --num-threads=64 --max-time=300 --max-requests=0 --test=./lua/oltp_update_index.lua --mysql-user=sbtest --mysql-password=sbtest --mysql-host=10.11.22.42 --mysql-port=5717 --oltp-table-size=10000000 --oltp-tables-count=8 --report-interval=10 --oltp-auto-inc=off run

This specific benchmark is being executed only with 64 connections because the number of connections is not relevant at this stage, as the bottleneck is the single-threaded replication.

While the master was able to process 47332 trx/sec, the slaves were only able to process around 2065 trx/sec . Assuming replication lag is not ideal, we should try to not exceed the number of transactions that the slave is able to proceed.

Parallel replication

Since MySQL 5.6 it is possible to enable slave parallel workers (or multi-threaded slave, MTS).
Although, several limitations apply (see the doc).
In MySQL 5.7 a new variable slave_parallel_type controls the algorithm used for MTS, and the two possible values are DATABASE (5.6 style) and LOGICAL_CLOCK . We enabled 16 parallel workers with slave_parallel_workers=16. See doc.

For this specific workload, slave_parallel_type=DATABASE does not show much benefit as all tables are in the same schema while slave_parallel_type=LOGICAL_CLOCK shows a great performance boost.

Semi-sync replication

Now that we have such base metric, the next configuration I tried was to enable semi-sync, as this is what we use in production.
Interestingly, with semi-sync and slave_parallel_type=LOGICAL_CLOCK there is absolutely no replication lag. As the master is slowed down by semi-sync, with slave_parallel_type=LOGICAL_CLOCK the slave is able to process all the replication events without any lag.

The follow graph shows:

thoughput of the master without semi-sync thoughput of a slave in
the same rack without MTS thoughput of a slave in the same rack and slave_parallel_type=DATABASE
thoughput of a slave in the same rack and slave_parallel_type=LOGICAL_CLOCK
thoughput of the master with semi-sync enabled and slaves in the same rack and slave_parallel_type=LOGICAL_CLOCK
thoughput of the master with semi-sync enabled and slaves in different DCs and slave_parallel_type=LOGICAL_CLOCK

Group Replication

While with async replication we only tested the throughput with 64 connections on a write intensive workload to understand the upper boundaries, with Group Replication we tried several number of connections. In every test, only one writer node (single master) was used.

Group Replication – same rack

During the first series of tests, the 3 nodes were located on the same rack, therefore minimal network latency. The results include both throughput and 95% response time.

Group Replication – multiple facilities

Running a cluster with all the nodes on the same rack creates a single point of failure, therefore the first series of benchmark was executed just to have a baseline for further benchmark.
During the second series of tests, the 3 nodes were located on 3 different facilities/DC, therefore increased latency.
The results include both throughput and 95% response time.

Conclusion

It is possible to draw several conclusions from the above results, some very positive, while others show space for improvement.

the introduction of slave_parallel_type=LOGICAL_CLOCK in 5.7 makes possible to achive replication performance impossible in MySQL
5.6 . If you need high throughput in replication, you should consider to upgrade to 5.7, no matter if using Group Replication or not.
the same parallelism algorithm (slave_parallel_type=LOGICAL_CLOCK) shows very good performance in Group Replication
Not much difference was noticed in throughput and response time between the cases where the 3 nodes are located in the same rack or
across 3 different DCs
Group Replication flow control makes performance at 32 and 64 connections a bit unstable
every 60 seconds there is a performance drop, but fortunately this can be noticed only at high throughput. See bug 84774

FOSDEM 2017

Final note: Remember to join us at FOSDEM and pre-FOSDEM MySQL Day with plenty of talks on Group Replication and other interesting topics, like ProxySQL!