ProxySQL Native Galera Support

  • Date:
  • Tags: proxy mysql galera cluster release proxysql ha gpl 2.0.0 2.0.1 2.0.2
Galera Cluster

One of the latest enhancements in ProxySQL v2.0 is native support for Galera Clustering. In previous versions of ProxySQL an external scheduler was required to track the status of Galera nodes however due to the widespread use we have now integrated support directly in ProxySQL's core configuration.

This blog discusses how to take an advantage of the new feature and integrate ProxySQL with Galera Cluster to monitor node status and implement read-write split with ProxySQL using a 3x node cluster.

So, let’s have a look at whats new in ProxySQL's admin interface! In the admin interface you'll find the following new tables and variables available to configure your Galera cluster and monitor the cluster's status.

ProxySQL Admin

The definition of the mysql_galera_hostgroups table used to configure your Galera cluster is as follows:
        +--------------------------------------------+
        | tables                                     |
        +--------------------------------------------+
        | [..]                                       |
        | mysql_galera_hostgroups                    |
        | [..]                                       |
        | runtime_mysql_galera_hostgroups            |
        | [..]                                       |
        +--------------------------------------------+
        CREATE TABLE mysql_galera_hostgroups (
            writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY,
            backup_writer_hostgroup INT CHECK (backup_writer_hostgroup>=0 AND backup_writer_hostgroup<>writer_hostgroup) NOT NULL,
            reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND backup_writer_hostgroup<>reader_hostgroup AND reader_hostgroup>0),
            offline_hostgroup INT NOT NULL CHECK (offline_hostgroup<>writer_hostgroup AND offline_hostgroup<>reader_hostgroup AND backup_writer_hostgroup<>offline_hostgroup AND offline_hostgroup>=0),
            active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
            max_writers INT NOT NULL CHECK (max_writers >= 0) DEFAULT 1,
            writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1,2)) NOT NULL DEFAULT 0,
            max_transactions_behind INT CHECK (max_transactions_behind>=0) NOT NULL DEFAULT 0,
            comment VARCHAR,
            UNIQUE (reader_hostgroup),
            UNIQUE (offline_hostgroup),
            UNIQUE (backup_writer_hostgroup));

The fields have the following semantics:

  • writer_hostgroup: the id of the hostgroup that will contain all the members that are writers
    backup_writer_hostgroup: if the cluster is running in multi-primary mode (i.e. there are multiple nodes with read_only=0) and max_writers is set to a smaller number than the total number of nodes, the additional nodes are moved to this backup writer hostgroup
  • reader_hostgroup: the id of the hostgroup that will contain all the members that are readers (i.e. nodes that have read_only=1)
  • offline_hostgroup: when ProxySQL's monitoring determines a host to be OFFLINE, the host will be moved to the offline_hostgroup
  • active: a boolean value (0 or 1) to activate a hostgroup
  • max_writers: controls the maximum number of allowable nodes in the writer hostgroup, as mentioned previously, additional nodes will be moved to the backup_writer_hostgroup
  • writer_is_also_reader: when 1, a node in the writer_hostgroup will also be placed in the reader_hostgroup so that it will be used for reads. When set to 2, the nodes from backup_writer_hostgroup will be placed in the reader_hostgroup_, instead of the node(s) in the writer_hostgroup.
  • max_transactions_behind: determines the maximum number of writesets a node in the cluster can have queued before the node is SHUNNED to prevent stale reads (this is determined by querying the wsrep_local_recv_queue Galera variable).

For reference: https://github.com/sysown/proxysql/wiki/Main-(runtime)#mysql_galera_hostgroups

The definition of the monitor.mysql_server_galera_log table used to monitor your Galera cluster is as follows:
        +------------------------------------+
        | tables                             |
        +------------------------------------+
        | [..]                               |
        | mysql_server_galera_log            |
        | [..]                               |
        +------------------------------------+
        CREATE TABLE mysql_server_galera_log (
            hostname VARCHAR NOT NULL,
            port INT NOT NULL DEFAULT 3306,
            time_start_us INT NOT NULL DEFAULT 0,
            success_time_us INT DEFAULT 0,
            primary_partition VARCHAR NOT NULL DEFAULT 'NO',
            read_only VARCHAR NOT NULL DEFAULT 'YES',
            wsrep_local_recv_queue INT DEFAULT 0,
            wsrep_local_state INT DEFAULT 0,
            wsrep_desync VARCHAR NOT NULL DEFAULT 'NO',
            wsrep_reject_queries VARCHAR NOT NULL DEFAULT 'NO',
            wsrep_sst_donor_rejects_queries VARCHAR NOT NULL DEFAULT 'NO',
            error VARCHAR,
            PRIMARY KEY (hostname, port, time_start_us))

The fields have the following semantics:

  • hostname: the Galera node hostname or IP address
  • port: the Galera MySQL service port
  • time_start_us: the time the monitor check was started (microseconds)
  • success_time_us: the amount of time for the monitor check to complete (microseconds)
  • primary_partition: whether the Galera member node is PRIMARY
  • read_only: whether the node is READ ONLY
  • wsrep_local_recv_queue: the length of the receive queue during the check see wsrep_local_recv_queue
  • wsrep_local_state: the node's local state number see wsrep_local_state
  • wsrep_desync: whether the node has been set to desync see wsrep_desync
  • wsrep_reject_queries: whether the node has been set to reject queries see wsrep_reject_queries
  • wsrep_sst_donor_rejects_queries: whether the node has been set to reject queries when donor see wsrep_sst_donor_reject_queries
  • error: any error messages that occurred while checking a node
The global variables to control timeout and interval check:
        ProxySQL Admin> select * from global_variables where variable_name like '%monitor_galera%';
        +-------------------------------------------+----------------+
        | variable_name                             | variable_value |
        +-------------------------------------------+----------------+
        | mysql-monitor_galera_healthcheck_interval | 5000           |
        | mysql-monitor_galera_healthcheck_timeout  | 800            |
        +-------------------------------------------+----------------+

ProxySQL Configuration

We will configure the 3x node cluster in ProxySQL as follows:

  • one writer node responsible for handling all write traffic
  • one backup writer node which will be available as a standby in case the primary writer node goes offline (or becomes unavailable for writes)
  • one reader node for handling SELECT traffic
ProxySQL Galera Cluster

The nodes that will be used are:

  • 172.16.1.112: db-node01
  • 172.16.1.113: db-node02
  • 172.16.1.114: db-node03

First connect to the admin interface to start configuring ProxySQL:

        mysql -P6032 -uadmin -padmin -h 127.0.0.1 --prompt "ProxySQL Admin> "

Now we can set up ProxySQL’s behavior for our Galera cluster. Lets setup the following hostgroups:

  • offline_hostgroup with hostgroup_id=1
  • writer_hostgroup with hostgroup_id=2
  • reader_hostgroup with hostgroup_id=3
  • backup_writer_hostgroup with hostgroup_id=4

We'll set max_writers=1 to ensure we only have 1x writer at a time and also configure the writer to be dedicated for writes ONLY so we'll also set writer_is_also_reader=0.

        INSERT INTO mysql_galera_hostgroups 
            (offline_hostgroup,
             writer_hostgroup,
             reader_hostgroup,
             backup_writer_hostgroup,
             active,
             max_writers,
             writer_is_also_reader,
             max_transactions_behind) 
          VALUES
            (1,2,3,4,1,1,0,100);

We need to manually populate the mysql_servers table with information about our Galera nodes. The node with the lowest weight will be moved to the backup_writer_hostgroup automatically after loading configuration into runtime, for now we'll just add it to the writer_hostgroup.

        INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight)
          VALUES (2,'172.16.1.112',3306,100);

        INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight)
          VALUES (2,'172.16.1.113',3306,10);

        INSERT INTO mysql_servers(hostgroup_id,hostname,port,weight)
          VALUES (3,'172.16.1.114',3306,100);

Now, we can verify the configured tables:

        ProxySQL Admin> select hostgroup_id,hostname,port,status,weight,max_connections from mysql_servers;
        +--------------+--------------+------+--------+--------+-----------------+
        | hostgroup_id | hostname     | port | status | weight | max_connections |
        +--------------+--------------+------+--------+--------+-----------------+
        | 2            | 172.16.1.112 | 3306 | ONLINE | 100    | 1000            |
        | 2            | 172.16.1.113 | 3306 | ONLINE | 10     | 1000            |
        | 3            | 172.16.1.114 | 3306 | ONLINE | 100    | 1000            |
        +--------------+--------------+------+--------+--------+-----------------+

        ProxySQL Admin> select * from mysql_galera_hostgroups;
        +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
        | writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment |
        +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
        | 2                | 4                       | 3                | 1                 | 1      | 1           | 0                     | 100                     | NULL    |
        +------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+---------+
        1 row in set (0.00 sec)

We can now load our configuration to runtime and also save the configuration to disk to persist across restarts:

        LOAD MYSQL SERVERS TO RUNTIME;
        SAVE MYSQL SERVERS TO DISK;

After loading the configuration to runtime, we can now see that host 172.16.1.113, which is configured with a lower weight, has been moved to hostgroup 4. To recap, this happened because we configured ProxySQL to have max_writers=1 and backup_writer_hostgroup=4.

The backup writer node will only be used in case the node 172.16.1.112 becomes unavailable.

        ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host;
        +-----------+--------------+--------+----------+-------------+-----------+------------+
        | hostgroup | srv_host     | status | ConnUsed | MaxConnUsed | Queries   | Latency_us |
        +-----------+--------------+--------+----------+-------------+-----------+------------+
        | 2         | 172.16.1.112 | ONLINE | 3        | 4           | 930742390 | 118        |
        | 4         | 172.16.1.113 | ONLINE | 0        | 0           | 0         | 136        |
        | 3         | 172.16.1.114 | ONLINE | 1        | 1           | 233130345 | 123        |
        +-----------+--------------+--------+----------+-------------+-----------+------------+

Now it’s time to go ahead and define some query rules, which will handle read/write split. For illustrative purposes we'll just use some generic rules to redirect SELECT traffic (NOTE: IT IS NOT RECOMMENDED TO USE GENERIC QUERY RULES IN A PRODUCTION ENVIRONMENT, QUERY RULES SHOULD BE GENERATED FOR SPECIFIC QUERIES OR QUERY PATTERNS INSTEAD).

The default_hostgroup for the application user in the mysql_users table is set to 2 i.e the writer_hostgroup.

  • 1st Rule: Query processor scans the query rule to find a match for ^SELECT.* pattern and if a match is found, ProxySQL will forward these queries to destination_hostgroup=3.
  • 2nd Rule: Queries with a ^SELECT.* FOR UPDATE pattern should always be served from the writer hostgroup, so we must set the destination_hostgroup=2.
  • All traffic not matching the above criteria will be routed to the default hostgroup i.e. the writer_hostgroup
        INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply)
          VALUES (1, '^SELECT.*',3, 0);

        INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply)
          VALUES (1, '^SELECT.* FOR UPDATE',2, 1);

        LOAD MYSQL QUERY RULES TO RUNTIME;
        SAVE MYSQL QUERY RULES TO DISK;

Make sure read_only is enabled in MySQL for any hosts that should be part of the reader_hostgroup and optionally set writer_is_also_reader=0 if you want to prevent hosts in your writer_hostgroup to be used for reads.

        db-node03 mysql> SET GLOBAL read_only=ON;

Alternatively, you can configure writer_is_also_reader=2: in this way you won't need to set read_only=ON because the host(s) in backup_writer_hostgroup will be used as reader(s).

When you are done with the configuration, as you can see below, there is a useful table in ProxySQL which helps to get a quick view of the state of each node of the cluster, as seen by ProxySQL:

        ProxySQL Admin> select * from mysql_server_galera_log order by time_start_us desc limit 3;
        +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
        | hostname     | port | time_start_us    | success_time_us | primary_partition | read_only | wsrep_local_recv_queue | wsrep_local_state | wsrep_desync | wsrep_reject_queries | wsrep_sst_donor_rejects_queries | error |
        +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
        | 172.16.1.114 | 3306 | 1529510693289001 | 1234            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
        | 172.16.1.113 | 3306 | 1529510693287804 | 1209            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
        | 172.16.1.112 | 3306 | 1529510693286879 | 1158            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
        +--------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+

Testing time!

Let’s enable a Donor/Desync state on the Galera node we've configured as the priority writer node and check how ProxySQL handles write traffic. When the writer node changes to a Donor/Desync status we expect to see ProxySQL move all write traffic to the backup writer node after promoting it from HG4 (backup_writer_hostgroup) to HG2 (writer_hostgroup).

1.) Check the initial ProxySQL configuration:

        ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host;
        +-----------+--------------+--------+----------+-------------+------------+------------+
        | hostgroup | srv_host     | status | ConnUsed | MaxConnUsed | Queries    | Latency_us |
        +-----------+--------------+--------+----------+-------------+------------+------------+
        | 2         | 172.16.1.112 | ONLINE | 4        | 4           | 2295114892 | 131        |
        | 4         | 172.16.1.113 | ONLINE | 0        | 0           | 0          | 162        |
        | 3         | 172.16.1.114 | ONLINE | 1        | 1           | 539211603  | 142        |
        +-----------+--------------+--------+----------+-------------+------------+------------+

2.) Desync db-node01 by setting wsrep_desync=ON

        db-node01 mysql> SET GLOBAL wsrep_desync=ON;

3.) Re-check ProxySQL's configuration:

        ProxySQL Admin> select hostgroup,srv_host,status,ConnUsed,MaxConnUsed,Queries,Latency_us from stats.stats_mysql_connection_pool order by srv_host;
        +-----------+--------------+--------+----------+-------------+-----------+------------+
        | hostgroup | srv_host     | status | ConnUsed | MaxConnUsed | Queries   | Latency_us |
        +-----------+--------------+--------+----------+-------------+-----------+------------+
        | 1         | 172.16.1.112 | ONLINE | 0        | 0           | 0         | 149        |
        | 2         | 172.16.1.113 | ONLINE | 3        | 4           | 1156479   | 129        |
        | 3         | 172.16.1.114 | ONLINE | 0        | 1           | 542028027 | 128        |
        +-----------+--------------+--------+----------+-------------+-----------+------------+

Great stuff! We see that db-node01 was moved to the offline_hostgroup as expected and db-node02 has been allocated to the writer_hostgroup in order to continue serving write traffic.

The same behaviour will occur when the primary writer node goes down and leaves the cluster. As a final note, its worthwhile to mention that apart from monitoring wsrep_dsync variable, ProxySQL is also continuously checking the status of wsrep_reject_queries and wsrep_sst_donor_rejects_queries variables and take a required action when needed.
Happy ProxySQLing !

Authored by: Ashwini Ahire & Nick Vyzas