Query rewrite with ProxySQL: use case scenario

One of ProxySQL most interesting features is query rewrite.

This functionality is one of the main reasons to write a proxy that sits between the application and the database server: to rewrite queries on the fly, without the need to have a developer involved in rewriting said query ASAP (if ever possible!).

In this short post I will describe a use case scenario on how to use ProxySQL to troubleshoot performance and quickly rewrite queries if needed. The setup in this specific blog post is the Nylas platform. Nylas is a sharded environment, which has over 80 ProxySQL instances running.

The Issue

We noticed that an odd workload originated from Nylas’ open source sync engine. See the below graph on InnoDB rows read:

Analysis

Therefore, we asked ProxySQL to get the list of the most time consuming queries:

mysql> SELECT SUM(sum_time), SUM(count_star), digest_text FROM stats_mysql_query_digest GROUP BY digest ORDER BY SUM(sum_time) DESC LIMIT 3\G
*************************** 1. row ***************************
  SUM(sum_time): 1907314442459
SUM(count_star): 15033880
    digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at, 
trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS
 trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, 
trx.command AS trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?) AND trx.object_type = ? 
AND trx.command = ? ORDER BY trx.id ASC LIMIT ?
*************************** 2. row ***************************
  SUM(sum_time): 1753837589659
SUM(count_star): 3533157
    digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at,
 trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS
 trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS
 trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?, ?) AND trx.object_type = ? AND trx.command = ? 
ORDER BY trx.id ASC LIMIT ?
*************************** 3. row ***************************
  SUM(sum_time): 764491985658
SUM(count_star): 1705938
    digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS trx_updated_at,
trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id, trx.object_type AS 
trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS
trx_command FROM trx WHERE trx.id >= ? AND trx.namespace_id IN (?, ?, ?) AND trx.object_type = ? AND trx.command = ? 
ORDER BY trx.id ASC LIMIT ?
3 rows in set (0.35 sec)

Then we also asked for the query with the highest average execution time:

mysql> SELECT SUM(sum_time), SUM(count_star), SUM(sum_time)/SUM(count_star) avg,  
digest_text FROM stats_mysql_query_digest WHERE digest_text LIKE 'SELECT trx.public_id AS
 trx_public_id%' GROUP BY
 digest ORDER BY SUM(sum_time)/SUM(count_star) DESC LIMIT 1\G
*************************** 1. row ***************************
  SUM(sum_time): 1753837589659
SUM(count_star): 3533157
            avg: 496393
    digest_text: SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.updated_at AS
 trx_updated_at, trx.deleted_at AS trx_deleted_at, trx.id AS trx_id, trx.namespace_id AS trx_namespace_id,
 trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id AS 
trx_object_public_id, trx.command AS trx_command FROM trx WHERE trx.id >= ? AND 
trx.namespace_id IN (?, ?) AND trx.object_type = ? AND trx.command = ? ORDER BY trx.id ASC LIMIT ?

We could have gotten the same information from information_schema.events_statements_summary_by_digest. However, we already knew that the load was coming from a specific node so we queried stats.stats_mysql_query_digest inside ProxySQL itself.

From the output above, it was obvious that the problem was a specific query type running at that time.

Without copying the whole query again, it could be simplified as:

SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ?
    AND idx_col1 IN (? , ?)
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?

col2 and col3 are not indexed, because they do not provide enough cardinality. However, idx_col1 does, so it is indexed. Because of the WHERE and the ORDER BY clauses, mysqld performs a range scan on the primary key. The optimizer could optimize this query if it were smart enough. But it isn’t.

Therefore, the query can be rewritten as:

SELECT * FROM (
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ?
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
UNION ALL
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ?
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
) t ORDER BY PK LIMIT ?

The Solution: Query Rewrite

How do you solve this with ProxySQL? Extremely easy!

The snippet below shows how to create a rule to rewrite the query, load the new rule into runtime, and persist the new rule on disk:

INSERT INTO mysql_query_rules (rule_id,active,flagIN,match_pattern, replace_pattern) VALUES (92,1,0,
'^SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id, trx.namespace_id
 AS trx_namespace_id, trx.object_type AS trx_object_type, trx.record_id AS trx_record_id, trx.object_public_id
 AS trx_object_public_id, trx.command AS trx_command(.*)\nFROM trx(.*)\nWHERE trx.id >= (\d+) 
AND trx.namespace_id IN \((\d+), (\d+)\) AND trx.object_type = (.*) ORDER BY trx.id ASC(.*)\n LIMIT (.*)$',
'SELECT * FROM (
(SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id,
trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type,
trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command \1
FROM trx \2 WHERE trx.id >= \3 AND trx.namespace_id = \4 AND trx.object_type = \6 ORDER BY trx.id ASC LIMIT \8)
UNION ALL
(SELECT trx.public_id AS trx_public_id, trx.created_at AS trx_created_at, trx.id AS trx_id,
trx.namespace_id AS trx_namespace_id, trx.object_type AS trx_object_type,
trx.record_id AS trx_record_id, trx.object_public_id AS trx_object_public_id, trx.command AS trx_command \1
FROM trx \2 WHERE trx.id >= \3 AND trx.namespace_id = \5 AND trx.object_type = \6 ORDER BY trx.id ASC LIMIT \8)
) t ORDER BY trx_id LIMIT \8'
);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

From the following graph of InnoDB rows read, the effect of the new query rewrite is immediately clear:

Conclusion

Query rewrite is a feature that, depending from the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.