Obfuscate data from mysqldump with ProxySQL

  • Date:
  • Tags: proxy mysql proxysql mysqldump obfuscate datamasking
safe

While me and Fred were presenting Inexpensive Datamasking for MySQL with ProxySQL at FOSDEM 2017, we got a very interesting question: can ProxySQL be used to obfuscate backups obtained from mysqldump?
The short answer is yes!

Instead of dumping directly from MySQL Server, you can point mysqldump to ProxySQL that will perform the datamasking/obfuscation.
The dump file can then be used to provision new staging database without using production data, and follow compliance requirements.
This extremely simplify the "common" approach of dumping original data and storing it in an not obfuscated format (not good for compliance I believe), load the original data into a database, and finally perform some obfuscating query (that can really take a lot of time).

For this example, we will use the employees database.
Employees data seems a very good example of data that needs to be masked!

In this blog we will assume that ProxySQL is already configured, and the employees database is already loaded.

If you are familiar with mysqldump, you will know that it runs SELECT /*!40001 SQL_NO_CACHE */ * FROM tablename .

In our first exercise, we do not want that the dump file generates from mysqldump includes any real salary.
Let's first have a look at how mysqldump will dump the salaries table:

$ mysqldump --user=sbtest --password=sbtest -h 127.0.0.1 -P6033 employees salaries --skip-extended-insert | less

In the output you will see something like:

DROP TABLE IF EXISTS `salaries`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `salaries`
--

LOCK TABLES `salaries` WRITE;
/*!40000 ALTER TABLE `salaries` DISABLE KEYS */;
INSERT INTO `salaries` VALUES (10001,60117,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,62102,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,66074,'1988-06-25','1989-06-25');
INSERT INTO `salaries` VALUES (10001,66596,'1989-06-25','1990-06-25');
INSERT INTO `salaries` VALUES (10001,66961,'1990-06-25','1991-06-25');
INSERT INTO `salaries` VALUES (10001,71046,'1991-06-25','1992-06-24');
INSERT INTO `salaries` VALUES (10001,74333,'1992-06-24','1993-06-24');
INSERT INTO `salaries` VALUES (10001,75286,'1993-06-24','1994-06-24');
INSERT INTO `salaries` VALUES (10001,75994,'1994-06-24','1995-06-24');
INSERT INTO `salaries` VALUES (10001,76884,'1995-06-24','1996-06-23');
INSERT INTO `salaries` VALUES (10001,80013,'1996-06-23','1997-06-23');
INSERT INTO `salaries` VALUES (10001,81025,'1997-06-23','1998-06-23');
INSERT INTO `salaries` VALUES (10001,81097,'1998-06-23','1999-06-23');
INSERT INTO `salaries` VALUES (10001,84917,'1999-06-23','2000-06-22');
INSERT INTO `salaries` VALUES (10001,85112,'2000-06-22','2001-06-22');
INSERT INTO `salaries` VALUES (10001,85097,'2001-06-22','2002-06-22');
INSERT INTO `salaries` VALUES (10001,88958,'2002-06-22','9999-01-01');

We can now create a rule in ProxySQL to mask the salaries.

Admin> INSERT INTO mysql_query_rules (rule_id,active,schemaname,match_pattern,replace_pattern) VALUES (1,1,'employees','^SELECT /\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries`','SELECT SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries');
Query OK, 1 row affected (0.00 sec)

Admin> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

We can now re-execute mysqldump to dump the salaries table:

$ mysqldump --user=sbtest --password=sbtest -h 127.0.0.1 -P6033 employees salaries --skip-extended-insert --set-gtid-purged=OFF | egrep '^INSERT' | head -n 10
Warning: Using a password on the command line interface can be insecure.
INSERT INTO `salaries` VALUES (10001,70472,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,20417,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,90668,'1988-06-25','1989-06-25');
INSERT INTO `salaries` VALUES (10001,92090,'1989-06-25','1990-06-25');
INSERT INTO `salaries` VALUES (10001,88446,'1990-06-25','1991-06-25');
INSERT INTO `salaries` VALUES (10001,65960,'1991-06-25','1992-06-24');
INSERT INTO `salaries` VALUES (10001,64463,'1992-06-24','1993-06-24');
INSERT INTO `salaries` VALUES (10001,24433,'1993-06-24','1994-06-24');
INSERT INTO `salaries` VALUES (10001,28777,'1994-06-24','1995-06-24');
INSERT INTO `salaries` VALUES (10001,70586,'1995-06-24','1996-06-23');
mysqldump: Got errno 32 on write
$ mysqldump --user=sbtest --password=sbtest -h 127.0.0.1 -P6033 employees salaries --skip-extended-insert --set-gtid-purged=OFF | egrep '^INSERT' | head -n 10
Warning: Using a password on the command line interface can be insecure.
INSERT INTO `salaries` VALUES (10001,29895,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,58190,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,1266,'1988-06-25','1989-06-25');
INSERT INTO `salaries` VALUES (10001,31761,'1989-06-25','1990-06-25');
INSERT INTO `salaries` VALUES (10001,55004,'1990-06-25','1991-06-25');
INSERT INTO `salaries` VALUES (10001,79739,'1991-06-25','1992-06-24');
INSERT INTO `salaries` VALUES (10001,33681,'1992-06-24','1993-06-24');
INSERT INTO `salaries` VALUES (10001,29187,'1993-06-24','1994-06-24');
INSERT INTO `salaries` VALUES (10001,44895,'1994-06-24','1995-06-24');
INSERT INTO `salaries` VALUES (10001,36913,'1995-06-24','1996-06-23');
mysqldump: Got errno 32 on write

It seems the masking is working pretty well!!
We can now save the new query rule:

Admin> SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Next, we want to mask the data in employees table.
Let's first have a look at how mysqldump will dump the employees table:

$ mysqldump --user=sbtest --password=sbtest -h 127.0.0.1 -P6033 employees employees --skip-extended-insert | less

In the output you will see something like:

DROP TABLE IF EXISTS `employees`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `employees`
--

LOCK TABLES `employees` WRITE;
/*!40000 ALTER TABLE `employees` DISABLE KEYS */;
INSERT INTO `employees` VALUES (10001,'1953-09-02','Georgi','Facello','M','1986-06-26');
INSERT INTO `employees` VALUES (10002,'1964-06-02','Bezalel','Simmel','F','1985-11-21');
INSERT INTO `employees` VALUES (10003,'1959-12-03','Parto','Bamford','M','1986-08-28');
INSERT INTO `employees` VALUES (10004,'1954-05-01','Chirstian','Koblick','M','1986-12-01');
INSERT INTO `employees` VALUES (10005,'1955-01-21','Kyoichi','Maliniak','M','1989-09-12');
INSERT INTO `employees` VALUES (10006,'1953-04-20','Anneke','Preusig','F','1989-06-02');
INSERT INTO `employees` VALUES (10007,'1957-05-23','Tzvetan','Zielinski','F','1989-02-10');
INSERT INTO `employees` VALUES (10008,'1958-02-19','Saniya','Kalloufi','M','1994-09-15');
INSERT INTO `employees` VALUES (10009,'1952-04-19','Sumant','Peac','F','1985-02-18');
INSERT INTO `employees` VALUES (10010,'1963-06-01','Duangkaew','Piveteau','F','1989-08-24');
INSERT INTO `employees` VALUES (10011,'1953-11-07','Mary','Sluis','F','1990-01-22');
INSERT INTO `employees` VALUES (10012,'1960-10-04','Patricio','Bridgland','M','1992-12-18');
INSERT INTO `employees` VALUES (10013,'1963-06-07','Eberhardt','Terkki','M','1985-10-20');
INSERT INTO `employees` VALUES (10014,'1956-02-12','Berni','Genin','M','1987-03-11');
INSERT INTO `employees` VALUES (10015,'1959-08-19','Guoxiang','Nooteboom','M','1987-07-02');
INSERT INTO `employees` VALUES (10016,'1961-05-02','Kazuhito','Cappelletti','M','1995-01-27');
INSERT INTO `employees` VALUES (10017,'1958-07-06','Cristinel','Bouloucos','F','1993-08-03');
INSERT INTO `employees` VALUES (10018,'1954-06-19','Kazuhide','Peha','F','1987-04-03');
INSERT INTO `employees` VALUES (10019,'1953-01-23','Lillian','Haddadi','M','1999-04-30');
INSERT INTO `employees` VALUES (10020,'1952-12-24','Mayuko','Warwick','M','1991-01-26');

We can now create a rule in ProxySQL to mask the salaries.

Admin> INSERT INTO mysql_query_rules (rule_id,active,schemaname,match_pattern,replace_pattern) VALUES (2,1,'employees','\* FROM `employees`',"emp_no, CONCAT(LEFT(birth_date,2),FLOOR(RAND()*50)+10,RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2),REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3),REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees");
Query OK, 1 row affected (0.00 sec)

Admin> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

We can now re-execute mysqldump to dump the employees table:

$ mysqldump --user=sbtest --password=sbtest -h 127.0.0.1 -P6033 employees employees --skip-extended-insert --set-gtid-purged=OFF | egrep '^INSERT' | head -n 20
Warning: Using a password on the command line interface can be insecure.
INSERT INTO `employees` VALUES (10001,'1958-09-02','Gexxxx','Facxxxx','M','1986-06-26');
INSERT INTO `employees` VALUES (10002,'1912-06-02','Bexxxxx','Simxxx','F','1985-11-21');
INSERT INTO `employees` VALUES (10003,'1927-12-03','Paxxx','Bamxxxx','M','1986-08-28');
INSERT INTO `employees` VALUES (10004,'1941-05-01','Chxxxxxxx','Kobxxxx','M','1986-12-01');
INSERT INTO `employees` VALUES (10005,'1915-01-21','Kyxxxxx','Malxxxxx','M','1989-09-12');
INSERT INTO `employees` VALUES (10006,'1942-04-20','Anxxxx','Prexxxx','F','1989-06-02');
INSERT INTO `employees` VALUES (10007,'1956-05-23','Tzxxxxx','Ziexxxxxx','F','1989-02-10');
INSERT INTO `employees` VALUES (10008,'1942-02-19','Saxxxx','Kalxxxxx','M','1994-09-15');
INSERT INTO `employees` VALUES (10009,'1935-04-19','Suxxxx','Peax','F','1985-02-18');
INSERT INTO `employees` VALUES (10010,'1939-06-01','Duxxxxxxx','Pivxxxxx','F','1989-08-24');
INSERT INTO `employees` VALUES (10011,'1931-11-07','Maxx','Sluxx','F','1990-01-22');
INSERT INTO `employees` VALUES (10012,'1929-10-04','Paxxxxxx','Brixxxxxx','M','1992-12-18');
INSERT INTO `employees` VALUES (10013,'1944-06-07','Ebxxxxxxx','Terxxx','M','1985-10-20');
INSERT INTO `employees` VALUES (10014,'1920-02-12','Bexxx','Genxx','M','1987-03-11');
INSERT INTO `employees` VALUES (10015,'1910-08-19','Guxxxxxx','Nooxxxxxx','M','1987-07-02');
INSERT INTO `employees` VALUES (10016,'1930-05-02','Kaxxxxxx','Capxxxxxxxx','M','1995-01-27');
INSERT INTO `employees` VALUES (10017,'1912-07-06','Crxxxxxxx','Bouxxxxxx','F','1993-08-03');
INSERT INTO `employees` VALUES (10018,'1910-06-19','Kaxxxxxx','Pehx','F','1987-04-03');
INSERT INTO `employees` VALUES (10019,'1954-01-23','Lixxxxx','Hadxxxx','M','1999-04-30');
INSERT INTO `employees` VALUES (10020,'1933-12-24','Maxxxx','Warxxxx','M','1991-01-26');

It seems that also in this case masking and obfuscating is working pretty well.
We can now save the new query rule:

Admin> SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Conclusion

ProxySQL can be easily used to mask and obfuscate data from dump generated by mysqldump: this is very useful for compliance requirements, as sensitive data will not leave your database server.
Also in this case, ProxySQL creates a layer of abstraction to protect the data in your database.
For this and more examples on how to use ProxySQL for datamasking, do not miss our session about Inexpensive Datamasking for MySQL with ProxySQL on April 27, 2017 at Percona Live 2017