MCP Autodiscovery
ProxySQL v4.0 features a deterministic “Two-Phase Discovery” mechanism that allows LLMs to understand the database structure with high accuracy.
How it works
The discovery process is typically triggered by an AI agent using the discovery.run_static tool. This process iterates through the backend MySQL/PostgreSQL servers and populates the MCP Catalog.
Phase 1: Static Harvesting
ProxySQL connects to the configured backend servers and extracts:
- Schema names.
- Table and View definitions.
- Column metadata (types, nullability, defaults).
- Index definitions.
- Foreign key relationships.
Phase 2: Metadata Enrichment
Once the basic structure is harvested, ProxySQL (optionally with LLM assistance) can:
- Inferred relationships between tables.
- Generate summaries for complex views or tables.
- Track “Agent Runs” to maintain context between different LLM sessions.
Deterministic Run IDs
Every discovery execution is assigned a unique run_id. This allows ProxySQL to maintain a history of the schema state. AI agents can bind themselves to a specific run_id to ensure consistency even if the schema changes while they are processing a request.
Storage (The MCP Catalog)
The harvested metadata is stored in a local SQLite database named mcp_catalog.db, located in ProxySQL’s data directory. This catalog powers the search and retrieval tools used by the MCP server.
For a detailed look at the catalog structure, see the MCP Catalog documentation.
Configuration
Discovery uses the backend servers that are already configured in ProxySQL’s standard server tables. No additional server configuration is required.
- MySQL backends: Discovery reads from hostgroups defined in
mysql_servers. The user specified inmysql_usersmust haveSELECTprivileges onINFORMATION_SCHEMA. - PostgreSQL backends: Discovery reads from hostgroups defined in
pgsql_servers. The configured user must have access toinformation_schemaorpg_catalog.
ProxySQL connects to each backend server in the target hostgroup and queries INFORMATION_SCHEMA (or the PostgreSQL equivalent) to harvest the schema. Only servers in ONLINE status are included in a discovery pass.
If you have many schemas or tables, consider limiting discovery scope using the schema_filter option (if available in your ProxySQL build) to avoid harvesting schemas that are not relevant to your AI workloads.
Performance Considerations
Discovery queries INFORMATION_SCHEMA on every reachable backend server in the configured hostgroup. This can take significant time on large schemas:
- Databases with thousands of tables may take tens of seconds to fully harvest.
- Discovery is a one-off (or infrequent) operation and does not affect query routing performance once complete.
- During discovery, ProxySQL opens short-lived connections to each backend; these are separate from the normal connection pool and do not affect the pool’s availability.
- It is recommended to run discovery during low-traffic periods if the backend has a very large number of objects, since
INFORMATION_SCHEMAqueries can themselves be resource-intensive on MySQL.
Example: Triggering Discovery and Querying the Catalog
The following illustrates a typical workflow using the MCP tools via an AI agent or MCP client:
Step 1 — Trigger discovery:
{ "tool": "discovery.run_static" }
Response:
{
"run_id": "disc_20260403_001",
"schemas_harvested": 3,
"tables_harvested": 142,
"columns_harvested": 1087
}
Step 2 — Search the catalog for a relevant table:
{ "tool": "catalog.search", "query": "customer orders" }
Response:
[
{ "schema": "shop", "name": "orders", "type": "table", "snippet": "…order_id, customer_id, total…" },
{ "schema": "shop", "name": "customers", "type": "table", "snippet": "…customer_id, name, email…" }
]
Step 3 — Inspect the table definition:
{ "tool": "catalog.get_object", "schema": "shop", "name": "orders" }
Step 4 — Run a read-only query:
{ "tool": "run_sql_readonly", "sql": "SELECT COUNT(*) FROM shop.orders WHERE status = 'pending'", "schema": "shop" }
The catalog persists between sessions; you do not need to re-run discovery unless the schema has changed.