MCP Autodiscovery

ProxySQL v4.0 features a deterministic “Two-Phase Discovery” mechanism that allows LLMs to understand the database structure with high accuracy.

How it works

The discovery process is typically triggered by an AI agent using the discovery.run_static tool. This process iterates through the backend MySQL/PostgreSQL servers and populates the MCP Catalog.

Phase 1: Static Harvesting

ProxySQL connects to the configured backend servers and extracts:

Schema names.
Table and View definitions.
Column metadata (types, nullability, defaults).
Index definitions.
Foreign key relationships.

Phase 2: Metadata Enrichment

Once the basic structure is harvested, ProxySQL (optionally with LLM assistance) can:

Inferred relationships between tables.
Generate summaries for complex views or tables.
Track “Agent Runs” to maintain context between different LLM sessions.

Deterministic Run IDs

Every discovery execution is assigned a unique run_id. This allows ProxySQL to maintain a history of the schema state. AI agents can bind themselves to a specific run_id to ensure consistency even if the schema changes while they are processing a request.

Storage (The MCP Catalog)

The harvested metadata is stored in a local SQLite database named mcp_catalog.db, located in ProxySQL’s data directory. This catalog powers the search and retrieval tools used by the MCP server.

For a detailed look at the catalog structure, see the MCP Catalog documentation.

Configuration

Discovery uses the backend servers that are already configured in ProxySQL’s standard server tables. No additional server configuration is required.

MySQL backends: Discovery reads from hostgroups defined in mysql_servers. The user specified in mysql_users must have SELECT privileges on INFORMATION_SCHEMA.
PostgreSQL backends: Discovery reads from hostgroups defined in pgsql_servers. The configured user must have access to information_schema or pg_catalog.

ProxySQL connects to each backend server in the target hostgroup and queries INFORMATION_SCHEMA (or the PostgreSQL equivalent) to harvest the schema. Only servers in ONLINE status are included in a discovery pass.

💡Tip

If you have many schemas or tables, consider limiting discovery scope using the schema_filter option (if available in your ProxySQL build) to avoid harvesting schemas that are not relevant to your AI workloads.

Performance Considerations

Discovery queries INFORMATION_SCHEMA on every reachable backend server in the configured hostgroup. This can take significant time on large schemas:

Databases with thousands of tables may take tens of seconds to fully harvest.
Discovery is a one-off (or infrequent) operation and does not affect query routing performance once complete.
During discovery, ProxySQL opens short-lived connections to each backend; these are separate from the normal connection pool and do not affect the pool’s availability.
It is recommended to run discovery during low-traffic periods if the backend has a very large number of objects, since INFORMATION_SCHEMA queries can themselves be resource-intensive on MySQL.

Example: Triggering Discovery and Querying the Catalog

The following illustrates a typical workflow using the MCP tools via an AI agent or MCP client:

Step 1 — Trigger discovery:

{ "tool": "discovery.run_static" }

Response:

{
  "run_id": "disc_20260403_001",
  "schemas_harvested": 3,
  "tables_harvested": 142,
  "columns_harvested": 1087
}

Step 2 — Search the catalog for a relevant table:

{ "tool": "catalog.search", "query": "customer orders" }

Response:

[
  { "schema": "shop", "name": "orders",    "type": "table", "snippet": "…order_id, customer_id, total…" },
  { "schema": "shop", "name": "customers", "type": "table", "snippet": "…customer_id, name, email…" }
]

Step 3 — Inspect the table definition:

{ "tool": "catalog.get_object", "schema": "shop", "name": "orders" }

Step 4 — Run a read-only query:

{ "tool": "run_sql_readonly", "sql": "SELECT COUNT(*) FROM shop.orders WHERE status = 'pending'", "schema": "shop" }

The catalog persists between sessions; you do not need to re-run discovery unless the schema has changed.