Orchestrator for PostgreSQL: the HA brain, now first-party

Part of the broader ProxySQL for PostgreSQL — the failover model series.

Most PostgreSQL operators we meet have opinions about Patroni. Fewer have opinions about Orchestrator. That’s a historical artefact — Orchestrator was born in the MySQL world at Outbrain, then refined at Booking.com , then became mainstream at Github, and only picked up PostgreSQL support after the ProxySQL team forked it. If you run PostgreSQL and haven’t looked at Orchestrator, there are three reasons to look now: the PG support is real and under CI, it comes from the same people who ship the proxy you already talk to, and — unlike Patroni — it does not require you to deploy and operate an external distributed consensus store (etcd / Consul / ZooKeeper) alongside it.

This post is a short tour. We’ll cover what Orchestrator is, how its mental model differs from Patroni’s, what the minimum working configuration for PostgreSQL looks like, and what Orchestrator deliberately does not do so you know what pieces you still need to bring. A companion post measures the actual unplanned-primary-failure window with Orchestrator and ProxySQL back-to-back.

The fork chain

Orchestrator started in 2014 as a MySQL topology manager, authored by Shlomi Noach. When Shlomi left GitHub, Percona took over the public fork and used Orchestrator as the control plane inside their operator. That fork is stable but not maintained outside the scope of the operator, and it is MySQL-only: pg_is_in_recovery, pg_promote, and the rest of the PostgreSQL vocabulary were never part of it.

The ProxySQL team forked Percona’s Orchestrator and added:

a PostgreSQL provider (ProviderType: "postgresql") that speaks libpq, reads pg_stat_replication for topology discovery, calls pg_promote() to elect new primaries, and reconfigures primary_conninfo on surviving replicas after a failover;
matching CI: every push and pull request runs test-postgresql.sh against PostgreSQL 15, 16, and 17 via GitHub Actions. The functional-postgresql job exercises topology discovery, graceful switchover (happy path and negative cases), and kill-current-primary failover on every commit;
release packaging matched to ProxySQL’s own release cadence, so a deployment can pin both sides to versions that were tested together.

The code is Apache-2.0 licensed; the fork chain is intact; everything upstream that Percona shipped for MySQL still works on the MySQL provider. The PG provider is additive, not a rewrite.

Mental model

Orchestrator is a topology manager. It is not a proxy, not a fencing tool, and not a consensus system. It does three things:

Discovery. You hand Orchestrator one instance through the API and it walks the replication graph — with one asymmetry worth knowing about on the PG side. Walking upward from a standby works cleanly: the standby’s primary_conninfo carries the primary’s full host:port, so Orchestrator finds the primary by parsing that string. Walking downward from a primary to its replicas reads pg_stat_replication, which only gives you each replica’s client_addr and an ephemeral source client_port — not the replica’s actual listening port. The PG provider falls back to DefaultInstancePort (see the config below) to fill that in, which works fine when every replica does listen on that port (typical containerised or one-per-host deployments), and doesn’t when they don’t. If your PG instances live on mixed ports, seed each of them explicitly via /api/discover. Each instance’s state (primary vs in recovery, read-only or writable, LSN, replica of whom) is recorded in Orchestrator’s own backing database.
Continuous probing. Every InstancePollSeconds, Orchestrator polls every known instance and updates the record. If an instance stops answering, it is flagged LastCheckValid=false; if it stays flagged across consecutive probes, Orchestrator emits a replication analysis — DeadPrimary, UnreachablePrimary, DeadIntermediateMaster, etc.
Recovery. For every analysis that matches a Recover…ClusterFilters regex, Orchestrator promotes the most-caught-up standby, reconfigures the rest to stream from the new primary, and fires a user-supplied shell-command hook. The hook is where you plug in anything ProxySQL-side (or anything else outside the database layer).

Two things about that model differ from Patroni in ways worth paying attention to.

First, fewer moving parts. Patroni treats each cluster as a set of nodes agreeing on a distributed consensus store (DCS) — in production, usually a 3+ node etcd, Consul, or ZooKeeper cluster. Every Patroni process races to acquire a short-lived leader lease on the DCS, and the DCS’s own consensus algorithm breaks ties. That is a correct design, but it means “deploy Patroni” is really “deploy Patroni and deploy and operate a DCS cluster”: two distributed systems to monitor, upgrade, back up, secure, and page on. A typical production footprint is 3 Patroni nodes + 3 etcd nodes = six processes and two bodies of expertise.

Orchestrator does not need a DCS at all. For its own high availability it uses the Raft consensus algorithm between Orchestrator nodes themselves — three Orchestrator processes talking to each other, no third-party cluster, no extra port to secure, no extra dashboard to watch. For small deployments you can go even lower: a single Orchestrator node with a SQLite backing DB is a legitimate production choice when the HA tool itself is not load-bearing for your tier. The lab in the companion post runs exactly that — one Orchestrator, one SQLite file, three PostgreSQL instances.

The practical consequence is operational: fewer clusters to run and one less piece of software to upgrade in lock-step. If you’re the person who stays up at 02

when the pager goes off, “the HA tool itself has no DCS to be wedged by” is a real feature.

Second, API-first and topology-graph-first. Patroni’s primary abstraction is “which node in this cluster holds the lease.” The Orchestrator abstraction is a directed graph of replication relationships across however many clusters you’re managing, with a REST API over the full graph. That lends itself to good visualisation (Orchestrator ships a web UI with a live topology graph), good ad-hoc operations (you can relocate a replica to stream from a different intermediate primary with one curl), and good scriptability — the entire control surface is REST.

A minimum working config

Orchestrator is configured via a single JSON file. For a PostgreSQL deployment with a SQLite-backed control database and automatic recovery enabled, the minimum configuration is about a dozen keys:

{
  "ListenAddress": ":3098",
  "ProviderType": "postgresql",
  "PostgreSQLTopologyUser": "orchestrator",
  "PostgreSQLTopologyPassword": "orch_pass",
  "PostgreSQLSSLMode": "disable",
  "DefaultInstancePort": 5432,
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/var/lib/orchestrator/orchestrator.sqlite3",
  "InstancePollSeconds": 1,
  "FailureDetectionPeriodBlockSeconds": 5,
  "RecoveryPeriodBlockSeconds": 10,
  "RecoverMasterClusterFilters": [".*"],
  "PostMasterFailoverProcesses": [
    "/usr/local/bin/proxysql-rewire-hook.sh"
  ]
}

Keys worth understanding:

ProviderType — the switch that activates the PostgreSQL code path. Without this, Orchestrator will try to speak the MySQL protocol to your instances.
PostgreSQLTopologyUser / PostgreSQLTopologyPassword — a PG role with the pg_monitor role and SUPERUSER (needed for pg_promote() and to reload primary_conninfo on surviving replicas). The CI fixture creates it as orchestrator:orch_pass.
DefaultInstancePort — set this to 5432. Orchestrator’s history is MySQL; the default in the code is still 3306. If you leave the default and run PostgreSQL instances on non-standard listening ports, the PG provider will fall back to 3306 when it can’t determine a replica’s listening port from pg_stat_replication. The result is noisy DiscoverInstance(...:3306) instance is nil lines in the log.
InstancePollSeconds — polling cadence. Your failover window’s lower bound is roughly one of these intervals plus a probe round-trip. We use 1 s in the lab. The original upstream default is 5 s, which is reasonable for production at scale.
FailureDetectionPeriodBlockSeconds — debounce window for emitting the same analysis twice. Lower means faster failover at the cost of more false positives on transient network blips.
RecoverMasterClusterFilters: [".*"] — enables automatic recovery for every discovered cluster. Regex-match against ClusterName in production to scope auto-recovery to specific tiers.
PostMasterFailoverProcesses — an array of shell commands fired after Orchestrator has successfully promoted a replica. Each entry runs with environment variables populated (ORC_FAILURE_TYPE, ORC_FAILED_HOST, ORC_FAILED_PORT, ORC_SUCCESSOR_HOST, ORC_SUCCESSOR_PORT). This is where ProxySQL gets its admin rewrite; more on that in the companion post.

BackendDB: "sqlite" is fine for single-node Orchestrator deployments, including small production deployments where the HA tool itself is not load-bearing. For larger clusters, set up a MySQL backend with its own replication and run Orchestrator in Raft mode across three nodes — Orchestrator’s HA story for itself.

Discovering the topology

Once Orchestrator is up, you seed it through the API. In the common case — every instance listens on DefaultInstancePort (5432) — seeding the primary is enough:

curl -sf http://orch:3098/api/discover/db-primary.example.com/5432

Orchestrator reads pg_stat_replication on the primary, picks up each replica’s client_addr, assumes DefaultInstancePort for the listening port, and crawls to them. A second or two later you can ask what it knows:

curl -s http://orch:3098/api/clusters
# ["db-primary.example.com:5432"]

curl -s http://orch:3098/api/all-instances | jq -r '
  .[] | "\(.Key.Hostname):\(.Key.Port) RO=\(.ReadOnly) Master=\(.MasterKey.Hostname):\(.MasterKey.Port)"'
# db-primary.example.com:5432 RO=false Master=:0
# db-standby1.example.com:5432 RO=true  Master=db-primary.example.com:5432
# db-standby2.example.com:5432 RO=true  Master=db-primary.example.com:5432

If your instances live on mixed ports on the same host (as in the lab below, where the three PostgreSQL processes listen on 5433/5434/5435), seed each one explicitly — one /api/discover/host/port call per instance. The walk still works going up from a replica (the replica’s primary_conninfo carries the primary’s full host

), so seeding any one replica gets you the primary for free; it’s the primary-to-other-replicas hop that wants DefaultInstancePort to be right.

The cluster name defaults to the primary’s host:port; you can override it with a custom ClusterNameToInstancesMap or ClusterAliasOverride if you want operator-friendly names. The web UI at http://orch:3098/web displays the same graph visually, with instance state colour-coded and failover buttons wired to the same API endpoints.

Everything is a REST call. POST /api/graceful-master-takeover to run a planned switchover. POST /api/move-below to relocate a replica. POST /api/begin-downtime to mark an instance down for maintenance and suppress recovery. The full API is documented in the repository’s docs/ tree.

What Orchestrator deliberately doesn’t do

Orchestrator is a detect-and-decide layer. It is not:

A proxy. Client applications do not talk to Orchestrator. When it promotes a new primary, the application has no way to know unless something in the data path changes its routing. That’s where ProxySQL fits.
A fencing tool. Orchestrator can tell you the old primary is unreachable, and it can refuse to promote until every replica has caught up past a configured LSN. It cannot guarantee the old primary will stay dead. For workloads where split-brain cannot be tolerated, you layer fencing (STONITH, watchdog, cloud-provider force-detach) on top.
A quorum system for storage. Orchestrator’s own Raft layer is for Orchestrator’s HA, not for the PostgreSQL data plane. Your replication is still plain streaming replication (synchronous or asynchronous, your choice).

This minimalism is a feature. Orchestrator handles the parts that are hard to get right — discovery, decision-making across many clusters, consistent audit trails — and stays out of everything else.

The CI story

Before trusting a new tool with the primary of a production database, most operators want to know how the tool is tested. The short answer for Orchestrator’s PG port: the same GitHub Actions workflow that runs on every commit tests discovery, graceful switchover, and unplanned failover against real PostgreSQL containers, across three major PG versions.

The workflow lives at .github/workflows/functional.yml and the PG matrix job is functional-postgresql. Its driver script, tests/functional/test-postgresql.sh, runs five test groups:

Topology discovery (primary + standby)
API tests
Graceful primary switchover (happy path)
Graceful switchover negative cases (refusing to promote a replica with too much lag, etc.) and a round-trip switch-back
Failover test: kill current primary — stops the primary container mid-workload, waits for Orchestrator to detect DeadPrimary, promote the standby, and record a successful recovery in /api/v2/recoveries

Every one of those runs on PG 15, PG 16, and PG 17 for every push and every pull request. When the PG provider ships a config change, you see it fail on the tier where it breaks.

Try it

The easiest way to explore is the lab in the companion post. We will soon publish another blog post where we sets up one primary and two replicas locally, wires Orchestrator and ProxySQL in, and gives you a reproducible kill-primary benchmark on an empty afternoon.

If you’ve been running Patroni and are curious what an API-first-topology-graph HA tool feels like — grab the lab, seed discovery with one curl, and see your cluster light up.

Github repository: Orchestrator