Skip to main content
友田 陽大
PostgreSQL operations & reliability
PostgreSQL
アーキテクチャ設計

PostgreSQL streaming replication and high availability (HA, sync/async, read replicas, failover, v18)

A practical guide to achieving PostgreSQL high availability (HA) with streaming replication. Faithful to the official docs, it explains building physical replication, the durability-vs-latency trade-off of sync/async (synchronous_commit), read replicas and replication conflicts on a hot standby, the fact that the PostgreSQL core doesn't provide automatic failover and STONITH, WAL retention via replication slots, and PostgreSQL 18 improvements.

Published
Reading time
8 min read
Author
友田 陽大
Share

Servers always break. Disks die, AZs go down, kernels panic. Betting everything on a single PostgreSQL is the same as signing a contract to "pay in downtime" for the failure that will surely come.

The basis of high availability (HA) is replication — always have a nearly-latest replica on another server and switch over when it counts. This article explains how to build HA with PostgreSQL physical streaming replication and the reality of the most-misunderstood "automatic failover," faithful to the official docs. It's a deep dive of production-operations guide §3.

Rules for this article: replication specs, parameter defaults, and PostgreSQL 18 changes are based on the PostgreSQL 18 official documentation (as of June 2026). Failover-automation tools (Patroni, etc.) are within the scope the official docs collectively call "external tools," with specific names noted as community knowledge.


1. HA options: physical vs logical vs shared disk

PostgreSQL HA has multiple approaches (the official comparison).

MethodMechanismSuited forConstraint
Physical streamingWAL to the standby byte-by-byteWhole-cluster HA/DR, read replicasSame major version, whole-replication only
Logical replicationRow changes via publish/subscribeTable-level, cross-version, bidirectionalNeeds conflict resolution, DDL not replicated (separate article)
Shared diskShare a single storageThe storage itself is a SPOF, no replication

This article's main subject is physical streaming replication. The official explanation:

Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they're generated, without waiting for the WAL file to be filled.

It's "nearly real-time, byte-level replication." Being physical, the standby is block-for-block identical to the primary — the same major version is the premise.


2. Building physical streaming replication

The flow of the minimal setup.

2.1 Primary-side configuration

# primary postgresql.conf
wal_level = replica            # replica 以上(既定)
max_wal_senders = 10           # 同時WAL送信プロセス数(既定 10)
max_replication_slots = 10     # レプリケーションスロット数(既定 10)
-- 専用のレプリケーションロール(公式推奨:REPLICATION + LOGIN の最小権限)
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD '***';
-- スロットを作っておくと、スタンバイが遅れてもWALを保持してくれる(§5)
SELECT pg_create_physical_replication_slot('standby1');
# primary pg_hba.conf(replication は専用キーワード。TLS強制+SCRAM)
hostssl replication replicator 10.0.0.0/24 scram-sha-256

2.2 Building the standby

# プライマリからベースバックアップを取得し、復旧設定まで書き出す(-R)
pg_basebackup -h primary.internal -U replicator -D /var/lib/postgresql/18/main \
  -Ft -z -X stream -c fast -R -S standby1
#   -R: standby.signal と primary_conninfo を自動生成 / -S: 上で作ったスロットを使う

The postgresql.auto.conf that -R generates is written like this.

primary_conninfo = 'host=primary.internal user=replicator passfile=... sslmode=verify-full'
primary_slot_name = 'standby1'

And because there's a standby.signal file, on startup it comes up in standby mode and keeps receiving the primary's WAL. Thanks to hot_standby (default on), you can run read queries on the standby (§4).

pg_ctl -D /var/lib/postgresql/18/main start

3. Sync vs async: a durability-vs-latency trade-off

"Whether to wait for the standby's replication before returning a commit" is sync/async. The default is async — officially, "if no synchronous standby name is specified in synchronous_standby_names, synchronous replication is disabled and commits don't wait for replication. This is the default."

Choose the durability level with the synchronous_commit value (default on). The lower you go, the stronger and slower.

ValueWhat the commit waits forWhat's protected
offWaits for nothing (not even local flush)Fastest. Can lose the latest few on a crash (no corruption)
localOnly local disk flushSingle-node durability
remote_writeThe standby receives and OS-writes the WALSurvives a standby PostgreSQL crash (an OS crash is separate)
on (default)The standby disk-flushes the WALNot lost unless the primary + all sync standbys are corrupted simultaneously
remote_applyThe standby applies (makes visible) the WALImmediately reflected in the standby's queries. Slowest
# 「1件も失えない」要件:同期レプリケーション
synchronous_standby_names = 'standby1'   # このスタンバイの確認を待つ
synchronous_commit = on                   # 受信+flush を待つ

Design decision: if "commit = absolutely never lost" is a requirement, like finance/payments, sync (on or higher). General web apps are often fine with async (latency-first, accepting a slight data-loss risk). remote_apply is only when you want immediate reflection on the read replica — the commit delay grows large.


4. Read replicas: offload reads (and its price)

A hot standby can take read-only queries. Official: "when hot_standby is true, it accepts connections once recovery reaches a consistent state. All connections are strictly read-only (not even writing to a temp table)." The standard configuration of offloading heavy aggregations/reports to the replica and letting the primary concentrate on writes.

But there's the inherent problem of replication conflicts. Official: "the most common cause of conflict between standby queries and WAL replay is 'early cleanup.' … The primary's cleanup (VACUUM) may delete row versions that a standby's transaction can still see."

The standby is forced to either "delay WAL replay" or "cancel the conflicting query." Adjust with two knobs.

# ノブ1:競合時、どれだけWAL適用を遅らせるか(超えたらクエリをキャンセル)
max_standby_streaming_delay = 30s

# ノブ2:スタンバイの実行中クエリをプライマリにフィードバックし、VACUUMの早期クリーンを抑止
hot_standby_feedback = on    # 既定 off

The official warning: hot_standby_feedback = on can eliminate conflicts but "delays cleanup of dead rows on the primary, potentially causing table bloat." It's a trade-off — hot_standby_feedback if you run long analytics queries on the replica, max_standby_streaming_delay if you want to avoid primary bloat — design the acceptable delay.


5. The WAL-retention safety device: slots vs wal_keep_size

If the primary deletes WAL the standby needs while the standby is behind/down, replication breaks. Two mechanisms prevent this.

  • wal_keep_size (default 0): the minimum size to retain in pg_wal. A standby that falls behind beyond this loses the needed WAL and disconnects. Just a fixed buffer.
  • Replication slot: reliably retains WAL until the standby consumes it. Safer.

The slot is safer but has a fatal side effect. Official (max_slot_wal_keep_size, default -1): "if -1, a replication slot may retain unlimited WAL." That is, leaving a standby down causes the primary's pg_wal to swell unboundedly and eat the disk.

# スロットは使うが、保持上限を設けて pg_wal の暴走を防ぐ
max_slot_wal_keep_size = 64GB   # これを超えたら、遅れたスロットは無効化(複製は切れるがDBは守る)

The iron rule of operations: a slot is "convenient but dangerous." Always set a cap with max_slot_wal_keep_size and monitor the lag in pg_replication_slots. PostgreSQL 18 added idle_replication_slot_timeout, letting you auto-invalidate slots idle for a long time.


6. Failover: PostgreSQL doesn't do it automatically

This is the most misunderstood point. The operation to "promote" a standby itself is easy.

-- スタンバイをプライマリに昇格(手動フェイルオーバー)
SELECT pg_promote();
-- または: pg_ctl promote -D <datadir>

But who decides "when to promote"? The official docs state clearly.

PostgreSQL does not provide the system software required to detect a primary failure and notify the standby.

That is, automatic failover isn't in the core. Plus the danger of split-brain (both think they're the primary):

A mechanism is needed to tell the old primary "you're no longer the primary." This is known as STONITH (Shoot The Other Node In The Head), and is needed to avoid a situation where both systems think they're the primary, leading to data loss.

The official docs only state that "many such tools exist and are well-integrated with OS features (like IP-address migration)." In practice, you build it with the following external tools (community standard):

  • Patroni (integrates with etcd/Consul, the de facto)
  • repmgr, pg_auto_failover

This is managed's biggest value (operations guide §8). RDS/Aurora Multi-AZ does this "failure detection + automatic promotion + endpoint switching + STONITH-equivalent" for you. For self-operation, the real cost is building and operating this automatic-failover foundation, more than building replication.


7. Monitoring: watch the lag and the slots

-- プライマリ側:各スタンバイの遅延(write/flush/replay lag)
SELECT application_name, state, sync_state,
       write_lag, flush_lag, replay_lag
FROM pg_stat_replication;

-- スロットの滞留(restart_lsn が進まない=WALが溜まる)
SELECT slot_name, active, wal_status,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;

replay_lag continuously increasing = the standby can't keep up (insufficient I/O or replication conflicts). retained_wal keeps increasing = the slot is stuck and pg_wal is swelling — always set a cap and alerts.


8. Summary

  • Build HA with physical streaming replication: wal_level=replica, a dedicated role, pg_basebackup -R, a slot.
  • Sync/async is a durability-vs-latency trade-off. The default is async. Choose synchronous_commit by requirement.
  • Read replicas can offload reads, but watch for replication conflicts (max_standby_streaming_delay / hot_standby_feedback).
  • The core has no automatic failover (officially stated). Do detection, promotion, and STONITH with an external tool like Patroni, or managed.
  • Slots are safe but can run away — set a cap with max_slot_wal_keep_size and monitor the lag.

Once "don't stop on failure" is settled, next is "don't stop on change" — non-stop schema changes, and cross-version logical replication & zero-downtime upgrade.


References (PostgreSQL 18 official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading