Skip to main content
友田 陽大
PostgreSQL internals & performance
PostgreSQL
パフォーマンス
アーキテクチャ設計

PostgreSQL production performance-tuning overview (v18 support): speed it up in the correct order of measure → index → execution plan → memory → VACUUM

A systematic guide to making PostgreSQL fast in production. Faithful to the official documentation (v18), with real code it explains measurement starting with pg_stat_statements, how to read the execution plan with EXPLAIN, the meaning of memory settings like shared_buffers/work_mem, MVCC/VACUUM and indexes, connection pooling, and PostgreSQL 18's asynchronous I/O and B-tree skip scan.

Published
Last updated
Updated
Reading time
14 min read
Author
友田 陽大
Share

90% of "PostgreSQL is slow" consultations are caused by not measuring in the correct order. Suddenly enlarging shared_buffers, or adding indexes by guesswork — in many cases that not only doesn't work but slows writes, confuses the planner, and eats up memory.

This article is the overall map for making PostgreSQL fast in production. Centered on the methodology of "where to start, what to measure, and in what order to optimize," it re-orders indexes, execution plans, memory, VACUUM, connection pooling, and partitioning in the correct priority. The deep dive into each theme is left to individual articles; this article concentrates on "the design judgments that run through the whole."

Rules for this article: setting defaults, meanings, SQL syntax, and PostgreSQL 18's new features are all based on the PostgreSQL 18 official documentation (as of June 2026). A managed service's (RDS/Aurora, Cloud SQL, Supabase, etc.) defaults can differ from upstream due to the engine minor or service-side overrides. Before applying to production, always confirm with SHOW <param>; and measurement on your instance. The figures can change by workload.


1. The big principle: don't speculate, measure

The iron rule of performance tuning is "Don't speculate, measure." PostgreSQL's planner runs cost-based, and its judgment is derived from statistics and settings. Human intuition is usually wrong.

Before starting optimization, always grasp these 2 points.

  1. Which query occupies how much of the total load (→ pg_stat_statements)
  2. Why is that slow query slow (→ EXPLAIN (ANALYZE, BUFFERS))

And touch them in the following order of effectiveness. The higher, the bigger the effect and the lower the cost.

PriorityLayerMain weaponDeep-dive article
1Query/index designproper indexes, N+1 removal, SARGable predicatesindex design
2Understanding the execution planEXPLAIN ANALYZE, statistics freshnessEXPLAIN and slow-query improvement
3Memory settingswork_mem / shared_buffers / effective_cache_sizethis article §4
4MVCC/VACUUMautovacuum, bloat measures, HOT updatesMVCC and VACUUM
5Connection managementconnection pooling (PgBouncer, etc.)this article §6
6Physical designdeclarative partitioningpartitioning

Optimizations you must not do: ① touching settings before measuring, ② adding indexes without looking at EXPLAIN ANALYZE, ③ routinely using SET enable_seqscan = off in production (symptomatic treatment that breaks the planner's judgment), ④ indexing every column (writes slow down and it bloats).


2. The starting point of measurement: pg_stat_statements

pg_stat_statements is an extension that normalizes (replaces literals with $1) and aggregates executed queries. It bundles "SELECT * FROM orders WHERE id = 42" and "= 99" as the same query and records call count, total execution time, mean time, and rows returned. Get the production "heavy-query Top N" from here.

Register it in shared_preload_libraries (a server restart is needed) and enable it on the target DB.

-- postgresql.conf: shared_preload_libraries = 'pg_stat_statements'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

The iron rule of cost-effectiveness is to crush from "long cumulative execution time = big impact on the whole" queries. A query of 5ms once but called a million times is heavier system-wide than a query of 0.5s once.

-- 全体負荷への寄与が大きい順(時間は ms)
SELECT
  substring(query, 1, 80)              AS query,
  calls,
  round(total_exec_time::numeric, 1)   AS total_ms,
  round(mean_exec_time::numeric, 2)    AS mean_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

This total_exec_time DESC list is your optimization To-Do list. Improving just the top 5 dramatically changes the perceived experience.

In production, also use auto_explain, which logs the execution plan itself of slow queries that exceed a threshold (for details, the EXPLAIN article).


3. The planner runs on statistics (so ANALYZE is critical)

PostgreSQL's planner, before actually scanning a table, estimates "how many rows this WHERE narrows to (selectivity)" from statistics and chooses the cheapest plan. If statistics are stale, the estimate is off and it chooses a wrong plan (a sequential scan where it should use an index, etc.).

There are 2 kinds of statistics (official planner-stats).

  • Per table (pg_class): reltuples (row count), relpages (block count). Updated by VACUUM / ANALYZE / CREATE INDEX, etc., and always somewhat stale.
  • Per column (pg_statistic → the pg_stats view): most_common_vals (high-frequency values), histogram_bounds, n_distinct, null_frac. Updated by ANALYZE / VACUUM ANALYZE, and always approximate.

The practical guidance is simple.

-- 統計の収集精度(既定 100)。偏った分布の列だけ列単位で引き上げる
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Multiple correlated columns (e.g., city and postal_code), because the planner assumes "each condition is independent," underestimate the row count by orders of magnitude for a combined condition. Correct this with extended statistics.

-- 「都市が決まれば郵便番号もほぼ決まる」関数従属を学習させる
CREATE STATISTICS orders_city_zip (dependencies, ndistinct)
  ON city, postal_code FROM orders;
ANALYZE orders;

The typical cause of "I added an index but it isn't used" is stale statistics. Always ANALYZE right after mass INSERT/update. Rather than leaving it to autovacuum, explicitly running it at the end of a batch process is safe.


4. Memory settings: touch them understanding the meaning

Memory settings aren't "bigger = faster." Because each protects a different target, touch them understanding the meaning. The following are all defaults of the official runtime-config-resource / runtime-config-query.

4.1 shared_buffers — PostgreSQL's own cache

The shared memory where PostgreSQL caches data pages. Default 128MB. The official guidance is "if RAM is 1GB or more, start from 25%" and "rarely better even over 40%." The reason is that PostgreSQL also strongly depends on the OS page cache, and raising shared_buffers too much double-holds with the OS cache, conversely lowering efficiency.

# 例:RAM 16GB のサーバー
shared_buffers = 4GB              # RAM の約25%。サーバー起動時のみ反映
effective_cache_size = 12GB       # 後述。OS含めキャッシュ可能と見込む量

4.2 work_mem — the biggest mine (not per query)

The memory a sort (ORDER BY / DISTINCT / merge join) or hash (hash join, hash aggregate) can use before spilling to a temporary file. Default 4MB.

This is the biggest pitfall. Citing the official expression —

A complex query may run multiple sort or hash operations simultaneously, and each operation can use up to work_mem. Furthermore, multiple sessions can run simultaneously. So the total memory usage can be many times work_mem.

That is, the effective upper limit is work_mem × the number of simultaneous sort/hash nodes in the query × simultaneous sessions. Casually set work_mem = 1GB and the server goes down with OOM at peak. Furthermore, hashes have hash_mem_multiplier (default 2.0) applied, so they can actually use up to work_mem × 2.0.

-- グローバルは控えめ(例 16MB〜64MB)。重い集計クエリだけセッション/トランザクションで上げる
SET LOCAL work_mem = '256MB';   -- このトランザクション内だけ。BEGIN〜COMMIT で囲む
SELECT customer_id, sum(amount) FROM orders GROUP BY customer_id;

If Sort Method: external merge Disk: 12345kB appears in EXPLAIN ANALYZE, it's evidence that the sort overflowed work_mem and spilled to disk = a sign of insufficient work_mem.

4.3 maintenance_work_mem — for maintenance work

The memory used by VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY, etc. Default 64MB. Since the premise is only one runs per session at a time, the official also says "safe to set far larger than work_mem." Temporarily raising it before index re-creation or bulk load has a huge effect.

SET maintenance_work_mem = '1GB';   -- 索引作成・VACUUM を高速化

4.4 effective_cache_size and random_page_cost — the planner's worldview

These two don't reserve memory. They're parameters that teach the planner "how it sees the world."

  • effective_cache_size (default 4GB): the estimate of the total disk cache (shared_buffers + OS cache) one query is likely to use. The larger, the more favorably an index scan is evaluated. Aim for 50-75% of RAM.
  • random_page_cost (default 4.0): the relative cost of a random page read (seq_page_cost is 1.0). It's an HDD-premised default and doesn't match reality on SSD/NVMe. For SSD, lowering it to about 1.1 makes an index scan more justly chosen.
effective_cache_size = 12GB       # RAM 16GB の 75%
random_page_cost = 1.1            # SSD/NVMe の現実に合わせる(HDD既定は 4.0)

This is an "effective but often overlooked" setting. Cloud managed DBs are nearly all SSD, but with random_page_cost still at 4.0, the planner misunderstands "indexes are expensive" and tends to choose a sequential scan.


5. PostgreSQL 18's new features: what got faster

PostgreSQL 18 (released September 2025, the latest at writing is 18.4) greatly boosted I/O performance. The premise of tuning changes, so grasp it.

5.1 Asynchronous I/O (AIO) — up to 3×

Per the official release announcement, PG18 introduced an asynchronous-I/O subsystem and can now issue I/O requests concurrently without waiting sequentially. The targets are sequential scans, bitmap heap scans, and VACUUM, and benchmarks show up to 3× performance improvement.

What controls this is the new parameter io_method (server startup only). There are 3 values.

ValueContent
workerdefault. Execute async I/O with dedicated worker processes (io_workers default 3)
io_uringUse Linux's io_uring (needs a --with-liburing build)
syncSynchronous I/O as before (disabled)

Together, the defaults of effective_io_concurrency and maintenance_io_concurrency were raised from 1 → 16 (official comment: "more accurately reflects modern hardware").

5.2 B-tree skip scan

Cases increased where a composite index can be used by a following column's condition even without an equality condition on the leading column. Citing the official release notes —

Allow a multi-column B-tree index to be used even when leading or earlier columns have no constraint (or only a non-equality constraint), when later columns have a useful constraint.

This partially eases the conventional iron rule of "get the column order of a composite index wrong and it's not used at all" (but it works more the lower the leading column's cardinality). For details, head to the index-design article.

5.3 EXPLAIN ANALYZE shows BUFFERS by default

Plain but effective. From PG18, EXPLAIN ANALYZE now automatically includes BUFFERS (official: "Buffers information is automatically included when ANALYZE is used."). Since buffer hits/reads are visible by default, the diagnosis of "is it on cache / is it hitting disk" gets a notch easier.

-- PG18 では BUFFERS を明示しなくても shared hit/read が出る
EXPLAIN (ANALYZE) SELECT * FROM orders WHERE customer_id = 42;
-- 抑制したいときだけ: EXPLAIN (ANALYZE, BUFFERS OFF) ...

Other practically-effective improvements are in too: uuidv7() (time-ordered UUID = good index locality), virtual generated columns becoming default, being able to use a non-btree unique index for a composite primary key, etc.


6. Connection management: processes not threads, so pooling is essential

PostgreSQL spawns one OS process per connection (not a thread). One connection consumes several MB+ of memory, and the simultaneous consumption of work_mem rides here too. Open app connections unlimitedly and it becomes the typical accident of "raise max_connections → memory exhaustion."

The remedy is connection pooling. Insert a pooler (PgBouncer, etc.) between the app ↔ DB and reuse physical connections with a small number. It's essentially mandatory in serverless (Lambda, etc.) or many web workers.

[many app processes] ──▶ [PgBouncer transaction mode] ──▶ [few physical connections] ──▶ PostgreSQL
                                  ↑ share physical connections and keep max_connections low

On the app side too, in serverless the iron rule is "don't create a connection per request" and "narrow the pool upper limit on the premise of a pooler."

// node-postgres: アプリ内プールは小さく、外側で PgBouncer に集約する設計
import { Pool } from "pg";

export const pool = new Pool({
  connectionString: process.env.DATABASE_URL, // 秘密情報は環境変数(コードに埋めない)
  max: 10,                    // 1インスタンスあたりの上限。台数×max が物理接続を超えないこと
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 5_000,
});

// クエリは必ずパラメータ化(文字列連結は SQL インジェクションの温床)
export async function findOrder(id: string) {
  const { rows } = await pool.query(
    "SELECT id, customer_id, amount FROM orders WHERE id = $1",
    [id],
  );
  return rows[0] ?? null;
}

The most important security point: always pass user input with a parameterized query ($1). The moment you build SQL by string concatenation, the door to SQL injection opens. The parameterization of an ORM (Drizzle/Prisma, etc.) or pg is a mechanism to structurally protect this boundary.


7. Make writes fast: bulk load and HOT updates

Not only reads but insertion and update have a theory.

7.1 Mass insertion is COPY, not INSERT

The official populate guidance: for mass data, use COPY not INSERT (flow all rows in one command and amortize the parse, plan, and commit overhead). Furthermore,

  • Create indexes and foreign keys after insertion (avoid index maintenance during insertion)
  • Temporarily raise maintenance_work_mem and max_wal_size
  • Always ANALYZE after insertion (otherwise the planner malfunctions on empty-table statistics)
-- 初期ロードの定石
ALTER TABLE big_table SET (autovacuum_enabled = false);  -- ロード中の自動VACUUMを止める
-- (索引を落としてから)COPY で投入
COPY big_table FROM '/path/data.csv' WITH (FORMAT csv, HEADER true);
-- 索引・FKを作成し直す → 統計を更新
CREATE INDEX ...;
ANALYZE big_table;
ALTER TABLE big_table SET (autovacuum_enabled = true);

7.2 Updates produce "bloat" (suppress with HOT updates)

PostgreSQL's UPDATE, for MVCC, doesn't immediately delete the old row but appends a new row version (detailed in the MVCC article). This is the cause of table bloat.

The mitigation is the HOT update. If "the update doesn't change indexed columns" and "there's free space on the same page," it can skip creating a new index entry, suppressing bloat. For a heavily-updated table, lowering fillfactor to ensure free space on the same page makes HOT updates hold more easily.

-- 更新の多いテーブルは fillfactor を下げて HOT 更新を促す(既定100)
ALTER TABLE sessions SET (fillfactor = 85);

8. Emergency checklist (when production is heavy)

The order to isolate "slow now." Confirm from the top.

  1. Which query is clogged? Confirm state = 'active' and wait_event with pg_stat_activity. A long idle in transaction session is the culprit that hinders VACUUM and keeps holding locks.
  2. Any lock waits? Join pg_locks with pg_stat_activity and identify the blocking PID.
  3. What's the heavy query's plan? EXPLAIN (ANALYZE, BUFFERS) the relevant query. Look for Seq Scan + a large Rows Removed by Filter, external merge Disk, a huge loops.
  4. Are statistics fresh? ANALYZE tables with a recent mass update.
  5. Is it bloated? Confirm n_dead_tup in pg_stat_user_tables and the last autovacuum run time.
-- いま実行中で重いセッション(長時間 idle in transaction は要注意)
SELECT pid, state, wait_event_type, wait_event,
       now() - xact_start AS xact_age, substring(query,1,60) AS query
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY xact_start;

9. Summary of design judgments (trade-off table)

MeasureEffective targetCost/riskDecision criterion
Add indexreadswrite degradation, bloat, capacitystick it only on the search conditions of the pg_stat_statements top
Raise work_memsort/hashOOM (multiplies under concurrency)conservative globally, SET LOCAL only for heavy aggregation
Lower random_page_costplan choicemistakenly discard seq scanaround 1.1 for SSD (keep 4.0 for HDD)
Partitioningultra-large tablescomplexity of design/operationguide is when table size exceeds physical memory
Connection poolingmany simultaneous connectionsan added operational componentmandatory in serverless/many-worker
Non-durable settingsinsertion speeddata loss on OS crashonly for re-creatable processing like initial load

Note on non-durable settings (official non-durability): synchronous_commit = off "has a risk of losing transactions on an OS crash but doesn't corrupt data." fsync = off has a corruption risk. Use only for data you can always recreate (initial load, re-creatable aggregation).


10. Conclusion: speed is decided by "order"

  • Measurement first. Identify heavy queries with pg_stat_statements, see the cause with EXPLAIN (ANALYZE, BUFFERS), then act.
  • The order of effectiveness is ① index/query design → ② statistics (ANALYZE) → ③ memory → ④ VACUUM → ⑤ connections → ⑥ physical design. The higher, the cheaper and more effective.
  • work_mem is per-operation. The total usage swells multiplicatively. Conservative globally, SET LOCAL only where needed.
  • Lower random_page_cost for SSD. Make effective_cache_size large to fit reality.
  • PG18 made I/O faster (async I/O, BUFFERS on by default, skip scan). The premise changed, so re-measure on the latest.

In this series, I deep-dive into each layer lined up in this article individually. Start from how to read EXPLAIN and index design. PostgreSQL, "if you measure correctly and touch it in the correct order," answers production demands plenty without adding a dedicated DB.


References (PostgreSQL 18 official documentation)

Frequently asked questions

What should I start PostgreSQL tuning from?
From measurement, not speculation. Identify heavy queries with pg_stat_statements, confirm the execution plan with EXPLAIN (ANALYZE, BUFFERS), then act. Touching only settings without seeing the bottleneck tends to be counterproductive.
What's the guide for shared_buffers and work_mem?
shared_buffers' initial guide is about 25% of RAM, and over 40% tends to be counterproductive. work_mem is consumed not per query but per plan 'node,' and the total usage can be work_mem × simultaneous sorts/hashes × simultaneous sessions, so set it conservatively.
What's the order of effective tuning?
① index/query design → ② statistics (ANALYZE) → ③ memory settings → ④ VACUUM/autovacuum → ⑤ connection pooling → ⑥ partitioning. The more upstream, the bigger the effect, and keeping the order directly ties to speed.

References

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading