Skip to main content
友田 陽大
PostgreSQL internals & performance
PostgreSQL
パフォーマンス
アーキテクチャ設計

How to read PostgreSQL EXPLAIN ANALYZE and improve slow queries (reading the plan, the meaning of each node, auto_explain, v18 support)

A practical guide to diagnosing PostgreSQL's slow queries with EXPLAIN ANALYZE and reliably making them fast. Faithful to the official docs, it covers: reading cost/rows/width, the gap between estimate and actual (stale statistics), the meaning of each node such as Seq Scan / Bitmap / Nested Loop, warning signs like external sort and Heap Fetches, auto_explain to log the plans of production slow queries, and PostgreSQL 18's BUFFERS-on-by-default.

Published
Reading time
10 min read
Author
友田 陽大
Share

You don't need intuition to fix "the query is slow." PostgreSQL shows you the execution plan with EXPLAIN and the measurement with EXPLAIN ANALYZE. Once you can read them, "why it's slow" is clear at a glance, and the move (index, statistics update, query rewrite, memory) is determined.

This article is a reading guide for actually diagnosing and fixing the "heavy queries" identified in §2 of the performance overview. Faithful to the official documentation's definitions, it goes all the way down to the warning signs you see in the field and their remedies.

Rules for this article: the meaning of EXPLAIN's output items, each option, and the PostgreSQL 18 changes are all based on the PostgreSQL 18 official documentation (as of June 2026). Cost is "an arbitrary unit with disk-page fetch as 1.0" and is used for relative comparison between plans, not absolute values.


1. Read the four numbers: cost / rows / width

First, plain EXPLAIN (doesn't run = estimate only).

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_cust on orders  (cost=0.42..8.45 rows=3 width=72)
  Index Cond: (customer_id = 42)

The four numbers in parentheses are the basics (official using-explain).

ItemMeaning
cost=0.42..8.45start cost..total cost. 0.42 = until it can return the first row, 8.45 = until it finishes returning all rows
rows=3the number of rows this node is estimated to output
width=72the estimated average bytes per row

Points.

  • Start cost is "preparation until output begins" (for a sort node, until the sort completes).
  • Total cost assumes "reading all the way to the end." If the parent cuts it off midway with LIMIT, it won't actually take this long.
  • Cost is cumulative. A parent node's cost includes its children.
  • The unit is arbitrary (seq_page_cost = 1.0 basis). The absolute value is meaningless; use it to compare plans.

PostgreSQL 18 tidbit: EXPLAIN's rows may be shown as a decimal (formerly rounded to integers). Tiny selectivities look more accurate.


2. EXPLAIN ANALYZE: show the measurement (be careful with DML)

Add ANALYZE and PostgreSQL actually runs the query and lays the estimate alongside the measurement.

EXPLAIN (ANALYZE) SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_cust on orders
  (cost=0.42..8.45 rows=3 width=72) (actual time=0.018..0.021 rows=3 loops=1)
  Index Cond: (customer_id = 42)
  Buffers: shared hit=4
Planning Time: 0.10 ms
Execution Time: 0.04 ms

actual time=0.018..0.021 rows=3 loops=1 is the measurement. Comparing (cost=…) (actual=…) side by side is the basic diagnostic action.

Danger: ANALYZE really runs it. As the official warning says, adding EXPLAIN ANALYZE to INSERT/UPDATE/DELETE/MERGE changes data. The SELECT return is discarded, but the side effects occur. When measuring DML, always wrap it in a transaction and roll back.

BEGIN;
EXPLAIN (ANALYZE) UPDATE orders SET status = 'shipped' WHERE id = 42;
ROLLBACK;   -- 計画は分かったが、変更は取り消す

3. Most important: the gap between estimated rows and actual rows

What the docs explicitly call "most important" is — how close the estimated row count is to the actual.

Seq Scan on orders
  (cost=0.00..18334.00 rows=1 width=72) (actual ... rows=48210 loops=1)
                       ^^^^^^                            ^^^^^^^^^^^
                       estimated 1 row          actual is 48,210 rows (off by orders of magnitude!)

When it is off by orders of magnitude like this, the planner is building the plan on wrong premises. The cause is almost always stale statistics. Re-collecting statistics with ANALYZE solves this kind of problem in many cases.

ANALYZE orders;   -- 統計を更新してから、もう一度 EXPLAIN ANALYZE

If it's still off, the cause is often multiple correlated columns (the planner assumes independence). Give it extended statistics with CREATE STATISTICS (see overview §3). Once the estimate is correct, the planner naturally chooses a good plan. "Correcting the statistics" rather than "forcing an index" is the royal road.


4. The loops trap: the measured values are "the per-loop average"

Very many people misread this. The official definition:

When a subplan node is executed multiple times, loops represents the total number of executions, and actual time and rows are displayed as the per-execution average. To know the total time, multiply by loops.

That is, actual time=0.5 rows=2 loops=10000 means "average 0.5 ms × 10,000 times = 5 seconds total." When this happens inside a Nested Loop, the seemingly small numbers explode in total time.

Nested Loop  (actual time=... rows=10000 loops=1)
  ->  Seq Scan on a   (actual ... rows=10000 loops=1)
  ->  Index Scan on b (actual time=0.05..0.06 rows=1 loops=10000)
                                                        ^^^^^^^^^^
                          the inner ran 10,000 times → 0.06ms × 10000 ≈ 0.6s is hidden here

If a Nested Loop is chosen despite a large outer row count, there's a high chance the planner failed to pick a Hash/Merge join due to a row-count estimate error. Again, suspect the statistics.


5. The meaning of nodes: what does it mean when each is chosen

The execution plan is a tree; leaves are scans, branches are joins/aggregations. The meaning of the major nodes and "what to think when it appears."

Scans

NodeMeaningWhat to think when it appears
Seq Scanreads all rows in ordernormal if it returns many rows. If it appears while returning few, suspect a missing index
Index Scanlocate via index → fetch from heapoptimal for selective conditions / ordering. One of the ideal forms
Index Only Scancompletes with the index alone (no heap needed)top-class fast. Check Heap Fetches (index article)
Bitmap Index Scan → Bitmap Heap Scangather positions into a bitmap, then fetch from the heap in physical ordermedium selectivity. Efficient, between Seq and Index

Joins

NodeMeaningGood situation
Nested Loopre-scans the inner for each outer rowwhen the outer is few. Explodes if many (§4)
Hash Joinbuilds a hash table from one side and matcheslarge unordered equi-join
Merge Joinsorts both sides and mergeslarge join where both sides are already sorted
-- VERBOSE で出力列・スキーマ修飾名まで、FORMAT json で機械可読に出せる
EXPLAIN (ANALYZE, VERBOSE, FORMAT JSON) SELECT ...;

6. Warning-sign collection: when this appears, fix it this way

A field collection of patterns for reading "the reason it's slow" from the execution plan.

① A huge Rows Removed by Filter → missing index

Seq Scan on orders  (actual ... rows=120 loops=1)
  Filter: (status = 'pending')
  Rows Removed by Filter: 1488000   ← read 1.49M rows and narrowed to 120 (huge waste)

It read 1.49M rows and threw them away with the filter. Put an index (or partial index) on status to narrow the rows read from the start.

Sort Method: external merge Disk: … → work_mem shortage

Sort  (actual time=820..950 rows=500000 loops=1)
  Sort Key: created_at
  Sort Method: external merge  Disk: 86016kB   ← didn't fit in memory and spilled to disk

Evidence that the sort overflowed work_mem and wrote to disk. Either raise SET LOCAL work_mem for that query only (see overview §4.2), or prepare an index that satisfies ORDER BY to eliminate the sort itself. Memory: NkB (quicksort) means it completes in memory = healthy.

③ Many Heap Fetches → VACUUM shortage

Index Only Scan using idx_orders_cust_covering on orders
  (actual ...) Heap Fetches: 43120   ← supposed to be index-only but it visited the heap 40,000 times

Because the visibility map isn't all-visible, the index-only scan goes to look at the heap. VACUUM the target table and Heap Fetches drops (see MVCC/VACUUM article).

④ A huge inner loops in a Nested Loop → wrong join-method choice

As in §4. Often an estimate error. Correcting the estimate with ANALYZE / extended statistics is the right path.


7. Read cache efficiency with BUFFERS (on by default in PG18)

BUFFERS shows the number of buffers each node touched. You can tell whether it's on memory or hitting disk.

ItemMeaning
shared hitcache hit (avoided disk read)
shared readread from disk (cold)
shared dirtieda previously-unmodified block this query modified
shared writtena dirty block evicted during this query's execution

PostgreSQL 18 change: EXPLAIN ANALYZE now automatically includes BUFFERS (official: "Buffers information is automatically included when ANALYZE is used."). PG17 and earlier required it explicitly. Use EXPLAIN (ANALYZE, BUFFERS OFF) only when you want to suppress it.

Index Scan ... Buffers: shared hit=4          ← all cache hits (ideal)
Seq Scan   ... Buffers: shared read=18334     ← read 18k blocks from disk (heavy)

Run the same query twice, and if read turns into hit, that's evidence the second time landed in cache. Distinguish the first run (cold) from the second (warm) when evaluating. In PG18, additionally, EXPLAIN ANALYZE reports the number of index searches per index scan, and the memory/disk usage of Material/Window Aggregate/CTE nodes is also shown.


8. Catch production slow queries: auto_explain

During development you can hit EXPLAIN ANALYZE by hand, but in production you need to trace "when and which query was slow" after the fact. The extension for that is auto_explain — it automatically logs the execution plan of queries that exceed a threshold.

# postgresql.conf(shared_preload_libraries 推奨)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '500ms'   # 500ms 超のクエリの計画をログ(-1=無効, 0=全件)
auto_explain.log_format = 'json'          # 構造化ログとして集約しやすい
auto_explain.sample_rate = 1.0            # 高トラフィックでは下げて負荷を抑える

The most important caution (strongly warned by the docs): setting auto_explain.log_analyze = on makes it equivalent to EXPLAIN ANALYZE and captures not just the plan but the measurement, but "per-node timing runs for all queries, including short ones not targeted for logging, and can have an extremely bad impact on performance." Avoid keeping it always on in production; use it only when needed, combined with log_timing = off or sample_rate < 1 to keep the load down.

ParameterDefaultRole
log_min_duration-1 (disabled)record the plan if it exceeds this time (ms). 0 for all
log_analyzeoffalso take the measurement. Heavy-load caution
log_timingonper-node timing (assumes log_analyze). off reduces load
log_buffersoffinclude BUFFERS (assumes log_analyze)
sample_rate1the fraction of queries recorded. Lower it under heavy load

9. The diagnose-to-improve workflow (summary)

  1. Identify: choose the queries with long cumulative time via pg_stat_statements (overview §2).
  2. Measure: run EXPLAIN (ANALYZE) (DML in BEGIN; … ROLLBACK;).
  3. Check the estimate gap: rows estimate vs. actual. If off by orders of magnitude, ANALYZE → extended statistics.
  4. Identify the heavy node: nodes with large actual timeloops), Rows Removed by Filter, external merge Disk, Heap Fetches.
  5. Determine the move:
    • heavy filter → index (index article)
    • external sort → work_mem or an ordering index
    • Heap Fetches → VACUUM
    • wrong join choice → fix the statistics
  6. Re-measure: confirm the improvement numerically with the same EXPLAIN ANALYZE (mind cold/warm).
  7. Production monitoring: detect recurrence with auto_explain.

What not to do: bending the planner by brute force with SET enable_seqscan = off, etc., is symptomatic treatment that hides the cause. It's usable for isolation during debugging, but routine production use is forbidden. Make the planner choose correctly with correct statistics and indexes — that's the right path.


10. Conclusion

  • EXPLAIN is the estimate, EXPLAIN ANALYZE is the measurement. Compare the two side by side.
  • The gap between estimate and actual is the most important signal. If off by orders of magnitude, it's a statistics problem (ANALYZE / extended statistics).
  • actual is the per-loop average. Total time is loops times — beware the Nested Loop trap.
  • For each warning sign (Rows Removed by Filter / external merge Disk / Heap Fetches), the move is determined.
  • PG18 has EXPLAIN ANALYZE BUFFERS-on by default. In production, record plans with auto_explain (keeping log_analyze always on is heavy).

Once you can read the execution plan, tuning goes from "intuition" to "diagnosis." Next, on to the physical layer that sways the plan — MVCC, transactions, and VACUUM.


References (PostgreSQL 18 official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading