You don't need intuition to fix "the query is slow." PostgreSQL shows you the execution plan with EXPLAIN and the measurement with EXPLAIN ANALYZE. Once you can read them, "why it's slow" is clear at a glance, and the move (index, statistics update, query rewrite, memory) is determined.
This article is a reading guide for actually diagnosing and fixing the "heavy queries" identified in §2 of the performance overview. Faithful to the official documentation's definitions, it goes all the way down to the warning signs you see in the field and their remedies.
Rules for this article: the meaning of
EXPLAIN's output items, each option, and the PostgreSQL 18 changes are all based on the PostgreSQL 18 official documentation (as of June 2026). Cost is "an arbitrary unit with disk-page fetch as 1.0" and is used for relative comparison between plans, not absolute values.
1. Read the four numbers: cost / rows / width
First, plain EXPLAIN (doesn't run = estimate only).
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_cust on orders (cost=0.42..8.45 rows=3 width=72)
Index Cond: (customer_id = 42)
The four numbers in parentheses are the basics (official using-explain).
| Item | Meaning |
|---|---|
cost=0.42..8.45 | start cost..total cost. 0.42 = until it can return the first row, 8.45 = until it finishes returning all rows |
rows=3 | the number of rows this node is estimated to output |
width=72 | the estimated average bytes per row |
Points.
- Start cost is "preparation until output begins" (for a sort node, until the sort completes).
- Total cost assumes "reading all the way to the end." If the parent cuts it off midway with
LIMIT, it won't actually take this long. - Cost is cumulative. A parent node's cost includes its children.
- The unit is arbitrary (
seq_page_cost = 1.0basis). The absolute value is meaningless; use it to compare plans.
PostgreSQL 18 tidbit:
EXPLAIN'srowsmay be shown as a decimal (formerly rounded to integers). Tiny selectivities look more accurate.
2. EXPLAIN ANALYZE: show the measurement (be careful with DML)
Add ANALYZE and PostgreSQL actually runs the query and lays the estimate alongside the measurement.
EXPLAIN (ANALYZE) SELECT * FROM orders WHERE customer_id = 42;
Index Scan using idx_orders_cust on orders
(cost=0.42..8.45 rows=3 width=72) (actual time=0.018..0.021 rows=3 loops=1)
Index Cond: (customer_id = 42)
Buffers: shared hit=4
Planning Time: 0.10 ms
Execution Time: 0.04 ms
actual time=0.018..0.021 rows=3 loops=1 is the measurement. Comparing (cost=…) (actual=…) side by side is the basic diagnostic action.
Danger: ANALYZE really runs it. As the official warning says, adding
EXPLAIN ANALYZEtoINSERT/UPDATE/DELETE/MERGEchanges data. TheSELECTreturn is discarded, but the side effects occur. When measuring DML, always wrap it in a transaction and roll back.
BEGIN;
EXPLAIN (ANALYZE) UPDATE orders SET status = 'shipped' WHERE id = 42;
ROLLBACK; -- 計画は分かったが、変更は取り消す
3. Most important: the gap between estimated rows and actual rows
What the docs explicitly call "most important" is — how close the estimated row count is to the actual.
Seq Scan on orders
(cost=0.00..18334.00 rows=1 width=72) (actual ... rows=48210 loops=1)
^^^^^^ ^^^^^^^^^^^
estimated 1 row actual is 48,210 rows (off by orders of magnitude!)
When it is off by orders of magnitude like this, the planner is building the plan on wrong premises. The cause is almost always stale statistics. Re-collecting statistics with ANALYZE solves this kind of problem in many cases.
ANALYZE orders; -- 統計を更新してから、もう一度 EXPLAIN ANALYZE
If it's still off, the cause is often multiple correlated columns (the planner assumes independence). Give it extended statistics with CREATE STATISTICS (see overview §3). Once the estimate is correct, the planner naturally chooses a good plan. "Correcting the statistics" rather than "forcing an index" is the royal road.
4. The loops trap: the measured values are "the per-loop average"
Very many people misread this. The official definition:
When a subplan node is executed multiple times,
loopsrepresents the total number of executions, andactual timeandrowsare displayed as the per-execution average. To know the total time, multiply by loops.
That is, actual time=0.5 rows=2 loops=10000 means "average 0.5 ms × 10,000 times = 5 seconds total." When this happens inside a Nested Loop, the seemingly small numbers explode in total time.
Nested Loop (actual time=... rows=10000 loops=1)
-> Seq Scan on a (actual ... rows=10000 loops=1)
-> Index Scan on b (actual time=0.05..0.06 rows=1 loops=10000)
^^^^^^^^^^
the inner ran 10,000 times → 0.06ms × 10000 ≈ 0.6s is hidden here
If a Nested Loop is chosen despite a large outer row count, there's a high chance the planner failed to pick a Hash/Merge join due to a row-count estimate error. Again, suspect the statistics.
5. The meaning of nodes: what does it mean when each is chosen
The execution plan is a tree; leaves are scans, branches are joins/aggregations. The meaning of the major nodes and "what to think when it appears."
Scans
| Node | Meaning | What to think when it appears |
|---|---|---|
| Seq Scan | reads all rows in order | normal if it returns many rows. If it appears while returning few, suspect a missing index |
| Index Scan | locate via index → fetch from heap | optimal for selective conditions / ordering. One of the ideal forms |
| Index Only Scan | completes with the index alone (no heap needed) | top-class fast. Check Heap Fetches (index article) |
| Bitmap Index Scan → Bitmap Heap Scan | gather positions into a bitmap, then fetch from the heap in physical order | medium selectivity. Efficient, between Seq and Index |
Joins
| Node | Meaning | Good situation |
|---|---|---|
| Nested Loop | re-scans the inner for each outer row | when the outer is few. Explodes if many (§4) |
| Hash Join | builds a hash table from one side and matches | large unordered equi-join |
| Merge Join | sorts both sides and merges | large join where both sides are already sorted |
-- VERBOSE で出力列・スキーマ修飾名まで、FORMAT json で機械可読に出せる
EXPLAIN (ANALYZE, VERBOSE, FORMAT JSON) SELECT ...;
6. Warning-sign collection: when this appears, fix it this way
A field collection of patterns for reading "the reason it's slow" from the execution plan.
① A huge Rows Removed by Filter → missing index
Seq Scan on orders (actual ... rows=120 loops=1)
Filter: (status = 'pending')
Rows Removed by Filter: 1488000 ← read 1.49M rows and narrowed to 120 (huge waste)
It read 1.49M rows and threw them away with the filter. Put an index (or partial index) on status to narrow the rows read from the start.
② Sort Method: external merge Disk: … → work_mem shortage
Sort (actual time=820..950 rows=500000 loops=1)
Sort Key: created_at
Sort Method: external merge Disk: 86016kB ← didn't fit in memory and spilled to disk
Evidence that the sort overflowed work_mem and wrote to disk. Either raise SET LOCAL work_mem for that query only (see overview §4.2), or prepare an index that satisfies ORDER BY to eliminate the sort itself. Memory: NkB (quicksort) means it completes in memory = healthy.
③ Many Heap Fetches → VACUUM shortage
Index Only Scan using idx_orders_cust_covering on orders
(actual ...) Heap Fetches: 43120 ← supposed to be index-only but it visited the heap 40,000 times
Because the visibility map isn't all-visible, the index-only scan goes to look at the heap. VACUUM the target table and Heap Fetches drops (see MVCC/VACUUM article).
④ A huge inner loops in a Nested Loop → wrong join-method choice
As in §4. Often an estimate error. Correcting the estimate with ANALYZE / extended statistics is the right path.
7. Read cache efficiency with BUFFERS (on by default in PG18)
BUFFERS shows the number of buffers each node touched. You can tell whether it's on memory or hitting disk.
| Item | Meaning |
|---|---|
shared hit | cache hit (avoided disk read) |
shared read | read from disk (cold) |
shared dirtied | a previously-unmodified block this query modified |
shared written | a dirty block evicted during this query's execution |
PostgreSQL 18 change:
EXPLAIN ANALYZEnow automatically includesBUFFERS(official: "Buffers information is automatically included when ANALYZE is used."). PG17 and earlier required it explicitly. UseEXPLAIN (ANALYZE, BUFFERS OFF)only when you want to suppress it.
Index Scan ... Buffers: shared hit=4 ← all cache hits (ideal)
Seq Scan ... Buffers: shared read=18334 ← read 18k blocks from disk (heavy)
Run the same query twice, and if read turns into hit, that's evidence the second time landed in cache. Distinguish the first run (cold) from the second (warm) when evaluating. In PG18, additionally, EXPLAIN ANALYZE reports the number of index searches per index scan, and the memory/disk usage of Material/Window Aggregate/CTE nodes is also shown.
8. Catch production slow queries: auto_explain
During development you can hit EXPLAIN ANALYZE by hand, but in production you need to trace "when and which query was slow" after the fact. The extension for that is auto_explain — it automatically logs the execution plan of queries that exceed a threshold.
# postgresql.conf(shared_preload_libraries 推奨)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '500ms' # 500ms 超のクエリの計画をログ(-1=無効, 0=全件)
auto_explain.log_format = 'json' # 構造化ログとして集約しやすい
auto_explain.sample_rate = 1.0 # 高トラフィックでは下げて負荷を抑える
The most important caution (strongly warned by the docs): setting auto_explain.log_analyze = on makes it equivalent to EXPLAIN ANALYZE and captures not just the plan but the measurement, but "per-node timing runs for all queries, including short ones not targeted for logging, and can have an extremely bad impact on performance." Avoid keeping it always on in production; use it only when needed, combined with log_timing = off or sample_rate < 1 to keep the load down.
| Parameter | Default | Role |
|---|---|---|
log_min_duration | -1 (disabled) | record the plan if it exceeds this time (ms). 0 for all |
log_analyze | off | also take the measurement. Heavy-load caution |
log_timing | on | per-node timing (assumes log_analyze). off reduces load |
log_buffers | off | include BUFFERS (assumes log_analyze) |
sample_rate | 1 | the fraction of queries recorded. Lower it under heavy load |
9. The diagnose-to-improve workflow (summary)
- Identify: choose the queries with long cumulative time via
pg_stat_statements(overview §2). - Measure: run
EXPLAIN (ANALYZE)(DML inBEGIN; … ROLLBACK;). - Check the estimate gap:
rowsestimate vs. actual. If off by orders of magnitude,ANALYZE→ extended statistics. - Identify the heavy node: nodes with large
actual time(×loops),Rows Removed by Filter,external merge Disk,Heap Fetches. - Determine the move:
- heavy filter → index (index article)
- external sort →
work_memor an ordering index - Heap Fetches → VACUUM
- wrong join choice → fix the statistics
- Re-measure: confirm the improvement numerically with the same
EXPLAIN ANALYZE(mind cold/warm). - Production monitoring: detect recurrence with
auto_explain.
What not to do: bending the planner by brute force with
SET enable_seqscan = off, etc., is symptomatic treatment that hides the cause. It's usable for isolation during debugging, but routine production use is forbidden. Make the planner choose correctly with correct statistics and indexes — that's the right path.
10. Conclusion
EXPLAINis the estimate,EXPLAIN ANALYZEis the measurement. Compare the two side by side.- The gap between estimate and actual is the most important signal. If off by orders of magnitude, it's a statistics problem (
ANALYZE/ extended statistics). actualis the per-loop average. Total time is loops times — beware the Nested Loop trap.- For each warning sign (
Rows Removed by Filter/external merge Disk/Heap Fetches), the move is determined. - PG18 has
EXPLAIN ANALYZEBUFFERS-on by default. In production, record plans withauto_explain(keepinglog_analyzealways on is heavy).
Once you can read the execution plan, tuning goes from "intuition" to "diagnosis." Next, on to the physical layer that sways the plan — MVCC, transactions, and VACUUM.