Skip to main content
友田 陽大
PostgreSQL internals & performance
PostgreSQL
アーキテクチャ設計
パフォーマンス

PostgreSQL declarative partitioning in practice (RANGE/LIST/HASH, partition pruning, rolling-window operation, v18-ready)

A practical guide to correctly using PostgreSQL declarative partitioning in production. With real code faithful to the official documentation, it explains: when to partition (and when not to), the RANGE/LIST/HASH syntax, the boundary trap that the upper bound is exclusive, primary-key constraint rules, how to confirm partition pruning, ATTACH/DETACH and time-series rolling-window operation, partitionwise join, and PostgreSQL 18 improvements.

Published
Reading time
11 min read
Author
友田 陽大
Share

"The table has hundreds of millions of rows and is slow," "deleting old logs takes hours" — that may be partitioning's turn. Physically split one huge table into multiple child tables by rule, so the planner reads only the necessary splits and you can discard an old split whole in an instant.

But partitioning isn't a "just add it" optimization. It increases design constraints (especially the primary key) and operational effort. This article explains "when and how to partition, and when not to," faithful to the PostgreSQL official documentation, with production-operation procedures. It's a deep dive on the "physical design" layer raised in §9 of the performance overview.

Rules for this article: the partitioning spec, syntax, default settings, limits, and PostgreSQL 18 changes are all based on the PostgreSQL 18 official documentation (as of June 2026). In managed environments, parallel settings, etc., can differ, so confirm the actual pruning with EXPLAIN.


1. When to partition (and when not to)

The advantages of partitioning the official docs raise:

  • Dramatic query-performance improvement — when frequently used rows concentrate in a few partitions. It substitutes for the upper tree of an index, and the frequently used part is more likely to be in memory.
  • Faster bulk updates/aggregation — if you read most of one partition, a sequential scan is more effective than random access over the whole table.
  • Bulk insert/delete becomes a meta operationDROP TABLE or DETACH PARTITION is orders of magnitude faster than a bulk DELETE, and completely avoids the VACUUM load DELETE produces.
  • You can move cold data to cheaper storage.

And the rule of thumb for when it's effective (official):

These advantages are typically worthwhile only when a table is very large. The exact point at which a table benefits from partitioning depends on the app, but the rule of thumb is that the table size exceeds the database server's physical memory.

An important judgment: for around a few million rows, first exhaust index design and VACUUM. Since partitioning increases complexity (especially the primary-key constraint) and operational cost, introduce it narrowly for "a scale exceeding physical memory" or "a clear time-series discard pattern." YAGNI — once you need it.


2. Three methods: RANGE / LIST / HASH

The official definitions and syntax.

MethodHow it splitsTypical use
RANGESplit by value ranges of the key (non-overlapping)Time series (date, sequence)
LISTSplit by explicitly enumerated key valuesRegion, tenant, category
HASHSplit by modulus and remainderNo natural division, but want even distribution

RANGE: the standard for time series

-- 親テーブル(データは入らない。ルーティングするだけ)
CREATE TABLE measurement (
  city_id  int  NOT NULL,
  logdate  date NOT NULL,
  peaktemp int
) PARTITION BY RANGE (logdate);

-- 月次パーティション。下限は含む・上限は含まない('2026-07-01' は7月側)
CREATE TABLE measurement_y2026m06 PARTITION OF measurement
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

-- どの範囲にも入らない行の受け皿(任意・hash では不可)
CREATE TABLE measurement_default PARTITION OF measurement DEFAULT;

The boundary trap (stated officially): "the lower bound FROM is inclusive, the upper bound TO is exclusive." E.g., with ranges 110 and 1020, the value 10 goes into the second. Making the upper bound = the next lower bound on adjacent partitions leaves neither gap nor overlap. No bounds is MINVALUE/MAXVALUE.

LIST: explicit enumeration

CREATE TABLE customers (id bigint, region text, name text)
  PARTITION BY LIST (region);

CREATE TABLE customers_apac PARTITION OF customers FOR VALUES IN ('JP','KR','SG');
CREATE TABLE customers_emea PARTITION OF customers FOR VALUES IN ('DE','FR','UK');
CREATE TABLE customers_other PARTITION OF customers DEFAULT;

HASH: even distribution

CREATE TABLE events (id bigint, payload jsonb) PARTITION BY HASH (id);
-- 4分割(モジュラス4・剰余0〜3)
CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);

3. The biggest design constraint: the primary key includes the partition key

This is what surprises people most in introducing partitioning. The official rule:

To create a unique or primary-key constraint on a partitioned table, the partition key must not include an expression or function call, and the constraint's columns must include all partition-key columns.

The reason is clear — each partition's index can only guarantee uniqueness within itself, so "no duplication across different partitions" can only be ensured by the partition structure itself.

-- ❌ これは失敗する:主キー (id) がパーティションキー logdate を含まない
-- CREATE TABLE m (id bigint PRIMARY KEY, logdate date) PARTITION BY RANGE (logdate);

-- ✅ パーティションキーを主キーに含める(複合主キー)
CREATE TABLE m (
  id      bigint GENERATED ALWAYS AS IDENTITY,
  logdate date NOT NULL,
  PRIMARY KEY (id, logdate)        -- パーティションキー logdate を含む
) PARTITION BY RANGE (logdate);

The impact on design: if you want id alone to be globally unique, it's a poor fit for partitioning (and as a foreign-key reference target it becomes the (id, logdate) composite). This is the most important trade-off in deciding "whether to partition." Reconcile it with a design like using a time-ordered UUID like uuidv7() (PG18), ensuring logical uniqueness with the UUID while including the partition column in the primary key.

Other limits (official §5.12.2.3): a BEFORE ROW trigger can't change the destination partition / you can't mix temporary and permanent tables in a partition tree / you can't convert between a regular table and a partitioned table afterward (design from the start) / updating the partition key moves the row to another partition.


4. Partition pruning: always confirm it's working

The source of partitioning's performance is pruning — the optimization that "skips partitions irrelevant to the query" (per the official definition). enable_partition_pruning is default ON.

There are two kinds of pruning.

  • Plan-time pruning: when the value is known at the planning stage, like WHERE logdate >= '2026-06-01'.
  • Run-time pruning: it narrows at run time even when the value is unknown at plan time, such as a PREPARE parameter, a subquery result, or a parameter inside a nested loop.
-- プルーニングが効いていれば、関係するパーティションしか計画に出ない
EXPLAIN (ANALYZE)
SELECT count(*) FROM measurement WHERE logdate >= DATE '2026-06-01';
-- → measurement_y2026m06 など該当パーティションだけが現れ、他は読まれない

Partitions pruned at plan time don't appear in EXPLAIN/EXPLAIN ANALYZE. Conversely, if "all partitions appear in the plan," that's a sign the WHERE doesn't touch the partition key = pruning isn't working. Make a query design that always narrows by the partition key.

(Note: the legacy inheritance-based split uses constraint_exclusion and is plan-time only. Declarative partitioning's pruning works at both plan time and run time, a more powerful mechanism.)


5. Rolling-window operation: the main use for time series

The true value of RANGE partitioning is in operation. A rolling window of "keep only the latest N months and discard old months" is done with attaching/detaching partitions, not DELETE.

-- 古い月をまるごと削除(数百万行でも一瞬。DELETE のような VACUUM 負荷なし)
DROP TABLE measurement_y2025m06;

-- 捨てずにアーカイブしたいなら、独立テーブルとして切り離す
ALTER TABLE measurement DETACH PARTITION measurement_y2025m06;
-- DETACH CONCURRENTLY は親への重いロックを避ける(トランザクション外・DEFAULTパーティションがあると不可)

Add a new month's partition safely

Each month, prepare next month's partition in advance. The basis is creating an empty table with CREATE TABLE ... PARTITION OF (no scan needed since it's empty). If you ATTACH an existing table, attaching a matching CHECK constraint in advance avoids the full-table scan (and long lock) at ingestion, as the official docs say.

-- 既存テーブルを取り込む場合:先に CHECK を付けると ATTACH 時の全行スキャンを省ける
ALTER TABLE measurement_new
  ADD CONSTRAINT ck CHECK (logdate >= '2026-07-01' AND logdate < '2026-08-01');
ALTER TABLE measurement ATTACH PARTITION measurement_new
  FOR VALUES FROM ('2026-07-01') TO ('2026-08-01');
ALTER TABLE measurement_new DROP CONSTRAINT ck;  -- 取り込み後は冗長なので外してよい

It's robust to make the operation an idempotent function and run it monthly with cron (pg_cron, etc.).

-- 指定月のパーティションを「無ければ作る」冪等関数。二重実行しても安全
CREATE OR REPLACE FUNCTION ensure_measurement_partition(target date)
RETURNS void LANGUAGE plpgsql AS $$
DECLARE
  start_d date := date_trunc('month', target)::date;
  end_d   date := (date_trunc('month', target) + interval '1 month')::date;
  part    text := format('measurement_y%sm%s', to_char(start_d,'YYYY'), to_char(start_d,'MM'));
BEGIN
  IF to_regclass(part) IS NULL THEN          -- 既に在れば何もしない=冪等
    EXECUTE format(
      'CREATE TABLE %I PARTITION OF measurement FOR VALUES FROM (%L) TO (%L)',
      part, start_d, end_d);
  END IF;
END;
$$;

-- 毎月、今月と来月分を確保(実行漏れに強い)
SELECT ensure_measurement_partition(current_date);
SELECT ensure_measurement_partition((current_date + interval '1 month')::date);

For observability: if next month's creation is missed, it falls into the DEFAULT partition (if you prepared one), or if there isn't one, the INSERT errors. Operating with monitoring of "do the next 2 months exist" and alerting if missing prevents accidents. An index automatically propagates to all partitions if created on the parent, and is inherited by partitions created later too.


6. partitionwise join / aggregate: why it's default OFF

The optimization that does joins/aggregations between partitions "per corresponding partition" is partitionwise join / aggregate. Both are default OFF.

The reason is the memory cost the official docs state plainly: enabling these means "the number of nodes limited by work_mem can increase linearly in proportion to the partition count. The total memory consumption at run time can increase greatly, and the planning itself becomes substantially more expensive in CPU and memory."

-- パーティション数が少なく、パーティション同士の結合/集約が重いワークロードでのみ検討
SET enable_partitionwise_join = on;
SET enable_partitionwise_aggregate = on;

It's not "always ON if it's faster." Since many partitions can be counterproductive, EXPLAIN ANALYZE on the target workload and, if needed, enable it per session / specific query.


7. PostgreSQL 18 improvements

PG18 practically improved partitioning's operation and planning (official release notes).

  • Made query planning of many partitions efficient — planning for designs with many partitions got faster.
  • Expanded the cases partitionwise join can apply and reduced memory use — the §6 trade-off is eased.
  • Improved cost estimation for partition queries.
  • VACUUM/ANALYZE can process only the parent partitioned table without processing children.
  • Allows NOT VALID foreign-key constraints on a partitioned table (staged constraint addition becomes possible).
  • Allows a non-btree unique index for the partition key.
  • Unlogged partitioned tables are disallowed (for consistency).

From an upgrade standpoint, the more a design takes many partitions (multi-tenant, etc.), the more it benefits from PG18's planning efficiency.


8. Introduction-decision checklist

  1. Is the scale enough? Does the table exceed physical memory / is there a clear discard pattern (time series)? If not, indexes and VACUUM come first.
  2. Is the partition key decided? Can most queries' WHERE narrow by that key (the premise of pruning)?
  3. Can you tolerate the primary-key constraint? Is a design that includes the partition-key column in the primary key/unique acceptable?
  4. Is the method appropriate? Time series = RANGE, explicit division = LIST, even distribution = HASH.
  5. Did you automate operation? Pre-create next month's, DETACH/DROP old months, with an idempotent function + scheduler.
  6. Did you confirm pruning? EXPLAIN a representative query and check that only the necessary partitions are read.

9. Conclusion

  • Partitioning is worthwhile only with "a scale exceeding physical memory" + "a clear discard pattern." For small tables, indexes and VACUUM first.
  • RANGE/LIST/HASH. RANGE's boundary is lower inclusive, upper exclusive. Old data is instant with DROP/DETACH PARTITION (no VACUUM load either).
  • The primary key includes the partition key — this is the biggest design constraint. Always confirm before introducing.
  • Pruning (default ON) is the source of performance. Narrow WHERE by the partition key and confirm the effect with EXPLAIN.
  • The rolling window is automated with an idempotent function + scheduler. ATTACH avoids a full-table scan with a prior CHECK.
  • partitionwise join/agg is default OFF (memory cost). PG18 improves planning of many partitions, partitionwise join, and cost estimation.

This completes one round of the PostgreSQL core performance-design series. Starting from the performance overview, combine indexes, EXPLAIN, MVCC/VACUUM, and JSONB to design a PostgreSQL that meets production demands without adding dedicated DBs.


References (PostgreSQL 18 official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading