"The table has hundreds of millions of rows and is slow," "deleting old logs takes hours" — that may be partitioning's turn. Physically split one huge table into multiple child tables by rule, so the planner reads only the necessary splits and you can discard an old split whole in an instant.
But partitioning isn't a "just add it" optimization. It increases design constraints (especially the primary key) and operational effort. This article explains "when and how to partition, and when not to," faithful to the PostgreSQL official documentation, with production-operation procedures. It's a deep dive on the "physical design" layer raised in §9 of the performance overview.
Rules for this article: the partitioning spec, syntax, default settings, limits, and PostgreSQL 18 changes are all based on the PostgreSQL 18 official documentation (as of June 2026). In managed environments, parallel settings, etc., can differ, so confirm the actual pruning with
EXPLAIN.
1. When to partition (and when not to)
The advantages of partitioning the official docs raise:
- Dramatic query-performance improvement — when frequently used rows concentrate in a few partitions. It substitutes for the upper tree of an index, and the frequently used part is more likely to be in memory.
- Faster bulk updates/aggregation — if you read most of one partition, a sequential scan is more effective than random access over the whole table.
- Bulk insert/delete becomes a meta operation —
DROP TABLEorDETACH PARTITIONis orders of magnitude faster than a bulkDELETE, and completely avoids the VACUUM loadDELETEproduces. - You can move cold data to cheaper storage.
And the rule of thumb for when it's effective (official):
These advantages are typically worthwhile only when a table is very large. The exact point at which a table benefits from partitioning depends on the app, but the rule of thumb is that the table size exceeds the database server's physical memory.
An important judgment: for around a few million rows, first exhaust index design and VACUUM. Since partitioning increases complexity (especially the primary-key constraint) and operational cost, introduce it narrowly for "a scale exceeding physical memory" or "a clear time-series discard pattern." YAGNI — once you need it.
2. Three methods: RANGE / LIST / HASH
The official definitions and syntax.
| Method | How it splits | Typical use |
|---|---|---|
| RANGE | Split by value ranges of the key (non-overlapping) | Time series (date, sequence) |
| LIST | Split by explicitly enumerated key values | Region, tenant, category |
| HASH | Split by modulus and remainder | No natural division, but want even distribution |
RANGE: the standard for time series
-- 親テーブル(データは入らない。ルーティングするだけ)
CREATE TABLE measurement (
city_id int NOT NULL,
logdate date NOT NULL,
peaktemp int
) PARTITION BY RANGE (logdate);
-- 月次パーティション。下限は含む・上限は含まない('2026-07-01' は7月側)
CREATE TABLE measurement_y2026m06 PARTITION OF measurement
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
-- どの範囲にも入らない行の受け皿(任意・hash では不可)
CREATE TABLE measurement_default PARTITION OF measurement DEFAULT;
The boundary trap (stated officially): "the lower bound FROM is inclusive, the upper bound TO is exclusive." E.g., with ranges 1–10 and 10–20, the value 10 goes into the second. Making the upper bound = the next lower bound on adjacent partitions leaves neither gap nor overlap. No bounds is MINVALUE/MAXVALUE.
LIST: explicit enumeration
CREATE TABLE customers (id bigint, region text, name text)
PARTITION BY LIST (region);
CREATE TABLE customers_apac PARTITION OF customers FOR VALUES IN ('JP','KR','SG');
CREATE TABLE customers_emea PARTITION OF customers FOR VALUES IN ('DE','FR','UK');
CREATE TABLE customers_other PARTITION OF customers DEFAULT;
HASH: even distribution
CREATE TABLE events (id bigint, payload jsonb) PARTITION BY HASH (id);
-- 4分割(モジュラス4・剰余0〜3)
CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);
3. The biggest design constraint: the primary key includes the partition key
This is what surprises people most in introducing partitioning. The official rule:
To create a unique or primary-key constraint on a partitioned table, the partition key must not include an expression or function call, and the constraint's columns must include all partition-key columns.
The reason is clear — each partition's index can only guarantee uniqueness within itself, so "no duplication across different partitions" can only be ensured by the partition structure itself.
-- ❌ これは失敗する:主キー (id) がパーティションキー logdate を含まない
-- CREATE TABLE m (id bigint PRIMARY KEY, logdate date) PARTITION BY RANGE (logdate);
-- ✅ パーティションキーを主キーに含める(複合主キー)
CREATE TABLE m (
id bigint GENERATED ALWAYS AS IDENTITY,
logdate date NOT NULL,
PRIMARY KEY (id, logdate) -- パーティションキー logdate を含む
) PARTITION BY RANGE (logdate);
The impact on design: if you want
idalone to be globally unique, it's a poor fit for partitioning (and as a foreign-key reference target it becomes the(id, logdate)composite). This is the most important trade-off in deciding "whether to partition." Reconcile it with a design like using a time-ordered UUID likeuuidv7()(PG18), ensuring logical uniqueness with the UUID while including the partition column in the primary key.
Other limits (official §5.12.2.3): a BEFORE ROW trigger can't change the destination partition / you can't mix temporary and permanent tables in a partition tree / you can't convert between a regular table and a partitioned table afterward (design from the start) / updating the partition key moves the row to another partition.
4. Partition pruning: always confirm it's working
The source of partitioning's performance is pruning — the optimization that "skips partitions irrelevant to the query" (per the official definition). enable_partition_pruning is default ON.
There are two kinds of pruning.
- Plan-time pruning: when the value is known at the planning stage, like
WHERE logdate >= '2026-06-01'. - Run-time pruning: it narrows at run time even when the value is unknown at plan time, such as a
PREPAREparameter, a subquery result, or a parameter inside a nested loop.
-- プルーニングが効いていれば、関係するパーティションしか計画に出ない
EXPLAIN (ANALYZE)
SELECT count(*) FROM measurement WHERE logdate >= DATE '2026-06-01';
-- → measurement_y2026m06 など該当パーティションだけが現れ、他は読まれない
Partitions pruned at plan time don't appear in
EXPLAIN/EXPLAIN ANALYZE. Conversely, if "all partitions appear in the plan," that's a sign theWHEREdoesn't touch the partition key = pruning isn't working. Make a query design that always narrows by the partition key.
(Note: the legacy inheritance-based split uses constraint_exclusion and is plan-time only. Declarative partitioning's pruning works at both plan time and run time, a more powerful mechanism.)
5. Rolling-window operation: the main use for time series
The true value of RANGE partitioning is in operation. A rolling window of "keep only the latest N months and discard old months" is done with attaching/detaching partitions, not DELETE.
-- 古い月をまるごと削除(数百万行でも一瞬。DELETE のような VACUUM 負荷なし)
DROP TABLE measurement_y2025m06;
-- 捨てずにアーカイブしたいなら、独立テーブルとして切り離す
ALTER TABLE measurement DETACH PARTITION measurement_y2025m06;
-- DETACH CONCURRENTLY は親への重いロックを避ける(トランザクション外・DEFAULTパーティションがあると不可)
Add a new month's partition safely
Each month, prepare next month's partition in advance. The basis is creating an empty table with CREATE TABLE ... PARTITION OF (no scan needed since it's empty). If you ATTACH an existing table, attaching a matching CHECK constraint in advance avoids the full-table scan (and long lock) at ingestion, as the official docs say.
-- 既存テーブルを取り込む場合:先に CHECK を付けると ATTACH 時の全行スキャンを省ける
ALTER TABLE measurement_new
ADD CONSTRAINT ck CHECK (logdate >= '2026-07-01' AND logdate < '2026-08-01');
ALTER TABLE measurement ATTACH PARTITION measurement_new
FOR VALUES FROM ('2026-07-01') TO ('2026-08-01');
ALTER TABLE measurement_new DROP CONSTRAINT ck; -- 取り込み後は冗長なので外してよい
It's robust to make the operation an idempotent function and run it monthly with cron (pg_cron, etc.).
-- 指定月のパーティションを「無ければ作る」冪等関数。二重実行しても安全
CREATE OR REPLACE FUNCTION ensure_measurement_partition(target date)
RETURNS void LANGUAGE plpgsql AS $$
DECLARE
start_d date := date_trunc('month', target)::date;
end_d date := (date_trunc('month', target) + interval '1 month')::date;
part text := format('measurement_y%sm%s', to_char(start_d,'YYYY'), to_char(start_d,'MM'));
BEGIN
IF to_regclass(part) IS NULL THEN -- 既に在れば何もしない=冪等
EXECUTE format(
'CREATE TABLE %I PARTITION OF measurement FOR VALUES FROM (%L) TO (%L)',
part, start_d, end_d);
END IF;
END;
$$;
-- 毎月、今月と来月分を確保(実行漏れに強い)
SELECT ensure_measurement_partition(current_date);
SELECT ensure_measurement_partition((current_date + interval '1 month')::date);
For observability: if next month's creation is missed, it falls into the
DEFAULTpartition (if you prepared one), or if there isn't one, the INSERT errors. Operating with monitoring of "do the next 2 months exist" and alerting if missing prevents accidents. An index automatically propagates to all partitions if created on the parent, and is inherited by partitions created later too.
6. partitionwise join / aggregate: why it's default OFF
The optimization that does joins/aggregations between partitions "per corresponding partition" is partitionwise join / aggregate. Both are default OFF.
The reason is the memory cost the official docs state plainly: enabling these means "the number of nodes limited by work_mem can increase linearly in proportion to the partition count. The total memory consumption at run time can increase greatly, and the planning itself becomes substantially more expensive in CPU and memory."
-- パーティション数が少なく、パーティション同士の結合/集約が重いワークロードでのみ検討
SET enable_partitionwise_join = on;
SET enable_partitionwise_aggregate = on;
It's not "always ON if it's faster." Since many partitions can be counterproductive,
EXPLAIN ANALYZEon the target workload and, if needed, enable it per session / specific query.
7. PostgreSQL 18 improvements
PG18 practically improved partitioning's operation and planning (official release notes).
- Made query planning of many partitions efficient — planning for designs with many partitions got faster.
- Expanded the cases partitionwise join can apply and reduced memory use — the §6 trade-off is eased.
- Improved cost estimation for partition queries.
- VACUUM/ANALYZE can process only the parent partitioned table without processing children.
- Allows
NOT VALIDforeign-key constraints on a partitioned table (staged constraint addition becomes possible). - Allows a non-btree unique index for the partition key.
- Unlogged partitioned tables are disallowed (for consistency).
From an upgrade standpoint, the more a design takes many partitions (multi-tenant, etc.), the more it benefits from PG18's planning efficiency.
8. Introduction-decision checklist
- Is the scale enough? Does the table exceed physical memory / is there a clear discard pattern (time series)? If not, indexes and VACUUM come first.
- Is the partition key decided? Can most queries'
WHEREnarrow by that key (the premise of pruning)? - Can you tolerate the primary-key constraint? Is a design that includes the partition-key column in the primary key/unique acceptable?
- Is the method appropriate? Time series = RANGE, explicit division = LIST, even distribution = HASH.
- Did you automate operation? Pre-create next month's, DETACH/DROP old months, with an idempotent function + scheduler.
- Did you confirm pruning?
EXPLAINa representative query and check that only the necessary partitions are read.
9. Conclusion
- Partitioning is worthwhile only with "a scale exceeding physical memory" + "a clear discard pattern." For small tables, indexes and VACUUM first.
- RANGE/LIST/HASH. RANGE's boundary is lower inclusive, upper exclusive. Old data is instant with
DROP/DETACH PARTITION(no VACUUM load either). - The primary key includes the partition key — this is the biggest design constraint. Always confirm before introducing.
- Pruning (default ON) is the source of performance. Narrow
WHEREby the partition key and confirm the effect withEXPLAIN. - The rolling window is automated with an idempotent function + scheduler. ATTACH avoids a full-table scan with a prior CHECK.
- partitionwise join/agg is default OFF (memory cost). PG18 improves planning of many partitions, partitionwise join, and cost estimation.
This completes one round of the PostgreSQL core performance-design series. Starting from the performance overview, combine indexes, EXPLAIN, MVCC/VACUUM, and JSONB to design a PostgreSQL that meets production demands without adding dedicated DBs.