# PostgreSQL zero-downtime schema change (lock-safe DDL, lock_timeout, NOT VALID→VALIDATE, CONCURRENTLY, v18-ready)

> A practical guide to changing the schema of production PostgreSQL with no downtime. It explains, in real code faithful to the official documentation: the mechanism by which an ACCESS EXCLUSIVE lock stops every SELECT, how not to stop the world with lock_timeout, adding columns (PG11+'s fast path), the two-stage NOT VALID→VALIDATE for NOT NULL/constraints, CONCURRENTLY indexes, the shadow-column strategy for type changes, a lock-level cheat sheet, and PostgreSQL 18's NOT NULL NOT VALID.

- Published: 2026-06-21
- Author: 友田 陽大
- Tags: PostgreSQL, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/postgresql-zero-downtime-schema-migration-lock-safe-ddl-guide
- Category: PostgreSQL operations & reliability
- Pillar guide: https://tomodahinata.com/en/blog/postgresql-production-operations-guide

## Key points

- Many ALTER TABLEs take an ACCESS EXCLUSIVE lock and block even SELECT. If they enter a lock wait behind a long-running query, all new queries pile up behind them and it effectively becomes a total outage.
- The starting point of the countermeasure is lock_timeout: fail the migration immediately with a short timeout, and retry. Make it give up before stopping the world.
- Adding a column with a constant default needs no rewrite on PG11+ = fast. Avoid a volatile default, a stored generated column, or a type change, which rewrite the whole table (a virtual generated column needs no rewrite).
- Constraints are two-stage: add immediately with NOT VALID → verify with VALIDATE CONSTRAINT under a weak lock (SHARE UPDATE EXCLUSIVE). ADD FOREIGN KEY is SHARE ROW EXCLUSIVE, not ACCESS EXCLUSIVE. Indexes are CONCURRENTLY.
- PostgreSQL 18 lets you attach NOT VALID to a NOT NULL constraint, making a no-downtime NOT NULL addition straightforward. It also adds NOT ENFORCED constraints and NOT VALID foreign keys on partitioned tables.

---

A migration of "just adding one column" took production fully down — if you operate PostgreSQL, it's a story you'll surely hear someday. The cause is **locks.** Many `ALTER TABLE`s take a powerful lock, and moreover, **when made to wait behind a long-running query, they drag all queries behind them to a halt.**

This article is a practical playbook for **changing the schema with no downtime** in production. It explains, faithful to the official documentation, from the lock mechanism through safe procedures per operation to the `lock_timeout` safety device. If a client is judging "is it safe to entrust migrations to this person," this is the watershed. It's a deep dive on [§4 of the production-operations guide](/blog/postgresql-production-operations-guide).

> **Rules for this article**: lock levels, the behavior of each DDL, and PostgreSQL 18's new features are based on the **PostgreSQL 18 official documentation (as of June 2026).** The failure mechanism of "a lock wait stops all subsequent queries" is a **derivation** from the lock-conflict table and the 'wait indefinitely' spec (I note that there isn't a single sentence about it in the official docs).

---

## 1. The true identity of the danger: ACCESS EXCLUSIVE stops even SELECT

PostgreSQL's table locks have 8 levels, and the strongest is **`ACCESS EXCLUSIVE`.** Official:

> ACCESS EXCLUSIVE **conflicts with locks of all modes** (ACCESS SHARE … ACCESS EXCLUSIVE all). This mode guarantees that the holder is **the only transaction accessing the table in any way.**

And the decisive sentence:

> **The only lock that blocks a SELECT (without FOR UPDATE/SHARE) is the ACCESS EXCLUSIVE lock.**

`ALTER TABLE` officially "**acquires ACCESS EXCLUSIVE unless stated otherwise.**" In other words, many schema changes **can stop even reads** while running.

### Why it develops into a "total outage"

This is the essence. When `ALTER TABLE` requests a lock, if there's a transaction `SELECT`ing the table for a long time, the `ALTER` **can't take the lock and waits** (official: "**waits indefinitely** until the conflicting lock is released").

The problem is **behind that waiting `ALTER`.** A newly arriving `SELECT` conflicts with the `ACCESS EXCLUSIVE` the `ALTER` is requesting, so it's **made to wait behind the `ALTER`.** As a result, just one long-running query + one `ALTER` **dams up all new queries** — this is the true identity of "total outage from adding a column."

> * The behavior of "subsequent ones pile up in a wait queue" is a **derivation** from the lock-conflict table and the 'wait indefinitely' spec (there's no direct sentence in the official docs). But it's a reality repeatedly observed in real operations and serves as the basis for the countermeasure.

---

## 2. The starting safety device: lock_timeout

The first step of the countermeasure is to **make the migration "give up before stopping the world."** Use `lock_timeout`. Official:

> **Aborts a statement** that exceeds the specified time while waiting to acquire a lock. … Zero (default) disables the timeout. … Setting it in `postgresql.conf` is **not recommended** (because it affects all sessions).

→ Set a short `lock_timeout` **only when running the migration**, fail immediately if it can't take the lock, and **retry.** This prevents "waiting behind a long-running query and going to total outage."

```sql
-- マイグレーションの定石：短いロックタイムアウトで即失敗 → 後でリトライ
SET lock_timeout = '3s';
SET statement_timeout = '0';      -- 文自体の実行は別管理（lock_timeout は「待ち」だけ）
ALTER TABLE orders ADD COLUMN memo text;
```

```ts
// マイグレーションランナー側：lock_timeout で失敗したら指数バックオフでリトライ
// （混雑の谷を狙って、短いロック窓を捕まえる）
const LOCK_NOT_AVAILABLE = "55P03"; // lock_not_available

export async function runWithLockRetry(sql: string, maxAttempts = 10) {
  for (let attempt = 1; ; attempt++) {
    const tx = await pool.connect();
    try {
      await tx.query("BEGIN");
      await tx.query("SET LOCAL lock_timeout = '3s'");
      await tx.query(sql);                 // 強いロックを要するDDL
      await tx.query("COMMIT");
      return;
    } catch (e: unknown) {
      await tx.query("ROLLBACK").catch(() => {});
      if ((e as { code?: string }).code === LOCK_NOT_AVAILABLE && attempt < maxAttempts) {
        await new Promise((r) => setTimeout(r, Math.min(2 ** attempt * 100, 30_000)));
        continue;                          // ロックが空くタイミングを待って再挑戦
      }
      throw e;
    } finally {
      tx.release();
    }
  }
}
```

> Another preventive measure: **don't leave long-running transactions hanging.** If an `idle in transaction` session keeps holding a lock, the DDL never succeeds even with `lock_timeout` ([MVCC/VACUUM article §8](/blog/postgresql-mvcc-transaction-isolation-vacuum-autovacuum-guide)).

---

## 3. Safe recipes per operation

### 3.1 Adding a column: a constant default is fast, volatile is dangerous

Official: "With a **non-volatile DEFAULT**, `ADD COLUMN` evaluates the default value at statement-execution time and **stores it in the table's metadata.** … It's returned when an existing row is accessed. **No table rewrite is needed**, and the ALTER is very fast even on a huge table."

Conversely, cases where **a rewrite occurs** (official): "Adding a **volatile DEFAULT** (e.g., `clock_timestamp()`), a **stored generated column**, an **identity column**, or a constrained domain type **fully rewrites the table and indexes.** **Adding a virtual generated column needs no rewrite.**"

```sql
-- ✅ 高速：定数デフォルト（書き換えなし・PG11以降）
ALTER TABLE users ADD COLUMN plan text NOT NULL DEFAULT 'free';

-- ❌ 危険：volatile デフォルトは全行書き換え（巨大表で長時間ロック）
-- ALTER TABLE users ADD COLUMN created_at timestamptz DEFAULT clock_timestamp();
-- → 代わりに「null許容で追加 → バッチで埋める → 後でNOT NULL」（§3.2）
```

### 3.2 Adding NOT NULL: avoid the full-table scan

`SET NOT NULL` **scans the whole table** to confirm there are no NULLs (locking during that time). The official docs state the workaround clearly: "If a **valid CHECK constraint** exists that proves 'no NULLs exist,' **the table scan is skipped.**"

PostgreSQL **18** adds an even more straightforward path — **`NOT VALID` can be attached to a `NOT NULL` constraint.**

```sql
-- 旧来の道：CHECK を NOT VALID で足し、VALIDATE してから SET NOT NULL（スキャンを分割）
ALTER TABLE orders ADD CONSTRAINT orders_amount_nn CHECK (amount IS NOT NULL) NOT VALID;
ALTER TABLE orders VALIDATE CONSTRAINT orders_amount_nn;   -- 弱いロックで検証
ALTER TABLE orders ALTER COLUMN amount SET NOT NULL;       -- 検証済みCHECKによりスキャン省略

-- PG18の道：NOT NULL を NOT VALID で追加 → VALIDATE（既存行の検証を切り離せる）
-- ALTER TABLE orders ALTER COLUMN amount SET NOT NULL NOT VALID;
-- ALTER TABLE orders VALIDATE CONSTRAINT ...;
```

### 3.3 Constraints and foreign keys safely: NOT VALID → VALIDATE

When adding a constraint to a large table, locking throughout the verification scan is dangerous. The two stages the official docs provide:

> Attach `NOT VALID` and `ADD CONSTRAINT` can **commit immediately without scanning the table.** Then verify existing rows with `VALIDATE CONSTRAINT`. **The verification doesn't lock out concurrent updates** — `VALIDATE` takes **only a SHARE UPDATE EXCLUSIVE lock.**

```sql
-- 1) 即時：新規行にだけ制約を効かせる（既存行は未検証＝スキャンなし）
ALTER TABLE orders
  ADD CONSTRAINT orders_amount_positive CHECK (amount >= 0) NOT VALID;

-- 2) 後で：既存行を検証（書き込みを止めずに）
ALTER TABLE orders VALIDATE CONSTRAINT orders_amount_positive;
```

Foreign keys too. And, official: "Most `ADD table_constraint`s require ACCESS EXCLUSIVE, but **`ADD FOREIGN KEY` gets by with SHARE ROW EXCLUSIVE.**" Even so, the two stages `NOT VALID` → `VALIDATE` are safe.

```sql
ALTER TABLE orders
  ADD CONSTRAINT orders_user_fk FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;
ALTER TABLE orders VALIDATE CONSTRAINT orders_user_fk;
```

### 3.4 Indexes: don't stop writes with CONCURRENTLY

```sql
-- 書き込みを止めずに索引を作る（[索引設計の記事]も参照）
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
```

Cautions (official, see also [index-design article §6](/blog/postgresql-index-design-btree-gin-gist-brin-covering-guide)): slow due to two scans, **cannot run inside a transaction block**, and on failure **an INVALID index remains** (`DROP INDEX CONCURRENTLY` and re-run).

### 3.5 Type change: the shadow-column strategy

`ALTER COLUMN ... TYPE` officially "usually **fully rewrites the table and indexes**" (the exception is text↔varchar). It's dangerous on a huge table, so **add a new column and migrate in stages.**

```sql
-- 1) 新型の列を追加（高速・nullで）
ALTER TABLE events ADD COLUMN payload_jsonb jsonb;
-- 2) アプリを「両方書く（dual-write）」に。既存はバッチで少しずつ変換（lock_timeout＋小バッチ）
--    UPDATE events SET payload_jsonb = payload_text::jsonb WHERE id BETWEEN $1 AND $2;
-- 3) 切替：読みを新列へ。検証後、旧列を DROP
ALTER TABLE events DROP COLUMN payload_text;
```

> This "**add a new column → dual-write → batch conversion → switch → drop the old column**" is **a general-purpose pattern effective for all heavy changes** like type changes, column renames, and normalization. From the MVCC standpoint too, rewriting DDL (`TRUNCATE` or `ALTER` that involves a rewrite) is **not MVCC-safe** (official), so it's worth avoiding it and migrating in stages.

---

## 4. Lock-level cheat sheet

Knowing "which lock this operation takes" lets you avoid danger (official, excerpt).

| Lock | Main operations | Concurrent read | Concurrent write |
| --- | --- | --- | --- |
| **ACCESS EXCLUSIVE** | `DROP TABLE`, `TRUNCATE`, `ALTER COLUMN TYPE`, many `ALTER TABLE`s | ❌ | ❌ |
| **SHARE** (standard index creation) | `CREATE INDEX` (without CONCURRENTLY) | ✅ | ❌ |
| **SHARE ROW EXCLUSIVE** | `ADD FOREIGN KEY`, `CREATE TRIGGER` | ✅ | ❌ |
| **SHARE UPDATE EXCLUSIVE** | `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, `VALIDATE CONSTRAINT`, `SET STATISTICS`, `DETACH PARTITION CONCURRENTLY` | ✅ | ✅ |

**The aim is the bottom-right** — leaning toward `SHARE UPDATE EXCLUSIVE` (stops neither reads nor writes). `CONCURRENTLY` and `NOT VALID`→`VALIDATE` are the tools to **replace** a strong lock with a weak lock.

---

## 5. PostgreSQL 18's no-downtime DDL enhancements

PG18 practically improved online DDL (official release notes).

- **`NOT VALID` can be attached to a `NOT NULL` constraint** (§3.2) — a no-downtime NOT NULL addition becomes straightforward.
- **`NOT NULL` is stored in `pg_constraint`** — NOT NULL constraints can be named, and can also be attached to foreign tables.
- **`NOT VALID` foreign keys are allowed on partitioned tables** — staged constraint addition becomes possible.
- **`NOT ENFORCED` constraints** (a CHECK/foreign key that's "defined but not enforced") — flexibility during a transition period.
- **Virtual generated columns by default** — computed at read time, so **adding one needs no rewrite** (stored needs a rewrite).

---

## 6. Migration checklist

When reviewing a production migration, answer these questions.

1. **Did you set `lock_timeout`?** (fail immediately + retry if it can't take the lock)
2. **Does a table rewrite occur?** (avoid volatile defaults, type changes, stored generated columns)
3. **Did you split the full-table scan?** (`NOT VALID`→`VALIDATE`, NOT NULL via CHECK)
4. **Is the index `CONCURRENTLY`?** (outside a transaction, and cleaning up the INVALID on failure)
5. **Did you stage heavy changes with shadow columns?**
6. **Is a long-running transaction (`idle in transaction`) running?**
7. **Is it one logical change per migration?** (a huge bundle of DDLs has high risk on failure)

---

## 7. Conclusion

- **Many `ALTER TABLE`s are ACCESS EXCLUSIVE** = stop even SELECT. Waiting behind a long-running query develops into a **total outage.**
- **`lock_timeout` + retry** to "make it give up before stopping the world."
- **Add columns with a constant default** (fast), and avoid volatile/type changes (rewrites).
- **Constraints are `NOT VALID`→`VALIDATE`, indexes are `CONCURRENTLY`** — replace a strong lock with a weak lock.
- **Stage heavy changes with shadow columns** (add → dual-write → batch conversion → switch → drop).
- **PG18 makes no-downtime DDL even more straightforward** with `NOT NULL ... NOT VALID`, etc.

Once "don't stop on changes" is solid, the remaining corner of operations is "**protect**" — go to [security hardening (roles, privileges, TLS, SCRAM)](/blog/postgresql-security-hardening-roles-privileges-ssl-scram-guide).

---

### References (PostgreSQL 18 official documentation)

- [ALTER TABLE (lock levels, NOT VALID, ADD COLUMN behavior)](https://www.postgresql.org/docs/18/sql-altertable.html)
- [13.3. Explicit Locking (table lock modes)](https://www.postgresql.org/docs/18/explicit-locking.html)
- [19.11. Client Connection Defaults (lock_timeout)](https://www.postgresql.org/docs/18/runtime-config-client.html)
- [CREATE INDEX (CONCURRENTLY)](https://www.postgresql.org/docs/18/sql-createindex.html)
- [E.4. Release 18 (NOT NULL NOT VALID, etc.)](https://www.postgresql.org/docs/18/release-18.html)
