# DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

> An explanation of the capacity design that decides DynamoDB's pricing and performance, faithful to the AWS official spec. From the break-even of on-demand vs. provisioned, the correct counting of RCU/WCU, the 3000/1000 partition limit and hot-key avoidance, warm throughput, Auto Scaling, to cost optimization via TTL and table classes — summarized from a production viewpoint with real Terraform / AWS SDK v3 code.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: AWS, DynamoDB, アーキテクチャ設計, サーバーレス, Terraform, TypeScript, コスト最適化
- URL: https://tomodahinata.com/en/blog/dynamodb-capacity-cost-performance-on-demand-vs-provisioned-guide
- Category: DynamoDB
- Pillar guide: https://tomodahinata.com/en/blog/dynamodb-single-table-design-reliability-idempotency-patterns

## Key points

- There are only 2 billing modes. Spiky/unpredictable → on-demand, steady/predictable → provisioned + reserved. Mutually changeable later (provisioned→on-demand up to 4 times in 24 hours, the reverse anytime)
- On-demand's list price is about 3.46× provisioned (at 100%-utilization equivalent). The break-even is roughly 30% sustained utilization, and reserved capacity (up to 54% off for 1 year / 77% for 3 years) lowers the break-even further
- One partition's limit is 3,000 read / 1,000 write units per second. A skew exceeding this is a hot partition. Distribute with a high-cardinality key design and write sharding. Adaptive Capacity is automatic and free on all tables, but can't exceed the limit
- Cost is decided by 'how you count consumption.' UpdateItem charges by the larger of before/after, a conditional failure still charges the write, and FilterExpression/Scan charge by what you read. Measure with ReturnConsumedCapacity, and trim with ProjectionExpression and Query
- TTL deletion consumes no WCU and is free. Pre-warm with warm throughput before a launch. Auto Scaling's standard is a 70% target utilization but lags spikes by minutes — absorb them with burst and on-demand

---

What you first stumble on with DynamoDB isn't "correctness." It's **"pricing" and "clogging (throttling)."**

The data model was designed cleanly. You put in idempotency and conditional writes too. Yet in production — the bill swells to 3× the estimate, or `ProvisionedThroughputExceededException` erupts the moment a sale starts. This is not a code bug but a **capacity-design problem.** DynamoDB's pricing and performance are mostly decided by 3 choices for a table: "billing mode, capacity, key design."

This article systematizes only DynamoDB's **capacity, cost, and performance design**, based on my experience running an **AWS-serverless (Lambda + DynamoDB) multi-tenant payment platform** in production. The design of "correctness" like data modeling, idempotency, and transactions I leave to the sister article [DynamoDB Single-Table Design & Production Reliability Patterns Complete Guide](/blog/dynamodb-single-table-design-reliability-idempotency-patterns). Complementary to it, this article narrows to **"how much it costs," "where it clogs," and "how to make it fast and cheap."**

All numbers and limits are checked against the **AWS official documentation (as of June 2026).** Pricing varies by region and time, so always confirm the final amount on the [official pricing page](https://aws.amazon.com/dynamodb/pricing/). All prices in the body are **US East (N. Virginia) / standard table class / as of June 2026.**

---

## 1. There Are Only 2 Billing Modes: On-Demand vs. Provisioned

The starting point of DynamoDB's pricing system is the **throughput mode** you choose per table. This decides "how you're charged" and "how it auto-scales" at once.

| Viewpoint | On-Demand | Provisioned |
| --- | --- | --- |
| Billing unit | Actual requests (RRU / WRU) | Reserved capacity (RCU / WCU, hourly billing) |
| Billing concept | Pay for what you used (¥0 at zero traffic) | Charged for the reserved amount even unused |
| Scaling | Fully automatic. Up to 2× the prior peak instantly | Manual or Auto Scaling |
| Capacity planning | Unneeded | Needed (prediction is the premise) |
| Suited load | Spike, unpredictable, new, dev environment | Steady, predictable, high utilization |
| Unit price (per same consumption) | High (about 3.46× provisioned at 100% equivalent) | Cheap (if you can maintain high utilization) |

The official docs state plainly. **On-demand is "the default and recommended."**

> On-demand mode is the default and recommended throughput option for most DynamoDB workloads.

At the stage of "just get it running" or "traffic is unreadable," go on-demand without hesitation. Optimize after actual measurements accumulate — that's the correct order. Premature optimization (a guesswork capacity setting for provisioned) invites either cost increase or throttling.

### The Essence of On-Demand: "Up to 2× the Prior Peak, Instantly"

On-demand isn't magic. Scaling has clear rules.

- **A new table's initial throughput**: right after creation, it can immediately handle **4,000 writes / 12,000 reads per second.**
- **Up to 2× the prior peak, instantly**: it can instantly produce, anytime, up to 2× the peak of traffic previously reached. For example, if the peak is 50k reads/sec, up to 100k instantly. Produce 100k and that becomes the new peak; next you can grow to 200k.
- **The 30-minute rule for a surge beyond 2×**: try to exceed 2× the prior peak **within 30 minutes** and it can throttle. The official docs state "spread traffic increases over 30 minutes, or pre-warm."

So even with on-demand, **events that surge 10×・100× at once** like a sale or launch need care. The countermeasure is the [warm throughput](#5-surviving-a-launch-with-warm-throughput) described later.

Note that on-demand also has a **default per-table quota (40,000 RRU / 40,000 WRU per second).** This is a runaway-prevention guardrail and can be raised by request (on-demand has no per-account throughput quota).

### The Essence of Provisioned: "Hourly Billing on Reserved Capacity"

Provisioned **reserves the read (RCU) / write (WCU) capacity per second yourself**, and is hourly-billed on that reserved amount. **You're charged even if you don't use it up** — the decisive difference from on-demand. In exchange, the unit price is cheap, and you can rate-limit the request rate at the ceiling, so cost predictability is high.

- Default quotas: per-table 40,000 RCU / 40,000 WCU, per-account 80,000 RCU / 80,000 WCU (all increasable by request). The minimum is 1 RCU / 1 WCU.
- **Capacity decreases have a count limit**: a day starts with 4 "decrease slots," recovering 1 slot per hour (up to 4 slots). You can decrease up to 27 times in 24 hours. **Increases are unlimited.**

This asymmetry of "decreases up to 27 times a day" is also the reason the Auto Scaling described later is cautious about scaling down.

### The Mode Is Mutually Changeable Later (with a Count Limit)

Even if you guess wrong, you can redo it.

- **Provisioned → on-demand**: **up to 4 times** in a 24-hour rolling window.
- **On-demand → provisioned**: **anytime.**

Understanding the behavior at switch is safe too. Switch provisioned→on-demand for the first time, and the table is scaled to a state able to immediately produce **at least 4,000 writes / 12,000 reads per second** (or that value if you'd reserved more in the past). In the reverse direction, it's served at a throughput **matched to the on-demand-time prior peak**, so when reverting, set the initial provisioned value high to absorb the migration.

> **A practical guideline**: start new / PoC / dev environments on on-demand. Observe `ConsumedReadCapacityUnits` / `ConsumedWriteCapacityUnits` in CloudWatch for several weeks to a month in production, and lean only the tables found to be **steadily high-utilization** toward provisioned + reserved. The criterion is the next chapter's break-even.

---

## 2. Master the "Counting" of Capacity and You Master Cost

For both on-demand and provisioned, the base of billing is the same **capacity-unit consumption.** Unable to count this accurately, you can't predict pricing or throttling. The official definition is simple.

**Reads (1 unit = for an item up to 4KB)**

| Read type | Units consumed (up to 4KB) |
| --- | --- |
| Eventually consistent (default) | 0.5 |
| Strongly consistent | 1 |
| Transactional (TransactGetItems) | 2 |

**Writes (1 unit = for an item up to 1KB)**

| Write type | Units consumed (up to 1KB) |
| --- | --- |
| Normal (Put/Update/Delete) | 1 |
| Transactional (TransactWriteItems) | 2 |

Sizes are **rounded up in 4KB units for reads and 1KB units for writes.** Read a 3.5KB item and it's treated as 4KB, 10KB as 12KB. Write 500 bytes and it consumes 1KB's worth.

### The 4 "Counting Traps" That Swell the Bill

Read the official spec closely and you see billing points easy to miss at first glance. **These aren't bugs but the spec**, and unknown they silently erode cost.

1. **`UpdateItem` is charged by "the larger of before-update and after-update."** Rewrite just one attribute and it's charged by the whole item's size. Frequent partial updates of a huge item are high-cost.
2. **A conditional write consumes write capacity even on failure.** Even if `ConditionExpression` is `false`, WCU for the target item's size is charged. Design retries on the premise that an idempotency-check whiff is paid too.
3. **`FilterExpression` only narrows "after reading." Billing is by what you read.** Even if the filter results in 0 items, it consumes the read units for all items scanned/queried. The filter is not a saving measure.
4. **`Scan` is charged by "the size evaluated," not "the size returned."** A full-table scan charges by reading the whole table even if the return is 1 item. **A Scan on a production hot path is forbidden in principle.**

> A supplement: `Query` sums multiple items of the same partition key as one read and rounds up in 4KB units. For example, a query of 64 bytes × 1,500 items is a total of 96KB = 24 read units (12 if eventually consistent). You see that "a little at a time, in bulk" is surprisingly expensive.

### Don't Count, Measure: `ReturnConsumedCapacity`

Rather than counting in theory, **measurement** is accurate. With AWS SDK for JavaScript v3, just attach `ReturnConsumedCapacity` to a request and it returns the units that operation consumed. Always start cost optimization here.

```ts
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, QueryCommand } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));

/** あるアクセスパターンが実際に何ユニット消費するかを本番相当データで実測する。 */
export async function measureQueryCost(pk: string): Promise<number> {
  const res = await ddb.send(
    new QueryCommand({
      TableName: "AppTable",
      KeyConditionExpression: "PK = :pk",
      ExpressionAttributeValues: { ":pk": pk },
      // ← これを付けるだけで消費キャパシティが返る（"TOTAL" | "INDEXES" | "NONE"）
      ReturnConsumedCapacity: "TOTAL",
      // 取得属性を絞ると、転送量は減るが「読み取りユニットは変わらない」点に注意。
      // 読み取りコストを下げる本丸は、アイテムを小さく保つこと。
      ProjectionExpression: "PK, SK, amount, #s",
      ExpressionAttributeNames: { "#s": "status" },
    }),
  );
  return res.ConsumedCapacity?.CapacityUnits ?? 0;
}
```

An important pitfall here: **narrowing attributes with `ProjectionExpression` does not reduce read units** (official: specifying a subset of attributes to retrieve doesn't affect item-size calculation). Network transfer and bandwidth decrease, but **the means to lower read units — the heart of billing — is "keeping the item itself small."** Offloading a big BLOB to S3 and placing only a reference (the S3 key) in DynamoDB becomes the standard.

---

## 3. Cost Design: The Break-Even of On-Demand vs. Provisioned

This is the core of the article. To "which is cheaper after all?" let me answer quantitatively from list prices.

**Pricing (US East / standard table class / June 2026)**

| Item | On-Demand | Provisioned |
| --- | --- | --- |
| Write | $0.625 / 1M WRU | $0.00065 / WCU·hour |
| Read | $0.125 / 1M RRU | $0.00013 / RCU·hour |
| Data storage (Standard) | $0.25 / GB·month (first 25GB free tier) | Same |
| Data storage (Standard-IA) | $0.10 / GB·month | Same |
| Free tier | — | 25 RCU + 25 WCU + 25 GB |
| Reserved capacity | Not supported | Up to 54% off for 1 year / 77% for 3 years |

### The Unit Price Is "About 3.46×," the Break-Even Is "Roughly 30% Utilization"

Let me convert provisioned's unit price to the same "per 1M consumed units" as on-demand. Use up 1 WCU at 100% and you can handle `3,600 × 730 hours = 2,628,000` writes/month.

- Provisioned (100% utilization): `$0.00065 × 730 ÷ 2.628 ≈ $0.18 / 1M writes`
- On-demand: `$0.625 / 1M writes`

**That is, on-demand's list price is about 3.46× provisioned at 100% utilization.** Reads are the same ratio. Conversely, **provisioned wins only "when you can use up the reserved capacity at high utilization."** The break-even utilization is `1 ÷ 3.46 ≈ 29%`.

Keep this calculation checkable in code and the per-table judgment is done in a moment.

```ts
// US East / 標準テーブルクラス / 2026-06 時点。必ず公式料金ページで再確認すること。
const ON_DEMAND_PER_MILLION = { read: 0.125, write: 0.625 } as const; // RRU / WRU
const PROVISIONED_PER_UNIT_HOUR = { read: 0.00013, write: 0.00065 } as const; // RCU / WCU
const HOURS_PER_MONTH = 730;
const SECONDS_PER_HOUR = 3600;

type Kind = "read" | "write";

/**
 * 持続稼働率 utilization（0–1）における「100万消費ユニットあたり」コストを比較する。
 * 核心：オンデマンドは消費した分だけ、プロビジョンドは「確保した容量」に課金される。
 * よって稼働率が低いほどプロビジョンドは割高になる。
 */
export function comparePerMillion(kind: Kind, utilization: number) {
  if (utilization <= 0 || utilization > 1) {
    throw new RangeError("utilization は 0 より大きく 1 以下で指定する");
  }
  const onDemand = ON_DEMAND_PER_MILLION[kind];
  const unitsPerCapacityPerMonth = SECONDS_PER_HOUR * HOURS_PER_MONTH; // 100%稼働時 = 2,628,000
  const provisionedMonthlyPerUnit = PROVISIONED_PER_UNIT_HOUR[kind] * HOURS_PER_MONTH;
  const provisioned =
    (provisionedMonthlyPerUnit / (unitsPerCapacityPerMonth * utilization)) * 1_000_000;
  return {
    onDemand: Number(onDemand.toFixed(4)),
    provisioned: Number(provisioned.toFixed(4)),
    cheaper: provisioned < onDemand ? ("provisioned" as const) : ("on-demand" as const),
  };
}

comparePerMillion("write", 0.29); // ≈ { onDemand: 0.625, provisioned: 0.6226, cheaper: "provisioned" }
comparePerMillion("write", 0.7); //  ≈ { onDemand: 0.625, provisioned: 0.2579, cheaper: "provisioned" }
comparePerMillion("write", 0.1); //  ≈ { onDemand: 0.625, provisioned: 1.8062, cheaper: "on-demand" }
```

### "100% Utilization" Can't Be Made in Reality — the Practical Break-Even Is Higher

Many explanations stop here, but **using provisioned at 100% is impossible in reality.** To avoid throttling and leave headroom for bursts, the standard is to put the Auto Scaling **target utilization at 70%.** Then the effective unit price is `$0.18 ÷ 0.70 ≈ $0.26 / 1M`, only about **2.4× cheaper** than on-demand. And this 2.4× presupposes that **traffic is smooth and Auto Scaling can follow.**

Organized, the judgment becomes this.

- **Heavily spiky / unpredictable / new / low utilization (~30%)** → **on-demand.** The unit price is high, but the waste of reservation, throttling, and operational load vanish. In many cases this ends up cheaper.
- **Steady / predictable / high utilization (can maintain around 70%)** → **provisioned + Auto Scaling.** It shows its true value under a smooth load.
- **A non-moving baseline load is always there** → buy just that base with **reserved capacity** (up to 54% off for 1 year / 77% for 3 years), and carve the variable portion into an on-demand table — this **hybrid** tends to be cheapest.

> **A guardrail against cost runaway**: you want the peace of mind of staying on-demand but fear an open ceiling — then use **on-demand's maximum throughput** (set max RRU/WRU per table/GSI). Requests exceeding the set value are throttled, **structurally preventing a billing explosion from an accident or bug.** Set it in the Terraform later.

---

## 4. The True Nature of Performance: Partitions and Hot Keys

Most throttling happens not from "the whole table's capacity shortage" but from **concentration onto one partition (a hot partition).** This is the heart of DynamoDB performance design.

### One Partition's Limit Is "3,000 Reads / 1,000 Writes"

The official iron rule.

> Every partition in a DynamoDB table is designed to deliver a maximum capacity of 3,000 read units per second and 1,000 write units per second.

Even if the whole table has ample capacity, **if access is skewed to a specific partition key, it throttles at that one partition's limit.** And item size matters. A 20KB item consumes 5 units per strongly-consistent read, so that key reaches the partition limit at **600 times per second.**

### The 2 Layers DynamoDB Helps with Automatically: Burst and Adaptive Capacity

Before discussing design, correctly understand the mechanisms AWS absorbs automatically.

- **Burst capacity**: accumulates unused capacity for **up to 5 minutes (300 seconds)** and consumes it for a sudden spike. Short peaks are absorbed by this (so Auto Scaling needn't respond instantly to short-time spikes).
- **Adaptive Capacity**: **automatic, free, always-on on all tables.** Detects skewed access and automatically leans throughput to a hot partition. Further, it **isolates frequently-accessed items into a separate partition**, and in the extreme has a single popular item monopolize one partition, supplying it **up to the partition limit (3,000 RCU / 1,000 WCU).**

What matters is that **Adaptive Capacity works for both on-demand and provisioned**, but **can't exceed the partition limit (3,000/1,000) or the table's total capacity.** That is, continue "writes exceeding 1,000 per second to a single key" and it clogs in any mode. Without relying entirely on the automatic mechanisms, you need to **distribute by design.**

### The Design Principle: A High-Cardinality Key and "Write Sharding"

The official first principle is "design so access is uniform across all partition keys." Concretely, choose a key with **high cardinality (variety of values) and access spread evenly.** `user_id` and `order_id` are good candidates; `status` (only a few kinds) and "today's date" are bad candidates.

For an unavoidable hot key (e.g. an aggregation table where writes concentrate on "today's portion" by time series), use the **write sharding** the official docs recommend — a technique of attaching a computed suffix to the key to distribute it across multiple partitions.

```ts
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));

const SHARD_COUNT = 10; // 「当日」への書き込みを 1/10 に分散。読み取りは全シャードを束ねる。

/** 決定的ハッシュで seed を 0..SHARD_COUNT-1 に写像（= 同じ seed は常に同じシャード）。 */
function shardOf(seed: string): number {
  let h = 0;
  for (let i = 0; i < seed.length; i++) h = (Math.imul(h, 31) + seed.charCodeAt(i)) >>> 0;
  return h % SHARD_COUNT;
}

/** 書き込み：日付PKにシャードサフィックスを付け、1パーティション集中を避ける。 */
export async function recordEvent(day: string, eventId: string, payload: unknown): Promise<void> {
  const pk = `EVENTS#${day}#${shardOf(eventId)}`; // 例: "EVENTS#2026-06-25#7"
  await ddb.send(
    new PutCommand({
      TableName: "AppTable",
      Item: { PK: pk, SK: `EVENT#${eventId}`, payload },
    }),
  );
}

/** 読み取り：全シャードを並列クエリ（scatter-gather）して結合する。 */
export async function listEventsForDay(day: string): Promise<unknown[]> {
  const shards = await Promise.all(
    Array.from({ length: SHARD_COUNT }, (_, shard) =>
      ddb.send(
        new QueryCommand({
          TableName: "AppTable",
          KeyConditionExpression: "PK = :pk",
          ExpressionAttributeValues: { ":pk": `EVENTS#${day}#${shard}` },
        }),
      ),
    ),
  );
  return shards.flatMap((r) => r.Items ?? []);
}
```

The trade-off is clear. **Writes distribute and stop clogging**, but **reads lick all shards so they're a bit more expensive and complex.** So apply it only to patterns like aggregation / event collection where **writes concentrate and reads are tolerable.** "Shard all tables just in case" is a YAGNI violation. For the details of key design, see the [single-table design guide](/blog/dynamodb-single-table-design-reliability-idempotency-patterns).

---

## 5. Surviving a "Launch" with Warm Throughput

The constraint touched on in [Chapter 1](#1-there-are-only-2-billing-modes-on-demand-vs-provisioned), "up to 2× the prior peak instantly, beyond that over 30 minutes." If the peak is **known in advance** like a sale, new-product launch, or TV exposure, you can pre-heat with **warm throughput.**

> **Warm throughput** is the amount of read/write a table/GSI **can immediately handle at this very moment.** It's on all tables by default (free) and auto-rises per past usage. **Raising it in advance (pre-warming)** lets you receive a surge without throttling from the moment of the spike.

Organizing the points.

- Without changing the billing mode, you can raise the warm throughput of **either/both read and write.**
- Possible on both existing and new tables. With Global Tables (2019.11.21), it auto-applies to all replicas.
- **A once-raised value can't be lowered.** The pre-warming request itself is charged (the default-value state is free).

A pre-warming example with AWS SDK v3 (an operational script run the night before a launch):

```ts
import { DynamoDBClient, UpdateTableCommand } from "@aws-sdk/client-dynamodb";

const client = new DynamoDBClient({});

/**
 * 打ち上げに備えて、毎秒「50,000読み取り・20,000書き込み」を即時にさばける状態へ温める。
 * 課金モード（オンデマンド/プロビジョンド）は変更しない。値は一度上げると下げられない点に注意。
 */
export async function preWarm(tableName: string): Promise<void> {
  await client.send(
    new UpdateTableCommand({
      TableName: tableName,
      WarmThroughput: {
        ReadUnitsPerSecond: 50_000,
        WriteUnitsPerSecond: 20_000,
      },
    }),
  );
}
```

The criterion is simple. **If you know "more than 10× normal traffic comes at a specific date/time" (sale, launch, etc.), pre-warm.** A constantly-smooth service doesn't need it.

---

## 6. Auto Scaling Design: Receive a Smooth Load Cheaply

If you choose provisioned, **DynamoDB Auto Scaling (= AWS Application Auto Scaling)** is nearly mandatory. Pin capacity manually and it's either over-reservation (high cost) or shortage (throttling).

The mechanism and the official numbers:

- **Target tracking**: auto-adjusts reserved capacity with `UpdateTable` so consumed capacity approaches the **target utilization.** The target utilization can be set at **20–90%.** The **standard is 70%.**
- **Scale-up**: fires when consumption exceeds the target for **2 consecutive minutes.**
- **Scale-down**: fires when it stays below the target for **15 consecutive data points** (= cautious about lowering. Prevents the throttle of dropping capacity in a short valley and clogging on the immediately-following spike).
- **Short-time spikes** are absorbed not by a capacity change but by the table's built-in **burst capacity.**
- **A GSI is separate capacity.** If you put Auto Scaling on a table, **always put the same setting on the GSI too** (the official docs strongly recommend). Note too that a new GSI's auto-scaling doesn't work until backfill completes.

A production setup in Terraform (a table + scaling policies for both read and write):

```hcl
resource "aws_dynamodb_table" "app" {
  name         = "AppTable"
  billing_mode = "PROVISIONED"
  hash_key     = "PK"
  range_key    = "SK"

  read_capacity  = 5 # 下限。Auto Scaling が需要に応じて引き上げる
  write_capacity = 5

  attribute {
    name = "PK"
    type = "S"
  }
  attribute {
    name = "SK"
    type = "S"
  }

  # 期限切れアイテムを「無料」で自動削除（7章参照）
  ttl {
    attribute_name = "expiresAt"
    enabled        = true
  }

  point_in_time_recovery {
    enabled = true
  }
}

# --- 書き込みキャパシティの Auto Scaling ---
resource "aws_appautoscaling_target" "write" {
  service_namespace  = "dynamodb"
  resource_id        = "table/${aws_dynamodb_table.app.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  min_capacity       = 5
  max_capacity       = 4000 # コストと事故の上限。負荷予測に合わせて調整
}

resource "aws_appautoscaling_policy" "write" {
  name               = "${aws_dynamodb_table.app.name}-write-target-tracking"
  service_namespace  = aws_appautoscaling_target.write.service_namespace
  resource_id        = aws_appautoscaling_target.write.resource_id
  scalable_dimension = aws_appautoscaling_target.write.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
    target_value = 70.0 # 目標使用率70%が定石
  }
}

# 読み取りも同じ要領で target/policy を定義する（predefined_metric_type は
# "DynamoDBReadCapacityUtilization"、scalable_dimension は ...:ReadCapacityUnits）。
# GSI を持つなら GSI 用にも同じ4リソースを必ず追加する。
```

Here too, face the limit. Auto Scaling entails **a delay of several minutes via a CloudWatch alarm**, and `UpdateTable` itself takes several minutes. **It can't make a steep spike in time** — so receive short peaks with burst and unreadable peaks with on-demand, that's the right answer. "Auto Scaling is there so spikes are fine" is a misunderstanding.

---

## 7. Practical Cost-Optimization Techniques

Land the understanding so far into "effective measures," in order of effectiveness.

### (1) "Free" Auto-Deletion with TTL

DynamoDB's **TTL** is a mechanism that gives each item an expiry (**a Number attribute of Unix epoch seconds**) and **auto-deletes the expired without consuming write throughput.** The official docs state plainly.

> DynamoDB automatically deletes expired items within a few days of their expiration time, without consuming write throughput.

For **data with a lifespan** like sessions, temporary tokens, caches, and logs, **delete them with TTL (free), not `DeleteItem` (paid)** — the iron rule. It also reduces storage fees. But 2 cautions:

- **Deletion is "within a few days," not instant.** An expired-but-not-yet-deleted item can appear in reads/queries/scans, so **also exclude it on the app side with `FilterExpression`.**
- TTL deletion **flows to Streams as a service deletion** and is removed from LSI/GSI too. **In Global Tables, the origin-region deletion is WCU-free, but the replication deletion to replicas is charged replication WCU/write units.**

### (2) Make "Rarely-Read Data" Cheap with the Standard-IA Table Class

If storage is large but access is rare (audit logs, old order history, etc.), the **Standard-IA table class** lowers the storage unit price from **$0.25 → $0.10 / GB·month** (in exchange, the read/write unit price rises). It works for **storage-dominant, low-throughput tables.**

### (3) Keep Items Small (the Heart of Read Units)

As in [Chapter 2](#2-master-the-counting-of-capacity-and-you-master-cost), read cost is decided by item size. Shorten attribute names (`createdAt`→`ca`, etc.) and offload huge BLOBs to S3, holding only references. **One item is up to 400KB**, but that's an "upper bound," not a "target."

### (4) Sparse GSIs and Minimizing Projected Attributes

A GSI increases storage and write cost by the "projected attributes." Keep the index small and cheap by **projecting only the needed attributes** and **giving the GSI key only to the relevant items** (a sparse index). For the design details, to the [single-table design guide](/blog/dynamodb-single-table-design-reliability-idempotency-patterns).

### (5) Erase Scan from the Design

A `Scan` on a production hot path is the biggest anti-pattern, charged by reading the whole table. **Determine the access patterns first, and make a key design where you can get it with Query (a GSI if needed).** For analytics-purpose full-table processing, don't Scan DynamoDB directly but offload to **export to S3 → Athena/Glue** — cheaper and faster.

### (6) Backup Cost Is a Design Target Too

PITR (continuous backup) is **$0.20 / GB·month**, on-demand backup **$0.10 / GB·month**. Important tables are worth PITR, but **unconditionally on all tables** can be excessive. Select the protection targets.

---

## 8. Observability: Notice Before It Clogs

Capacity design is not "set it and forget it" but **observe and turn it.** The minimum CloudWatch metrics to watch and the alarms to ring:

| Metric | What it indicates | Action |
| --- | --- | --- |
| `ThrottledRequests` / `ReadThrottleEvents` / `WriteThrottleEvents` | Throttling occurring | **Immediate alert.** Capacity shortage or a hot key |
| `ConsumedReadCapacityUnits` / `ConsumedWriteCapacityUnits` | Actual consumption | The basis for the mode/capacity decision |
| `OnlineIndexConsumedWriteCapacity` | A GSI's consumption | Detect a GSI capacity shortage |
| `AccountProvisionedReadCapacityUtilization`, etc. | Approaching the account limit | The judgment for a quota-increase request |

A minimal alarm against `ThrottledRequests` (Terraform):

```hcl
resource "aws_cloudwatch_metric_alarm" "ddb_throttle" {
  alarm_name          = "${aws_dynamodb_table.app.name}-throttled-requests"
  namespace           = "AWS/DynamoDB"
  metric_name         = "ThrottledRequests"
  dimensions          = { TableName = aws_dynamodb_table.app.name }
  statistic           = "Sum"
  period              = 60
  evaluation_periods  = 1
  threshold           = 0
  comparison_operator = "GreaterThanThreshold"
  treat_missing_data  = "notBreaching"
  alarm_actions       = [aws_sns_topic.alerts.arn] # Slack/PagerDuty へ
}
```

When you don't know which key is clogging, enabling **CloudWatch Contributor Insights for DynamoDB** lets you identify the most-consuming partition key (= the hot key). **Ring the alert on the symptom (throttling), not the cause (code)**, and trace the culprit with Contributor Insights — this operation makes an unstoppable foundation.

---

## 9. Summary: The Production Table's IaC (On-Demand Version)

Finally, here's the "when in doubt, this" production table definition condensing the judgments so far. Start on on-demand, **guard billing runaway with the maximum throughput**, put in **TTL and PITR**, and evolve to GSIs and provisioned when needed — that's the starting point.

```hcl
resource "aws_dynamodb_table" "app" {
  name         = "AppTable"
  billing_mode = "PAY_PER_REQUEST" # オンデマンド = デフォルト推奨
  hash_key     = "PK"
  range_key    = "SK"

  attribute {
    name = "PK"
    type = "S"
  }
  attribute {
    name = "SK"
    type = "S"
  }

  # 事故・バグによる請求爆発を構造的に防ぐガードレール。
  # 想定ピークに余裕を持たせつつ、青天井にはしない。
  on_demand_throughput {
    max_read_request_units  = 50000
    max_write_request_units = 20000
  }

  # 寿命のあるデータは「無料」で自動削除（DeleteItem の WCU を払わない）
  ttl {
    attribute_name = "expiresAt"
    enabled        = true
  }

  point_in_time_recovery {
    enabled = true
  }

  # 削除保護：本番テーブルの誤削除を止める
  deletion_protection_enabled = true

  tags = {
    Environment = "production"
    CostCenter  = "platform"
  }
}
```

---

## FAQ

**Q. On-demand or provisioned, which is cheaper after all?**
A. For a steady load where you can use up the reserved capacity at **sustained high utilization (around 70% as a guide)**, provisioned (+ reserved) is cheaper, and for **spiky, unpredictable, low-utilization**, on-demand is cheaper and easier. At list price, on-demand is about 3.46× provisioned at 100%-utilization equivalent, and the break-even is roughly 30% utilization. Measure first, then lean.

**Q. What is a hot partition? How to prevent it?**
A. A state where access concentrates on a specific partition key, reaching that one partition's limit (**3,000 reads / 1,000 writes units per second**) and throttling. Handle it with a **high-cardinality key design** and, for unavoidable concentration, **write sharding** (attaching a suffix to the key to distribute). Adaptive Capacity mitigates it automatically, but can't exceed the partition limit.

**Q. Must I not use Scan?**
A. In principle NG on a production online path. Scan is charged by **what was evaluated, not what was returned** (effectively the whole table), and is slow and high-cost. Decide the access patterns first, and design so you can get it with Query/GSI. Full-table analytics goes to S3 export + Athena.

**Q. Is warm-throughput pre-warming mandatory?**
A. Normally unneeded. Only when you know **a surge of more than 10× normal at a specific date/time** (sale, launch, etc.), pre-heat in advance to avoid the "2× the prior peak / 30 minutes" constraint. Note a once-raised value can't be lowered, and the pre-warming operation is charged.

**Q. Can I change the billing mode later?**
A. You can. **Provisioned→on-demand up to 4 times in 24 hours**, **on-demand→provisioned anytime.** The switch takes several minutes, during which it's served at a throughput matched to the immediately-prior capacity.

**Q. Can I no longer access an item deleted by TTL immediately?**
A. No. Deletion is **within a few days of expiry**, not instant. An expired-but-not-yet-deleted item can appear in reads, so **also exclude it on the app side with `FilterExpression`.** The deletion itself is **WCU-free.**

---

## In Closing: Cost and Performance Are Decided by "Design"

DynamoDB's pricing and speed are mostly decided not by savings tricks during operation but by **the initial capacity design.**

- **Choose the mode by the load's shape**: spike/unpredictable is on-demand, steady/high-utilization is provisioned + reserved.
- **Measure consumption**: count with `ReturnConsumedCapacity`, and crush the "hidden billing" of UpdateItem, conditional failure, Scan, and FilterExpression.
- **Protect the partition limit by design**: distribute the 3,000/1,000 with high cardinality and write sharding, don't rely entirely on Adaptive Capacity.
- **Use the free weapons**: TTL deletion, burst, Adaptive Capacity, and warm throughput are powerful when used correctly.
- **Observe and turn it**: immediate alert on ThrottledRequests, identify the hot key with Contributor Insights.

The design of correctness (idempotency, atomicity, consistency) is compiled in the [DynamoDB Single-Table Design & Production Reliability Patterns Complete Guide](/blog/dynamodb-single-table-design-reliability-idempotency-patterns), and the application in a real payment foundation in [Designing "zero double charges" in a serverless payment foundation](/blog/dynamodb-payment-reliability-idempotency-zero-downtime). Making DynamoDB earn in production **fast, cheap, and safe** — I'll work out that design with you, matched to your requirements.