DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

What you first stumble on with DynamoDB isn't "correctness." It's "pricing" and "clogging (throttling)."

The data model was designed cleanly. You put in idempotency and conditional writes too. Yet in production — the bill swells to 3× the estimate, or ProvisionedThroughputExceededException erupts the moment a sale starts. This is not a code bug but a capacity-design problem. DynamoDB's pricing and performance are mostly decided by 3 choices for a table: "billing mode, capacity, key design."

This article systematizes only DynamoDB's capacity, cost, and performance design, based on my experience running an AWS-serverless (Lambda + DynamoDB) multi-tenant payment platform in production. The design of "correctness" like data modeling, idempotency, and transactions I leave to the sister article DynamoDB Single-Table Design & Production Reliability Patterns Complete Guide. Complementary to it, this article narrows to "how much it costs," "where it clogs," and "how to make it fast and cheap."

All numbers and limits are checked against the AWS official documentation (as of June 2026). Pricing varies by region and time, so always confirm the final amount on the official pricing page. All prices in the body are US East (N. Virginia) / standard table class / as of June 2026.

1. There Are Only 2 Billing Modes: On-Demand vs. Provisioned

The starting point of DynamoDB's pricing system is the throughput mode you choose per table. This decides "how you're charged" and "how it auto-scales" at once.

Viewpoint	On-Demand	Provisioned
Billing unit	Actual requests (RRU / WRU)	Reserved capacity (RCU / WCU, hourly billing)
Billing concept	Pay for what you used (¥0 at zero traffic)	Charged for the reserved amount even unused
Scaling	Fully automatic. Up to 2× the prior peak instantly	Manual or Auto Scaling
Capacity planning	Unneeded	Needed (prediction is the premise)
Suited load	Spike, unpredictable, new, dev environment	Steady, predictable, high utilization
Unit price (per same consumption)	High (about 3.46× provisioned at 100% equivalent)	Cheap (if you can maintain high utilization)

The official docs state plainly. On-demand is "the default and recommended."

On-demand mode is the default and recommended throughput option for most DynamoDB workloads.

At the stage of "just get it running" or "traffic is unreadable," go on-demand without hesitation. Optimize after actual measurements accumulate — that's the correct order. Premature optimization (a guesswork capacity setting for provisioned) invites either cost increase or throttling.

The Essence of On-Demand: "Up to 2× the Prior Peak, Instantly"

On-demand isn't magic. Scaling has clear rules.

A new table's initial throughput: right after creation, it can immediately handle 4,000 writes / 12,000 reads per second.
Up to 2× the prior peak, instantly: it can instantly produce, anytime, up to 2× the peak of traffic previously reached. For example, if the peak is 50k reads/sec, up to 100k instantly. Produce 100k and that becomes the new peak; next you can grow to 200k.
The 30-minute rule for a surge beyond 2×: try to exceed 2× the prior peak within 30 minutes and it can throttle. The official docs state "spread traffic increases over 30 minutes, or pre-warm."

So even with on-demand, events that surge 10×・100× at once like a sale or launch need care. The countermeasure is the warm throughput described later.

Note that on-demand also has a default per-table quota (40,000 RRU / 40,000 WRU per second). This is a runaway-prevention guardrail and can be raised by request (on-demand has no per-account throughput quota).

The Essence of Provisioned: "Hourly Billing on Reserved Capacity"

Provisioned reserves the read (RCU) / write (WCU) capacity per second yourself, and is hourly-billed on that reserved amount. You're charged even if you don't use it up — the decisive difference from on-demand. In exchange, the unit price is cheap, and you can rate-limit the request rate at the ceiling, so cost predictability is high.

Default quotas: per-table 40,000 RCU / 40,000 WCU, per-account 80,000 RCU / 80,000 WCU (all increasable by request). The minimum is 1 RCU / 1 WCU.
Capacity decreases have a count limit: a day starts with 4 "decrease slots," recovering 1 slot per hour (up to 4 slots). You can decrease up to 27 times in 24 hours. Increases are unlimited.

This asymmetry of "decreases up to 27 times a day" is also the reason the Auto Scaling described later is cautious about scaling down.

The Mode Is Mutually Changeable Later (with a Count Limit)

Even if you guess wrong, you can redo it.

Provisioned → on-demand: up to 4 times in a 24-hour rolling window.
On-demand → provisioned: anytime.

Understanding the behavior at switch is safe too. Switch provisioned→on-demand for the first time, and the table is scaled to a state able to immediately produce at least 4,000 writes / 12,000 reads per second (or that value if you'd reserved more in the past). In the reverse direction, it's served at a throughput matched to the on-demand-time prior peak, so when reverting, set the initial provisioned value high to absorb the migration.

A practical guideline: start new / PoC / dev environments on on-demand. Observe ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits in CloudWatch for several weeks to a month in production, and lean only the tables found to be steadily high-utilization toward provisioned + reserved. The criterion is the next chapter's break-even.

2. Master the "Counting" of Capacity and You Master Cost

For both on-demand and provisioned, the base of billing is the same capacity-unit consumption. Unable to count this accurately, you can't predict pricing or throttling. The official definition is simple.

Reads (1 unit = for an item up to 4KB)

Read type	Units consumed (up to 4KB)
Eventually consistent (default)	0.5
Strongly consistent	1
Transactional (TransactGetItems)	2

Writes (1 unit = for an item up to 1KB)

Write type	Units consumed (up to 1KB)
Normal (Put/Update/Delete)	1
Transactional (TransactWriteItems)	2

Sizes are rounded up in 4KB units for reads and 1KB units for writes. Read a 3.5KB item and it's treated as 4KB, 10KB as 12KB. Write 500 bytes and it consumes 1KB's worth.

The 4 "Counting Traps" That Swell the Bill

Read the official spec closely and you see billing points easy to miss at first glance. These aren't bugs but the spec, and unknown they silently erode cost.

UpdateItem is charged by "the larger of before-update and after-update." Rewrite just one attribute and it's charged by the whole item's size. Frequent partial updates of a huge item are high-cost.
A conditional write consumes write capacity even on failure. Even if ConditionExpression is false, WCU for the target item's size is charged. Design retries on the premise that an idempotency-check whiff is paid too.
FilterExpression only narrows "after reading." Billing is by what you read. Even if the filter results in 0 items, it consumes the read units for all items scanned/queried. The filter is not a saving measure.
Scan is charged by "the size evaluated," not "the size returned." A full-table scan charges by reading the whole table even if the return is 1 item. A Scan on a production hot path is forbidden in principle.

A supplement: Query sums multiple items of the same partition key as one read and rounds up in 4KB units. For example, a query of 64 bytes × 1,500 items is a total of 96KB = 24 read units (12 if eventually consistent). You see that "a little at a time, in bulk" is surprisingly expensive.

Don't Count, Measure: `ReturnConsumedCapacity`

Rather than counting in theory, measurement is accurate. With AWS SDK for JavaScript v3, just attach ReturnConsumedCapacity to a request and it returns the units that operation consumed. Always start cost optimization here.

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, QueryCommand } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));

/** あるアクセスパターンが実際に何ユニット消費するかを本番相当データで実測する。 */
export async function measureQueryCost(pk: string): Promise<number> {
  const res = await ddb.send(
    new QueryCommand({
      TableName: "AppTable",
      KeyConditionExpression: "PK = :pk",
      ExpressionAttributeValues: { ":pk": pk },
      // ← これを付けるだけで消費キャパシティが返る（"TOTAL" | "INDEXES" | "NONE"）
      ReturnConsumedCapacity: "TOTAL",
      // 取得属性を絞ると、転送量は減るが「読み取りユニットは変わらない」点に注意。
      // 読み取りコストを下げる本丸は、アイテムを小さく保つこと。
      ProjectionExpression: "PK, SK, amount, #s",
      ExpressionAttributeNames: { "#s": "status" },
    }),
  );
  return res.ConsumedCapacity?.CapacityUnits ?? 0;
}

An important pitfall here: narrowing attributes with ProjectionExpression does not reduce read units (official: specifying a subset of attributes to retrieve doesn't affect item-size calculation). Network transfer and bandwidth decrease, but the means to lower read units — the heart of billing — is "keeping the item itself small." Offloading a big BLOB to S3 and placing only a reference (the S3 key) in DynamoDB becomes the standard.

3. Cost Design: The Break-Even of On-Demand vs. Provisioned

This is the core of the article. To "which is cheaper after all?" let me answer quantitatively from list prices.

Pricing (US East / standard table class / June 2026)

Item	On-Demand	Provisioned
Write	$0.625 / 1M WRU	$0.00065 / WCU·hour
Read	$0.125 / 1M RRU	$0.00013 / RCU·hour
Data storage (Standard)	$0.25 / GB·month (first 25GB free tier)	Same
Data storage (Standard-IA)	$0.10 / GB·month	Same
Free tier	—	25 RCU + 25 WCU + 25 GB
Reserved capacity	Not supported	Up to 54% off for 1 year / 77% for 3 years

The Unit Price Is "About 3.46×," the Break-Even Is "Roughly 30% Utilization"

Let me convert provisioned's unit price to the same "per 1M consumed units" as on-demand. Use up 1 WCU at 100% and you can handle 3,600 × 730 hours = 2,628,000 writes/month.

Provisioned (100% utilization): $0.00065 × 730 ÷ 2.628 ≈ $0.18 / 1M writes
On-demand: $0.625 / 1M writes

That is, on-demand's list price is about 3.46× provisioned at 100% utilization. Reads are the same ratio. Conversely, provisioned wins only "when you can use up the reserved capacity at high utilization." The break-even utilization is 1 ÷ 3.46 ≈ 29%.

Keep this calculation checkable in code and the per-table judgment is done in a moment.

// US East / 標準テーブルクラス / 2026-06 時点。必ず公式料金ページで再確認すること。
const ON_DEMAND_PER_MILLION = { read: 0.125, write: 0.625 } as const; // RRU / WRU
const PROVISIONED_PER_UNIT_HOUR = { read: 0.00013, write: 0.00065 } as const; // RCU / WCU
const HOURS_PER_MONTH = 730;
const SECONDS_PER_HOUR = 3600;

type Kind = "read" | "write";

/**
 * 持続稼働率 utilization（0–1）における「100万消費ユニットあたり」コストを比較する。
 * 核心：オンデマンドは消費した分だけ、プロビジョンドは「確保した容量」に課金される。
 * よって稼働率が低いほどプロビジョンドは割高になる。
 */
export function comparePerMillion(kind: Kind, utilization: number) {
  if (utilization <= 0 || utilization > 1) {
    throw new RangeError("utilization は 0 より大きく 1 以下で指定する");
  }
  const onDemand = ON_DEMAND_PER_MILLION[kind];
  const unitsPerCapacityPerMonth = SECONDS_PER_HOUR * HOURS_PER_MONTH; // 100%稼働時 = 2,628,000
  const provisionedMonthlyPerUnit = PROVISIONED_PER_UNIT_HOUR[kind] * HOURS_PER_MONTH;
  const provisioned =
    (provisionedMonthlyPerUnit / (unitsPerCapacityPerMonth * utilization)) * 1_000_000;
  return {
    onDemand: Number(onDemand.toFixed(4)),
    provisioned: Number(provisioned.toFixed(4)),
    cheaper: provisioned < onDemand ? ("provisioned" as const) : ("on-demand" as const),
  };
}

comparePerMillion("write", 0.29); // ≈ { onDemand: 0.625, provisioned: 0.6226, cheaper: "provisioned" }
comparePerMillion("write", 0.7); //  ≈ { onDemand: 0.625, provisioned: 0.2579, cheaper: "provisioned" }
comparePerMillion("write", 0.1); //  ≈ { onDemand: 0.625, provisioned: 1.8062, cheaper: "on-demand" }

"100% Utilization" Can't Be Made in Reality — the Practical Break-Even Is Higher

Many explanations stop here, but using provisioned at 100% is impossible in reality. To avoid throttling and leave headroom for bursts, the standard is to put the Auto Scaling target utilization at 70%. Then the effective unit price is $0.18 ÷ 0.70 ≈ $0.26 / 1M, only about 2.4× cheaper than on-demand. And this 2.4× presupposes that traffic is smooth and Auto Scaling can follow.

Organized, the judgment becomes this.

Heavily spiky / unpredictable / new / low utilization (~30%) → on-demand. The unit price is high, but the waste of reservation, throttling, and operational load vanish. In many cases this ends up cheaper.
Steady / predictable / high utilization (can maintain around 70%) → provisioned + Auto Scaling. It shows its true value under a smooth load.
A non-moving baseline load is always there → buy just that base with reserved capacity (up to 54% off for 1 year / 77% for 3 years), and carve the variable portion into an on-demand table — this hybrid tends to be cheapest.

A guardrail against cost runaway: you want the peace of mind of staying on-demand but fear an open ceiling — then use on-demand's maximum throughput (set max RRU/WRU per table/GSI). Requests exceeding the set value are throttled, structurally preventing a billing explosion from an accident or bug. Set it in the Terraform later.

4. The True Nature of Performance: Partitions and Hot Keys

Most throttling happens not from "the whole table's capacity shortage" but from concentration onto one partition (a hot partition). This is the heart of DynamoDB performance design.

One Partition's Limit Is "3,000 Reads / 1,000 Writes"

The official iron rule.

Every partition in a DynamoDB table is designed to deliver a maximum capacity of 3,000 read units per second and 1,000 write units per second.

Even if the whole table has ample capacity, if access is skewed to a specific partition key, it throttles at that one partition's limit. And item size matters. A 20KB item consumes 5 units per strongly-consistent read, so that key reaches the partition limit at 600 times per second.

The 2 Layers DynamoDB Helps with Automatically: Burst and Adaptive Capacity

Before discussing design, correctly understand the mechanisms AWS absorbs automatically.

Burst capacity: accumulates unused capacity for up to 5 minutes (300 seconds) and consumes it for a sudden spike. Short peaks are absorbed by this (so Auto Scaling needn't respond instantly to short-time spikes).
Adaptive Capacity: automatic, free, always-on on all tables. Detects skewed access and automatically leans throughput to a hot partition. Further, it isolates frequently-accessed items into a separate partition, and in the extreme has a single popular item monopolize one partition, supplying it up to the partition limit (3,000 RCU / 1,000 WCU).

What matters is that Adaptive Capacity works for both on-demand and provisioned, but can't exceed the partition limit (3,000/1,000) or the table's total capacity. That is, continue "writes exceeding 1,000 per second to a single key" and it clogs in any mode. Without relying entirely on the automatic mechanisms, you need to distribute by design.

The Design Principle: A High-Cardinality Key and "Write Sharding"

The official first principle is "design so access is uniform across all partition keys." Concretely, choose a key with high cardinality (variety of values) and access spread evenly. user_id and order_id are good candidates; status (only a few kinds) and "today's date" are bad candidates.

For an unavoidable hot key (e.g. an aggregation table where writes concentrate on "today's portion" by time series), use the write sharding the official docs recommend — a technique of attaching a computed suffix to the key to distribute it across multiple partitions.

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));

const SHARD_COUNT = 10; // 「当日」への書き込みを 1/10 に分散。読み取りは全シャードを束ねる。

/** 決定的ハッシュで seed を 0..SHARD_COUNT-1 に写像（= 同じ seed は常に同じシャード）。 */
function shardOf(seed: string): number {
  let h = 0;
  for (let i = 0; i < seed.length; i++) h = (Math.imul(h, 31) + seed.charCodeAt(i)) >>> 0;
  return h % SHARD_COUNT;
}

/** 書き込み：日付PKにシャードサフィックスを付け、1パーティション集中を避ける。 */
export async function recordEvent(day: string, eventId: string, payload: unknown): Promise<void> {
  const pk = `EVENTS#${day}#${shardOf(eventId)}`; // 例: "EVENTS#2026-06-25#7"
  await ddb.send(
    new PutCommand({
      TableName: "AppTable",
      Item: { PK: pk, SK: `EVENT#${eventId}`, payload },
    }),
  );
}

/** 読み取り：全シャードを並列クエリ（scatter-gather）して結合する。 */
export async function listEventsForDay(day: string): Promise<unknown[]> {
  const shards = await Promise.all(
    Array.from({ length: SHARD_COUNT }, (_, shard) =>
      ddb.send(
        new QueryCommand({
          TableName: "AppTable",
          KeyConditionExpression: "PK = :pk",
          ExpressionAttributeValues: { ":pk": `EVENTS#${day}#${shard}` },
        }),
      ),
    ),
  );
  return shards.flatMap((r) => r.Items ?? []);
}

The trade-off is clear. Writes distribute and stop clogging, but reads lick all shards so they're a bit more expensive and complex. So apply it only to patterns like aggregation / event collection where writes concentrate and reads are tolerable. "Shard all tables just in case" is a YAGNI violation. For the details of key design, see the single-table design guide.

5. Surviving a "Launch" with Warm Throughput

The constraint touched on in Chapter 1, "up to 2× the prior peak instantly, beyond that over 30 minutes." If the peak is known in advance like a sale, new-product launch, or TV exposure, you can pre-heat with warm throughput.

Warm throughput is the amount of read/write a table/GSI can immediately handle at this very moment. It's on all tables by default (free) and auto-rises per past usage. Raising it in advance (pre-warming) lets you receive a surge without throttling from the moment of the spike.

Organizing the points.

Without changing the billing mode, you can raise the warm throughput of either/both read and write.
Possible on both existing and new tables. With Global Tables (2019.11.21), it auto-applies to all replicas.
A once-raised value can't be lowered. The pre-warming request itself is charged (the default-value state is free).

A pre-warming example with AWS SDK v3 (an operational script run the night before a launch):

import { DynamoDBClient, UpdateTableCommand } from "@aws-sdk/client-dynamodb";

const client = new DynamoDBClient({});

/**
 * 打ち上げに備えて、毎秒「50,000読み取り・20,000書き込み」を即時にさばける状態へ温める。
 * 課金モード（オンデマンド/プロビジョンド）は変更しない。値は一度上げると下げられない点に注意。
 */
export async function preWarm(tableName: string): Promise<void> {
  await client.send(
    new UpdateTableCommand({
      TableName: tableName,
      WarmThroughput: {
        ReadUnitsPerSecond: 50_000,
        WriteUnitsPerSecond: 20_000,
      },
    }),
  );
}

The criterion is simple. If you know "more than 10× normal traffic comes at a specific date/time" (sale, launch, etc.), pre-warm. A constantly-smooth service doesn't need it.

6. Auto Scaling Design: Receive a Smooth Load Cheaply

If you choose provisioned, DynamoDB Auto Scaling (= AWS Application Auto Scaling) is nearly mandatory. Pin capacity manually and it's either over-reservation (high cost) or shortage (throttling).

The mechanism and the official numbers:

Target tracking: auto-adjusts reserved capacity with UpdateTable so consumed capacity approaches the target utilization. The target utilization can be set at 20–90%. The standard is 70%.
Scale-up: fires when consumption exceeds the target for 2 consecutive minutes.
Scale-down: fires when it stays below the target for 15 consecutive data points (= cautious about lowering. Prevents the throttle of dropping capacity in a short valley and clogging on the immediately-following spike).
Short-time spikes are absorbed not by a capacity change but by the table's built-in burst capacity.
A GSI is separate capacity. If you put Auto Scaling on a table, always put the same setting on the GSI too (the official docs strongly recommend). Note too that a new GSI's auto-scaling doesn't work until backfill completes.

A production setup in Terraform (a table + scaling policies for both read and write):

resource "aws_dynamodb_table" "app" {
  name         = "AppTable"
  billing_mode = "PROVISIONED"
  hash_key     = "PK"
  range_key    = "SK"

  read_capacity  = 5 # 下限。Auto Scaling が需要に応じて引き上げる
  write_capacity = 5

  attribute {
    name = "PK"
    type = "S"
  }
  attribute {
    name = "SK"
    type = "S"
  }

  # 期限切れアイテムを「無料」で自動削除（7章参照）
  ttl {
    attribute_name = "expiresAt"
    enabled        = true
  }

  point_in_time_recovery {
    enabled = true
  }
}

# --- 書き込みキャパシティの Auto Scaling ---
resource "aws_appautoscaling_target" "write" {
  service_namespace  = "dynamodb"
  resource_id        = "table/${aws_dynamodb_table.app.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  min_capacity       = 5
  max_capacity       = 4000 # コストと事故の上限。負荷予測に合わせて調整
}

resource "aws_appautoscaling_policy" "write" {
  name               = "${aws_dynamodb_table.app.name}-write-target-tracking"
  service_namespace  = aws_appautoscaling_target.write.service_namespace
  resource_id        = aws_appautoscaling_target.write.resource_id
  scalable_dimension = aws_appautoscaling_target.write.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
    target_value = 70.0 # 目標使用率70%が定石
  }
}

# 読み取りも同じ要領で target/policy を定義する（predefined_metric_type は
# "DynamoDBReadCapacityUtilization"、scalable_dimension は ...:ReadCapacityUnits）。
# GSI を持つなら GSI 用にも同じ4リソースを必ず追加する。

Here too, face the limit. Auto Scaling entails a delay of several minutes via a CloudWatch alarm, and UpdateTable itself takes several minutes. It can't make a steep spike in time — so receive short peaks with burst and unreadable peaks with on-demand, that's the right answer. "Auto Scaling is there so spikes are fine" is a misunderstanding.

7. Practical Cost-Optimization Techniques

Land the understanding so far into "effective measures," in order of effectiveness.

(1) "Free" Auto-Deletion with TTL

DynamoDB's TTL is a mechanism that gives each item an expiry (a Number attribute of Unix epoch seconds) and auto-deletes the expired without consuming write throughput. The official docs state plainly.

DynamoDB automatically deletes expired items within a few days of their expiration time, without consuming write throughput.

For data with a lifespan like sessions, temporary tokens, caches, and logs, delete them with TTL (free), not DeleteItem (paid) — the iron rule. It also reduces storage fees. But 2 cautions:

Deletion is "within a few days," not instant. An expired-but-not-yet-deleted item can appear in reads/queries/scans, so also exclude it on the app side with FilterExpression.
TTL deletion flows to Streams as a service deletion and is removed from LSI/GSI too. In Global Tables, the origin-region deletion is WCU-free, but the replication deletion to replicas is charged replication WCU/write units.

(2) Make "Rarely-Read Data" Cheap with the Standard-IA Table Class

If storage is large but access is rare (audit logs, old order history, etc.), the Standard-IA table class lowers the storage unit price from $0.25 → $0.10 / GB·month (in exchange, the read/write unit price rises). It works for storage-dominant, low-throughput tables.

(3) Keep Items Small (the Heart of Read Units)

As in Chapter 2, read cost is decided by item size. Shorten attribute names (createdAt→ca, etc.) and offload huge BLOBs to S3, holding only references. One item is up to 400KB, but that's an "upper bound," not a "target."

(4) Sparse GSIs and Minimizing Projected Attributes

A GSI increases storage and write cost by the "projected attributes." Keep the index small and cheap by projecting only the needed attributes and giving the GSI key only to the relevant items (a sparse index). For the design details, to the single-table design guide.

(5) Erase Scan from the Design

A Scan on a production hot path is the biggest anti-pattern, charged by reading the whole table. Determine the access patterns first, and make a key design where you can get it with Query (a GSI if needed). For analytics-purpose full-table processing, don't Scan DynamoDB directly but offload to export to S3 → Athena/Glue — cheaper and faster.

(6) Backup Cost Is a Design Target Too

PITR (continuous backup) is $0.20 / GB·month, on-demand backup $0.10 / GB·month. Important tables are worth PITR, but unconditionally on all tables can be excessive. Select the protection targets.

8. Observability: Notice Before It Clogs

Capacity design is not "set it and forget it" but observe and turn it. The minimum CloudWatch metrics to watch and the alarms to ring:

Metric	What it indicates	Action
`ThrottledRequests` / `ReadThrottleEvents` / `WriteThrottleEvents`	Throttling occurring	Immediate alert. Capacity shortage or a hot key
`ConsumedReadCapacityUnits` / `ConsumedWriteCapacityUnits`	Actual consumption	The basis for the mode/capacity decision
`OnlineIndexConsumedWriteCapacity`	A GSI's consumption	Detect a GSI capacity shortage
`AccountProvisionedReadCapacityUtilization`, etc.	Approaching the account limit	The judgment for a quota-increase request

A minimal alarm against ThrottledRequests (Terraform):

resource "aws_cloudwatch_metric_alarm" "ddb_throttle" {
  alarm_name          = "${aws_dynamodb_table.app.name}-throttled-requests"
  namespace           = "AWS/DynamoDB"
  metric_name         = "ThrottledRequests"
  dimensions          = { TableName = aws_dynamodb_table.app.name }
  statistic           = "Sum"
  period              = 60
  evaluation_periods  = 1
  threshold           = 0
  comparison_operator = "GreaterThanThreshold"
  treat_missing_data  = "notBreaching"
  alarm_actions       = [aws_sns_topic.alerts.arn] # Slack/PagerDuty へ
}

When you don't know which key is clogging, enabling CloudWatch Contributor Insights for DynamoDB lets you identify the most-consuming partition key (= the hot key). Ring the alert on the symptom (throttling), not the cause (code), and trace the culprit with Contributor Insights — this operation makes an unstoppable foundation.

9. Summary: The Production Table's IaC (On-Demand Version)

Finally, here's the "when in doubt, this" production table definition condensing the judgments so far. Start on on-demand, guard billing runaway with the maximum throughput, put in TTL and PITR, and evolve to GSIs and provisioned when needed — that's the starting point.

resource "aws_dynamodb_table" "app" {
  name         = "AppTable"
  billing_mode = "PAY_PER_REQUEST" # オンデマンド = デフォルト推奨
  hash_key     = "PK"
  range_key    = "SK"

  attribute {
    name = "PK"
    type = "S"
  }
  attribute {
    name = "SK"
    type = "S"
  }

  # 事故・バグによる請求爆発を構造的に防ぐガードレール。
  # 想定ピークに余裕を持たせつつ、青天井にはしない。
  on_demand_throughput {
    max_read_request_units  = 50000
    max_write_request_units = 20000
  }

  # 寿命のあるデータは「無料」で自動削除（DeleteItem の WCU を払わない）
  ttl {
    attribute_name = "expiresAt"
    enabled        = true
  }

  point_in_time_recovery {
    enabled = true
  }

  # 削除保護：本番テーブルの誤削除を止める
  deletion_protection_enabled = true

  tags = {
    Environment = "production"
    CostCenter  = "platform"
  }
}

FAQ

Q. On-demand or provisioned, which is cheaper after all? A. For a steady load where you can use up the reserved capacity at sustained high utilization (around 70% as a guide), provisioned (+ reserved) is cheaper, and for spiky, unpredictable, low-utilization, on-demand is cheaper and easier. At list price, on-demand is about 3.46× provisioned at 100%-utilization equivalent, and the break-even is roughly 30% utilization. Measure first, then lean.

Q. What is a hot partition? How to prevent it? A. A state where access concentrates on a specific partition key, reaching that one partition's limit (3,000 reads / 1,000 writes units per second) and throttling. Handle it with a high-cardinality key design and, for unavoidable concentration, write sharding (attaching a suffix to the key to distribute). Adaptive Capacity mitigates it automatically, but can't exceed the partition limit.

Q. Must I not use Scan? A. In principle NG on a production online path. Scan is charged by what was evaluated, not what was returned (effectively the whole table), and is slow and high-cost. Decide the access patterns first, and design so you can get it with Query/GSI. Full-table analytics goes to S3 export + Athena.

Q. Is warm-throughput pre-warming mandatory? A. Normally unneeded. Only when you know a surge of more than 10× normal at a specific date/time (sale, launch, etc.), pre-heat in advance to avoid the "2× the prior peak / 30 minutes" constraint. Note a once-raised value can't be lowered, and the pre-warming operation is charged.

Q. Can I change the billing mode later? A. You can. Provisioned→on-demand up to 4 times in 24 hours, on-demand→provisioned anytime. The switch takes several minutes, during which it's served at a throughput matched to the immediately-prior capacity.

Q. Can I no longer access an item deleted by TTL immediately? A. No. Deletion is within a few days of expiry, not instant. An expired-but-not-yet-deleted item can appear in reads, so also exclude it on the app side with FilterExpression. The deletion itself is WCU-free.

In Closing: Cost and Performance Are Decided by "Design"

DynamoDB's pricing and speed are mostly decided not by savings tricks during operation but by the initial capacity design.

Choose the mode by the load's shape: spike/unpredictable is on-demand, steady/high-utilization is provisioned + reserved.
Measure consumption: count with ReturnConsumedCapacity, and crush the "hidden billing" of UpdateItem, conditional failure, Scan, and FilterExpression.
Protect the partition limit by design: distribute the 3,000/1,000 with high cardinality and write sharding, don't rely entirely on Adaptive Capacity.
Use the free weapons: TTL deletion, burst, Adaptive Capacity, and warm throughput are powerful when used correctly.
Observe and turn it: immediate alert on ThrottledRequests, identify the hot key with Contributor Insights.

The design of correctness (idempotency, atomicity, consistency) is compiled in the DynamoDB Single-Table Design & Production Reliability Patterns Complete Guide, and the application in a real payment foundation in Designing "zero double charges" in a serverless payment foundation. Making DynamoDB earn in production fast, cheap, and safe — I'll work out that design with you, matched to your requirements.

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

1. There Are Only 2 Billing Modes: On-Demand vs. Provisioned

The Essence of On-Demand: "Up to 2× the Prior Peak, Instantly"

The Essence of Provisioned: "Hourly Billing on Reserved Capacity"

The Mode Is Mutually Changeable Later (with a Count Limit)

2. Master the "Counting" of Capacity and You Master Cost

The 4 "Counting Traps" That Swell the Bill

Don't Count, Measure: `ReturnConsumedCapacity`

3. Cost Design: The Break-Even of On-Demand vs. Provisioned

The Unit Price Is "About 3.46×," the Break-Even Is "Roughly 30% Utilization"

"100% Utilization" Can't Be Made in Reality — the Practical Break-Even Is Higher

4. The True Nature of Performance: Partitions and Hot Keys

One Partition's Limit Is "3,000 Reads / 1,000 Writes"

The 2 Layers DynamoDB Helps with Automatically: Burst and Adaptive Capacity

The Design Principle: A High-Cardinality Key and "Write Sharding"

5. Surviving a "Launch" with Warm Throughput

6. Auto Scaling Design: Receive a Smooth Load Cheaply

7. Practical Cost-Optimization Techniques

(1) "Free" Auto-Deletion with TTL

(2) Make "Rarely-Read Data" Cheap with the Standard-IA Table Class

(3) Keep Items Small (the Heart of Read Units)

(4) Sparse GSIs and Minimizing Projected Attributes

(5) Erase Scan from the Design

(6) Backup Cost Is a Design Target Too

8. Observability: Notice Before It Clogs

9. Summary: The Production Table's IaC (On-Demand Version)

FAQ

In Closing: Cost and Performance Are Decided by "Design"

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code

DynamoDB Global Tables × Multi-Region × Disaster Recovery (DR) Complete Guide (2026 Edition): MREC/MRSC Consistency, Conflict Resolution, RTO/RPO Design, PITR, Cost

DynamoDB Security Complete Guide (2026 Edition): IAM Least Privilege, Fine-Grained Access Control (LeadingKeys), Encryption at Rest/in Transit, VPC Endpoints

DynamoDB Streams × Event-Driven Architecture / CDC Complete Guide (2026 Edition): Safely Propagating Change Data with Lambda and EventBridge Pipes

Also worth reading

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Azure Container Apps vs AWS ECS on Fargate: a thorough serverless-container comparison (scale-to-zero, GPU, cost, migration)

Designing 'zero double charges' in a serverless payment foundation — implementing idempotency, atomicity, and zero-downtime migration with DynamoDB

1. There Are Only 2 Billing Modes: On-Demand vs. Provisioned

The Essence of On-Demand: "Up to 2× the Prior Peak, Instantly"

The Essence of Provisioned: "Hourly Billing on Reserved Capacity"

The Mode Is Mutually Changeable Later (with a Count Limit)

2. Master the "Counting" of Capacity and You Master Cost

The 4 "Counting Traps" That Swell the Bill

Don't Count, Measure: ReturnConsumedCapacity

3. Cost Design: The Break-Even of On-Demand vs. Provisioned

The Unit Price Is "About 3.46×," the Break-Even Is "Roughly 30% Utilization"

"100% Utilization" Can't Be Made in Reality — the Practical Break-Even Is Higher

4. The True Nature of Performance: Partitions and Hot Keys

One Partition's Limit Is "3,000 Reads / 1,000 Writes"

The 2 Layers DynamoDB Helps with Automatically: Burst and Adaptive Capacity

The Design Principle: A High-Cardinality Key and "Write Sharding"

5. Surviving a "Launch" with Warm Throughput

6. Auto Scaling Design: Receive a Smooth Load Cheaply

7. Practical Cost-Optimization Techniques

(1) "Free" Auto-Deletion with TTL

(2) Make "Rarely-Read Data" Cheap with the Standard-IA Table Class

(3) Keep Items Small (the Heart of Read Units)

(4) Sparse GSIs and Minimizing Projected Attributes

(5) Erase Scan from the Design

(6) Backup Cost Is a Design Target Too

8. Observability: Notice Before It Clogs

9. Summary: The Production Table's IaC (On-Demand Version)

FAQ

In Closing: Cost and Performance Are Decided by "Design"

Related articles

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code

DynamoDB Global Tables × Multi-Region × Disaster Recovery (DR) Complete Guide (2026 Edition): MREC/MRSC Consistency, Conflict Resolution, RTO/RPO Design, PITR, Cost

DynamoDB Security Complete Guide (2026 Edition): IAM Least Privilege, Fine-Grained Access Control (LeadingKeys), Encryption at Rest/in Transit, VPC Endpoints

DynamoDB Streams × Event-Driven Architecture / CDC Complete Guide (2026 Edition): Safely Propagating Change Data with Lambda and EventBridge Pipes

Also worth reading

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Azure Container Apps vs AWS ECS on Fargate: a thorough serverless-container comparison (scale-to-zero, GPU, cost, migration)

Designing 'zero double charges' in a serverless payment foundation — implementing idempotency, atomicity, and zero-downtime migration with DynamoDB

Don't Count, Measure: `ReturnConsumedCapacity`