# Crushing Lambda cold starts in production: choosing among execution-environment reuse, SnapStart, and provisioned concurrency

> An implementation guide to suppressing AWS Lambda cold starts at production quality. With real code faithful to the AWS official specs and a decision tree, it covers the true nature of the INIT phase and the August 2025 billing unification, connection reuse and package reduction and Arm64, SnapStart's (Java/Python/.NET) snapshot restore and the uniqueness pitfall, provisioned concurrency + Application Auto Scaling target tracking, and the VPC Hyperplane ENI.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: AWS, Lambda, サーバーレス, パフォーマンス, コスト最適化
- URL: https://tomodahinata.com/en/blog/aws-lambda-cold-start-snapstart-provisioned-concurrency-performance-guide
- Category: AWS Lambda in production
- Pillar guide: https://tomodahinata.com/en/blog/aws-lambda-production-guide

## Key points

- Cold starts are under 1% of all invocations, but the problem is the tail (P99). The first thing to do is 'reuse connections outside the handler, reduce the package, Arm64' — effective for free.
- SnapStart is a mechanism that restores a pre-initialized Firecracker snapshot. Java/Python 3.12+/.NET 8+ supported; Java is free, Python/.NET have restore billing. Node.js/Ruby unsupported.
- SnapStart's biggest trap is uniqueness. Because the snapshot is reused, regenerate random numbers, tokens, and timestamps in an afterRestore hook or inside the handler.
- If two-digit-millisecond responses are required, use provisioned concurrency + Application Auto Scaling target tracking (track ProvisionedConcurrencyUtilization at 70%).
- Decision: first the free optimizations → if at scale and a supported runtime, SnapStart → only for a strict latency SLA, provisioned. SnapStart and provisioned can't be used together.

---

"Lambda, sometimes the very first call is super slow" — what you always hit in production is cold starts. A payment-confirmation API, a form submission the user is waiting on, an internal API with an SLA — when this appears at a **place where P99 latency matters**, it directly hits perceived quality.

This article is an implementation guide to **suppressing AWS Lambda cold starts at production quality.** From "the basics that work for free," it organizes with a decision tree the use of the two cards **SnapStart** and **provisioned concurrency**, and the VPC pitfall. The big picture of Lambda production operation (execution model, idempotency, observability, security, cost) is in the sister article [AWS Lambda production-operations guide](/blog/aws-lambda-production-guide); this article concentrates on the **single point of "making it fast."**

> **Rules for this article**: specs, parameter names, and defaults are based on the **AWS official documentation (as of June 2026).** SnapStart's supported runtimes, pricing scheme, and scale caps are revised, so always confirm the latest values in the official docs (the "References" at the end) before production rollout.

---

## 0. Mental model: a cold start = "the time to make a new execution environment"

First, fix the true nature on one page. Lambda assigns an **execution environment** per request. When there's no reusable environment on hand, it makes a new one — **this preparation time is the cold start.**

- **Cold start = code download + environment construction + INIT (initialization outside the handler).** The official says "typically **under 1%** of all invocations, taking **under 100ms to over 1 second.**"
- **The problem isn't the average but the tail (P99/P100).** Even if 99% are fast, if the 1% kept waiting is "payment confirmation," the business impact is large.
- **There are three moves**: ① **free optimizations** (connection reuse, package reduction, Arm64) ② **SnapStart** (restore a pre-initialized snapshot) ③ **provisioned concurrency** (keep environments warm in advance).
- **From August 2025, INIT time is also billed.** Reducing cold starts improves not only latency but **cost.**

The order of this chapter — **first free, then SnapStart, finally provisioned** — is exactly the priority of the decision-making (diagrammed in chapter 5).

---

## 1. First measure: visualize cold starts

Before optimizing, measure (warming on a guess costs the most). A cold start appears as `Init Duration` in the `REPORT` line of CloudWatch Logs.

```text
REPORT RequestId: ...  Duration: 12.34 ms  Billed Duration: 13 ms
  Memory Size: 1024 MB  Max Memory Used: 95 MB  Init Duration: 320.45 ms
```

An invocation **with** an `Init Duration` line = a cold start; one without = a warm start. Get the ratio and P99 with Logs Insights.

```sql
-- コールドスタート率と Init Duration の分布（CloudWatch Logs Insights）
filter @type = "REPORT"
| stats count(*) as invocations,
        sum(strcontains(@message, "Init Duration")) as cold_starts,
        avg(@initDuration) as avg_init_ms,
        pct(@initDuration, 99) as p99_init_ms
```

Enable **X-Ray active tracing** and cold starts are visualized as an `Initialization` subsegment, letting you trace which initialization (SDK, DB connection, config fetch) is heavy. **First identify "the place that genuinely works"** here, then choose the move.

---

## 2. Optimizations that work for free: connection reuse, package reduction, Arm64

Before SnapStart or provisioned, exhaust **the basics that work at no extra charge.** Many workloads are enough here.

### 2.1 Heavy init once outside the handler (connection reuse)

A frozen execution environment is reused with its memory state. As the official also states, **objects declared outside the handler stay alive across warm invocations.** Do DB connections, SDK clients, and config fetches once **outside the handler** and reuse them. This lightens the cold-start INIT and also cuts the execution time of warm invocations.

```ts
// 良い例：接続・クライアントはハンドラ外（INIT）で1度だけ。ウォーム呼び出しで再利用される。
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({})); // ← ここで1度だけ
let configPromise: Promise<AppConfig> | undefined;               // 重い設定は遅延初期化

export const handler = async (event: MyEvent) => {
  configPromise ??= loadConfig(ddb); // 初回だけ実行、以降は同じPromiseを再利用
  const config = await configPromise;
  return process(event, config);
};
```

```python
# 悪い例：ハンドラ内で毎回クライアント生成 → 毎回接続コスト＋実行時間増（やってはいけない）
def lambda_handler(event, context):
    ddb = boto3.client("dynamodb")  # ← 毎回生成。ハンドラ外に出すべき
    ...
```

### 2.2 Make the package small, narrow the dependencies

Code download is part of the cold start. **The smaller the package, the faster.**

- **AWS SDK v3, import only the needed subclients** (`@aws-sdk/client-dynamodb`, etc.). Don't bring in the whole SDK.
- **Tree-shake and bundle with esbuild, etc.** Avoid bundling `node_modules` wholesale.
- **Layers are not recommended for Go / Rust** (official). Bundling into the executable is more favorable for cold starts.

### 2.3 Go Arm64 (Graviton2)

If compatible, **Arm64 is almost always a win.** **The execution price is 20% cheaper than x86**, and performance improves in many workloads. Often it's just changing the build target. It helps **cost** more than latency itself, but cold-start measures and cost optimization point in the same direction.

### 2.4 Raise memory to "finish fast"

CPU is allocated proportionally to memory, and **1,769MB is equivalent to 1 vCPU.** If initialization or processing is CPU-bound, raising memory can **shorten both the cold start and the execution time.** Decide by **measuring, not guessing** (AWS Lambda Power Tuning is the official sample).

---

## 3. SnapStart: restore a pre-initialized snapshot

When the free optimizations aren't enough — especially when the tail is unacceptable for **an init-heavy runtime like the JVM** — the first move is **SnapStart.**

### 3.1 Mechanism and supported runtimes

SnapStart **runs initialization just once when a function version is published, and takes, encrypts, and caches a (memory + disk) snapshot of the pre-initialized execution environment with a Firecracker MicroVM.** Thereafter, instead of redoing initialization, it **restores from the cached snapshot (Restore)** to shorten the cold start. Picture a **Restore phase** added to the lifecycle.

| Runtime | SnapStart supported | Pricing (official) |
| --- | --- | --- |
| **Java 11 or later** | ✅ | **no extra charge** |
| **Python 3.12 or later** | ✅ | cache fee (min 3 hours) + restore fee |
| **.NET 8 or later** | ✅ | cache fee (min 3 hours) + restore fee |
| Node.js / Ruby / OS-only / container image | ❌ | — |

The point: **Java is free**, while **Python/.NET incur "keeping the snapshot cached" + "a fee per restore."** Also, **SnapStart is enabled only on a published version / alias** (`$LATEST` not allowed), **can't be used together with provisioned concurrency**, and can't be combined with `/tmp` over 512MB, EFS, etc.

### 3.2 The biggest trap: uniqueness (the snapshot is reused)

This is what you **always** step on with SnapStart. Because the snapshot **solidifies and reuses the state at initialization time**, a "supposedly unique value" generated during initialization becomes **the same across all environments.** This applies to random seeds, UUIDs, tokens, and cached timestamps.

The remedy is **runtime hooks.** Re-create "unique, volatile values" after restore (afterRestore).

```java
// Java（CRaC: org.crac）。afterRestore で乱数源を作り直す。
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

public class Handler implements Resource {
  private SecureRandom random = new SecureRandom();

  public Handler() { Core.getGlobalContext().register(this); }

  @Override public void beforeCheckpoint(Context<? extends Resource> c) { /* 接続を畳む等 */ }

  @Override public void afterRestore(Context<? extends Resource> c) {
    // スナップショット復元後に一意性・エントロピーを再取得（全環境で同じ値になるのを防ぐ）
    this.random = new SecureRandom();
  }
}
```

```python
# Python（snapshot-restore-py。Python managed runtime に同梱）
from snapshot_restore_py import register_after_restore

@register_after_restore
def reseed():
    # 復元後にランダム性・一時データを作り直す（DB接続の張り直しもここで）
    global random_seed
    random_seed = generate_fresh_seed()
```

> **Three cautions the official raises**: ① **uniqueness** — re-create unique content made during init after restore. ② **network connections** — connection state isn't guaranteed after restore, so verify and re-establish (AWS SDK connections usually auto-recover). ③ **temporary data** — update temporary credentials and cached timestamps inside the handler before using them.

### 3.3 When SnapStart works

SnapStart works best on **a function that scales and is called frequently** (to amortize the publish-time init cost over many restores). Conversely, **a rarely-called function** can, on Python/.NET, only add the cache fee (min 3 hours and continuing) and not pay off. **Java is free, so it's the first candidate for cold-start measures of an init-heavy JVM function.**

---

## 4. Provisioned concurrency: keep environments warm in advance

When "**two-digit-millisecond responses are required** and SnapStart isn't enough / isn't a supported runtime," the sure move is **provisioned concurrency.**

### 4.1 Mechanism and where to use it

Provisioned concurrency **pre-initializes and keeps a specified number of execution environments on standby.** Cold starts structurally don't occur, so it yields the lowest, most predictable latency. However, there's **an extra charge**, it can't be set on `$LATEST` (published version / alias only), and **it can't be set beyond reserved concurrency.** Lambda provides up to 6,000 environments/minute per function (it doesn't stand up instantly).

### 4.2 Track traffic: Application Auto Scaling

Holding the provisioned amount as a fixed value causes overpaying in idle times and shortage at the peak. With **Application Auto Scaling target tracking**, auto-increase/decrease it by utilization.

```hcl
# Terraform: プロビジョンド同時実行をターゲット追跡で自動スケール
# 利用率 ProvisionedConcurrencyUtilization を 70% に保つように増減する
resource "aws_appautoscaling_target" "lambda" {
  service_namespace  = "lambda"
  resource_id        = "function:orders-api:live"          # エイリアス live
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  min_capacity       = 2
  max_capacity       = 50
}

resource "aws_appautoscaling_policy" "lambda_target_tracking" {
  name               = "lambda-pc-target-tracking"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.lambda.service_namespace
  resource_id        = aws_appautoscaling_target.lambda.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda.scalable_dimension

  target_tracking_scaling_policy_configuration {
    target_value = 0.70 # 10%〜90%の範囲で設定可。70%利用を狙う
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
  }
}
```

For **predictable peaks** (a simultaneous login at 9:00 every business day, etc.), also use **scheduled scaling** to warm before the peak. Requests **beyond** the provisioned amount spill over to ordinary on-demand (with cold starts), so design `min_capacity` for the bottom of the peak and `max_capacity` within reserved concurrency.

---

## 5. Decision tree: what to use, in what order

Boil the above into a single order. The principle is **to try the cheap-and-effective first.**

```text
Is the cold-start P99 unacceptable?
├─ No → do nothing (cold starts are <1%. Over-optimization is technical debt)
└─ Yes
   ├─ ① First, free optimizations (common to all runtimes, mandatory)
   │    reuse connections outside the handler / reduce package / Arm64 / optimize memory by measuring
   │    → if this suffices, done (often enough here)
   │
   ├─ ② A supported runtime (Java / Python 3.12+ / .NET 8+)?
   │    Java          → SnapStart (free). The first candidate for init-heavy JVM
   │    Python/.NET   → if at scale, SnapStart (adopt if restore billing < cold-start loss)
   │    → always re-create uniqueness and connections with runtime hooks
   │
   └─ ③ A two-digit-ms SLA is required / SnapStart unsupported (Node.js, Ruby) / SnapStart not enough?
        → Provisioned Concurrency + Application Auto Scaling (target tracking)
        → for predictable peaks, also use scheduled scaling
```

Fix the three moves in a comparison table.

| Means | Latency improvement | Extra charge | Support | Main caution |
| --- | --- | --- | --- | --- |
| **Free optimizations** | medium (INIT shortening) | none | all runtimes | this first. Often enough here |
| **SnapStart** | large | Java free / Python·.NET restore + cache | Java11+/Py3.12+/.NET8+ | re-create uniqueness/connections. `$LATEST` not allowed. Can't combine with provisioned |
| **Provisioned concurrency** | largest (pre-initialized) | yes (always) | all runtimes | within reserved. Prevent overpay with Auto Scaling |

> **Note the incompatibility**: **SnapStart and provisioned concurrency can't be used together on the same function version.** It's an exclusive choice: first see "whether free SnapStart works on Java," and if not enough, move to provisioned.

---

## 6. VPC cold starts: the current state of Hyperplane ENI

The old knowledge that "putting Lambda in a VPC makes cold starts painfully slow" **mostly no longer applies.** The ENI Lambda creates for a VPC is a **Hyperplane ENI**, **shared and reused among functions with the same subnet + security group combination**, and each ENI handles up to 65,000 connections.

The important point is that **ENI creation happens "at function create/update time."** On a new VPC attach, the function may temporarily be in `Pending` state and take a few minutes, but this is **detached from the invocation path**, so it doesn't ride on steady-operation cold starts. Two notes:

- **After 14 days idle, the ENI is reclaimed** and the function becomes `Inactive` (re-created on the next invocation). Not a problem with steady traffic.
- **Outbound communication needs NAT.** Just placing it in a private subnet doesn't reach the internet. To AWS services, you can reach without NAT via **VPC endpoints**, advantageous in all of latency, cost, and security.

So a VPC is no longer "something to avoid for cold-start reasons" but **something usable normally if you do the design (subnet/SG sharing, endpoints) right.**

---

## 7. Conclusion: a cold-start cheat sheet

- **First measure**: identify P99 and heavy init with `Init Duration` (the REPORT line) and X-Ray's `Initialization`. Don't warm on a guess.
- **Exhaust the free optimizations**: reuse connections outside the handler / reduce package (SDK subclients + tree-shaking) / **Arm64** / optimize memory by measuring. **Often enough here.**
- **SnapStart**: Java/Python3.12+/.NET8+. **Java is free** and the first candidate for an init-heavy JVM. **Always re-create uniqueness and connections in afterRestore.** `$LATEST` not allowed, can't combine with provisioned.
- **Provisioned concurrency**: when a two-digit-ms SLA is required, or SnapStart is unsupported. Prevent overpaying with **Application Auto Scaling target tracking** (`ProvisionedConcurrencyUtilization` at 70%), and use scheduled scaling for predictable peaks.
- **VPC**: fast now thanks to Hyperplane ENI. With correct subnet/SG sharing, VPC endpoints, and NAT design, there's no reason to avoid it.
- **Cost view**: from August 2025, **INIT is also billed.** Reducing cold starts improves latency and cost at the same time.

Cold-start measures are not "just warm with provisioned" but a design of **measure → free optimizations → SnapStart if a supported runtime → provisioned only when an SLA is required** that **crushes the tail without paying costs you don't need to.** To balance the payment platform's latency requirements and cost, I spent cost "only where it works" in this priority.

**"I want to fit my own Lambda's latency (P99) into a production SLA without spending excess cost" — from measurement through choosing the move to auto-scaling in IaC, I accompany you at the speed of one person × generative AI (Claude Code).** For the overall design of Lambda operation, please also see the [AWS Lambda production-operations guide](/blog/aws-lambda-production-guide).

---

### References (official documentation)

- [Lambda execution environment lifecycle](https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html) — INIT phase, cold starts (<1%, 100ms to over 1s), reuse of objects outside the handler
- [AWS Lambda standardizes billing for the INIT phase](https://aws.amazon.com/blogs/compute/aws-lambda-standardizes-billing-for-init-phase/) — INIT billing unification from August 1, 2025
- [Improving startup performance with Lambda SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html) — supported runtimes (Java11+/Python3.12+/.NET8+), Firecracker snapshot, pricing, limits
- [Handling uniqueness with SnapStart](https://docs.aws.amazon.com/lambda/latest/dg/snapstart-uniqueness.html) — cautions on uniqueness, random numbers, entropy
- [SnapStart runtime hooks (Java/CRaC)](https://docs.aws.amazon.com/lambda/latest/dg/snapstart-runtime-hooks-java.html) / [Python](https://docs.aws.amazon.com/lambda/latest/dg/snapstart-runtime-hooks-python.html) — `beforeCheckpoint`/`afterRestore`, `@register_after_restore`
- [Provisioned concurrency](https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html) — pre-initialization, Application Auto Scaling (scheduled / target tracking, `ProvisionedConcurrencyUtilization`)
- [Lambda concurrency](https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html) — reserved / provisioned, incompatibility with SnapStart
- [Configuring a Lambda function to access resources in a VPC](https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html) — Hyperplane ENI, sharing/reuse, Inactive, NAT/VPC endpoints
- [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing/) — execution-time / provisioned / Arm pricing
