Skip to main content
友田 陽大
AWS Lambda in production
AWS
Lambda
サーバーレス
パフォーマンス
コスト最適化

Crushing Lambda cold starts in production: choosing among execution-environment reuse, SnapStart, and provisioned concurrency

An implementation guide to suppressing AWS Lambda cold starts at production quality. With real code faithful to the AWS official specs and a decision tree, it covers the true nature of the INIT phase and the August 2025 billing unification, connection reuse and package reduction and Arm64, SnapStart's (Java/Python/.NET) snapshot restore and the uniqueness pitfall, provisioned concurrency + Application Auto Scaling target tracking, and the VPC Hyperplane ENI.

Published
Reading time
12 min read
Author
友田 陽大
Share

"Lambda, sometimes the very first call is super slow" — what you always hit in production is cold starts. A payment-confirmation API, a form submission the user is waiting on, an internal API with an SLA — when this appears at a place where P99 latency matters, it directly hits perceived quality.

This article is an implementation guide to suppressing AWS Lambda cold starts at production quality. From "the basics that work for free," it organizes with a decision tree the use of the two cards SnapStart and provisioned concurrency, and the VPC pitfall. The big picture of Lambda production operation (execution model, idempotency, observability, security, cost) is in the sister article AWS Lambda production-operations guide; this article concentrates on the single point of "making it fast."

Rules for this article: specs, parameter names, and defaults are based on the AWS official documentation (as of June 2026). SnapStart's supported runtimes, pricing scheme, and scale caps are revised, so always confirm the latest values in the official docs (the "References" at the end) before production rollout.


0. Mental model: a cold start = "the time to make a new execution environment"

First, fix the true nature on one page. Lambda assigns an execution environment per request. When there's no reusable environment on hand, it makes a new one — this preparation time is the cold start.

  • Cold start = code download + environment construction + INIT (initialization outside the handler). The official says "typically under 1% of all invocations, taking under 100ms to over 1 second."
  • The problem isn't the average but the tail (P99/P100). Even if 99% are fast, if the 1% kept waiting is "payment confirmation," the business impact is large.
  • There are three moves: ① free optimizations (connection reuse, package reduction, Arm64) ② SnapStart (restore a pre-initialized snapshot) ③ provisioned concurrency (keep environments warm in advance).
  • From August 2025, INIT time is also billed. Reducing cold starts improves not only latency but cost.

The order of this chapter — first free, then SnapStart, finally provisioned — is exactly the priority of the decision-making (diagrammed in chapter 5).


1. First measure: visualize cold starts

Before optimizing, measure (warming on a guess costs the most). A cold start appears as Init Duration in the REPORT line of CloudWatch Logs.

REPORT RequestId: ...  Duration: 12.34 ms  Billed Duration: 13 ms
  Memory Size: 1024 MB  Max Memory Used: 95 MB  Init Duration: 320.45 ms

An invocation with an Init Duration line = a cold start; one without = a warm start. Get the ratio and P99 with Logs Insights.

-- コールドスタート率と Init Duration の分布(CloudWatch Logs Insights)
filter @type = "REPORT"
| stats count(*) as invocations,
        sum(strcontains(@message, "Init Duration")) as cold_starts,
        avg(@initDuration) as avg_init_ms,
        pct(@initDuration, 99) as p99_init_ms

Enable X-Ray active tracing and cold starts are visualized as an Initialization subsegment, letting you trace which initialization (SDK, DB connection, config fetch) is heavy. First identify "the place that genuinely works" here, then choose the move.


2. Optimizations that work for free: connection reuse, package reduction, Arm64

Before SnapStart or provisioned, exhaust the basics that work at no extra charge. Many workloads are enough here.

2.1 Heavy init once outside the handler (connection reuse)

A frozen execution environment is reused with its memory state. As the official also states, objects declared outside the handler stay alive across warm invocations. Do DB connections, SDK clients, and config fetches once outside the handler and reuse them. This lightens the cold-start INIT and also cuts the execution time of warm invocations.

// 良い例:接続・クライアントはハンドラ外(INIT)で1度だけ。ウォーム呼び出しで再利用される。
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({})); // ← ここで1度だけ
let configPromise: Promise<AppConfig> | undefined;               // 重い設定は遅延初期化

export const handler = async (event: MyEvent) => {
  configPromise ??= loadConfig(ddb); // 初回だけ実行、以降は同じPromiseを再利用
  const config = await configPromise;
  return process(event, config);
};
# 悪い例:ハンドラ内で毎回クライアント生成 → 毎回接続コスト+実行時間増(やってはいけない)
def lambda_handler(event, context):
    ddb = boto3.client("dynamodb")  # ← 毎回生成。ハンドラ外に出すべき
    ...

2.2 Make the package small, narrow the dependencies

Code download is part of the cold start. The smaller the package, the faster.

  • AWS SDK v3, import only the needed subclients (@aws-sdk/client-dynamodb, etc.). Don't bring in the whole SDK.
  • Tree-shake and bundle with esbuild, etc. Avoid bundling node_modules wholesale.
  • Layers are not recommended for Go / Rust (official). Bundling into the executable is more favorable for cold starts.

2.3 Go Arm64 (Graviton2)

If compatible, Arm64 is almost always a win. The execution price is 20% cheaper than x86, and performance improves in many workloads. Often it's just changing the build target. It helps cost more than latency itself, but cold-start measures and cost optimization point in the same direction.

2.4 Raise memory to "finish fast"

CPU is allocated proportionally to memory, and 1,769MB is equivalent to 1 vCPU. If initialization or processing is CPU-bound, raising memory can shorten both the cold start and the execution time. Decide by measuring, not guessing (AWS Lambda Power Tuning is the official sample).


3. SnapStart: restore a pre-initialized snapshot

When the free optimizations aren't enough — especially when the tail is unacceptable for an init-heavy runtime like the JVM — the first move is SnapStart.

3.1 Mechanism and supported runtimes

SnapStart runs initialization just once when a function version is published, and takes, encrypts, and caches a (memory + disk) snapshot of the pre-initialized execution environment with a Firecracker MicroVM. Thereafter, instead of redoing initialization, it restores from the cached snapshot (Restore) to shorten the cold start. Picture a Restore phase added to the lifecycle.

RuntimeSnapStart supportedPricing (official)
Java 11 or laterno extra charge
Python 3.12 or latercache fee (min 3 hours) + restore fee
.NET 8 or latercache fee (min 3 hours) + restore fee
Node.js / Ruby / OS-only / container image

The point: Java is free, while Python/.NET incur "keeping the snapshot cached" + "a fee per restore." Also, SnapStart is enabled only on a published version / alias ($LATEST not allowed), can't be used together with provisioned concurrency, and can't be combined with /tmp over 512MB, EFS, etc.

3.2 The biggest trap: uniqueness (the snapshot is reused)

This is what you always step on with SnapStart. Because the snapshot solidifies and reuses the state at initialization time, a "supposedly unique value" generated during initialization becomes the same across all environments. This applies to random seeds, UUIDs, tokens, and cached timestamps.

The remedy is runtime hooks. Re-create "unique, volatile values" after restore (afterRestore).

// Java(CRaC: org.crac)。afterRestore で乱数源を作り直す。
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

public class Handler implements Resource {
  private SecureRandom random = new SecureRandom();

  public Handler() { Core.getGlobalContext().register(this); }

  @Override public void beforeCheckpoint(Context<? extends Resource> c) { /* 接続を畳む等 */ }

  @Override public void afterRestore(Context<? extends Resource> c) {
    // スナップショット復元後に一意性・エントロピーを再取得(全環境で同じ値になるのを防ぐ)
    this.random = new SecureRandom();
  }
}
# Python(snapshot-restore-py。Python managed runtime に同梱)
from snapshot_restore_py import register_after_restore

@register_after_restore
def reseed():
    # 復元後にランダム性・一時データを作り直す(DB接続の張り直しもここで)
    global random_seed
    random_seed = generate_fresh_seed()

Three cautions the official raises: ① uniqueness — re-create unique content made during init after restore. ② network connections — connection state isn't guaranteed after restore, so verify and re-establish (AWS SDK connections usually auto-recover). ③ temporary data — update temporary credentials and cached timestamps inside the handler before using them.

3.3 When SnapStart works

SnapStart works best on a function that scales and is called frequently (to amortize the publish-time init cost over many restores). Conversely, a rarely-called function can, on Python/.NET, only add the cache fee (min 3 hours and continuing) and not pay off. Java is free, so it's the first candidate for cold-start measures of an init-heavy JVM function.


4. Provisioned concurrency: keep environments warm in advance

When "two-digit-millisecond responses are required and SnapStart isn't enough / isn't a supported runtime," the sure move is provisioned concurrency.

4.1 Mechanism and where to use it

Provisioned concurrency pre-initializes and keeps a specified number of execution environments on standby. Cold starts structurally don't occur, so it yields the lowest, most predictable latency. However, there's an extra charge, it can't be set on $LATEST (published version / alias only), and it can't be set beyond reserved concurrency. Lambda provides up to 6,000 environments/minute per function (it doesn't stand up instantly).

4.2 Track traffic: Application Auto Scaling

Holding the provisioned amount as a fixed value causes overpaying in idle times and shortage at the peak. With Application Auto Scaling target tracking, auto-increase/decrease it by utilization.

# Terraform: プロビジョンド同時実行をターゲット追跡で自動スケール
# 利用率 ProvisionedConcurrencyUtilization を 70% に保つように増減する
resource "aws_appautoscaling_target" "lambda" {
  service_namespace  = "lambda"
  resource_id        = "function:orders-api:live"          # エイリアス live
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  min_capacity       = 2
  max_capacity       = 50
}

resource "aws_appautoscaling_policy" "lambda_target_tracking" {
  name               = "lambda-pc-target-tracking"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.lambda.service_namespace
  resource_id        = aws_appautoscaling_target.lambda.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda.scalable_dimension

  target_tracking_scaling_policy_configuration {
    target_value = 0.70 # 10%〜90%の範囲で設定可。70%利用を狙う
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
  }
}

For predictable peaks (a simultaneous login at 9:00 every business day, etc.), also use scheduled scaling to warm before the peak. Requests beyond the provisioned amount spill over to ordinary on-demand (with cold starts), so design min_capacity for the bottom of the peak and max_capacity within reserved concurrency.


5. Decision tree: what to use, in what order

Boil the above into a single order. The principle is to try the cheap-and-effective first.

Is the cold-start P99 unacceptable?
├─ No → do nothing (cold starts are <1%. Over-optimization is technical debt)
└─ Yes
   ├─ ① First, free optimizations (common to all runtimes, mandatory)
   │    reuse connections outside the handler / reduce package / Arm64 / optimize memory by measuring
   │    → if this suffices, done (often enough here)
   │
   ├─ ② A supported runtime (Java / Python 3.12+ / .NET 8+)?
   │    Java          → SnapStart (free). The first candidate for init-heavy JVM
   │    Python/.NET   → if at scale, SnapStart (adopt if restore billing < cold-start loss)
   │    → always re-create uniqueness and connections with runtime hooks
   │
   └─ ③ A two-digit-ms SLA is required / SnapStart unsupported (Node.js, Ruby) / SnapStart not enough?
        → Provisioned Concurrency + Application Auto Scaling (target tracking)
        → for predictable peaks, also use scheduled scaling

Fix the three moves in a comparison table.

MeansLatency improvementExtra chargeSupportMain caution
Free optimizationsmedium (INIT shortening)noneall runtimesthis first. Often enough here
SnapStartlargeJava free / Python·.NET restore + cacheJava11+/Py3.12+/.NET8+re-create uniqueness/connections. $LATEST not allowed. Can't combine with provisioned
Provisioned concurrencylargest (pre-initialized)yes (always)all runtimeswithin reserved. Prevent overpay with Auto Scaling

Note the incompatibility: SnapStart and provisioned concurrency can't be used together on the same function version. It's an exclusive choice: first see "whether free SnapStart works on Java," and if not enough, move to provisioned.


6. VPC cold starts: the current state of Hyperplane ENI

The old knowledge that "putting Lambda in a VPC makes cold starts painfully slow" mostly no longer applies. The ENI Lambda creates for a VPC is a Hyperplane ENI, shared and reused among functions with the same subnet + security group combination, and each ENI handles up to 65,000 connections.

The important point is that ENI creation happens "at function create/update time." On a new VPC attach, the function may temporarily be in Pending state and take a few minutes, but this is detached from the invocation path, so it doesn't ride on steady-operation cold starts. Two notes:

  • After 14 days idle, the ENI is reclaimed and the function becomes Inactive (re-created on the next invocation). Not a problem with steady traffic.
  • Outbound communication needs NAT. Just placing it in a private subnet doesn't reach the internet. To AWS services, you can reach without NAT via VPC endpoints, advantageous in all of latency, cost, and security.

So a VPC is no longer "something to avoid for cold-start reasons" but something usable normally if you do the design (subnet/SG sharing, endpoints) right.


7. Conclusion: a cold-start cheat sheet

  • First measure: identify P99 and heavy init with Init Duration (the REPORT line) and X-Ray's Initialization. Don't warm on a guess.
  • Exhaust the free optimizations: reuse connections outside the handler / reduce package (SDK subclients + tree-shaking) / Arm64 / optimize memory by measuring. Often enough here.
  • SnapStart: Java/Python3.12+/.NET8+. Java is free and the first candidate for an init-heavy JVM. Always re-create uniqueness and connections in afterRestore. $LATEST not allowed, can't combine with provisioned.
  • Provisioned concurrency: when a two-digit-ms SLA is required, or SnapStart is unsupported. Prevent overpaying with Application Auto Scaling target tracking (ProvisionedConcurrencyUtilization at 70%), and use scheduled scaling for predictable peaks.
  • VPC: fast now thanks to Hyperplane ENI. With correct subnet/SG sharing, VPC endpoints, and NAT design, there's no reason to avoid it.
  • Cost view: from August 2025, INIT is also billed. Reducing cold starts improves latency and cost at the same time.

Cold-start measures are not "just warm with provisioned" but a design of measure → free optimizations → SnapStart if a supported runtime → provisioned only when an SLA is required that crushes the tail without paying costs you don't need to. To balance the payment platform's latency requirements and cost, I spent cost "only where it works" in this priority.

"I want to fit my own Lambda's latency (P99) into a production SLA without spending excess cost" — from measurement through choosing the move to auto-scaling in IaC, I accompany you at the speed of one person × generative AI (Claude Code). For the overall design of Lambda operation, please also see the AWS Lambda production-operations guide.


References (official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading