"I overwrite-deployed to production with update-function-code, and a defect hit all users at once" — because Lambda can be deployed in one command, you tend to do shipping without a safety device. For an API handling payment confirmation or user operations, a new version's defect immediately hitting 100% of traffic is unacceptable as production operation.
This article is an implementation guide to safely deploying AWS Lambda with zero downtime and automatic rollback. From the foundation of versions and aliases, it explains end-to-end through canary releases, automatic rollback, IaC (SAM/CDK/Terraform) selection, and keyless CI/CD via OIDC. As material, it also weaves in the shipping judgment from the serverless payment platform (0 double charges in production) that I built as a core developer. The Lambda execution model itself is left to the sister article AWS Lambda production-operations guide; this article concentrates on the single point of "how to ship safely."
Rules for this article: specs, parameter names, and predefined setting names are based on the AWS official documentation (as of June 2026). CodeDeploy setting names, runtimes, and each tool's specs are revised. Always confirm the latest values in the official docs (the "References" at the end) before production rollout.
0. Mental model: separate "the immutable version" from "the moving pointer"
All of safe deployment begins from separating these two.
- Version = an immutable snapshot. Publish and the code and config at that point becomes a numbered version with them fixed. The number monotonically increases and is not reused even if deleted/recreated.
- Alias = a movable pointer to a version. It points at a specific version by a name like
liveorprod. Deploying means "safely switching the alias's target to the new version." $LATESTis mutable. It's overwritten on eachupdate-function-code. So don't point production traffic directly at$LATEST— always call via an alias.- Canary = don't move the pointer all at once; move it gradually by weight. Flow only 10% to the new version, and if no problem, go to 100%. If there's a problem, automatically revert.
This "immutable foundation + moving pointer + gradual switch + automatic rollback" is this article's design.
1. Versions and aliases: call production via an alias
First, build the foundation. Update the code against $LATEST, publish a version once stable, and point the alias at that version. The client calls the alias's ARN.
# 1) コードを更新($LATEST が変わる。本番はまだこれを見ていない)
aws lambda update-function-code --function-name orders --zip-file fileb://build.zip
# 2) 更新完了を待つ(重要。LastUpdateStatus=Successful になるまで次の操作は失敗する)
aws lambda wait function-updated-v2 --function-name orders
# 3) 不変バージョンを公開(番号が振られる。例: 42)
VERSION=$(aws lambda publish-version --function-name orders --query Version --output text)
# 4) エイリアス live をそのバージョンへ。クライアントは live を呼ぶ
aws lambda update-alias --function-name orders --name live --function-version "$VERSION"
Three official specs that bite here:
- An alias is a "qualified ARN." It carries a qualifier (version number or alias name) like
...:function:orders:live. Calling unqualified implicitly runs$LATEST— a source of accidents in production. - Provisioned concurrency and SnapStart are enabled only on a published version/alias (
$LATESTnot allowed). To make latency measures (cold-start optimization) work, this foundation is a premise. - Not every config change publishes a version. For example, reserved concurrency doesn't create a version (because it's a function-wide operational setting).
2. Canary release: flow "only 10%" with a weighted alias
An alias can point at up to 2 published versions and distribute traffic by weight (routing config / AdditionalVersionWeights). This is the heart of a canary release.
# エイリアス live:97%を現行、3%を新バージョン(43)へ。問題なければ重みを上げていく
aws lambda update-alias --function-name orders --name live \
--function-version 42 \
--routing-config 'AdditionalVersionWeights={"43"=0.03}'
Grasp the constraints the official imposes (not keeping them is an error or accident).
- Both versions are published (
$LATESTnot allowed). - Both versions' execution roles are identical.
- DLQ configuration is identical (or both absent).
- They are 2 versions of the same function.
Raising and lowering weights by hand isn't realistic, so automate by leaving it to CodeDeploy (next chapter).
Compatibility with provisioned concurrency: if you want to avoid a cold stack during canary, you can provision more concurrency just while routing is active (the official mentions this). For APIs with a latency SLA, combine canary and provisioned.
3. Canary with automatic rollback: CodeDeploy + SAM
CodeDeploy automatically moves the weighted alias's weight on a predefined schedule and auto-rolls back if a CloudWatch alarm fires. Manual weight adjustment becomes unnecessary.
3.1 Predefined deployment configurations (the official formal names)
| Kind | Setting name (prefixed with CodeDeployDefault.) | Behavior |
|---|---|---|
| Canary | LambdaCanary10Percent5Minutes / 10Minutes / 15Minutes / 30Minutes | flow 10%, then the remaining 90% all at once after the specified minutes |
| Linear | LambdaLinear10PercentEvery1Minute / Every2Minutes / Every3Minutes / Every10Minutes | increase 10% at a time, gradually |
| All at once | LambdaAllAtOnce | 100% in one go (no canary) |
Name note: only the shortest linear is
Every1Minute(singular), the rest are plural (Every2Minutes). In SAM'sDeploymentPreference.Type, use the shortened name with the leadingCodeDeployDefault.Lambdaremoved (e.g.,Canary10Percent10Minutes).
3.2 With SAM, "verify + auto-rollback" composes in a few lines
Combine SAM's AutoPublishAlias (detects code changes and auto-publishes a version + updates the alias) and DeploymentPreference (canary strategy, alarms, hooks), and safe deploy can be written declaratively.
# template.yaml(AWS SAM):カナリア+プリ/ポスト検証+アラームで自動ロールバック
Resources:
OrdersFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs22.x
Architectures: [arm64] # Arm64で実行料金20%減(互換があれば)
AutoPublishAlias: live # これが無いと DeploymentPreference は使えない
DeploymentPreference:
Type: Canary10Percent5Minutes # 10%を5分流し、問題なければ残りを切替
Alarms: # どれか1つでも ALARM になれば自動ロールバック
- !Ref OrdersErrorsAlarm
- !Ref OrdersLatencyP99Alarm
Hooks:
PreTraffic: !Ref PreTrafficCheck # 切替前にスモークテスト
PostTraffic: !Ref PostTrafficCheck # 切替後に結合検証
# 新バージョンのエラー率を監視(鳴ったらロールバック)
OrdersErrorsAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
Namespace: AWS/Lambda
MetricName: Errors
Dimensions:
- { Name: FunctionName, Value: !Ref OrdersFunction }
- { Name: Resource, Value: !Sub "${OrdersFunction}:live" }
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
Pre/post-traffic hooks are verification Lambdas that CodeDeploy calls before and after the traffic switch. A hook calls back its result to CodeDeploy with PutLifecycleEventHookExecutionStatus, and on failure the deployment is aborted and rolled back. By convention, hook function names start with CodeDeployHook_.
# プリトラフィックフック:切替前に新バージョンをスモークテストし、合否をCodeDeployへ返す
import boto3
codedeploy = boto3.client("codedeploy")
def handler(event, context):
deployment_id = event["DeploymentId"]
hook_id = event["LifecycleEventHookExecutionId"]
status = "Succeeded"
try:
run_smoke_tests() # 新バージョン(エイリアス未切替の version)を直接叩いて検証
except Exception:
status = "Failed" # ここでFailedを返すと切替されずロールバックされる
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=deployment_id, lifecycleEventHookExecutionId=hook_id, status=status,
)
return {"status": status}
The first deploy is two-stage: CodeDeploy needs "the old version to switch from," so the first time deploys with only
AutoPublishAlias→ enableDeploymentPreferencefrom the second time.
4. IaC selection: SAM / CDK / Terraform / Serverless Framework
"Which to manage Lambda with" is a buyer-intent-heavy question. Choose by how easy safe deploy is to compose and the team's assets.
| Tool | Safe deploy | Strength | Suited team |
|---|---|---|---|
| AWS SAM | ◎ a few lines of DeploymentPreference | serverless-specialized, canary with minimal config | all-in on AWS serverless, want to ship safely fastest |
| AWS CDK | ◎ LambdaDeploymentGroup | type-safe IaC, NodejsFunction's esbuild bundle | want to compose types, completion, and complex structures in code |
| Terraform | ○ compose it yourself | multi-cloud, existing Terraform assets | already on Terraform, also manage non-AWS |
| Serverless Framework | ○ plugin | the ease of YAML | small-scale, quick. But v4 has a license note |
Key points:
- SAM: with
AutoPublishAlias+DeploymentPreference(Type/Alarms/Hooks), canary + auto-rollback at minimal cost. Being a CloudFormation extension, the generated resources are also traceable. - CDK: equivalent safe deploy with
aws-lambda'sFunction/NodejsFunction(auto-transpile/bundle with esbuild), andaws-codedeploy'sLambdaDeploymentGroup+LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES. Python/Go bundling is an alpha module. - Terraform: combine yourself
aws_lambda_function(publish = true) +aws_lambda_alias(routing_config.additional_version_weights) +aws_codedeploy_app(compute_platform = "Lambda") /aws_codedeploy_deployment_group. Control is there but the wiring increases. - Serverless Framework:
serverless.yml's ease is second to none, but v4 requires a paid subscription for "individuals/organizations with over $2M revenue in the most recent fiscal year" (v3 is free, OSS continuing). Depending on org size, factor in the cost.
# Terraform:バージョン公開+エイリアスの加重ルーティング(カナリアの土台)
resource "aws_lambda_function" "orders" {
function_name = "orders"
role = aws_iam_role.orders.arn
handler = "index.handler"
runtime = "nodejs22.x"
architectures = ["arm64"]
filename = "build.zip"
publish = true # 変更のたびに不変バージョンを公開
}
resource "aws_lambda_alias" "live" {
name = "live"
function_name = aws_lambda_function.orders.function_name
function_version = aws_lambda_function.orders.version
routing_config {
additional_version_weights = { } # CodeDeploy/手動でカナリア時に重みを注入
}
}
5. CI/CD: deploy "keyless" from GitHub Actions
Putting long-lived AWS access keys in GitHub Secrets is a lump of leak risk. The official correct answer is OIDC (OpenID Connect) — exchange the short-lived JWT GitHub issues for an AWS IAM role via AssumeRoleWithWebIdentity and run on temporary credentials. The stored keys become zero.
# .github/workflows/deploy.yml:OIDCで鍵レスにSAMデプロイ(長期キーをSecretsに置かない)
name: deploy
on:
push: { branches: [main] }
permissions:
id-token: write # OIDCトークンの発行に必須
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v6 # 推奨:OIDCで一時クレデンシャル取得
with:
role-to-assume: arn:aws:iam::123456789012:role/github-deploy
aws-region: ap-northeast-1
- uses: aws-actions/setup-sam@v2
- run: sam build
- run: sam deploy --no-confirm-changeset --no-fail-on-empty-changeset
On the IAM side, in the trust policy allow the issuer token.actions.githubusercontent.com and audience sts.amazonaws.com, and narrow to "only this branch of this repository" with the sub claim (least privilege). The detailed design of OIDC is in the sister article keyless CI/CD realized with OIDC.
6. The zero-downtime pitfall: wait on the function state
Finally, a plain pitfall you always step on in production. Lambda updates are asynchronous and have a state machine.
| State | Meaning | Operability |
|---|---|---|
| Pending | being created/configured (VPC ENI creation, etc.) | can't invoke. Updates also fail |
| Active | running | the only state you can invoke |
| Inactive | reclaimed when idle (VPC is 14 days) | the next invocation fails once → re-created in Pending |
| Failed | failed | delete and recreate |
In addition there's LastUpdateStatus (Successful/Failed/InProgress), and while InProgress, the next UpdateFunctionCode/UpdateFunctionConfiguration/PublishVersion fails (ResourceConflictException / 409). So:
- Keep the order code update →
wait function-updated-v2→ config update/publish (typical of failing with 409 when run consecutively in CI). - A VPC function takes time to reflect (Hyperplane ENI; details in the cold-start article). A test that hits right after a deploy should confirm Active first.
# CIでの安全な連続更新:各ステップで完了を待ってから次へ
aws lambda update-function-code --function-name orders --zip-file fileb://build.zip
aws lambda wait function-updated-v2 --function-name orders # ← これを挟まないと409
aws lambda update-function-configuration --function-name orders --environment "Variables={LOG_LEVEL=INFO}"
aws lambda wait function-updated-v2 --function-name orders
aws lambda publish-version --function-name orders
7. Conclusion: a safe-deploy cheat sheet
- Foundation:
$LATESTis mutable. Make a published version (immutable) + an alias (pointer), and call production via a qualified ARN (alias). - Canary: weighted alias (up to 2 versions, weight distribution). The conditions are both versions published, identical execution role, identical DLQ.
- Automation: CodeDeploy's
Canary10Percent5Minutes, etc. + verify with pre/post hooks + auto-rollback with CloudWatch alarms. With SAM, a few lines ofAutoPublishAlias+DeploymentPreference. - IaC: the fastest safe deploy is SAM, type safety is CDK, multi-cloud/existing assets is Terraform, easy but Serverless v4 is paid above $2M annual revenue.
- CI/CD: keyless with OIDC (
id-token: write+AssumeRoleWithWebIdentity, narrow repo/branch withsub). - Pitfall: updates are asynchronous. Wait for
LastUpdateStatus=Successful(an update duringInProgressis 409). A VPC function takes time to reflect.
On the payment platform, I was thorough about the shipping discipline of "immutable version + canary + alarm auto-rollback + keyless CI/CD" that supported 0 double charges in production. Detecting at 10% and automatically reverting before a new version's defect hits all users — this is the foundation for safely evolving an unstoppable payment platform.
"I want to continuously ship my own Lambda in a form that doesn't stop, doesn't break, and can auto-revert" — from designing the canary strategy to making CI/CD keyless and selecting IaC, I accompany you at the speed of one person × generative AI (Claude Code). From an audit of your existing deploy flow onward too, feel free to reach out.
References (official documentation)
- Lambda function versions / aliases — immutable versions, qualified/unqualified ARNs
- Implementing canary deployments using alias routing — weighted aliases,
AdditionalVersionWeights, constraints - Deployment configurations (CodeDeploy) —
LambdaCanary…/LambdaLinear…/LambdaAllAtOnce - Gradual deployments / DeploymentPreference (SAM) —
AutoPublishAlias,Type/Alarms/Hooks - AWS::Serverless::Function DeploymentPreference — automatic rollback via alarms
- AWS CDK aws-codedeploy LambdaDeploymentGroup — safe deploy in CDK
- Terraform aws_lambda_alias / aws_codedeploy_app — weighted routing,
compute_platform - Configuring OpenID Connect in Amazon Web Services (GitHub Docs) —
id-token: write,AssumeRoleWithWebIdentity - Lambda function states — Pending/Active/Inactive/Failed,
LastUpdateStatus - Serverless Framework pricing — v4's paid condition ($2M revenue)