Skip to main content
友田 陽大
AWS Lambda in production
AWS
Lambda
CI/CD
サーバーレス
IaC

Safe Lambda deployment: versions, aliases, canary releases (CodeDeploy), and SAM/CDK/Terraform selection

An implementation guide to safely deploying AWS Lambda with zero downtime. With real code faithful to the AWS official specs, it covers immutable versions and aliases, weighted aliases and CodeDeploy canary/linear delivery, pre/post-traffic hooks and automatic rollback via CloudWatch alarms, waiting on the function state, selecting SAM/CDK/Terraform/Serverless Framework, and keyless CI/CD via GitHub Actions OIDC.

Published
Reading time
11 min read
Author
友田 陽大
Share

"I overwrite-deployed to production with update-function-code, and a defect hit all users at once" — because Lambda can be deployed in one command, you tend to do shipping without a safety device. For an API handling payment confirmation or user operations, a new version's defect immediately hitting 100% of traffic is unacceptable as production operation.

This article is an implementation guide to safely deploying AWS Lambda with zero downtime and automatic rollback. From the foundation of versions and aliases, it explains end-to-end through canary releases, automatic rollback, IaC (SAM/CDK/Terraform) selection, and keyless CI/CD via OIDC. As material, it also weaves in the shipping judgment from the serverless payment platform (0 double charges in production) that I built as a core developer. The Lambda execution model itself is left to the sister article AWS Lambda production-operations guide; this article concentrates on the single point of "how to ship safely."

Rules for this article: specs, parameter names, and predefined setting names are based on the AWS official documentation (as of June 2026). CodeDeploy setting names, runtimes, and each tool's specs are revised. Always confirm the latest values in the official docs (the "References" at the end) before production rollout.


0. Mental model: separate "the immutable version" from "the moving pointer"

All of safe deployment begins from separating these two.

  • Version = an immutable snapshot. Publish and the code and config at that point becomes a numbered version with them fixed. The number monotonically increases and is not reused even if deleted/recreated.
  • Alias = a movable pointer to a version. It points at a specific version by a name like live or prod. Deploying means "safely switching the alias's target to the new version."
  • $LATEST is mutable. It's overwritten on each update-function-code. So don't point production traffic directly at $LATEST — always call via an alias.
  • Canary = don't move the pointer all at once; move it gradually by weight. Flow only 10% to the new version, and if no problem, go to 100%. If there's a problem, automatically revert.

This "immutable foundation + moving pointer + gradual switch + automatic rollback" is this article's design.


1. Versions and aliases: call production via an alias

First, build the foundation. Update the code against $LATEST, publish a version once stable, and point the alias at that version. The client calls the alias's ARN.

# 1) コードを更新($LATEST が変わる。本番はまだこれを見ていない)
aws lambda update-function-code --function-name orders --zip-file fileb://build.zip

# 2) 更新完了を待つ(重要。LastUpdateStatus=Successful になるまで次の操作は失敗する)
aws lambda wait function-updated-v2 --function-name orders

# 3) 不変バージョンを公開(番号が振られる。例: 42)
VERSION=$(aws lambda publish-version --function-name orders --query Version --output text)

# 4) エイリアス live をそのバージョンへ。クライアントは live を呼ぶ
aws lambda update-alias --function-name orders --name live --function-version "$VERSION"

Three official specs that bite here:

  • An alias is a "qualified ARN." It carries a qualifier (version number or alias name) like ...:function:orders:live. Calling unqualified implicitly runs $LATEST — a source of accidents in production.
  • Provisioned concurrency and SnapStart are enabled only on a published version/alias ($LATEST not allowed). To make latency measures (cold-start optimization) work, this foundation is a premise.
  • Not every config change publishes a version. For example, reserved concurrency doesn't create a version (because it's a function-wide operational setting).

2. Canary release: flow "only 10%" with a weighted alias

An alias can point at up to 2 published versions and distribute traffic by weight (routing config / AdditionalVersionWeights). This is the heart of a canary release.

# エイリアス live:97%を現行、3%を新バージョン(43)へ。問題なければ重みを上げていく
aws lambda update-alias --function-name orders --name live \
  --function-version 42 \
  --routing-config 'AdditionalVersionWeights={"43"=0.03}'

Grasp the constraints the official imposes (not keeping them is an error or accident).

  • Both versions are published ($LATEST not allowed).
  • Both versions' execution roles are identical.
  • DLQ configuration is identical (or both absent).
  • They are 2 versions of the same function.

Raising and lowering weights by hand isn't realistic, so automate by leaving it to CodeDeploy (next chapter).

Compatibility with provisioned concurrency: if you want to avoid a cold stack during canary, you can provision more concurrency just while routing is active (the official mentions this). For APIs with a latency SLA, combine canary and provisioned.


3. Canary with automatic rollback: CodeDeploy + SAM

CodeDeploy automatically moves the weighted alias's weight on a predefined schedule and auto-rolls back if a CloudWatch alarm fires. Manual weight adjustment becomes unnecessary.

3.1 Predefined deployment configurations (the official formal names)

KindSetting name (prefixed with CodeDeployDefault.)Behavior
CanaryLambdaCanary10Percent5Minutes / 10Minutes / 15Minutes / 30Minutesflow 10%, then the remaining 90% all at once after the specified minutes
LinearLambdaLinear10PercentEvery1Minute / Every2Minutes / Every3Minutes / Every10Minutesincrease 10% at a time, gradually
All at onceLambdaAllAtOnce100% in one go (no canary)

Name note: only the shortest linear is Every1Minute (singular), the rest are plural (Every2Minutes). In SAM's DeploymentPreference.Type, use the shortened name with the leading CodeDeployDefault.Lambda removed (e.g., Canary10Percent10Minutes).

3.2 With SAM, "verify + auto-rollback" composes in a few lines

Combine SAM's AutoPublishAlias (detects code changes and auto-publishes a version + updates the alias) and DeploymentPreference (canary strategy, alarms, hooks), and safe deploy can be written declaratively.

# template.yaml(AWS SAM):カナリア+プリ/ポスト検証+アラームで自動ロールバック
Resources:
  OrdersFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs22.x
      Architectures: [arm64]              # Arm64で実行料金20%減(互換があれば)
      AutoPublishAlias: live              # これが無いと DeploymentPreference は使えない
      DeploymentPreference:
        Type: Canary10Percent5Minutes     # 10%を5分流し、問題なければ残りを切替
        Alarms:                           # どれか1つでも ALARM になれば自動ロールバック
          - !Ref OrdersErrorsAlarm
          - !Ref OrdersLatencyP99Alarm
        Hooks:
          PreTraffic: !Ref PreTrafficCheck   # 切替前にスモークテスト
          PostTraffic: !Ref PostTrafficCheck # 切替後に結合検証

  # 新バージョンのエラー率を監視(鳴ったらロールバック)
  OrdersErrorsAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      Namespace: AWS/Lambda
      MetricName: Errors
      Dimensions:
        - { Name: FunctionName, Value: !Ref OrdersFunction }
        - { Name: Resource, Value: !Sub "${OrdersFunction}:live" }
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 1
      Threshold: 1
      ComparisonOperator: GreaterThanOrEqualToThreshold

Pre/post-traffic hooks are verification Lambdas that CodeDeploy calls before and after the traffic switch. A hook calls back its result to CodeDeploy with PutLifecycleEventHookExecutionStatus, and on failure the deployment is aborted and rolled back. By convention, hook function names start with CodeDeployHook_.

# プリトラフィックフック:切替前に新バージョンをスモークテストし、合否をCodeDeployへ返す
import boto3
codedeploy = boto3.client("codedeploy")

def handler(event, context):
    deployment_id = event["DeploymentId"]
    hook_id = event["LifecycleEventHookExecutionId"]
    status = "Succeeded"
    try:
        run_smoke_tests()   # 新バージョン(エイリアス未切替の version)を直接叩いて検証
    except Exception:
        status = "Failed"   # ここでFailedを返すと切替されずロールバックされる
    codedeploy.put_lifecycle_event_hook_execution_status(
        deploymentId=deployment_id, lifecycleEventHookExecutionId=hook_id, status=status,
    )
    return {"status": status}

The first deploy is two-stage: CodeDeploy needs "the old version to switch from," so the first time deploys with only AutoPublishAlias → enable DeploymentPreference from the second time.


4. IaC selection: SAM / CDK / Terraform / Serverless Framework

"Which to manage Lambda with" is a buyer-intent-heavy question. Choose by how easy safe deploy is to compose and the team's assets.

ToolSafe deployStrengthSuited team
AWS SAM◎ a few lines of DeploymentPreferenceserverless-specialized, canary with minimal configall-in on AWS serverless, want to ship safely fastest
AWS CDKLambdaDeploymentGrouptype-safe IaC, NodejsFunction's esbuild bundlewant to compose types, completion, and complex structures in code
Terraform○ compose it yourselfmulti-cloud, existing Terraform assetsalready on Terraform, also manage non-AWS
Serverless Framework○ pluginthe ease of YAMLsmall-scale, quick. But v4 has a license note

Key points:

  • SAM: with AutoPublishAlias + DeploymentPreference (Type/Alarms/Hooks), canary + auto-rollback at minimal cost. Being a CloudFormation extension, the generated resources are also traceable.
  • CDK: equivalent safe deploy with aws-lambda's Function / NodejsFunction (auto-transpile/bundle with esbuild), and aws-codedeploy's LambdaDeploymentGroup + LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES. Python/Go bundling is an alpha module.
  • Terraform: combine yourself aws_lambda_function (publish = true) + aws_lambda_alias (routing_config.additional_version_weights) + aws_codedeploy_app (compute_platform = "Lambda") / aws_codedeploy_deployment_group. Control is there but the wiring increases.
  • Serverless Framework: serverless.yml's ease is second to none, but v4 requires a paid subscription for "individuals/organizations with over $2M revenue in the most recent fiscal year" (v3 is free, OSS continuing). Depending on org size, factor in the cost.
# Terraform:バージョン公開+エイリアスの加重ルーティング(カナリアの土台)
resource "aws_lambda_function" "orders" {
  function_name = "orders"
  role          = aws_iam_role.orders.arn
  handler       = "index.handler"
  runtime       = "nodejs22.x"
  architectures = ["arm64"]
  filename      = "build.zip"
  publish       = true # 変更のたびに不変バージョンを公開
}

resource "aws_lambda_alias" "live" {
  name             = "live"
  function_name    = aws_lambda_function.orders.function_name
  function_version = aws_lambda_function.orders.version
  routing_config {
    additional_version_weights = { } # CodeDeploy/手動でカナリア時に重みを注入
  }
}

5. CI/CD: deploy "keyless" from GitHub Actions

Putting long-lived AWS access keys in GitHub Secrets is a lump of leak risk. The official correct answer is OIDC (OpenID Connect) — exchange the short-lived JWT GitHub issues for an AWS IAM role via AssumeRoleWithWebIdentity and run on temporary credentials. The stored keys become zero.

# .github/workflows/deploy.yml:OIDCで鍵レスにSAMデプロイ(長期キーをSecretsに置かない)
name: deploy
on:
  push: { branches: [main] }
permissions:
  id-token: write   # OIDCトークンの発行に必須
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v6   # 推奨:OIDCで一時クレデンシャル取得
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-deploy
          aws-region: ap-northeast-1
      - uses: aws-actions/setup-sam@v2
      - run: sam build
      - run: sam deploy --no-confirm-changeset --no-fail-on-empty-changeset

On the IAM side, in the trust policy allow the issuer token.actions.githubusercontent.com and audience sts.amazonaws.com, and narrow to "only this branch of this repository" with the sub claim (least privilege). The detailed design of OIDC is in the sister article keyless CI/CD realized with OIDC.


6. The zero-downtime pitfall: wait on the function state

Finally, a plain pitfall you always step on in production. Lambda updates are asynchronous and have a state machine.

StateMeaningOperability
Pendingbeing created/configured (VPC ENI creation, etc.)can't invoke. Updates also fail
Activerunningthe only state you can invoke
Inactivereclaimed when idle (VPC is 14 days)the next invocation fails once → re-created in Pending
Failedfaileddelete and recreate

In addition there's LastUpdateStatus (Successful/Failed/InProgress), and while InProgress, the next UpdateFunctionCode/UpdateFunctionConfiguration/PublishVersion fails (ResourceConflictException / 409). So:

  • Keep the order code update → wait function-updated-v2 → config update/publish (typical of failing with 409 when run consecutively in CI).
  • A VPC function takes time to reflect (Hyperplane ENI; details in the cold-start article). A test that hits right after a deploy should confirm Active first.
# CIでの安全な連続更新:各ステップで完了を待ってから次へ
aws lambda update-function-code --function-name orders --zip-file fileb://build.zip
aws lambda wait function-updated-v2 --function-name orders        # ← これを挟まないと409
aws lambda update-function-configuration --function-name orders --environment "Variables={LOG_LEVEL=INFO}"
aws lambda wait function-updated-v2 --function-name orders
aws lambda publish-version --function-name orders

7. Conclusion: a safe-deploy cheat sheet

  • Foundation: $LATEST is mutable. Make a published version (immutable) + an alias (pointer), and call production via a qualified ARN (alias).
  • Canary: weighted alias (up to 2 versions, weight distribution). The conditions are both versions published, identical execution role, identical DLQ.
  • Automation: CodeDeploy's Canary10Percent5Minutes, etc. + verify with pre/post hooks + auto-rollback with CloudWatch alarms. With SAM, a few lines of AutoPublishAlias + DeploymentPreference.
  • IaC: the fastest safe deploy is SAM, type safety is CDK, multi-cloud/existing assets is Terraform, easy but Serverless v4 is paid above $2M annual revenue.
  • CI/CD: keyless with OIDC (id-token: write + AssumeRoleWithWebIdentity, narrow repo/branch with sub).
  • Pitfall: updates are asynchronous. Wait for LastUpdateStatus=Successful (an update during InProgress is 409). A VPC function takes time to reflect.

On the payment platform, I was thorough about the shipping discipline of "immutable version + canary + alarm auto-rollback + keyless CI/CD" that supported 0 double charges in production. Detecting at 10% and automatically reverting before a new version's defect hits all users — this is the foundation for safely evolving an unstoppable payment platform.

"I want to continuously ship my own Lambda in a form that doesn't stop, doesn't break, and can auto-revert" — from designing the canary strategy to making CI/CD keyless and selecting IaC, I accompany you at the speed of one person × generative AI (Claude Code). From an audit of your existing deploy flow onward too, feel free to reach out.


References (official documentation)

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading