Skip to main content
友田 陽大
ECS on Fargate in production
AWS
ECS
Fargate
CI/CD
Blue/Green
CodeDeploy
GitHub Actions
デプロイ

ECS on Fargate CI/CD Complete Guide: Shipping Safely with Native Blue/Green, CodeDeploy, and GitHub Actions (OIDC)

Organize ECS Fargate's three deployment strategies (rolling, ECS-native Blue/Green, CodeDeploy) and show a keyless GitHub Actions OIDC pipeline in real code. End-to-end through the production-shipping quality gates.

Published
Reading time
15 min read
Author
友田 陽大
Share

"When a failure occurs right after you ship a new container to production, how fast can you roll back to the previous version?"—this is the essence of deployment. Speed is secondary. Whether you can instantly revert when it breaks comes first.

On a lumber-distribution SaaS that won a Minister of Economy, Trade and Industry Award, I operated 221 API endpoints in production on a configuration of API Gateway → NLB → ALB → ECS on Fargate. The payment foundation maintains zero double charges in production. To deliver production quality with "one person × generative AI (Claude Code)," it was essential to make the deployment pipeline itself a structure that is hard to break and easy to revert.

This article aims to organize ECS on Fargate's three deployment methods and let you judge when and how to use each. The basics of ECS-platform design, networking, and security are left to the pillar article; this piece focuses on "deployment itself."


ECS's three deployment methods: grasp the map first

ECS on Fargate offers three choices for how to update a service. Which you choose changes the complexity of the infrastructure and the rollback speed.

MethodDeployment controllerBlue/GreenRollback speedRequired additional resources
Rolling updateECSNoneDrop the new version and start the old (tens of seconds~)None
ECS-native Blue/GreenECSYes (GA in 2025)Instant switch during bake timeNone (ECS-complete)
CodeDeploy Blue/GreenCODEDEPLOYYesInstant switch during bake timeCodeDeploy app, test listener, TG×2

The basic way to choose:

  • Rolling update: a simple configuration is enough, you want to keep the deployment mechanism simple, and a short downtime risk is acceptable.
  • ECS-native Blue/Green: you want Blue/Green but don't want to add resources. The default recommendation for new configurations from 2025 onward.
  • CodeDeploy Blue/Green: you already have CodeDeploy assets, you want to do complex validation with Lambda hooks, or you're building on an existing configuration combined with NLB.

① Reviewing rolling updates: min/max and the circuit breaker

ECS's simplest deployment is the rolling update. The deployment controller is ECS, and the deployment strategy is implicitly rolling. The behavior is decided by two parameters (official).

  • minimumHealthyPercent: the lower bound (%, rounded up) of the number of tasks that must be healthy during a deployment.
  • maximumPercent: the upper bound (%, rounded down) of the number of tasks that may be running during a deployment.

min 100% / max 200% (the pillar article's setting) is the safest side. With desiredCount=2, it first stands up two new-version tasks (4 total), confirms health, and then stops the two old-version tasks. The cost temporarily doubles, but availability doesn't drop at all.

Deployment circuit breaker

What prevents the accident of routing traffic without noticing that new tasks keep crash-looping is the deployment circuit breaker.

resource "aws_ecs_service" "app" {
  # ...
  deployment_circuit_breaker {
    enable   = true
    rollback = true # 失敗を検知したら前リビジョンへ自動ロールバック
  }
  # CloudWatch アラームとの併用も可能(どちらかの条件成立で失敗扱い)
  deployment_controller {
    type = "ECS"
  }
}

Setting rollback = true automatically reverts to the previous revision as a deployment failure when new tasks fail health checks a prescribed number of times. It can also be linked with CloudWatch alarms; enabling both treats it as a failure "the moment either condition is met."

Version consistency via image digest

ECS resolves a tag to an image digest by default, guaranteeing that all tasks in a service run the same binary (versionConsistency). Using the latest tag causes the accident of "I rebuilt, the contents of latest changed, and only some tasks run a different version." Use the CommitSHA as the tag and don't depend on latest—that's the iron rule.


② ECS-native Blue/Green: GA in 2025, no CodeDeploy needed

ECS-native Blue/Green, which GA'd in July 2025 and reached feature parity with CodeDeploy via the addition of canary/linear in October 2025, is the recommendation for future new configurations. It's complete within ECS alone, without separately managing CodeDeploy.

AWS released native Blue/Green deployments for Amazon ECS — you can now do blue/green deployments directly from ECS without needing AWS CodeDeploy.(— AWS Blog

How it works

Keep the deployment controller as ECS and select BLUE_GREEN as the deployment strategy to enable it.

  1. When a service update begins, start the green task group.
  2. Once green becomes healthy, switch ALB traffic to green.
  3. For the duration of bakeTimeInMinutes, keep both blue (old) and green (new). During this time, you can instantly revert to blue with one click (instant rollback).
  4. Once bake time expires, delete the blue tasks.

The six deployment lifecycle hooks

In ECS-native Blue/Green, six lifecycle hooks are provided where you can insert a Lambda.

HookTiming
PRE_SCALE_UPBefore scaling up green tasks
POST_SCALE_UPAfter scaling up green tasks (startup confirmation)
TEST_TRAFFIC_SHIFTBefore directing test traffic to green
POST_TEST_TRAFFIC_SHIFTAfter switching test traffic (smoke test)
PRODUCTION_TRAFFIC_SHIFTBefore directing production traffic to green
POST_PRODUCTION_TRAFFIC_SHIFTAfter switching production traffic (production confirmation)

A hook's Lambda returns its validation result to ECS. On failure, the deployment is aborted at that point and reverts to blue.

Kinds of traffic shift

KindBehavior
ALL_AT_ONCESwitch to green all at once (fastest, concentrated risk)
CANARYFirst a specified % to green, then switch the rest after a wait
LINEARShift to green in stages by a fixed % at a time

Terraform configuration example

resource "aws_ecs_service" "app" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  deployment_controller {
    type = "ECS" # ECSネイティブBlue/Greenはコントローラ "ECS" のまま
  }

  deployment_configuration {
    strategy {
      type = "BLUE_GREEN"
      bake_time_in_minutes = 10 # 切り替え後10分間、旧版を保持してインスタントロールバックを維持
    }
    # トラフィックシフト方式(CANARY例)
    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  # ライフサイクルフック(任意)
  # hooks を指定することで Lambdaによる各段階での検証が可能

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }
  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }
}

Caveat: because ECS-native Blue/Green is a relatively new feature (GA in 2025), depending on your Terraform provider version, the strategy sub-block of the deployment_configuration block may not yet be supported. In that case you need a transitional workaround of configuring it via the AWS CLI or console and ignoring it in Terraform. CLI configuration is shown below.

Enabling ECS-native Blue/Green via the AWS CLI

If the Terraform provider hasn't caught up, configure it directly with create-service or update-service.

aws ecs update-service \
  --cluster prod \
  --service web-api \
  --deployment-configuration '{
    "strategy": {
      "type": "BLUE_GREEN",
      "bakeTimeInMinutes": 10
    },
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }' \
  --region ap-northeast-1

③ CodeDeploy Blue/Green: for existing assets and advanced validation

If ECS-native Blue/Green suffices, there's no reason to add CodeDeploy. But CodeDeploy is suited to the following cases.

  • You want to unify with existing CodeDeploy assets (other services or approval flows).
  • You want to finely insert advanced validation with Lambda hooks (integration tests with external APIs, DB-migration confirmation, Chaos Engineering scenarios).
  • You have an existing target-group-switching setup on an NLB configuration (but with NLB, only AllAtOnce).

Required resources

CodeDeploy-based ECS Blue/Green requires the following (official).

  1. An ALB (required. NLB is also possible but with traffic-shift restrictions)
  2. A production listener (port 443, etc.) and a test listener (optional but recommended: port 8080, etc.), belonging to the same ALB
  3. Two target groups (for blue and for green)
  4. A CodeDeploy app (compute platform ECS) and a deployment group
  5. An AppSpec (task-definition ARN, container name, port, hook Lambda ARNs)
  6. ecsCodeDeployRole (the IAM role for CodeDeploy to operate ECS and the LB)

AppSpec example

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "arn:aws:ecs:ap-northeast-1:111122223333:task-definition/web-api:42"
        LoadBalancerInfo:
          ContainerName: "app"
          ContainerPort: 8080
        PlatformVersion: "LATEST"
        NetworkConfiguration:
          AwsvpcConfiguration:
            Subnets:
              - "subnet-0abc123def456"
              - "subnet-0def789abc123"
            SecurityGroups:
              - "sg-0abcdef1234567890"
            AssignPublicIp: "DISABLED"
Hooks:
  - BeforeAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:pre-deploy-validation"
  - AfterAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:smoke-test"
  - BeforeAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:final-gate"
  - AfterAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:post-deploy-check"

Predefined deployment configurations

Config nameBehavior
CodeDeployDefault.ECSAllAtOnceSwitch all at once (usable with both ALB and NLB)
CodeDeployDefault.ECSLinear10PercentEvery1MinutesShift 10% every minute (complete in 10 minutes)
CodeDeployDefault.ECSLinear10PercentEvery3MinutesShift 10% every 3 minutes (complete in 30 minutes)
CodeDeployDefault.ECSCanary10Percent5MinutesFirst confirm 10% for 5 minutes, then the rest all at once
CodeDeployDefault.ECSCanary10Percent15MinutesFirst confirm 10% for 15 minutes, then the rest all at once

NLB restriction: when combined with NLB, traffic shifting is restricted to ECSAllAtOnce only.

CodeDeploy configuration in Terraform

# ECSサービスのデプロイコントローラを CODEDEPLOY に設定
resource "aws_ecs_service" "app" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  deployment_controller {
    type = "CODEDEPLOY"
  }

  # CodeDeploy管理下では load_balancer は2つのTGを参照
  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = 8080
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }

  # CodeDeployがTGを入れ替えるため、TFのplan差分を無視する
  lifecycle {
    ignore_changes = [task_definition, load_balancer]
  }
}

# IAMロール:CodeDeployがECSとLBを操作するための権限
resource "aws_iam_role" "ecs_codedeploy" {
  name               = "ecsCodeDeployRole"
  assume_role_policy = data.aws_iam_policy_document.codedeploy_assume.json
}

resource "aws_iam_role_policy_attachment" "ecs_codedeploy" {
  role       = aws_iam_role.ecs_codedeploy.name
  policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
}

# CodeDeployアプリ・デプロイグループ
resource "aws_codedeploy_app" "ecs" {
  compute_platform = "ECS"
  name             = "web-api"
}

resource "aws_codedeploy_deployment_group" "ecs" {
  app_name               = aws_codedeploy_app.ecs.name
  deployment_group_name  = "prod"
  service_role_arn       = aws_iam_role.ecs_codedeploy.arn
  deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"

  ecs_service {
    cluster_name = aws_ecs_cluster.main.name
    service_name = aws_ecs_service.app.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.https.arn]
      }
      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }
      target_group {
        name = aws_lb_target_group.blue.name
      }
      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }
    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5 # bake時間。この間はblueが残りロールバック可能
    }
  }
}

iam:PassRole is required: also grant the GitHub Actions / CI/CD IAM role permission to PassRole the ecsCodeDeployRole.


④ GitHub Actions (OIDC) pipeline: keyless and safe

Placing long-lived access keys in a deployment pipeline is a 2026 anti-pattern. With keyless CI/CD via GitHub Actions OIDC, everything is complete by obtaining temporary credentials for an IAM role.

Pipeline overview

git push → trigger GitHub Actions
  → obtain AWS temporary credentials via OIDC
  → quality gates (type check, tests, vulnerability scan)
  → push to ECR with a CommitSHA tag
  → swap the task definition's image to the new tag (register a new revision)
  → deploy to the ECS service (rolling or Blue/Green)

The complete workflow.yml

name: Deploy to ECS Fargate

on:
  push:
    branches:
      - main

permissions:
  id-token: write   # OIDC トークン取得に必要
  contents: read

env:
  AWS_REGION: ap-northeast-1
  ECR_REPOSITORY: web-api
  ECS_CLUSTER: prod
  ECS_SERVICE: web-api
  CONTAINER_NAME: app
  TASK_DEFINITION_FAMILY: web-api

jobs:
  quality-gate:
    name: Quality Gate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "22"
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Type check
        run: npm run type-check

      - name: Unit tests
        run: npm test -- --run

      - name: Build
        run: npm run build

  deploy:
    name: Build & Deploy
    runs-on: ubuntu-latest
    needs: quality-gate  # 品質ゲートが通ってからデプロイ
    environment: production

    steps:
      - uses: actions/checkout@v4

      # ─── OIDC で AWS に認証(長期キー不要)───
      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-deploy
          aws-region: ${{ env.AWS_REGION }}
          # role-session-name でどのジョブが認証したか CloudTrail に残る
          role-session-name: github-${{ github.sha }}

      # ─── ECR へ push ───
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build, tag, and push image to ECR
        id: build-image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}  # CommitSHA で版を固定(latest 非依存)
        run: |
          docker build \
            --platform linux/arm64 \
            --build-arg BUILD_SHA=$IMAGE_TAG \
            -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
            -t $ECR_REGISTRY/$ECR_REPOSITORY:latest \
            .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          # latest も更新(human確認用。ECSはSHAタグを使う)
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT

      # ─── ECR イメージスキャン結果を確認(HIGH/CRITICALがあれば失敗) ───
      - name: Wait for ECR scan and check results
        env:
          IMAGE_TAG: ${{ github.sha }}
        run: |
          aws ecr wait image-scan-complete \
            --repository-name $ECR_REPOSITORY \
            --image-id imageTag=$IMAGE_TAG \
            --region $AWS_REGION || true
          FINDINGS=$(aws ecr describe-image-scan-findings \
            --repository-name $ECR_REPOSITORY \
            --image-id imageTag=$IMAGE_TAG \
            --query 'imageScanFindings.findingSeverityCounts' \
            --output json 2>/dev/null || echo '{}')
          HIGH=$(echo $FINDINGS | jq '.HIGH // 0')
          CRITICAL=$(echo $FINDINGS | jq '.CRITICAL // 0')
          echo "HIGH=$HIGH CRITICAL=$CRITICAL"
          if [ "$HIGH" -gt 0 ] || [ "$CRITICAL" -gt 0 ]; then
            echo "::error::Image has HIGH or CRITICAL vulnerabilities. Blocking deploy."
            exit 1
          fi

      # ─── タスク定義の image を新タグへ差し替え ───
      - name: Download current task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition $TASK_DEFINITION_FAMILY \
            --query taskDefinition \
            > task-definition.json

      - name: Render new task definition with updated image
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: ${{ env.CONTAINER_NAME }}
          image: ${{ steps.build-image.outputs.image }}

      # ─── ECS へデプロイ(ローリング or ECSネイティブBlue/Green) ───
      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true  # デプロイが安定するまで待機(失敗したらCI失敗)

      - name: Notify deployment result
        if: always()
        env:
          STATUS: ${{ job.status }}
          SHA: ${{ github.sha }}
        run: |
          echo "Deploy status: $STATUS | SHA: $SHA"
          # Slack通知などはここで実装

The diff when using CodeDeploy Blue/Green

aws-actions/amazon-ecs-deploy-task-definition also supports CodeDeploy. Passing an AppSpec via the codedeploy-appspec option starts the CodeDeploy deployment and waits for completion.

      - name: Deploy via CodeDeploy Blue/Green
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true
          codedeploy-appspec: appspec.yaml
          codedeploy-application: web-api
          codedeploy-deployment-group: prod

OIDC IAM-role configuration (reference)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::111122223333:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
      }
    }
  ]
}

The principle is to narrow the policy attached to this role to minimal permissions (ECR push, ecs:RegisterTaskDefinition, ecs:UpdateService, iam:PassRole (for the task execution role)). For details, see the GitHub Actions OIDC guide.


⑤ Pre-deploy quality gates and container-side preparation

No matter how refined your deployment method, if the container itself is broken, none of it means anything. Circuit breakers and Blue/Green are mechanisms to "abort a deploy and revert," not mechanisms to "deliver a healthy image."

Layers of pre-shipping quality gates

LayerContentTiming
Type checkTypeScript tsc --noEmit / mypy for PythonPR + main push
Unit testsBusiness logic, validationPR + main push
Container buildConfirm docker build succeedsmain push
Vulnerability scanECR Enhanced Scanning / Trivyafter push, before deploy
Integration testsConnectivity check in a staging environment (optional)main push

Coordination with graceful shutdown

During a deployment, ECS stops tasks. The flow of stopping is this.

  1. Deregister the task from the ALB target group (during deregistration_delay, protect in-flight requests)
  2. Send SIGTERM to the container
  3. Wait for it to finish during stopTimeout (default 30 seconds, max 120 seconds)
  4. If it doesn't finish, force-terminate with SIGKILL (data-loss risk)

In Blue/Green deployments too, the same SIGTERM sequence runs when deleting the old (blue) tasks. Even at the timing of dropping blue after bake time ends, if you can't handle SIGTERM correctly, data loss occurs.

What's especially important in coordination with health checks is startPeriod. Set a grace period so health checks don't fail during the initialization right after a container starts (establishing DB connections, cache warm-up).

{
  "healthCheck": {
    "command": ["CMD-SHELL", "wget -q -O - http://localhost:8080/healthz || exit 1"],
    "interval": 15,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 30
  },
  "stopTimeout": 60
}

The implementation details of graceful shutdown (the Node.js code for a SIGTERM handler, etc.) are described in the pillar article. For production troubleshooting, the ECS troubleshooting article is useful.

The two-layer structure of health checks

ECS has two health-check layers, and both are linked.

  • Container health check (the task definition's healthCheck): ECS judges whether the container is down. If failures continue, ECS replaces the task.
  • ALB target health check (the target group's health_check): the ALB judges whether "it may send requests to this IP." On failure, the ALB removes the task from the target.

During a Blue/Green deployment, production traffic moves to green tasks only after they "pass the ALB target health check." Using a staging or internal-test listener as the test listener lets you do an extra connectivity check before sending production traffic.


Summary: build a state where you can instantly revert when it breaks

I've organized ECS on Fargate's deployment strategies into three methods.

MethodRecommended sceneAdditional resources
Rolling + circuit breakerSimple configuration, short downtime acceptableNone
ECS-native Blue/GreenThe first choice for new configurations from 2025 onwardNone
CodeDeploy Blue/GreenExisting CodeDeploy assets, advanced Lambda-hook validationTG×2, test listener, CodeDeploy config

Whichever method you choose, the essence of "shipping safely" doesn't change.

  1. Solidify quality gates in CI (types, tests, scans)
  2. Pin the image version with a CommitSHA tag (eliminate latest dependence)
  3. Authenticate keyless with OIDC and place no long-lived access keys at all
  4. Handle SIGTERM correctly and end cleanly on every deploy, scale-in, and abort
  5. Enable automatic rollback (the circuit breaker or Blue/Green's bake time)

With this pattern, I operated 221 endpoints in production on API Gateway → NLB → ALB → ECS on Fargate and maintained zero double charges on the payment foundation. For details, see the lumber-distribution SaaS case study.

For Fargate's compute-platform selection (ECS vs Lambda vs App Runner), see the comparison article; for IaC state management and module design, the Terraform article; and for post-deploy observability, OpenTelemetry × ECS—reference them together.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading