# ECS on Fargate CI/CD Complete Guide: Shipping Safely with Native Blue/Green, CodeDeploy, and GitHub Actions (OIDC)

> Organize ECS Fargate's three deployment strategies (rolling, ECS-native Blue/Green, CodeDeploy) and show a keyless GitHub Actions OIDC pipeline in real code. End-to-end through the production-shipping quality gates.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: AWS, ECS, Fargate, CI/CD, Blue/Green, CodeDeploy, GitHub Actions, デプロイ
- URL: https://tomodahinata.com/en/blog/aws-ecs-fargate-cicd-blue-green-codedeploy-github-actions-guide
- Category: ECS on Fargate in production
- Pillar guide: https://tomodahinata.com/en/blog/aws-ecs-fargate-production-guide

## Key points

- ECS has three deployment methods: rolling update (ECS type), ECS-native Blue/Green (GA in 2025, no CodeDeploy needed), and CodeDeploy Blue/Green. Consider ECS-native first, and choose CodeDeploy only when you need advanced Lambda-hook validation
- Enable ECS-native Blue/Green with deployment controller=ECS + strategy=BLUE_GREEN. With bakeTimeInMinutes, keep blue even after switching to green, maintaining instant rollback. You can insert Lambda validation at six lifecycle hooks
- CodeDeploy Blue/Green requires a production listener + test listener on the ALB, two target groups, and an AppSpec. With NLB, traffic shifting is restricted to ECSAllAtOnce only
- With GitHub Actions OIDC, place no long-lived access keys at all and obtain temporary credentials via role-to-assume. You can fully automate ECR push → task-definition update → ECS deploy
- The premise of a successful deploy is both the container-side quality gates (types, tests, vulnerability scans) and a graceful shutdown that receives SIGTERM and ends cleanly within stopTimeout

---

"When a failure occurs right after you ship a new container to production, how fast can you roll back to the previous version?"—this is the essence of deployment. **Speed is secondary. Whether you can instantly revert when it breaks** comes first.

On a lumber-distribution SaaS that won a Minister of Economy, Trade and Industry Award, I operated **221 API endpoints** in production on a configuration of `API Gateway → NLB → ALB → ECS on Fargate`. The payment foundation maintains [zero double charges in production](/case-studies/lumber-industry-dx). To deliver production quality with "one person × generative AI (Claude Code)," it was essential to make the deployment pipeline itself a structure that is **hard to break and easy to revert.**

This article aims to organize ECS on Fargate's three deployment methods and let you judge **when and how to use each.** The basics of ECS-platform design, networking, and security are left to the [pillar article](/blog/aws-ecs-fargate-production-guide); this piece focuses on **"deployment itself."**

---

## ECS's three deployment methods: grasp the map first

ECS on Fargate offers three choices for how to update a service. Which you choose changes the complexity of the infrastructure and the rollback speed.

| Method | Deployment controller | Blue/Green | Rollback speed | Required additional resources |
|------|------------------|-----------|-----------------|-----------------|
| Rolling update | ECS | None | Drop the new version and start the old (tens of seconds~) | None |
| ECS-native Blue/Green | ECS | Yes (GA in 2025) | Instant switch during bake time | None (ECS-complete) |
| CodeDeploy Blue/Green | CODEDEPLOY | Yes | Instant switch during bake time | CodeDeploy app, test listener, TG×2 |

**The basic way to choose**:

- **Rolling update**: a simple configuration is enough, you want to keep the deployment mechanism simple, and a short downtime risk is acceptable.
- **ECS-native Blue/Green**: you want Blue/Green but don't want to add resources. **The default recommendation for new configurations from 2025 onward.**
- **CodeDeploy Blue/Green**: you already have CodeDeploy assets, you want to do complex validation with Lambda hooks, or you're building on an existing configuration combined with NLB.

---

## ① Reviewing rolling updates: min/max and the circuit breaker

ECS's simplest deployment is the **rolling update.** The deployment controller is `ECS`, and the deployment strategy is implicitly rolling. The behavior is decided by two parameters ([official](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-ecs.html)).

- **`minimumHealthyPercent`**: the lower bound (%, rounded up) of the number of tasks that must be healthy during a deployment.
- **`maximumPercent`**: the upper bound (%, rounded down) of the number of tasks that may be running during a deployment.

`min 100% / max 200%` (the pillar article's setting) is **the safest side.** With `desiredCount=2`, it first stands up two new-version tasks (4 total), confirms health, and then stops the two old-version tasks. The cost temporarily doubles, but availability doesn't drop at all.

### Deployment circuit breaker

What prevents the accident of routing traffic without noticing that new tasks keep crash-looping is the **deployment circuit breaker.**

```hcl
resource "aws_ecs_service" "app" {
  # ...
  deployment_circuit_breaker {
    enable   = true
    rollback = true # 失敗を検知したら前リビジョンへ自動ロールバック
  }
  # CloudWatch アラームとの併用も可能（どちらかの条件成立で失敗扱い）
  deployment_controller {
    type = "ECS"
  }
}
```

Setting `rollback = true` **automatically reverts to the previous revision as a deployment failure** when new tasks fail health checks a prescribed number of times. It can also be linked with CloudWatch alarms; enabling both treats it as a failure "the moment either condition is met."

### Version consistency via image digest

ECS resolves a tag to an image digest by default, guaranteeing that all tasks in a service run the same binary (`versionConsistency`). Using the `latest` tag causes the accident of "I rebuilt, the contents of `latest` changed, and only some tasks run a different version." **Use the CommitSHA as the tag and don't depend on `latest`**—that's the iron rule.

---

## ② ECS-native Blue/Green: GA in 2025, no CodeDeploy needed

**ECS-native Blue/Green**, which GA'd in July 2025 and reached feature parity with CodeDeploy via the addition of canary/linear in October 2025, is the recommendation for future new configurations. It's complete within ECS alone, without separately managing CodeDeploy.

> AWS released native Blue/Green deployments for Amazon ECS — you can now do blue/green deployments directly from ECS without needing AWS CodeDeploy.（— [AWS Blog](https://aws.amazon.com/blogs/aws/accelerate-safe-software-releases-with-new-built-in-blue-green-deployments-in-amazon-ecs/)）

### How it works

Keep the deployment controller as `ECS` and select `BLUE_GREEN` as the deployment strategy to enable it.

1. When a service update begins, start the **green task group.**
2. Once green becomes healthy, switch ALB traffic to green.
3. For the duration of **`bakeTimeInMinutes`**, keep both blue (old) and green (new). During this time, you can instantly revert to blue with one click (instant rollback).
4. Once bake time expires, delete the blue tasks.

### The six deployment lifecycle hooks

In ECS-native Blue/Green, six lifecycle hooks are provided where you can insert a Lambda.

| Hook | Timing |
|--------|----------|
| `PRE_SCALE_UP` | Before scaling up green tasks |
| `POST_SCALE_UP` | After scaling up green tasks (startup confirmation) |
| `TEST_TRAFFIC_SHIFT` | Before directing test traffic to green |
| `POST_TEST_TRAFFIC_SHIFT` | After switching test traffic (smoke test) |
| `PRODUCTION_TRAFFIC_SHIFT` | Before directing production traffic to green |
| `POST_PRODUCTION_TRAFFIC_SHIFT` | After switching production traffic (production confirmation) |

A hook's Lambda returns its validation result to ECS. On failure, the deployment is aborted at that point and reverts to blue.

### Kinds of traffic shift

| Kind | Behavior |
|------|------|
| `ALL_AT_ONCE` | Switch to green all at once (fastest, concentrated risk) |
| `CANARY` | First a specified % to green, then switch the rest after a wait |
| `LINEAR` | Shift to green in stages by a fixed % at a time |

### Terraform configuration example

```hcl
resource "aws_ecs_service" "app" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  deployment_controller {
    type = "ECS" # ECSネイティブBlue/Greenはコントローラ "ECS" のまま
  }

  deployment_configuration {
    strategy {
      type = "BLUE_GREEN"
      bake_time_in_minutes = 10 # 切り替え後10分間、旧版を保持してインスタントロールバックを維持
    }
    # トラフィックシフト方式（CANARY例）
    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  # ライフサイクルフック（任意）
  # hooks を指定することで Lambdaによる各段階での検証が可能

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }
  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }
}
```

> **Caveat**: because ECS-native Blue/Green is a relatively new feature (GA in 2025), depending on your Terraform provider version, the `strategy` sub-block of the `deployment_configuration` block may not yet be supported. In that case you need a transitional workaround of configuring it via the AWS CLI or console and ignoring it in Terraform. CLI configuration is shown below.

#### Enabling ECS-native Blue/Green via the AWS CLI

If the Terraform provider hasn't caught up, configure it directly with `create-service` or `update-service`.

```bash
aws ecs update-service \
  --cluster prod \
  --service web-api \
  --deployment-configuration '{
    "strategy": {
      "type": "BLUE_GREEN",
      "bakeTimeInMinutes": 10
    },
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }' \
  --region ap-northeast-1
```

---

## ③ CodeDeploy Blue/Green: for existing assets and advanced validation

If ECS-native Blue/Green suffices, there's no reason to add CodeDeploy. But CodeDeploy is suited to the following cases.

- You want to unify with **existing CodeDeploy assets** (other services or approval flows).
- You want to finely insert **advanced validation with Lambda hooks** (integration tests with external APIs, DB-migration confirmation, Chaos Engineering scenarios).
- You have an existing target-group-switching setup on an NLB configuration (but with NLB, only AllAtOnce).

### Required resources

CodeDeploy-based ECS Blue/Green requires the following ([official](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-blue-green.html)).

1. **An ALB** (required. NLB is also possible but with traffic-shift restrictions)
2. **A production listener** (port 443, etc.) and a **test listener** (optional but recommended: port 8080, etc.), belonging to the same ALB
3. **Two target groups** (for blue and for green)
4. **A CodeDeploy app** (compute platform `ECS`) and a **deployment group**
5. **An AppSpec** (task-definition ARN, container name, port, hook Lambda ARNs)
6. **`ecsCodeDeployRole`** (the IAM role for CodeDeploy to operate ECS and the LB)

### AppSpec example

```yaml
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "arn:aws:ecs:ap-northeast-1:111122223333:task-definition/web-api:42"
        LoadBalancerInfo:
          ContainerName: "app"
          ContainerPort: 8080
        PlatformVersion: "LATEST"
        NetworkConfiguration:
          AwsvpcConfiguration:
            Subnets:
              - "subnet-0abc123def456"
              - "subnet-0def789abc123"
            SecurityGroups:
              - "sg-0abcdef1234567890"
            AssignPublicIp: "DISABLED"
Hooks:
  - BeforeAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:pre-deploy-validation"
  - AfterAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:smoke-test"
  - BeforeAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:final-gate"
  - AfterAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:post-deploy-check"
```

### Predefined deployment configurations

| Config name | Behavior |
|--------|------|
| `CodeDeployDefault.ECSAllAtOnce` | Switch all at once (usable with both ALB and NLB) |
| `CodeDeployDefault.ECSLinear10PercentEvery1Minutes` | Shift 10% every minute (complete in 10 minutes) |
| `CodeDeployDefault.ECSLinear10PercentEvery3Minutes` | Shift 10% every 3 minutes (complete in 30 minutes) |
| `CodeDeployDefault.ECSCanary10Percent5Minutes` | First confirm 10% for 5 minutes, then the rest all at once |
| `CodeDeployDefault.ECSCanary10Percent15Minutes` | First confirm 10% for 15 minutes, then the rest all at once |

> **NLB restriction**: when combined with NLB, traffic shifting is restricted to `ECSAllAtOnce` only.

### CodeDeploy configuration in Terraform

```hcl
# ECSサービスのデプロイコントローラを CODEDEPLOY に設定
resource "aws_ecs_service" "app" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  deployment_controller {
    type = "CODEDEPLOY"
  }

  # CodeDeploy管理下では load_balancer は2つのTGを参照
  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = 8080
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }

  # CodeDeployがTGを入れ替えるため、TFのplan差分を無視する
  lifecycle {
    ignore_changes = [task_definition, load_balancer]
  }
}

# IAMロール：CodeDeployがECSとLBを操作するための権限
resource "aws_iam_role" "ecs_codedeploy" {
  name               = "ecsCodeDeployRole"
  assume_role_policy = data.aws_iam_policy_document.codedeploy_assume.json
}

resource "aws_iam_role_policy_attachment" "ecs_codedeploy" {
  role       = aws_iam_role.ecs_codedeploy.name
  policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
}

# CodeDeployアプリ・デプロイグループ
resource "aws_codedeploy_app" "ecs" {
  compute_platform = "ECS"
  name             = "web-api"
}

resource "aws_codedeploy_deployment_group" "ecs" {
  app_name               = aws_codedeploy_app.ecs.name
  deployment_group_name  = "prod"
  service_role_arn       = aws_iam_role.ecs_codedeploy.arn
  deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"

  ecs_service {
    cluster_name = aws_ecs_cluster.main.name
    service_name = aws_ecs_service.app.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.https.arn]
      }
      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }
      target_group {
        name = aws_lb_target_group.blue.name
      }
      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }
    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5 # bake時間。この間はblueが残りロールバック可能
    }
  }
}
```

> **`iam:PassRole` is required**: also grant the GitHub Actions / CI/CD IAM role permission to `PassRole` the `ecsCodeDeployRole`.

---

## ④ GitHub Actions (OIDC) pipeline: keyless and safe

Placing long-lived access keys in a deployment pipeline is **a 2026 anti-pattern.** With [keyless CI/CD via GitHub Actions OIDC](/blog/github-actions-oidc-keyless-cicd-aws-gcp-guide), everything is complete by obtaining temporary credentials for an IAM role.

### Pipeline overview

```text
git push → trigger GitHub Actions
  → obtain AWS temporary credentials via OIDC
  → quality gates (type check, tests, vulnerability scan)
  → push to ECR with a CommitSHA tag
  → swap the task definition's image to the new tag (register a new revision)
  → deploy to the ECS service (rolling or Blue/Green)
```

### The complete workflow.yml

```yaml
name: Deploy to ECS Fargate

on:
  push:
    branches:
      - main

permissions:
  id-token: write   # OIDC トークン取得に必要
  contents: read

env:
  AWS_REGION: ap-northeast-1
  ECR_REPOSITORY: web-api
  ECS_CLUSTER: prod
  ECS_SERVICE: web-api
  CONTAINER_NAME: app
  TASK_DEFINITION_FAMILY: web-api

jobs:
  quality-gate:
    name: Quality Gate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "22"
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Type check
        run: npm run type-check

      - name: Unit tests
        run: npm test -- --run

      - name: Build
        run: npm run build

  deploy:
    name: Build & Deploy
    runs-on: ubuntu-latest
    needs: quality-gate  # 品質ゲートが通ってからデプロイ
    environment: production

    steps:
      - uses: actions/checkout@v4

      # ─── OIDC で AWS に認証（長期キー不要）───
      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-deploy
          aws-region: ${{ env.AWS_REGION }}
          # role-session-name でどのジョブが認証したか CloudTrail に残る
          role-session-name: github-${{ github.sha }}

      # ─── ECR へ push ───
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build, tag, and push image to ECR
        id: build-image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}  # CommitSHA で版を固定（latest 非依存）
        run: |
          docker build \
            --platform linux/arm64 \
            --build-arg BUILD_SHA=$IMAGE_TAG \
            -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
            -t $ECR_REGISTRY/$ECR_REPOSITORY:latest \
            .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          # latest も更新（human確認用。ECSはSHAタグを使う）
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
          echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT

      # ─── ECR イメージスキャン結果を確認（HIGH/CRITICALがあれば失敗） ───
      - name: Wait for ECR scan and check results
        env:
          IMAGE_TAG: ${{ github.sha }}
        run: |
          aws ecr wait image-scan-complete \
            --repository-name $ECR_REPOSITORY \
            --image-id imageTag=$IMAGE_TAG \
            --region $AWS_REGION || true
          FINDINGS=$(aws ecr describe-image-scan-findings \
            --repository-name $ECR_REPOSITORY \
            --image-id imageTag=$IMAGE_TAG \
            --query 'imageScanFindings.findingSeverityCounts' \
            --output json 2>/dev/null || echo '{}')
          HIGH=$(echo $FINDINGS | jq '.HIGH // 0')
          CRITICAL=$(echo $FINDINGS | jq '.CRITICAL // 0')
          echo "HIGH=$HIGH CRITICAL=$CRITICAL"
          if [ "$HIGH" -gt 0 ] || [ "$CRITICAL" -gt 0 ]; then
            echo "::error::Image has HIGH or CRITICAL vulnerabilities. Blocking deploy."
            exit 1
          fi

      # ─── タスク定義の image を新タグへ差し替え ───
      - name: Download current task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition $TASK_DEFINITION_FAMILY \
            --query taskDefinition \
            > task-definition.json

      - name: Render new task definition with updated image
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: ${{ env.CONTAINER_NAME }}
          image: ${{ steps.build-image.outputs.image }}

      # ─── ECS へデプロイ（ローリング or ECSネイティブBlue/Green） ───
      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true  # デプロイが安定するまで待機（失敗したらCI失敗）

      - name: Notify deployment result
        if: always()
        env:
          STATUS: ${{ job.status }}
          SHA: ${{ github.sha }}
        run: |
          echo "Deploy status: $STATUS | SHA: $SHA"
          # Slack通知などはここで実装
```

### The diff when using CodeDeploy Blue/Green

`aws-actions/amazon-ecs-deploy-task-definition` also supports CodeDeploy. Passing an AppSpec via the `codedeploy-appspec` option starts the CodeDeploy deployment and waits for completion.

```yaml
      - name: Deploy via CodeDeploy Blue/Green
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true
          codedeploy-appspec: appspec.yaml
          codedeploy-application: web-api
          codedeploy-deployment-group: prod
```

### OIDC IAM-role configuration (reference)

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::111122223333:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
      }
    }
  ]
}
```

The principle is to narrow the policy attached to this role to minimal permissions (ECR push, `ecs:RegisterTaskDefinition`, `ecs:UpdateService`, `iam:PassRole` (for the task execution role)). For details, see the [GitHub Actions OIDC guide](/blog/github-actions-oidc-keyless-cicd-aws-gcp-guide).

---

## ⑤ Pre-deploy quality gates and container-side preparation

No matter how refined your deployment method, **if the container itself is broken, none of it means anything.** Circuit breakers and Blue/Green are mechanisms to "abort a deploy and revert," not mechanisms to "deliver a healthy image."

### Layers of pre-shipping quality gates

| Layer | Content | Timing |
|--------|------|-----------|
| Type check | TypeScript `tsc --noEmit` / `mypy` for Python | PR + main push |
| Unit tests | Business logic, validation | PR + main push |
| Container build | Confirm `docker build` succeeds | main push |
| Vulnerability scan | ECR Enhanced Scanning / Trivy | after push, before deploy |
| Integration tests | Connectivity check in a staging environment (optional) | main push |

### Coordination with graceful shutdown

During a deployment, ECS stops tasks. The flow of stopping is this.

1. **Deregister** the task from the ALB target group (during `deregistration_delay`, protect in-flight requests)
2. Send **`SIGTERM`** to the container
3. Wait for it to finish during **`stopTimeout`** (default 30 seconds, max 120 seconds)
4. If it doesn't finish, force-terminate with **`SIGKILL`** (data-loss risk)

In Blue/Green deployments too, the same SIGTERM sequence runs when deleting the old (blue) tasks. **Even at the timing of dropping blue after bake time ends, if you can't handle SIGTERM correctly, data loss occurs.**

What's especially important in coordination with health checks is `startPeriod`. Set a grace period so health checks don't fail during the initialization right after a container starts (establishing DB connections, cache warm-up).

```json
{
  "healthCheck": {
    "command": ["CMD-SHELL", "wget -q -O - http://localhost:8080/healthz || exit 1"],
    "interval": 15,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 30
  },
  "stopTimeout": 60
}
```

The implementation details of graceful shutdown (the Node.js code for a SIGTERM handler, etc.) are described in the [pillar article](/blog/aws-ecs-fargate-production-guide). For production troubleshooting, the [ECS troubleshooting article](/blog/aws-ecs-fargate-troubleshooting-task-stopped-reasons-guide) is useful.

### The two-layer structure of health checks

ECS has two health-check layers, and both are linked.

- **Container health check** (the task definition's `healthCheck`): ECS judges whether the container is down. If failures continue, ECS replaces the task.
- **ALB target health check** (the target group's `health_check`): the ALB judges whether "it may send requests to this IP." On failure, the ALB removes the task from the target.

During a Blue/Green deployment, production traffic moves to green tasks only after they "pass the ALB target health check." Using a staging or internal-test listener as the test listener lets you do an extra connectivity check before sending production traffic.

---

## Summary: build a state where you can instantly revert when it breaks

I've organized ECS on Fargate's deployment strategies into three methods.

| Method | Recommended scene | Additional resources |
|------|---------|------------|
| Rolling + circuit breaker | Simple configuration, short downtime acceptable | None |
| ECS-native Blue/Green | **The first choice for new configurations from 2025 onward** | None |
| CodeDeploy Blue/Green | Existing CodeDeploy assets, advanced Lambda-hook validation | TG×2, test listener, CodeDeploy config |

Whichever method you choose, **the essence of "shipping safely" doesn't change.**

1. **Solidify quality gates in CI** (types, tests, scans)
2. **Pin the image version with a CommitSHA tag** (eliminate `latest` dependence)
3. **Authenticate keyless with OIDC** and place no long-lived access keys at all
4. **Handle SIGTERM correctly** and end cleanly on every deploy, scale-in, and abort
5. **Enable automatic rollback** (the circuit breaker or Blue/Green's bake time)

With this pattern, I operated 221 endpoints in production on `API Gateway → NLB → ALB → ECS on Fargate` and maintained zero double charges on the payment foundation. For details, see the [lumber-distribution SaaS case study](/case-studies/lumber-industry-dx).

For Fargate's compute-platform selection (ECS vs Lambda vs App Runner), see the [comparison article](/blog/aws-ecs-fargate-vs-lambda-vs-app-runner-compute-selection-guide); for IaC state management and module design, the [Terraform article](/blog/terraform-module-design-state-isolation-drift-detection-guide); and for post-deploy observability, [OpenTelemetry × ECS](/blog/aws-observability-opentelemetry-sre-ecs)—reference them together.
