"When a failure occurs right after you ship a new container to production, how fast can you roll back to the previous version?"—this is the essence of deployment. Speed is secondary. Whether you can instantly revert when it breaks comes first.
On a lumber-distribution SaaS that won a Minister of Economy, Trade and Industry Award, I operated 221 API endpoints in production on a configuration of API Gateway → NLB → ALB → ECS on Fargate. The payment foundation maintains zero double charges in production. To deliver production quality with "one person × generative AI (Claude Code)," it was essential to make the deployment pipeline itself a structure that is hard to break and easy to revert.
This article aims to organize ECS on Fargate's three deployment methods and let you judge when and how to use each. The basics of ECS-platform design, networking, and security are left to the pillar article; this piece focuses on "deployment itself."
ECS's three deployment methods: grasp the map first
ECS on Fargate offers three choices for how to update a service. Which you choose changes the complexity of the infrastructure and the rollback speed.
| Method | Deployment controller | Blue/Green | Rollback speed | Required additional resources |
|---|---|---|---|---|
| Rolling update | ECS | None | Drop the new version and start the old (tens of seconds~) | None |
| ECS-native Blue/Green | ECS | Yes (GA in 2025) | Instant switch during bake time | None (ECS-complete) |
| CodeDeploy Blue/Green | CODEDEPLOY | Yes | Instant switch during bake time | CodeDeploy app, test listener, TG×2 |
The basic way to choose:
- Rolling update: a simple configuration is enough, you want to keep the deployment mechanism simple, and a short downtime risk is acceptable.
- ECS-native Blue/Green: you want Blue/Green but don't want to add resources. The default recommendation for new configurations from 2025 onward.
- CodeDeploy Blue/Green: you already have CodeDeploy assets, you want to do complex validation with Lambda hooks, or you're building on an existing configuration combined with NLB.
① Reviewing rolling updates: min/max and the circuit breaker
ECS's simplest deployment is the rolling update. The deployment controller is ECS, and the deployment strategy is implicitly rolling. The behavior is decided by two parameters (official).
minimumHealthyPercent: the lower bound (%, rounded up) of the number of tasks that must be healthy during a deployment.maximumPercent: the upper bound (%, rounded down) of the number of tasks that may be running during a deployment.
min 100% / max 200% (the pillar article's setting) is the safest side. With desiredCount=2, it first stands up two new-version tasks (4 total), confirms health, and then stops the two old-version tasks. The cost temporarily doubles, but availability doesn't drop at all.
Deployment circuit breaker
What prevents the accident of routing traffic without noticing that new tasks keep crash-looping is the deployment circuit breaker.
resource "aws_ecs_service" "app" {
# ...
deployment_circuit_breaker {
enable = true
rollback = true # 失敗を検知したら前リビジョンへ自動ロールバック
}
# CloudWatch アラームとの併用も可能(どちらかの条件成立で失敗扱い)
deployment_controller {
type = "ECS"
}
}
Setting rollback = true automatically reverts to the previous revision as a deployment failure when new tasks fail health checks a prescribed number of times. It can also be linked with CloudWatch alarms; enabling both treats it as a failure "the moment either condition is met."
Version consistency via image digest
ECS resolves a tag to an image digest by default, guaranteeing that all tasks in a service run the same binary (versionConsistency). Using the latest tag causes the accident of "I rebuilt, the contents of latest changed, and only some tasks run a different version." Use the CommitSHA as the tag and don't depend on latest—that's the iron rule.
② ECS-native Blue/Green: GA in 2025, no CodeDeploy needed
ECS-native Blue/Green, which GA'd in July 2025 and reached feature parity with CodeDeploy via the addition of canary/linear in October 2025, is the recommendation for future new configurations. It's complete within ECS alone, without separately managing CodeDeploy.
AWS released native Blue/Green deployments for Amazon ECS — you can now do blue/green deployments directly from ECS without needing AWS CodeDeploy.(— AWS Blog)
How it works
Keep the deployment controller as ECS and select BLUE_GREEN as the deployment strategy to enable it.
- When a service update begins, start the green task group.
- Once green becomes healthy, switch ALB traffic to green.
- For the duration of
bakeTimeInMinutes, keep both blue (old) and green (new). During this time, you can instantly revert to blue with one click (instant rollback). - Once bake time expires, delete the blue tasks.
The six deployment lifecycle hooks
In ECS-native Blue/Green, six lifecycle hooks are provided where you can insert a Lambda.
| Hook | Timing |
|---|---|
PRE_SCALE_UP | Before scaling up green tasks |
POST_SCALE_UP | After scaling up green tasks (startup confirmation) |
TEST_TRAFFIC_SHIFT | Before directing test traffic to green |
POST_TEST_TRAFFIC_SHIFT | After switching test traffic (smoke test) |
PRODUCTION_TRAFFIC_SHIFT | Before directing production traffic to green |
POST_PRODUCTION_TRAFFIC_SHIFT | After switching production traffic (production confirmation) |
A hook's Lambda returns its validation result to ECS. On failure, the deployment is aborted at that point and reverts to blue.
Kinds of traffic shift
| Kind | Behavior |
|---|---|
ALL_AT_ONCE | Switch to green all at once (fastest, concentrated risk) |
CANARY | First a specified % to green, then switch the rest after a wait |
LINEAR | Shift to green in stages by a fixed % at a time |
Terraform configuration example
resource "aws_ecs_service" "app" {
name = "web-api"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 2
launch_type = "FARGATE"
deployment_controller {
type = "ECS" # ECSネイティブBlue/Greenはコントローラ "ECS" のまま
}
deployment_configuration {
strategy {
type = "BLUE_GREEN"
bake_time_in_minutes = 10 # 切り替え後10分間、旧版を保持してインスタントロールバックを維持
}
# トラフィックシフト方式(CANARY例)
deployment_circuit_breaker {
enable = true
rollback = true
}
}
# ライフサイクルフック(任意)
# hooks を指定することで Lambdaによる各段階での検証が可能
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.task.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = 8080
}
}
Caveat: because ECS-native Blue/Green is a relatively new feature (GA in 2025), depending on your Terraform provider version, the
strategysub-block of thedeployment_configurationblock may not yet be supported. In that case you need a transitional workaround of configuring it via the AWS CLI or console and ignoring it in Terraform. CLI configuration is shown below.
Enabling ECS-native Blue/Green via the AWS CLI
If the Terraform provider hasn't caught up, configure it directly with create-service or update-service.
aws ecs update-service \
--cluster prod \
--service web-api \
--deployment-configuration '{
"strategy": {
"type": "BLUE_GREEN",
"bakeTimeInMinutes": 10
},
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
}' \
--region ap-northeast-1
③ CodeDeploy Blue/Green: for existing assets and advanced validation
If ECS-native Blue/Green suffices, there's no reason to add CodeDeploy. But CodeDeploy is suited to the following cases.
- You want to unify with existing CodeDeploy assets (other services or approval flows).
- You want to finely insert advanced validation with Lambda hooks (integration tests with external APIs, DB-migration confirmation, Chaos Engineering scenarios).
- You have an existing target-group-switching setup on an NLB configuration (but with NLB, only AllAtOnce).
Required resources
CodeDeploy-based ECS Blue/Green requires the following (official).
- An ALB (required. NLB is also possible but with traffic-shift restrictions)
- A production listener (port 443, etc.) and a test listener (optional but recommended: port 8080, etc.), belonging to the same ALB
- Two target groups (for blue and for green)
- A CodeDeploy app (compute platform
ECS) and a deployment group - An AppSpec (task-definition ARN, container name, port, hook Lambda ARNs)
ecsCodeDeployRole(the IAM role for CodeDeploy to operate ECS and the LB)
AppSpec example
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:ap-northeast-1:111122223333:task-definition/web-api:42"
LoadBalancerInfo:
ContainerName: "app"
ContainerPort: 8080
PlatformVersion: "LATEST"
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- "subnet-0abc123def456"
- "subnet-0def789abc123"
SecurityGroups:
- "sg-0abcdef1234567890"
AssignPublicIp: "DISABLED"
Hooks:
- BeforeAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:pre-deploy-validation"
- AfterAllowTestTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:smoke-test"
- BeforeAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:final-gate"
- AfterAllowTraffic: "arn:aws:lambda:ap-northeast-1:111122223333:function:post-deploy-check"
Predefined deployment configurations
| Config name | Behavior |
|---|---|
CodeDeployDefault.ECSAllAtOnce | Switch all at once (usable with both ALB and NLB) |
CodeDeployDefault.ECSLinear10PercentEvery1Minutes | Shift 10% every minute (complete in 10 minutes) |
CodeDeployDefault.ECSLinear10PercentEvery3Minutes | Shift 10% every 3 minutes (complete in 30 minutes) |
CodeDeployDefault.ECSCanary10Percent5Minutes | First confirm 10% for 5 minutes, then the rest all at once |
CodeDeployDefault.ECSCanary10Percent15Minutes | First confirm 10% for 15 minutes, then the rest all at once |
NLB restriction: when combined with NLB, traffic shifting is restricted to
ECSAllAtOnceonly.
CodeDeploy configuration in Terraform
# ECSサービスのデプロイコントローラを CODEDEPLOY に設定
resource "aws_ecs_service" "app" {
name = "web-api"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 2
launch_type = "FARGATE"
deployment_controller {
type = "CODEDEPLOY"
}
# CodeDeploy管理下では load_balancer は2つのTGを参照
load_balancer {
target_group_arn = aws_lb_target_group.blue.arn
container_name = "app"
container_port = 8080
}
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.task.id]
assign_public_ip = false
}
# CodeDeployがTGを入れ替えるため、TFのplan差分を無視する
lifecycle {
ignore_changes = [task_definition, load_balancer]
}
}
# IAMロール:CodeDeployがECSとLBを操作するための権限
resource "aws_iam_role" "ecs_codedeploy" {
name = "ecsCodeDeployRole"
assume_role_policy = data.aws_iam_policy_document.codedeploy_assume.json
}
resource "aws_iam_role_policy_attachment" "ecs_codedeploy" {
role = aws_iam_role.ecs_codedeploy.name
policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
}
# CodeDeployアプリ・デプロイグループ
resource "aws_codedeploy_app" "ecs" {
compute_platform = "ECS"
name = "web-api"
}
resource "aws_codedeploy_deployment_group" "ecs" {
app_name = aws_codedeploy_app.ecs.name
deployment_group_name = "prod"
service_role_arn = aws_iam_role.ecs_codedeploy.arn
deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"
ecs_service {
cluster_name = aws_ecs_cluster.main.name
service_name = aws_ecs_service.app.name
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [aws_lb_listener.https.arn]
}
test_traffic_route {
listener_arns = [aws_lb_listener.test.arn]
}
target_group {
name = aws_lb_target_group.blue.name
}
target_group {
name = aws_lb_target_group.green.name
}
}
}
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 5 # bake時間。この間はblueが残りロールバック可能
}
}
}
iam:PassRoleis required: also grant the GitHub Actions / CI/CD IAM role permission toPassRoletheecsCodeDeployRole.
④ GitHub Actions (OIDC) pipeline: keyless and safe
Placing long-lived access keys in a deployment pipeline is a 2026 anti-pattern. With keyless CI/CD via GitHub Actions OIDC, everything is complete by obtaining temporary credentials for an IAM role.
Pipeline overview
git push → trigger GitHub Actions
→ obtain AWS temporary credentials via OIDC
→ quality gates (type check, tests, vulnerability scan)
→ push to ECR with a CommitSHA tag
→ swap the task definition's image to the new tag (register a new revision)
→ deploy to the ECS service (rolling or Blue/Green)
The complete workflow.yml
name: Deploy to ECS Fargate
on:
push:
branches:
- main
permissions:
id-token: write # OIDC トークン取得に必要
contents: read
env:
AWS_REGION: ap-northeast-1
ECR_REPOSITORY: web-api
ECS_CLUSTER: prod
ECS_SERVICE: web-api
CONTAINER_NAME: app
TASK_DEFINITION_FAMILY: web-api
jobs:
quality-gate:
name: Quality Gate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "22"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Type check
run: npm run type-check
- name: Unit tests
run: npm test -- --run
- name: Build
run: npm run build
deploy:
name: Build & Deploy
runs-on: ubuntu-latest
needs: quality-gate # 品質ゲートが通ってからデプロイ
environment: production
steps:
- uses: actions/checkout@v4
# ─── OIDC で AWS に認証(長期キー不要)───
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-deploy
aws-region: ${{ env.AWS_REGION }}
# role-session-name でどのジョブが認証したか CloudTrail に残る
role-session-name: github-${{ github.sha }}
# ─── ECR へ push ───
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }} # CommitSHA で版を固定(latest 非依存)
run: |
docker build \
--platform linux/arm64 \
--build-arg BUILD_SHA=$IMAGE_TAG \
-t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
-t $ECR_REGISTRY/$ECR_REPOSITORY:latest \
.
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
# latest も更新(human確認用。ECSはSHAタグを使う)
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
# ─── ECR イメージスキャン結果を確認(HIGH/CRITICALがあれば失敗) ───
- name: Wait for ECR scan and check results
env:
IMAGE_TAG: ${{ github.sha }}
run: |
aws ecr wait image-scan-complete \
--repository-name $ECR_REPOSITORY \
--image-id imageTag=$IMAGE_TAG \
--region $AWS_REGION || true
FINDINGS=$(aws ecr describe-image-scan-findings \
--repository-name $ECR_REPOSITORY \
--image-id imageTag=$IMAGE_TAG \
--query 'imageScanFindings.findingSeverityCounts' \
--output json 2>/dev/null || echo '{}')
HIGH=$(echo $FINDINGS | jq '.HIGH // 0')
CRITICAL=$(echo $FINDINGS | jq '.CRITICAL // 0')
echo "HIGH=$HIGH CRITICAL=$CRITICAL"
if [ "$HIGH" -gt 0 ] || [ "$CRITICAL" -gt 0 ]; then
echo "::error::Image has HIGH or CRITICAL vulnerabilities. Blocking deploy."
exit 1
fi
# ─── タスク定義の image を新タグへ差し替え ───
- name: Download current task definition
run: |
aws ecs describe-task-definition \
--task-definition $TASK_DEFINITION_FAMILY \
--query taskDefinition \
> task-definition.json
- name: Render new task definition with updated image
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: task-definition.json
container-name: ${{ env.CONTAINER_NAME }}
image: ${{ steps.build-image.outputs.image }}
# ─── ECS へデプロイ(ローリング or ECSネイティブBlue/Green) ───
- name: Deploy to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v2
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true # デプロイが安定するまで待機(失敗したらCI失敗)
- name: Notify deployment result
if: always()
env:
STATUS: ${{ job.status }}
SHA: ${{ github.sha }}
run: |
echo "Deploy status: $STATUS | SHA: $SHA"
# Slack通知などはここで実装
The diff when using CodeDeploy Blue/Green
aws-actions/amazon-ecs-deploy-task-definition also supports CodeDeploy. Passing an AppSpec via the codedeploy-appspec option starts the CodeDeploy deployment and waits for completion.
- name: Deploy via CodeDeploy Blue/Green
uses: aws-actions/amazon-ecs-deploy-task-definition@v2
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true
codedeploy-appspec: appspec.yaml
codedeploy-application: web-api
codedeploy-deployment-group: prod
OIDC IAM-role configuration (reference)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::111122223333:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
}
}
}
]
}
The principle is to narrow the policy attached to this role to minimal permissions (ECR push, ecs:RegisterTaskDefinition, ecs:UpdateService, iam:PassRole (for the task execution role)). For details, see the GitHub Actions OIDC guide.
⑤ Pre-deploy quality gates and container-side preparation
No matter how refined your deployment method, if the container itself is broken, none of it means anything. Circuit breakers and Blue/Green are mechanisms to "abort a deploy and revert," not mechanisms to "deliver a healthy image."
Layers of pre-shipping quality gates
| Layer | Content | Timing |
|---|---|---|
| Type check | TypeScript tsc --noEmit / mypy for Python | PR + main push |
| Unit tests | Business logic, validation | PR + main push |
| Container build | Confirm docker build succeeds | main push |
| Vulnerability scan | ECR Enhanced Scanning / Trivy | after push, before deploy |
| Integration tests | Connectivity check in a staging environment (optional) | main push |
Coordination with graceful shutdown
During a deployment, ECS stops tasks. The flow of stopping is this.
- Deregister the task from the ALB target group (during
deregistration_delay, protect in-flight requests) - Send
SIGTERMto the container - Wait for it to finish during
stopTimeout(default 30 seconds, max 120 seconds) - If it doesn't finish, force-terminate with
SIGKILL(data-loss risk)
In Blue/Green deployments too, the same SIGTERM sequence runs when deleting the old (blue) tasks. Even at the timing of dropping blue after bake time ends, if you can't handle SIGTERM correctly, data loss occurs.
What's especially important in coordination with health checks is startPeriod. Set a grace period so health checks don't fail during the initialization right after a container starts (establishing DB connections, cache warm-up).
{
"healthCheck": {
"command": ["CMD-SHELL", "wget -q -O - http://localhost:8080/healthz || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"stopTimeout": 60
}
The implementation details of graceful shutdown (the Node.js code for a SIGTERM handler, etc.) are described in the pillar article. For production troubleshooting, the ECS troubleshooting article is useful.
The two-layer structure of health checks
ECS has two health-check layers, and both are linked.
- Container health check (the task definition's
healthCheck): ECS judges whether the container is down. If failures continue, ECS replaces the task. - ALB target health check (the target group's
health_check): the ALB judges whether "it may send requests to this IP." On failure, the ALB removes the task from the target.
During a Blue/Green deployment, production traffic moves to green tasks only after they "pass the ALB target health check." Using a staging or internal-test listener as the test listener lets you do an extra connectivity check before sending production traffic.
Summary: build a state where you can instantly revert when it breaks
I've organized ECS on Fargate's deployment strategies into three methods.
| Method | Recommended scene | Additional resources |
|---|---|---|
| Rolling + circuit breaker | Simple configuration, short downtime acceptable | None |
| ECS-native Blue/Green | The first choice for new configurations from 2025 onward | None |
| CodeDeploy Blue/Green | Existing CodeDeploy assets, advanced Lambda-hook validation | TG×2, test listener, CodeDeploy config |
Whichever method you choose, the essence of "shipping safely" doesn't change.
- Solidify quality gates in CI (types, tests, scans)
- Pin the image version with a CommitSHA tag (eliminate
latestdependence) - Authenticate keyless with OIDC and place no long-lived access keys at all
- Handle SIGTERM correctly and end cleanly on every deploy, scale-in, and abort
- Enable automatic rollback (the circuit breaker or Blue/Green's bake time)
With this pattern, I operated 221 endpoints in production on API Gateway → NLB → ALB → ECS on Fargate and maintained zero double charges on the payment foundation. For details, see the lumber-distribution SaaS case study.
For Fargate's compute-platform selection (ECS vs Lambda vs App Runner), see the comparison article; for IaC state management and module design, the Terraform article; and for post-deploy observability, OpenTelemetry × ECS—reference them together.