# ECS on Fargate Auto Scaling Complete Guide: Designing Target Tracking, Step, and the SQS Backlog Pattern at Production Quality

> Systematizing ECS on Fargate auto scaling. From choosing among target tracking, step, and scheduled, to the custom-metric implementation of worker scaling via SQS backlog-per-task — explained with Terraform and real code.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: AWS, ECS, Fargate, オートスケーリング, SQS, Application Auto Scaling, 可観測性, コスト最適化
- URL: https://tomodahinata.com/en/blog/aws-ecs-fargate-auto-scaling-target-tracking-sqs-worker-guide
- Category: ECS on Fargate in production
- Pillar guide: https://tomodahinata.com/en/blog/aws-ecs-fargate-production-guide

## Key points

- Application Auto Scaling is the layer that moves an ECS service's desiredCount; correctly setting the 3 elements — scalable target, policy, cooldown — is the key to stability
- Target tracking auto-controls increase/decrease just by declaring a target value. The asymmetric setting of short scale_out_cooldown and long scale_in_cooldown is the standard for preventing flapping
- CPU utilization is unsuitable for SQS worker scaling. Target-track AWS's recommended 'backlog per task' = ApproximateNumberOfMessagesVisible ÷ RunningTaskCount, published as a custom metric
- An idle-zero setup with min_capacity=0 is effective for cost optimization, but decide whether to use it by checking the first scale-out's startup delay (tens of seconds) against your tolerable latency
- Always co-design scale-in and graceful shutdown (SIGTERM→stopTimeout→SIGKILL). The three-piece set of connection_draining, stopTimeout, and idempotency is mandatory to not SIGKILL an in-progress task

---

"Changing desiredCount by hand has limits. But set up auto-scaling sloppily and conversely it flaps and becomes unstable" — when you bring ECS on Fargate to production, this juncture surely comes.

I have run SQS-driven idempotent workers on Fargate in the [payment foundation (0 double charges in production)](/case-studies/payment-platform-reliability), and operated 221 API endpoints in production atop `API Gateway → NLB → ALB → ECS on Fargate` in a lumber-distribution B2B SaaS. In both, the asymmetric-scaling idea of "increase fast, decrease cautiously" and a metric choice matched to the workload were the core of stability.

This article, as a sequel to the [ECS on Fargate production-operations guide](/blog/aws-ecs-fargate-production-guide), **specializes in auto-scaling design.** For 2 representative workloads — an HTTP service and an SQS worker — I systematize the optimal design for each, with real code.

---

## Why Manual desiredCount Has Limits

Operating by manually adjusting `desired_count` has 3 fundamental limits.

1. **Slow reaction**: by the time a human notices the alert and applies Terraform, the spike is either over or has already breached the SLA.
2. **Forgetting to shrink**: fail to reduce the tasks you increased, and cost silently keeps swelling.
3. **SQS length unreadable**: even if messages pile up in the queue, you can't notice because CPU isn't moving.

Application Auto Scaling declares the mechanism "move desiredCount when a measured value exceeds a threshold" as a policy and delegates it to the ECS service controller. What you do is **only decide the target value and the upper/lower bounds.**

---

## The Mechanism: Application Auto Scaling Moves ECS's desiredCount

ECS auto-scaling is handled by a service called **AWS Application Auto Scaling.** This is a standalone service, handling, besides ECS, DynamoDB, Aurora, Lambda, SageMaker, etc. with the same API. The ECS-specific story is organized into 3 elements.

### The Scalable Target

First register "what to scale." This is the scalable target.

```hcl
resource "aws_appautoscaling_target" "app" {
  service_namespace  = "ecs"                         # ECS専用の名前空間
  scalable_dimension = "ecs:service:DesiredCount"    # 操作するのはdesiredCount
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  min_capacity       = 2   # 最低タスク数（下限。0にするとアイドルゼロが可能）
  max_capacity       = 20  # 最大タスク数（上限）
}
```

The format of `resource_id` is `service/<cluster_name>/<service_name>`. Misspell it and Terraform's apply still passes, but scaling stops functioning entirely. **Make it a habit to confirm it was actually registered with `aws application-autoscaling describe-scalable-targets` after applying.**

### The Scaling Policy

Next define "when and how to move it" in a policy. There are broadly 3 kinds.

| Policy type | Decision criterion | Main use |
|------------|---------|---------|
| **Target tracking** | Maintain a target metric value | Steady services (CPU, request count) |
| **Step scaling** | Change the increase/decrease amount in stages of a CloudWatch alarm | Burst handling, non-linear load |
| **Scheduled scaling** | Change min/max/desired on a time basis | Known peaks (business hours, campaigns) |

### The Cooldown

A wait time that suppresses the next action after a scaling action. **Scale-out short, scale-in long** — this is the only standard. To prevent "flapping" where you shrink right after a spike and then panic again.

---

## Target Tracking: Declare a Target Value and Leave It

The simplest, and most HTTP services are fine with this.

### Predefined Metrics

There are 3 predefined metrics usable against an ECS service ([official](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-configure-auto-scaling.html)).

| predefined_metric_type | What's measured | When to use |
|------------------------|---------|----------|
| `ECSServiceAverageCPUUtilization` | The average CPU utilization (%) of tasks in the service | CPU-bound processing (compute, encoding) |
| `ECSServiceAverageMemoryUtilization` | The average memory utilization (%) of tasks in the service | Memory-bound processing (expanding large data) |
| `ALBRequestCountPerTarget` | Requests per ALB target | Want to follow HTTP traffic linearly |

**The principle of choosing**: use the resource that's the bottleneck. If unsure, decide after measuring with Container Insights. Setting **both CPU/memory and request count is safer** (scale-out fires the moment either becomes the trigger).

### A Complete Terraform Example (CPU + ALBRequestCountPerTarget)

```hcl
# --- スケーラブルターゲット（1回定義すれば複数ポリシーを紐付けられる） ---
resource "aws_appautoscaling_target" "app" {
  service_namespace  = "ecs"
  scalable_dimension = "ecs:service:DesiredCount"
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  min_capacity       = 2
  max_capacity       = 20

  depends_on = [aws_ecs_service.app]  # サービスが先に存在していること
}

# --- CPU ターゲット追跡 ---
resource "aws_appautoscaling_policy" "cpu_tt" {
  name               = "cpu-target-tracking"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 60.0  # 60%を維持。70〜80%は高すぎてバースト余裕がなくなる
    scale_out_cooldown = 30    # スケールアウトは速く（秒）
    scale_in_cooldown  = 300   # スケールインは慎重に（秒）
    disable_scale_in   = false # スケールインも自動で行う（コスト管理）
  }
}

# --- ALBリクエスト数 ターゲット追跡 ---
resource "aws_appautoscaling_policy" "alb_tt" {
  name               = "alb-request-count-target-tracking"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      # ALBとターゲットグループのリソースラベルが必要
      resource_label = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.app.arn_suffix}"
    }
    target_value       = 1000  # タスク1台あたり1000 req/min を目標
    scale_out_cooldown = 30
    scale_in_cooldown  = 300
  }
}
```

`resource_label` is a specification needed only for `ALBRequestCountPerTarget`. The format is `<load-balancer-arn-suffix>/<target-group-arn-suffix>`, obtainable from the `arn_suffix` attribute of the `aws_lb` / `aws_lb_target_group` resources.

### How to Choose target_value

- **CPU**: 60–70% is common. Set it above 80% and there's no headroom for bursts; scale-out won't make it in time.
- **ALB request count**: measure the requests one task can handle in a local or staging environment, and target 60–70% of that. Don't set it by guesswork — **measurement first.**

---

## Step Scaling: Change the Increase/Decrease Amount in Stages

For workloads with heavy bursts, or non-linear demand of "small increases for a slight overage, increase all at once for a large overage," **step scaling** fits.

### When to Use It vs. Target Tracking

| Viewpoint | Target tracking | Step scaling |
|------|-------------|----------------|
| Config complexity | Low (just the target value) | High (alarm + step definitions) |
| Control of scale amount | AWS auto-computes | You define the steps |
| Suited case | A steady HTTP service | Burst, non-linear, precise control needed |
| Combination | Standalone OK | Can coexist with target tracking |

Step scaling links with a CloudWatch alarm. Define a scale-out policy when the alarm enters "ALARM state," and a scale-in policy when it "returns to OK state."

```hcl
# CloudWatchアラーム（スケールアウトトリガー）
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "ecs-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 70.0
  dimensions = {
    ClusterName = aws_ecs_cluster.main.name
    ServiceName = aws_ecs_service.app.name
  }
  alarm_actions = [aws_appautoscaling_policy.step_out.arn]
}

# ステップスケールアウトポリシー
resource "aws_appautoscaling_policy" "step_out" {
  name               = "step-scale-out"
  policy_type        = "StepScaling"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension

  step_scaling_policy_configuration {
    adjustment_type          = "ChangeInCapacity"  # 絶対値で変える（他にPercentChangeInCapacityも可）
    cooldown                 = 60
    metric_aggregation_type  = "Average"

    step_adjustment {
      # CPU 70〜80%: +2タスク
      metric_interval_lower_bound = 0
      metric_interval_upper_bound = 10
      scaling_adjustment          = 2
    }
    step_adjustment {
      # CPU 80%超: +5タスク（バースト対応）
      metric_interval_lower_bound = 10
      scaling_adjustment          = 5
    }
  }
}
```

`metric_interval_lower_bound` and `upper_bound` are specified by the **difference from the alarm's threshold** (the breach amount). "CPU 73% against a 70% threshold" is a difference of +3 — it enters the `0–10` step.

---

## Scheduled Scaling: Read Peaks Ahead

When you **know when the load will come** — the 9 AM business start, a monthly campaign, a pre-batch warm-up — get ahead with scheduled scaling.

```hcl
# 平日9時にスケールアウト（JST = UTC+9、なのでUTCは0時）
resource "aws_appautoscaling_scheduled_action" "scale_up_business_hours" {
  name               = "scale-up-business-hours"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
  schedule           = "cron(0 0 ? * MON-FRI *)"  # UTC 0:00 = JST 9:00

  scalable_target_action {
    min_capacity = 5   # ピーク時の下限を引き上げる
    max_capacity = 30  # ピーク時の上限を広げる
  }
}

# 平日21時に縮小（JST = UTC 12:00）
resource "aws_appautoscaling_scheduled_action" "scale_down_off_hours" {
  name               = "scale-down-off-hours"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
  schedule           = "cron(0 12 ? * MON-FRI *)"  # UTC 12:00 = JST 21:00

  scalable_target_action {
    min_capacity = 2   # 夜間の下限に戻す
    max_capacity = 20  # 夜間の上限に戻す
  }
}
```

Scheduled scaling and target tracking **can coexist.** A combination of raising `min_capacity` to keep a warm state during peak hours and having target tracking further finely adjust increase/decrease often works well in production.

---

## Scaling SQS-Driven Workers: The "Backlog Per Task" Pattern

From here is the core of this article. A completely different design from an HTTP service is needed.

### Why CPU Won't Do

An SQS worker has the structure "process if there's a message in the queue, wait if not." **Even with an empty queue, the worker stays up and idle**, so CPU utilization is nearly zero. Conversely, even with a huge pile of messages, if the worker does IO-wait-centric processing (external API calls, DB writes, etc.), CPU utilization stays low.

That is, **there's no correlation between CPU and the number of waiting messages.** Scaling an SQS worker by CPU tracking is like deciding the refueling timing by engine RPM rather than the fuel gauge.

### AWS Recommended: Backlog Per Task

The pattern AWS officially recommends is **"Backlog Per Task"** ([Scaling based on Amazon SQS](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html); the concept is in the EC2 Auto Scaling docs, but the same principle applies to ECS Application Auto Scaling).

The formula is simple.

```text
バックログ・パー・タスク = ApproximateNumberOfMessagesVisible ÷ RunningTaskCount
```

Target-track this toward a **target backlog.** The target-backlog setting is back-calculated from the tolerable latency.

### Computing the Target Backlog (Example)

The following shows the calculation method as a hypothetical example. Derive the actual values from measuring your workload.

```text
例（illustrative values — 実計測値ではありません）:
  - 1メッセージあたりの平均処理時間：5秒
  - タスク1台あたりの同時処理数（concurrency）：1（シングルスレッドワーカー）
  - タスク1台が1分間に処理できるメッセージ数：60秒 ÷ 5秒 = 12件/分
  - 許容メッセージ滞留時間（最大レイテンシ目標）：1分

  → 目標バックログ = 許容レイテンシ(秒) ÷ 1メッセージ処理秒
                   = 60秒 ÷ 5秒
                   = 12

  つまり「タスク1台あたり最大12件のバックログを目標に追跡する」と設定する。
  バックログが36件あればタスクを3台に、120件なら10台に増やす、という挙動になる。
```

### Handling the Division-by-Zero Problem

When RunningTaskCount is 0 (idle, shrunk to min_capacity=0), dividing causes a division by zero. In this case, handle it by either **using the message count itself as the target value** or **treating RunningTaskCount as a minimum of 1.** Absorbing it in the custom-metric publishing logic is the safest.

### Publishing the Custom Metric

SQS-related metrics (`ApproximateNumberOfMessagesVisible`) arrive at CloudWatch automatically, but the ratio with RunningTaskCount (backlog per task) **needs to be computed yourself and published as a CloudWatch custom metric.**

**A TypeScript (a Lambda run periodically by EventBridge Scheduler) example**:

```ts
import {
  CloudWatchClient,
  PutMetricDataCommand,
} from "@aws-sdk/client-cloudwatch";
import {
  SQSClient,
  GetQueueAttributesCommand,
} from "@aws-sdk/client-sqs";
import {
  ECSClient,
  DescribeServicesCommand,
} from "@aws-sdk/client-ecs";

const cw = new CloudWatchClient({});
const sqs = new SQSClient({});
const ecs = new ECSClient({});

const QUEUE_URL = process.env.QUEUE_URL!;
const CLUSTER = process.env.ECS_CLUSTER!;
const SERVICE = process.env.ECS_SERVICE!;
const NAMESPACE = "Custom/ECS";
const METRIC_NAME = "BacklogPerTask";

export async function handler(): Promise<void> {
  // 1) SQS の可視メッセージ数を取得
  const sqsRes = await sqs.send(
    new GetQueueAttributesCommand({
      QueueUrl: QUEUE_URL,
      AttributeNames: ["ApproximateNumberOfMessages"],
    }),
  );
  const visibleMessages = parseInt(
    sqsRes.Attributes?.ApproximateNumberOfMessages ?? "0",
    10,
  );

  // 2) ECS の Running タスク数を取得
  const ecsRes = await ecs.send(
    new DescribeServicesCommand({ cluster: CLUSTER, services: [SERVICE] }),
  );
  const runningCount = ecsRes.services?.[0]?.runningCount ?? 0;

  // 3) バックログ・パー・タスクを計算（0除算を安全に処理）
  //    runningCount=0 のときはメッセージ数をそのまま発行し、
  //    スケールアウトが起動するようにする
  const backlogPerTask =
    runningCount > 0 ? visibleMessages / runningCount : visibleMessages;

  console.log({ visibleMessages, runningCount, backlogPerTask });

  // 4) CloudWatch カスタムメトリクスへ発行
  await cw.send(
    new PutMetricDataCommand({
      Namespace: NAMESPACE,
      MetricData: [
        {
          MetricName: METRIC_NAME,
          Value: backlogPerTask,
          Unit: "Count",
          Dimensions: [
            { Name: "ClusterName", Value: CLUSTER },
            { Name: "ServiceName", Value: SERVICE },
          ],
        },
      ],
    }),
  );
}
```

Run this Lambda **every minute with EventBridge Scheduler.** CloudWatch custom-metric resolution is a minimum of 1 minute, so this is sufficient.

**A Bash (a shell script for manual confirmation / debugging) example**:

```bash
#!/usr/bin/env bash
set -euo pipefail

QUEUE_URL="${QUEUE_URL:?QUEUE_URL not set}"
CLUSTER="${ECS_CLUSTER:?ECS_CLUSTER not set}"
SERVICE="${ECS_SERVICE:?ECS_SERVICE not set}"
REGION="${AWS_REGION:-ap-northeast-1}"

# SQS 可視メッセージ数
VISIBLE=$(aws sqs get-queue-attributes \
  --queue-url "$QUEUE_URL" \
  --attribute-names ApproximateNumberOfMessages \
  --query 'Attributes.ApproximateNumberOfMessages' \
  --output text \
  --region "$REGION")

# ECS Running タスク数
RUNNING=$(aws ecs describe-services \
  --cluster "$CLUSTER" \
  --services "$SERVICE" \
  --query 'services[0].runningCount' \
  --output text \
  --region "$REGION")

if [[ "$RUNNING" -gt 0 ]]; then
  BACKLOG=$(echo "scale=2; $VISIBLE / $RUNNING" | bc)
else
  BACKLOG="$VISIBLE"
fi

echo "visible=$VISIBLE running=$RUNNING backlog_per_task=$BACKLOG"

# CloudWatch に発行
aws cloudwatch put-metric-data \
  --namespace "Custom/ECS" \
  --metric-name "BacklogPerTask" \
  --value "$BACKLOG" \
  --unit "Count" \
  --dimensions "Name=ClusterName,Value=$CLUSTER" "Name=ServiceName,Value=$SERVICE" \
  --region "$REGION"
```

### Terraform: Target Tracking Using the Custom Metric

```hcl
resource "aws_appautoscaling_target" "worker" {
  service_namespace  = "ecs"
  scalable_dimension = "ecs:service:DesiredCount"
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.worker.name}"
  min_capacity       = 0   # アイドル時はゼロに縮む（コスト最適化）
  max_capacity       = 50  # 上限は処理能力と許容コストから設定
}

resource "aws_appautoscaling_policy" "worker_backlog" {
  name               = "sqs-backlog-per-task"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.worker.service_namespace
  resource_id        = aws_appautoscaling_target.worker.resource_id
  scalable_dimension = aws_appautoscaling_target.worker.scalable_dimension

  target_tracking_scaling_policy_configuration {
    customized_metric_specification {
      metric_name = "BacklogPerTask"
      namespace   = "Custom/ECS"
      statistic   = "Average"
      dimensions {
        name  = "ClusterName"
        value = aws_ecs_cluster.main.name
      }
      dimensions {
        name  = "ServiceName"
        value = aws_ecs_service.worker.name
      }
    }
    target_value       = 12    # 上の計算例で求めた目標バックログ
    scale_out_cooldown = 60    # キュー急増への応答を速く
    scale_in_cooldown  = 300   # 処理しきるまでの余裕を持って縮小
    disable_scale_in   = false
  }
}
```

### The Startup Delay of min_capacity=0

Make it `min_capacity=0` (idle-zero) and, when scaling out from zero tasks, Fargate's task startup time (ENI allocation, image pull, app startup combined, roughly **tens of seconds**) is added. During this "cold start" period, no messages are processed, so **if the startup time isn't negligible against your tolerable latency, consider min_capacity=1 or more.** For the payment foundation's workers, I maintained `min_capacity=1` because there was a strict latency SLA.

---

## Anti-Patterns and Troubleshooting

### Flapping (Repeated Increase/Decrease)

**Symptom**: a cycle of scaling in right after scaling out, then scaling out again, doesn't stop.

**Cause**: `scale_in_cooldown` is too short. Scale-in runs before the post-scale-out load settles, and the load rises again.

**Countermeasure**: set `scale_in_cooldown` to several times `scale_out_cooldown` (at least 300 seconds). Temporarily using `disable_scale_in = true` is also effective for flapping diagnosis, but don't use it in production since you lose cost management.

### Forgetting the Health-Check Grace

When scale-out starts a new task and it's registered with the ALB, the health check fails while the app's initialization isn't done. Without setting `health_check_grace_period_seconds`, a starting task is immediately treated as unhealthy and dropped, and **scale-out turns into a death march.**

```hcl
resource "aws_ecs_service" "app" {
  # ...
  health_check_grace_period_seconds = 30  # アプリ起動時間に応じて調整
}
```

Also, don't forget to set the task definition's `healthCheck.startPeriod` ([see implementation ① in the pillar article](/blog/aws-ecs-fargate-production-guide)).

### Don't Kill an In-Progress Task on Scale-In

When scale-in occurs, ECS sends SIGTERM to the task. If an SQS worker receives SIGTERM while processing a message and is killed by SIGKILL after `stopTimeout` (default 30 seconds, max 120 seconds) passes, **the in-progress message is interrupted and can't be retried until the visibility timeout.**

The correct countermeasure is a three-piece set.

1. **Implement a SIGTERM handler** (stop receiving new messages, and drain the in-progress messages to completion).
2. **Set `stopTimeout` to a time sufficient for processing to complete** (max 120 seconds. Set it assuming the worst case where SIGTERM arrives in the middle of processing one message that takes 5 seconds).
3. **Guarantee idempotency** (don't double-process even if redelivered after a SIGKILL).

```ts
// SQS ワーカーの SIGTERM ハンドリング（概念例）
let isShuttingDown = false;

process.on("SIGTERM", () => {
  console.log("SIGTERM received: stopping new message consumption");
  isShuttingDown = true;
  // 処理中のメッセージが完了するのを待ち、stopTimeout内にexitする
});

async function pollMessages(): Promise<void> {
  while (!isShuttingDown) {
    const messages = await receiveMessages();
    for (const msg of messages) {
      await processMessage(msg);      // 冪等な処理
      await deleteMessage(msg);       // 正常終了後にのみ削除
    }
  }
  console.log("Worker gracefully stopped");
  process.exit(0);
}
```

For the detailed implementation patterns of idempotent async processing, see the [SQS・Lambda・EventBridge idempotent async-processing guide](/blog/aws-sqs-lambda-eventbridge-idempotent-async-processing-guide), and for the design of circuit breaking / retries, [retry, backoff, circuit breaker](/blog/retry-backoff-circuit-breaker-resilience-patterns-guide).

### Confirm the Scaling Config Is Effective

```bash
# スケーラブルターゲットの確認
aws application-autoscaling describe-scalable-targets \
  --service-namespace ecs \
  --query 'ScalableTargets[*].{Resource:ResourceId,Min:MinCapacity,Max:MaxCapacity}'

# スケーリングアクティビティ（直近のスケール履歴）
aws application-autoscaling describe-scaling-activities \
  --service-namespace ecs \
  --resource-id "service/<cluster>/<service>" \
  --max-results 10
```

Periodically confirm the scaling history, and build into monitoring whether flapping or unexpected scale-in/out is occurring. Putting it on a dashboard together with [OpenTelemetry × ECS observability](/blog/aws-observability-opentelemetry-sre-ecs) lets you quickly notice anomalies in scaling behavior.

---

## Pre-Production-Release Checklist

- [ ] Confirmed the scalable target is correctly registered with `describe-scalable-targets`
- [ ] Decided `min_capacity` and `max_capacity` from business requirements (cost ceiling, SLA)
- [ ] Is it `scale_out_cooldown < scale_in_cooldown` (asymmetric)?
- [ ] Is the CPU target-tracking `target_value` measurement-based (60–70% as a guide)?
- [ ] When using `ALBRequestCountPerTarget`, is `resource_label` set correctly?
- [ ] Does the SQS worker use backlog-per-task rather than CPU tracking?
- [ ] Does the custom-metric-publishing Lambda run every minute, and can you confirm data points on CloudWatch?
- [ ] If `min_capacity=0`, did you check the cold-start delay against the tolerable latency?
- [ ] Did you set `health_check_grace_period_seconds` to prevent forced termination from a startup-time false detection?
- [ ] Did you implement a SIGTERM handler and verify processing can complete within `stopTimeout`?
- [ ] Do you guarantee idempotency so message double-processing doesn't occur even after scale-in?
- [ ] Did you build the scaling-activity logs into the observability dashboard?
- [ ] Did you check the combination with Fargate Spot from a cost-optimization viewpoint in the [cost-optimization guide](/blog/aws-ecs-fargate-cost-optimization-spot-graviton-savings-plans-guide)?

---

## Summary

ECS on Fargate auto-scaling's production-quality foundation is the 3 points "choose the metric correctly," "prevent flapping with asymmetric cooldown," and "link with graceful shutdown."

- **HTTP service**: target tracking (CPU + ALBRequestCountPerTarget) suffices. If there are bursts, add step scaling.
- **SQS worker**: don't use CPU. Publish backlog-per-task as a custom metric, and target-track with a target value back-calculated from the tolerable latency.
- **Common**: scale-in can't be operated safely without the three-piece set of SIGTERM handling, `stopTimeout`, and idempotency.

What I can say from my experience running a payment foundation and a B2B SaaS in production with [one person × generative AI](/case-studies/payment-platform-reliability) is that "use the tools correctly and world-class infrastructure is within reach even for a small team." If you have a consultation on designing, reviewing, or troubleshooting auto-scaling, feel free to reach out.