Skip to main content
友田 陽大
ECS on Fargate in production
AWS
ECS
Fargate
コンテナ
Terraform
インフラ
コスト最適化
可観測性

AWS ECS on Fargate Production Operation Guide: Designing, Deploying, Costing, and Securing Serverless Containers in Real Code

An ECS on Fargate production operation guide faithful to the AWS official documentation. Systematizes, with Terraform, task-definition JSON, and real code: task-size design (the CPU/memory table), awsvpc networking, rolling updates + deployment circuit breaker, graceful shutdown via SIGTERM, separation of the execution role and the task role, and Fargate Spot and cost optimization.

Published
Reading time
22 min read
Author
友田 陽大
Share

"I want to run containers in production. But I don't want to take care of a Kubernetes cluster, and I can't spare time for EC2 patching or scaling either" — when a startup or a solo developer assembles a production container platform, you almost always arrive here. The answer is AWS Fargate.

On a lumber-distribution SaaS that won the Minister of Economy, Trade and Industry Award, I've operated 221 API endpoints in production on top of a configuration of API Gateway → NLB → ALB → ECS on Fargate. The worker group of the payment platform (0 double charges in production) also runs on Fargate. Being able to run HTTP services, batches, and event-driven workers with the same mechanism without touching a single server is the foundation for producing production quality with a small team.

This article aims to be faithful to the AWS official documentation yet more understandable than the official docs, and to show "in which scene how to use it" in real code. It handles end to end what's needed to go to production — task design, networking, deployment, resilience, security, and cost. For the technology selection itself (ECS or EKS), see the separate article ECS on Fargate vs. EKS: a startup decision framework. This article concentrates on "after choosing ECS on Fargate, how to build it in production."


What is Fargate: the difference from the EC2 launch type

Amazon ECS (Elastic Container Service) has, broadly, two compute resources (launch type / capacity) for running containers.

  • EC2 launch type: you prepare a fleet of EC2 instances (container instances) yourself and pack tasks onto them. OS patching, instance scaling, and bin-packing (packing efficiency) management are your responsibility.
  • Fargate: a serverless method where, just by specifying CPU and memory, AWS prepares, patches, and scales the instances behind it. The concept of a server itself disappears.

The official definition is simple.

AWS Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters of Amazon EC2 instances. (— AWS Fargate)

The most important single sentence for security is this.

Each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.

That is, each Fargate task has an independent isolation boundary and shares neither the kernel, CPU, memory, nor ENI (Elastic Network Interface) with other tasks. In the EC2 launch type, multiple tasks share one instance's kernel, but Fargate is "task = the minimal isolation unit." If you have multi-tenant or strict isolation requirements, this becomes a decisive advantage.

Comparison table: when Fargate and when EC2

AspectFargateEC2 launch type
Server managementUnnecessary (patching / AMI updates also on the AWS side)You operate the OS/AMI/patching yourself
ScalingYou only think about the task countA two-tier scale of the instance fleet and tasks
Isolation boundaryFully isolated per taskTasks share the instance's kernel
Billing unitUsage of vCPU-seconds / memory-seconds (the allocated amount)Instance hours (fixed regardless of utilization)
Launch speedTens of seconds (including ENI allocation)Fast if there's free space on an instance
GPU / special instancesNot supportedSupported (GPU, Inferentia, etc.)
Daemon type (one per host)Not supported (no concept of a host)DAEMON scheduling possible
Cost of constant high loadCan be expensive if utilization is highAdvantageous if high utilization. Also combine Savings Plans

Guideline: when in doubt, Fargate. Because server-management cost (= labor cost) is the biggest cost. The only time you should go back to EC2 is when there's a clear reason like "I need a GPU," "there's a batch group constantly running at over 80% CPU and cost is dominant," or "I need an agent resident one-per-host (a DaemonSet equivalent)."


In what scenes to use it: 3 typical workloads

Fargate is not "for web servers only." In practice, its strength is being able to handle the next 3 forms with the same vocabulary (the task definition).

  1. A resident service (Service): an HTTP API or web app constantly running behind an ALB/NLB. Make it redundant with desiredCount and auto-scale. ← the most common.
  2. A scheduled task (batch): daily aggregation, report generation, and data sync run on cron with EventBridge Scheduler. It runs as a one-off task (RunTask) rather than a service, and billing stops when it finishes.
  3. An event-driven worker: asynchronous processing that scales the task count by the SQS queue length. Payment webhook processing, image conversion, etc. Idempotency and graceful shutdown become the essence.

On the payment platform, I put both "a resident API service" and "SQS-driven idempotent workers" on Fargate, and absorbed with idempotency keys the out-of-order and duplicate arrival of webhooks. Being able to run all 3 forms on the same deploy platform dramatically lowers the cognitive load of operation.


The core components: the relationship of the 4 cast members

ECS has many terms and confuses you at first. The essence is just 4.

Cluster(論理的な箱:複数サービスをまとめる名前空間)
└── Service("常にN個のタスクを保つ"宣言=望ましい状態のコントローラ)
    └── Task(実行中の1単位。1つ以上のコンテナの集合)
        └── Container(あなたのアプリのイメージ)
        ↑
        Task Definition(タスクの設計図:イメージ・CPU/メモリ・IAM・ログ・環境変数)
  • Task Definition: the immutable "blueprint." Versioned by revision number (my-app:1, my-app:2...). A deploy is registering a new revision and swapping it into the service.
  • Task: the entity launched from a task definition. One task = one ENI (awsvpc mode) = one private IP.
  • Service: a declarative controller of "always keep desiredCount tasks of this task definition, healthy." If a task dies it auto-restarts, and it takes care of registering/deregistering with the ALB.
  • Cluster: a logical boundary that bundles services and tasks. In Fargate, there's no "server" in the cluster; it's close to just a namespace.

This declarative model of "the Service keeps maintaining the desired state" is the same idea as Kubernetes's Deployment. That's exactly why declaring the state and entrusting it, rather than docker run-ing by hand, is the correct usage.


Task-size design: CPU and memory are "fixed combinations"

This is Fargate's biggest pitfall. CPU and memory aren't a free combination; you can only choose from predetermined pairs. The official combinations (Task CPU and memory) are these.

CPU (whole task)Selectable memoryStep
256 (.25 vCPU)512 MiB / 1 GB / 2 GBFixed 3 choices
512 (.5 vCPU)1–4 GB1 GB steps
1024 (1 vCPU)2–8 GB1 GB steps
2048 (2 vCPU)4–16 GB1 GB steps
4096 (4 vCPU)8–30 GB1 GB steps
8192 (8 vCPU)16–60 GB4 GB steps (PV 1.4.0+)
16384 (16 vCPU)32–120 GB8 GB steps (PV 1.4.0+)

CPU can be specified as 1024 (CPU units) or 1 vCPU, and memory as 3072 (MiB) or 3 GB. They're converted to internal units at registration.

How it works in practice: for example, even for a workload where "memory of 512 MiB suffices but I want 1 vCPU of CPU," the moment you choose 1024 CPU, you secure (= are billed for) a minimum of 2 GB of memory. The reverse too — if "8 GB of memory is needed," at least 1 vCPU comes along. So the iron rule is to decide the size "after measuring." Take it large on speculation and you keep being billed per second for resources you don't use.

Ephemeral storage (a temporary disk)

A Fargate task has 20 GB of ephemeral storage by default. You can use it for build artifacts, temporary files, and caches. If insufficient, you can expand it to up to 200 GB with the task definition's ephemeralStorage (platform version 1.4.0 and later). It's a volatile area that disappears when the task ends, so if you need persistence, use EFS or S3.

The platform version is LATEST (= Linux 1.4.0)

The LATEST Linux platform version is 1.4.0. (— Fargate platform versions)

1.4.0 is needed for ephemeral-storage expansion, systemControls, UDP NLB, and the like. Use LATEST unless there's a special reason. Because a new task always launches with the latest-revision infrastructure (patched), this is the default safe side from a security standpoint too. ARM64 (Graviton) workloads are also supported, and you can specify ARM64 in cpuArchitecture (it pays off in the cost optimization described later).


Networking: the correct way to connect awsvpc and ALB

Fargate is fixed to the awsvpc network mode. Because each task has a dedicated ENI and private IP, there's no concept of "mapping a host's port" like in EC2. Misunderstand this and you'll definitely get stuck with ALB integration.

Three important points:

  1. The ALB's target group is target_type = "ip". Not instance. Because tasks are tied to ENIs rather than EC2 instances, they're registered as IP targets (officially stated).
  2. The security group attaches to the task's ENI. Minimize it to "only allow ALB's SG → the task's SG (the app's port)." Limit the task's SG inbound to the ALB's SG, and don't open 0.0.0.0/0.
  3. Placement in a private subnet + NAT Gateway is the production standard. You can also place it directly in a public subnet with assignPublicIp=ENABLED, but the attack surface widens. Reach ECR/CloudWatch/Secrets Manager via a VPC endpoint or NAT.

The flow of a request becomes this.

Internet → ALB(public subnet) → Target Group(type=ip)
        → Task ENI(private subnet, SG=allow only ALB's SG) → container:8080
                                                   ↘ NAT GW → ECR / Secrets Manager / CloudWatch

If you need service discovery (service-to-service communication), use ECS Service Connect (or Cloud Map) for DNS-name-based name resolution, keeping internal communication loosely coupled without adding an ALB.


Implementation ①: a minimal task definition (JSON)

First, grasp "what's mandatory" with a minimal task definition. The key points are noted in the comments.

{
  "family": "web-api",
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  },
  "executionRoleArn": "arn:aws:iam::111122223333:role/web-api-exec",
  "taskRoleArn": "arn:aws:iam::111122223333:role/web-api-task",
  "containerDefinitions": [
    {
      "name": "app",
      "image": "111122223333.dkr.ecr.ap-northeast-1.amazonaws.com/web-api:1a2b3c4",
      "essential": true,
      "user": "10001:10001",
      "readonlyRootFilesystem": true,
      "linuxParameters": { "initProcessEnabled": true },
      "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
      "environment": [{ "name": "NODE_ENV", "value": "production" }],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:ap-northeast-1:111122223333:secret:prod/db-Ab12Cd"
        }
      ],
      "stopTimeout": 60,
      "healthCheck": {
        "command": ["CMD-SHELL", "wget -q -O - http://localhost:8080/healthz || exit 1"],
        "interval": 15,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 30
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-api",
          "awslogs-region": "ap-northeast-1",
          "awslogs-stream-prefix": "app"
        }
      }
    }
  ]
}

Points that pay off in production:

  • Use an immutable reference equivalent to a digest (a CommitSHA tag) for image, not a tag. latest breaks reproducibility. By default ECS resolves the tag to a digest with versionConsistency: enabled, guaranteeing all tasks in the service run with the same image.
  • Non-root execution with user + readonlyRootFilesystem: true. Apply tmpfs with volumes only to paths that need writing. The basics of minimizing the attack surface.
  • secrets[].valueFrom injects secrets from Secrets Manager / SSM Parameter Store. Don't write them in plaintext in environment variables (described later).
  • stopTimeout is 30 seconds by default, 120 seconds max. The grace for graceful shutdown (described later).
  • healthCheck.startPeriod gives a grace right after startup, preventing forced termination from false detection during initialization.

Implementation ②: a full production service set with Terraform

A task definition alone isn't production. It becomes "unbreakable, reproducible" only by declaring the cluster, service, ALB, SG, logs, and auto-scale with IaC. Let me assemble the set with Terraform (a configuration narrowed to the key points). I leave Terraform module design, state separation, and drift detection to another article, and concentrate here on the ECS-specific parts.

# --- クラスタ:Container Insights を有効化(可観測性の土台) ---
resource "aws_ecs_cluster" "main" {
  name = "prod"
  setting {
    name  = "containerInsights"
    value = "enhanced" # 拡張オブザーバビリティ。コスト許容なら本番推奨
  }
}

# --- タスクのSG:インバウンドは ALB のSGからのみ ---
resource "aws_security_group" "task" {
  name_prefix = "web-api-task-"
  vpc_id      = var.vpc_id
  lifecycle { create_before_destroy = true }
}
resource "aws_vpc_security_group_ingress_rule" "from_alb" {
  security_group_id            = aws_security_group.task.id
  referenced_security_group_id = aws_security_group.alb.id
  ip_protocol                  = "tcp"
  from_port                    = 8080
  to_port                      = 8080
}
resource "aws_vpc_security_group_egress_rule" "all_out" {
  security_group_id = aws_security_group.task.id
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0" # NAT経由でECR/Secrets/CloudWatchへ
}

# --- ALB ターゲットグループ:Fargateは必ず target_type = "ip" ---
resource "aws_lb_target_group" "app" {
  name        = "web-api"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"
  deregistration_delay = 30 # 接続ドレイン。既定300sは過剰なことが多い
  health_check {
    path                = "/healthz"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 15
    timeout             = 5
    matcher             = "200"
  }
}

# --- サービス:ローリング更新+デプロイサーキットブレーカー ---
resource "aws_ecs_service" "app" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"
  platform_version = "LATEST"
  enable_execute_command = true # ECS Exec によるブレークグラス

  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent         = 200
  deployment_circuit_breaker {
    enable   = true
    rollback = true # 失敗を検知したら前リビジョンへ自動ロールバック
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }
  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }
  health_check_grace_period_seconds = 30
}

The combination of desired_count = 2, minimum_healthy_percent = 100, and maximum_percent = 200 means a zero-downtime rolling of "while always keeping 2 tasks healthy, temporarily increase up to 4 tasks to launch the new version, confirm it's healthy, then drop the old version." I'll explain it accurately in the next section.


Deployment: rolling updates and the deployment circuit breaker

ECS's default deployment is a rolling update (the ECS type). The behavior is decided by two parameters (official).

  • minimumHealthyPercent: the lower bound of the task count that "must be running healthy" during the deploy (%, rounded up). E.g. min 50%, desired 4, and you can stop up to 2 old-version tasks before launching 2 new-version ones.
  • maximumPercent: the upper bound of the task count you "may launch" during the deploy (%, rounded down). E.g. max 200%, desired 4, and you can launch 4 new-version tasks before stopping 4 old-version ones.

min 100% / max 200% is the safest side, fully launching the new version without dropping availability at all, then dropping the old version (in exchange, resources temporarily double). If cost-prioritizing, there's also the choice of min 50% / max 100% to swap without increasing resources.

Automatically "rolling back" failures is the circuit breaker

This is the dividing line of production quality. The deployment circuit breaker prevents the accident of, without noticing the new version is in a crash loop, flowing traffic to it.

Both methods support rolling back to the previous service revision. (— official. Both the circuit breaker and CloudWatch alarms support rollback to the previous revision)

Put in deployment_circuit_breaker { enable = true, rollback = true }, and if the new task doesn't launch (doesn't pass the health check) a prescribed number of times, it treats the deploy as failed and automatically reverts to the previous healthy revision. Further, if you want to base it on the app's business metrics (error rate, etc.), you can also use CloudWatch alarm linkage. Enable both and it fails / rolls back the moment either condition is met.

Version consistency via image digest

By default ECS resolves the tag to an image digest and guarantees all tasks in the service run with the same binary (versionConsistency). It structurally prevents the accident of "rebuilt and the contents of latest changed, so only some tasks are a different thing." Together with the operation of making CommitSHA the tag and not depending on latest, ensure reproducibility.

CI/CD assembled keyless with OIDC is the 2026 standard (don't place long-lived access keys). For specifics, see keyless CI/CD with GitHub Actions OIDC. The deploy itself is either registering a new revision and swapping with aws ecs update-service --force-new-deployment, or using the amazon-ecs-deploy-task-definition action.


Resilience / idempotency: receive SIGTERM and "finish cleanly"

What's most overlooked and most accident-prone in Fargate is graceful shutdown. The task stops on every deploy, scale-in, and Fargate Spot interruption. At that time, ECS takes the next steps.

  1. Deregister the task from the ALB target (stop new requests, and wait for in-flight connections during deregistration_delay).
  2. Send SIGTERM to the container.
  3. Wait stopTimeout (30 seconds by default, 120 seconds max).
  4. If it still doesn't finish, force-terminate with SIGKILL.

The SIGTERM signal must be received from within the container to perform any cleanup actions. Failure to process this signal results in the task receiving a SIGKILL signal after the configured stopTimeout and may result in data loss or corruption. (— official)

That is, the app is responsible for catching SIGTERM, handling in-flight requests to completion, and cleanly closing DB connections and queue reception. Neglect this and on every deploy, in-progress processing is killed with SIGKILL, causing data corruption and missed webhooks. In Node.js, you write it like this.

// graceful-shutdown.ts — SIGTERM を握って in-flight を捌き切る
import http from "node:http";

export function installGracefulShutdown(
  server: http.Server,
  opts: { drainMs: number; onClose: () => Promise<void> },
): void {
  let shuttingDown = false;

  const shutdown = async (signal: NodeJS.Signals): Promise<void> => {
    if (shuttingDown) return; // 二重発火を冪等に無視
    shuttingDown = true;
    console.info({ msg: "shutdown:start", signal });

    // 1) 新規接続を止める。処理中のレスポンスは待つ
    server.close(() => console.info({ msg: "shutdown:http-closed" }));

    // 2) drain 上限を stopTimeout より短く張る(SIGKILL より先に終える)
    const deadline = new Promise<void>((r) => setTimeout(r, opts.drainMs));

    // 3) DB プール・キュー consumer など外部資源を閉じる
    await Promise.race([opts.onClose(), deadline]);
    console.info({ msg: "shutdown:done" });
    process.exit(0);
  };

  process.on("SIGTERM", shutdown); // ECS が送るのはこれ
  process.on("SIGINT", shutdown);  // ローカル Ctrl-C 用
}

The iron rule is to set drainMs shorter than stopTimeout (e.g. drainMs: 50_000 against stopTimeout: 60). So as to cleanly exit(0) on your own before SIGKILL comes. Put in linuxParameters.initProcessEnabled: true and you also avoid the PID 1 zombie-process problem (signals don't propagate correctly).

The relationship with idempotency: for an SQS-driven worker, a design of "even if killed midway, don't double-process a redelivered message" is mandatory. This isn't Fargate-specific but a matter of distributed processing in general, and the principles of idempotent async processing apply directly. Graceful shutdown (don't miss anything) and idempotency (don't double-process) are two wheels of a cart.


Auto-scaling: follow the measured values

Fargate's horizontal scale is assembled with Application Auto Scaling's Target Tracking. Declare a target value like "keep CPU utilization at 60%," and it increases desiredCount when exceeded and decreases it when below.

resource "aws_appautoscaling_target" "app" {
  service_namespace  = "ecs"
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  min_capacity       = 2
  max_capacity       = 20
}

resource "aws_appautoscaling_policy" "cpu" {
  name               = "cpu-tt"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.app.service_namespace
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 60.0
    scale_in_cooldown  = 120
    scale_out_cooldown = 30 # 増やすのは速く、減らすのは慎重に
  }
}

Choosing a metric: if CPU-bound, ECSServiceAverageCPUUtilization; for memory, ...MemoryUtilization. If you want to follow HTTP traffic straightforwardly, ALBRequestCountPerTarget (requests per target) matches the feel best. Making scale_out_cooldown short and scale_in_cooldown long is the standard — increase immediately, decrease cautiously (preventing "flapping," where you shrink right after a spike and then scramble again).


Observability: a state where you can trace a stuck process at a glance

The smaller the team, the more observability is the lifeline. In Fargate, set up the following from the start.

  • Container Insights: auto-collect CPU/memory/network/task count. Set it to enhanced and you get finer per-container metrics.
  • Logs: to CloudWatch Logs with the awslogs driver. Make them JSON structured logs and always pass a correlation ID (request ID). For advanced requirements (multiple destinations, parsing, filtering), use FireLens (Fluent Bit) and route to CloudWatch/S3/OpenSearch, etc.
  • Traces: collect distributed tracing as a sidecar with OpenTelemetry. SRE practice on ECS is detailed in observability with OpenTelemetry × ECS.

ECS Exec: "break-glass" into a production container

Without SSH, port opening, or key management, you can enter a running container to investigate with ECS Exec.

in production scenarios, you can use it to gain break-glass access to your containers to debug issues. (— ECS Exec)

# サービス/タスクで enableExecuteCommand を有効化した上で:
aws ecs execute-command \
  --cluster prod \
  --task <task-id> \
  --container app \
  --interactive \
  --command "/bin/sh"

The mechanism is SSM Session Manager, and the operation is recorded in CloudTrail, and you can leave the commands and output to CloudWatch/S3 as audit logs. The task role needs the 4 ssmmessages:* actions (CreateControlChannel/CreateDataChannel/OpenControlChannel/OpenDataChannel). You can also apply fine governance like denying Exec only into production containers with IAM condition keys (ecs:container-name etc.). Being able to leave "who, when, into which task" is the auditability SSH doesn't have.


Security: don't confuse the execution role and the task role

This is the point most often gotten wrong in Fargate production. There are two kinds of IAM roles, and their roles are completely different.

RoleWho uses itWhat forTypical permissions
Execution role (executionRoleArn)The ECS/Fargate agent"To launch" the taskPull the image from ECR, write to CloudWatch Logs, fetch and inject secrets from Secrets Manager/SSM
Task role (taskRoleArn)Your app codeTo call AWS APIs while runningS3 read/write, DynamoDB, SQS send/receive, etc. — the least privilege the app needs

The official distinction is clear.

The permissions granted in the IAM role are vended to containers running in the task. This role allows your application code to use other AWS services. (— the task role) These permissions aren't accessed by the Amazon ECS container and Fargate agents. For the IAM permissions that Amazon ECS needs to pull container images and run the task, see Amazon ECS task execution IAM role. (— the difference from the execution role)

Principles:

  • Consolidate secret fetching on the execution role (because the resolution of secrets[].valueFrom is done by the agent at launch). If the app directly hits Secrets Manager while running, attach that to the task role.
  • The app's AWS access is the task role. Create a dedicated role per service / task definition, and make it least privilege. "A do-anything role common to all tasks" is the biggest anti-pattern.
  • Because each task has an independent isolation boundary in Fargate, the problem of "co-located tasks' credentials leaking," like an EC2 instance profile, structurally doesn't easily happen.

Don't put secrets in plaintext in environment variables

Write a DB password in plaintext in environment and it leaks to everyone who can DescribeTaskDefinition. Always inject via secrets from Secrets Manager or SSM Parameter Store, and grant the execution role secretsmanager:GetSecretValue (and KMS decryption permission) at the minimum scope. This is an extension of "put secrets in env, don't put them in code," consistent with this portfolio's root convention.

Three more to harden

  • Non-root execution (user: "10001:10001") + readonlyRootFilesystem: true. Apply tmpfs only where writing is needed.
  • Image scanning: stop known vulnerabilities before shipping with ECR scanning (enhanced scanning / Inspector integration).
  • Keep the task definition minimal: privileged and host-system sharing are impossible in Fargate in the first place. Don't add unnecessary linuxParameters.

For boundary defense including WAF and defense-in-depth, see AWS WAF defense-in-depth.


Cost optimization: lean usage-based billing toward "only what you used"

Fargate is per-second billing against the allocated vCPU and memory (minimum 1 minute). Rather than "buying an instance and diluting it by utilization" like EC2, you're billed for the allocated amount itself while the task is running. So the direction of optimization is clear.

  1. Right-sizing: look at actual usage in Container Insights and trim excessive allocation. As mentioned, CPU/memory are fixed combinations, so choose the minimal pair on the premise that "raise one and its partner also rises."
  2. Lean toward ARM64 (Graviton): just by making it cpuArchitecture: ARM64, you can run equivalent performance at a unit price about 20% lower than x86 (a multi-arch build is needed). The more CPU-bound the resident service, the more it pays off.
  3. Fargate Spot: run interruption-tolerant workloads (batches, stateless workers, dev environments) at a big discount. The cost is that "when AWS says to return the capacity, it's interrupted with a 2-minute warning (SIGTERM)." If you've implemented graceful shutdown, this is a sufficiently acceptable trade-off.
  4. Compute Savings Plans: commit the baseline portion of constantly-running production services for 1 year / 3 years to lower the unit price. Combine with Spot and split layers as "the base is a discount commitment, the burst is on-demand/Spot."

Capacity-provider strategy: safely mix Spot with base and weight

Mix on-demand and Spot with a capacity-provider strategy (official).

  • base: the minimum task count to secure with that provider (settable on only one provider, default 0).
  • weight: after base is satisfied, at what ratio to allocate additional tasks to each provider.
# 「最低2タスクは必ずオンデマンドで確保。それを超える分は Spot:オンデマンド = 4:1 で割る」
default_capacity_provider_strategy {
  capacity_provider = "FARGATE"
  base              = 2
  weight            = 1
}
default_capacity_provider_strategy {
  capacity_provider = "FARGATE_SPOT"
  base              = 0
  weight            = 4
}

With this, you can procure the scale-out portion cheaply while protecting the availability baseline with on-demand. On a Spot interruption, the service scheduler looks at free capacity and automatically tries to launch another task (if capacity is exhausted, it waits until recovery). The interruption also flows to EventBridge's task-state-change event as a SpotInterruption, so put it on monitoring.

The overall FinOps thinking (tags, budget alerts, continuous waste reduction) is summarized in AWS startup cost optimization.


Pre-production-release checklist

The items I always confirm before shipping.

  • Task size is based on measured values. Did you choose the minimum from the fixed CPU/memory pair
  • platform_version is LATEST (= 1.4.0). Did you consider whether you can make cpuArchitecture ARM64
  • awsvpc + ALB target_type=ip. The task SG receives only from the ALB's SG (you don't open 0.0.0.0/0)
  • Private-subnet placement. assign_public_ip=false, outbound via NAT/VPC endpoint
  • Separate the execution role and the task role. The task role is service-dedicated & least privilege
  • Secrets are injected with secrets[].valueFrom. You don't place plaintext in environment
  • Non-root execution + readonlyRootFilesystem. Passes ECR image scanning
  • The image is a CommitSHA tag (not depending on latest). versionConsistency is in effect
  • Rolling update + deployment_circuit_breaker {rollback=true} enabled
  • SIGTERM handling implemented with drainMs < stopTimeout. initProcessEnabled: true
  • startPeriod in healthCheck, health_check_grace_period_seconds on the service
  • Application Auto Scaling (target tracking) with min/max and asymmetric cooldown set
  • Container Insights enabled, structured logs + correlation ID, break-glass possible with enable_execute_command
  • Cost: Spot + capacity provider strategy (base/weight) / layer-splitting with Savings Plans

Summary: Fargate is a tool for "erasing servers and concentrating on production quality"

The essence of ECS on Fargate is being able to erase the biggest cost — server management (labor cost) — and concentrate on production quality itself. The key points this article pinned down for that were four.

  1. Design: CPU/memory are fixed pairs. Measure and right-size, and connect correctly with awsvpc + ALB (target_type=ip).
  2. Deployment: rolling update + automatic rollback with the circuit breaker. Ensure version consistency with the digest.
  3. Resilience: catch SIGTERM and finish cleanly within stopTimeout. Prevent missed and double processing as two wheels with idempotency.
  4. Safety and cost: separate the execution role and the task role at least privilege. Lean usage-based billing toward "only what you used" with ARM64, Spot, and Savings Plans.

With this pattern, I've operated an award-winning SaaS with 221 endpoints and a payment platform with 0 double charges in production, with a small team. Even with one person × generative AI, as long as you don't break the pattern faithful to the official documentation, you can reproduce world-class robustness. When you want to put your product's container platform into production fast, cheap, and safely, please consult me.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading