ECS on Fargate Cost-Optimization Complete Guide: From Understanding the Pricing Model to Graviton, Fargate Spot, and Savings Plans

"Fargate needs no server management. But the cost is unreadable" — this anxiety is legitimate. With EC2 you can estimate by "instance time × count," but Fargate bills per second against the allocated vCPU and memory, so all three variables of task count, size, and uptime are entangled. But flip it over and it also means the optimization variables are organized.

I have run a lumber-distribution SaaS that won the Minister of Economy, Trade and Industry Award (API Gateway → NLB → ALB → ECS on Fargate, 221 endpoints) in production with a small team. The cost of a container foundation can't be improved without visualizing "what you pay how much for." This article systematizes that visualize → reduce process, starting from an accurate understanding of the pricing model and proceeding in the order right-sizing → ARM64 (Graviton) → Fargate Spot → Savings Plans.

For the whole picture of building ECS on Fargate in production, see the pillar article. This article is a specialized treatise on cost optimization.

Why There's an "Order That Works": The Logic of Optimization

Try optimizations randomly and you can't measure the effect. For Fargate, following this order makes the later moves work multiplicatively.

Stage	Technique	Why it works
1	Right-sizing	Cut excessive allocation. Shrinks the denominator of all later discounts
2	ARM64 (Graviton)	Unit price drops even at the same task size (more impact the more it runs always)
3	Fargate Spot	Run interruption-tolerant workloads at a large discount
4	Compute Savings Plans	Cut the unit price of the always-running baseline with a long-term commit

The right way is to erase waste with right-sizing before applying Spot or SP. Use Spot while still over-sized and, even with a bigger discount, the absolute cost tends to stay high.

Decomposing the Pricing Model: What You're Charged For

The Unit of Billing

Fargate billing is simple (official pricing page).

The vCPU allocated at the task level × seconds (vCPU-second)
The memory allocated at the task level × seconds (GB-second)
Minimum billing time: 1 minute (even if a task finishes in 1 second, 60 seconds' worth)
Ephemeral storage: free up to 20 GB, the excess separately charged in GB-months (up to 200 GB)

Utilization is irrelevant to billing. Even if a task allocated 1 vCPU/2 GB actually uses only 5% CPU, billing is incurred against the 1 vCPU/2 GB seconds. The first understanding is to not conflate it with EC2 (instance-time billing).

CPU/Memory Are Fixed Pairs

A Fargate task size can only be chosen from a fixed set of pairs, not a free combination (official documentation).

Task CPU	Selectable memory	Note
256 (.25 vCPU)	512 MiB / 1 GB / 2 GB	Fixed 3 choices
512 (.5 vCPU)	1–4 GB	1 GB steps
1024 (1 vCPU)	2–8 GB	1 GB steps
2048 (2 vCPU)	4–16 GB	1 GB steps
4096 (4 vCPU)	8–30 GB	1 GB steps
8192 (8 vCPU)	16–60 GB	4 GB steps (PV 1.4.0+)
16384 (16 vCPU)	32–120 GB	8 GB steps (PV 1.4.0+)

The pitfall: even when "512 MiB memory is enough, but I want 1 vCPU CPU," the moment you choose 1024 CPU, memory starts from a minimum of 2 GB. Starting from measurement on the premise of "raise one and its partner rises too" is mandatory.

The Difference in Billing Structure from EC2

Viewpoint	Fargate	EC2 (on-demand)
Billing target	Seconds of allocated vCPU/memory	Instance uptime (utilization-irrelevant)
Relation to utilization	None (same amount at 5% or 100% use)	None (instance time fixed)
When idle	Zero if you drop the task	Billing continues unless you stop the instance
Direction of optimization	Minimize allocation size → minimize count	Bin-packing + RI/SP
Management cost	Unneeded (zero labor)	OS/AMI/patch management occurs

Fargate's value of "zero server-management cost" isn't visible in a simple time-unit-price comparison. Comparing by Total Cost of Ownership (TCO) including labor is the legitimate way. Unless it's a heavy batch group running a large number at constantly over 80% CPU, Fargate is usually advantageous.

Right-sizing: Measure and Go to the Smallest Pair

Why Do It First

Right-sizing acts as a multiplier on all discount techniques. Shrink a task from 2 vCPU/4 GB to 1 vCPU/2 GB and both the Spot discount and the Savings Plans discount apply to a smaller principal. Stack discounts later while still at a large size and the absolute cost stays high.

Measurement Tools

Enable Container Insights (cluster setting containerInsights: enhanced) and you can get the actual CPU/memory usage time-series per task/container. Confirm "the 99th percentile utilization over the past 30 days" from CloudWatch Metrics, and a size that adds 20–30% headroom to it is the first target pair.

AWS Compute Optimizer supports Fargate, and analyzing the past 14 days' metrics, it shows "this task is currently over-provisioned" and "the recommendation is XXX CPU/YYY GB." The realistic workflow is to use it as the starting point of judgment and have a human make the final decision against the fixed-pair constraint.

Ephemeral Storage Is an Easy-to-Miss Cost

The default 20 GB is free, but a Docker build cache or temporary files may have piled up and expanded it unnecessarily. Check the task definition's ephemeralStorage.sizeInGiB and narrow it to just the amount actually needed. The principle is to persist build artifacts to S3 and minimize the runtime cache.

ARM64 (Graviton): The Easiest Move to Lower the Unit Price

What Graviton Is

The AWS Graviton (ARM64 architecture) processor is an ARM-family chip AWS designed. On Fargate you can switch with one line in the task definition. AWS claims Graviton improves the price-performance ratio (up to about 40%) — in this article I cite it strictly as "AWS's claimed value." How it actually pans out on your workload depends on measurement.

{
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  }
}

In the Fargate price table too, ARM64 (Graviton) is set at a lower unit price than x86 (X86_64). Because the unit price drops just by choosing ARM64 even at the same vCPU/memory size, as long as you finish a multi-arch build of the image, the more it runs always, the more it directly improves the cost.

What Workloads It Works For

HTTP services (Node.js/Go/Python/Java): ARM-optimized runtimes are available, so migration is easy
Data-processing batches: the more CPU-bound, the more the unit-price difference works
Stateless workers: if no native binaries are mixed into the dependency libraries, migration is straightforward

What needs care is the case of using x86-only native binaries (some C libraries, vendor-supplied closed binaries, etc.). Comb through the dependencies and confirm whether an ARM64 build exists.

Multi-Arch Build (buildx)

To run on ARM64, you need a Docker image for linux/arm64. With Docker's buildx and BuildKit, you can make a multi-platform image at once.

# QEMU エミュレーションのセットアップ（CI 環境で一度だけ）
docker run --privileged --rm tonistiigi/binfmt --install all

# linux/amd64 と linux/arm64 の両対応イメージをビルドして ECR へ push
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag 111122223333.dkr.ecr.ap-northeast-1.amazonaws.com/web-api:${COMMIT_SHA} \
  --push \
  .

Because ECR supports multi-architecture manifests, the tag stays single, and if Fargate specifies ARM64, the ARM64 layer is fetched automatically.

A snippet when assembling it in GitHub Actions.

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

- name: Build and push multi-arch image
  uses: docker/build-push-action@v6
  with:
    context: .
    platforms: linux/amd64,linux/arm64
    tags: ${{ env.ECR_REGISTRY }}/web-api:${{ github.sha }}
    push: true
    cache-from: type=gha
    cache-to: type=gha,mode=max

With the cache by cache-from: type=gha working, you can greatly curb the CI-time increase of a multi-arch build.

The Decision Flow for Graviton Migration

依存ライブラリにARM64非対応のネイティブバイナリがあるか？
  → Yes: そのライブラリのARM64版を探す／迂回策を検討
  → No:  ローカルで linux/arm64 コンテナを動かしてスモークテスト
           → テスト通過: タスク定義をARM64に変更してステージング検証
              → 問題なし: 本番切り替え（ローリング更新で無停止）

Fargate Spot: Run Interruption-Tolerant Workloads at a Large Discount

What Fargate Spot Is

Fargate Spot is a mechanism that leverages AWS's spare capacity (unused Fargate foundation). In exchange for the risk of being interrupted with a 2-minute-prior warning when AWS needs the capacity, you can run at a markedly lower price than normal (the discount rate fluctuates; AWS and general public information introduce it as a large discount).

The interruption mechanism is per the official docs.

When AWS reclaims spot capacity, a EventBridge task-state-change event (TASK_STATE_CHANGE) is sent to the target task
SIGTERM is sent to the container
The task is force-terminated 2 minutes later
For a service, the scheduler tries to start another task using available capacity

If you've already implemented graceful shutdown (receiving SIGTERM and finishing processing), 2 minutes is sufficient grace for many workloads. The production-operations guide details the SIGTERM-handler implementation, and the premise of Spot use is that this implementation is complete.

Workloads Suited / Unsuited to Spot

Workload	Suitability to Spot	Reason
Daily batches, periodic reports	Suited	Can be recovered at the next run even if interrupted
SQS-driven async workers	Suited	Messages are redelivered (idempotent design premise)
Dev/staging environments	Suited	Low business impact even if interrupted
Load-test tasks	Suited	Cost-efficient for a temporary large run
Real-time HTTP API (desiredCount ≤ 2)	Unsuited	Interruption directly becomes user impact
Stateful DB-migration tasks	Unsuited	Interruption may leave a half-done state
Processing with a committed SLA	Unsuited	Interruption probability complicates quality assurance

The important premise for using Spot with an SQS-driven worker is idempotency. If a task receives a SIGKILL on a Spot interruption, the in-progress message is redelivered after the visibility timeout expires. Without a "same result even if the same message is processed twice" design, Spot is dangerous.

The Capacity-Provider Strategy: Designing base and weight

To mix Spot and on-demand in an ECS service, use the capacity-provider strategy. Understanding 2 parameters is the core.

base: the minimum task count to secure with that provider (base can be set for only 1 provider, default 0)
weight: after base is satisfied, in what ratio to distribute additional tasks to the providers

resource "aws_ecs_service" "worker" {
  name            = "async-worker"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.worker.arn

  # launch_type は指定しない（容量プロバイダ戦略と排他）
  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    # base = 1: 最低1タスクは必ずオンデマンドで確保（Spot枯渇時のフォールバック）
    base   = 1
    weight = 1
  }
  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    base   = 0
    weight = 4 # base 充足後の追加タスクは SPOT:オンデマンド = 4:1 で調達
  }
}

This strategy means "guarantee the first 1 task with on-demand, and procure about 80% of the scale-out beyond it from Spot." Even if Spot exhaustion or a service-interruption event comes, at least 1 task of on-demand survives.

Match the base task count to the service's minimum availability requirement. There's also a pattern of, for an HTTP service needing redundancy (2+ tasks) made interruption-tolerant, setting base = 2 and putting the burst beyond it on Spot.

Monitoring Spot Interruptions

Spot interruptions can be obtained with EventBridge's ECS Task State Change event. Filtering on stopReason: "Spot interruption" and connecting to a CloudWatch Alarm or Slack notification raises operational visibility.

{
  "source": ["aws.ecs"],
  "detail-type": ["ECS Task State Change"],
  "detail": {
    "lastStatus": ["STOPPED"],
    "stopCode": ["SpotInterruption"]
  }
}

Connect a Lambda to this event and aggregate, and you can grasp metrics like "the Spot interruption rate over the past 7 days." In a Region/AZ where the interruption rate is high (capacity tight), consider revising the weight distribution or raising the on-demand ratio.

Compute Savings Plans: Cutting the Unit Price of the Always-Running Baseline

What Savings Plans Are

Compute Savings Plans is a mechanism to receive a lower unit price than on-demand pricing by committing to a constant amount of compute usage ($/hour) for 1 or 3 years (official). Because Compute SP covers Fargate, Lambda, and EC2, the discount keeps applying even if the workload shifts to any form.

Important characteristics:

The commit is "an amount ($/hour)," not bound to a specific Region, service, or task size
Usage exceeding the committed amount falls back to on-demand pricing
A 3-year commit has a larger discount rate than a 1-year commit (AWS's claimed value)
You can choose prepayment (All Upfront / Partial Upfront / No Upfront)

Layering Baseline, On-Demand, and Spot

To use Savings Plans most efficiently, secure the "baseline" of always-running compute with SP, and receive the burst above it with on-demand and Spot.

コスト層の設計（例：本番HTTPサービス）
────────────────────────────────────────
  高負荷バースト層    → Fargate Spot（中断耐性があれば）
  定常バースト層      → Fargate オンデマンド
  常時稼働ベース層    → Compute Savings Plans で単価削減
────────────────────────────────────────

To compute the baseline's commit amount, confirm the minimum compute consumption ($/hour) over the past 30–90 days in AWS Cost Explorer, and commit 90–95% of it (a 100% commit is high-risk). A commit shortage merely falls back to on-demand, so even somewhat conservative, you don't lose.

The Difference Between EC2-Type SP and Compute SP

Type	Target	Flexibility
EC2 Instance SP	Only EC2 of a specific family/Region	Low (strongly bound, larger discount)
Compute SP	EC2 + Fargate + Lambda (all Regions)	High (less bound, smaller discount)

If Fargate is the main target, choose Compute SP. Even if the workload later moves to EC2 or Lambda, only the SP's consumption destination changes; nothing goes to waste.

Other Cost Reductions: Logs, Unneeded Resources, Tags

CloudWatch Logs Retention Period

By default the CloudWatch Logs retention period is "indefinite." Flow high-frequency access logs as-is in production and log-storage cost piles up.

resource "aws_cloudwatch_log_group" "ecs_app" {
  name              = "/ecs/web-api"
  retention_in_days = 30  # 要件に合わせて調整。デフォルト無期限は避ける
}

Archive logs needing long-term retention to S3 (CloudWatch Logs → Kinesis Firehose → S3) and retain them in S3's S3-IA / Glacier Instant Retrieval to greatly curb cost.

Filtering Logs with FireLens

Use FireLens (a Fluent Bit sidecar) and you can filter and aggregate logs before sending them to CloudWatch. You can realize an operation of excluding health-check or debug-level logs from CloudWatch emission to cut cost, while retaining them in S3.

{
  "name": "log-router",
  "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable",
  "essential": true,
  "firelensConfiguration": {
    "type": "fluentbit"
  }
}

A configuration of removing HEALTH or DEBUG levels from the CloudWatch-bound flow with a Fluent Bit Filter and routing them to S3 is especially effective for production services with many access logs.

Stopping Unneeded Tasks/Environments

For dev/staging-environment ECS services, just setting desiredCount = 0 outside business hours makes the cost zero.

# 夜間停止（EventBridge Scheduler から Lambda 経由で叩くか、直接 CLI）
aws ecs update-service \
  --cluster dev \
  --service web-api \
  --desired-count 0

Automating the morning restart / nightly stop with EventBridge Scheduler dramatically reduces the cost of the dev environment.

Cost Allocation by Tags

Attaching tags to ECS services / task definitions lets you get a per-service, per-environment breakdown in AWS Cost Explorer.

resource "aws_ecs_service" "app" {
  # ...
  tags = {
    Environment = "production"
    Service     = "web-api"
    Team        = "platform"
    CostCenter  = "api-platform"
  }
}

Without tag allocation you only see the whole lump of Fargate cost and can't tell which service is expensive. Combine it with the whole FinOps design and it's important to bake in cost visualization from the start.

Cost Comparison Table (Example)

The following is a purely illustrative estimate. The actual pricing differs by Region and time. Confirm the latest values on the AWS official pricing page.

Premise (example): Tokyo region, 1 vCPU / 2 GB, 720 hours/month (always running), a scale record of 24 tasks

Pattern	Configuration	The image of relative cost	Note
Baseline	x86 on-demand × 24 tasks	100 (reference)	All tasks on-demand, x86
Graviton only	ARM64 on-demand × 24 tasks	Lower unit price than x86 in AWS's claimed value	Just change to cpuArchitecture=ARM64
Spot mix	ARM64 on-demand × 4 + ARM64 Spot × 20	Lower by the Spot discount	Spot interruption-tolerant design premise
SP + Spot	ARM64 Spot (Spot portion) + Compute SP (base portion)	Further unit-price reduction	SP baseline-commit premise

Caution: the "image of relative cost" above is to show the structural tendency and doesn't guarantee actual figures. Get the concrete Savings Plans discount rate from AWS Cost Explorer's "Savings Plans recommendations" and use it for the commit decision.

Terraform: A Complete Snippet of the Capacity-Provider Strategy

A configuration using Graviton + on-demand for a production service (HTTP API) and Graviton + Spot for a batch worker.

# ── クラスタ ───────────────────────────────────────────────
resource "aws_ecs_cluster" "main" {
  name = "prod"
  setting {
    name  = "containerInsights"
    value = "enhanced"
  }
}

# ── タスク定義（ARM64 / Graviton） ────────────────────────
resource "aws_ecs_task_definition" "api" {
  family                   = "web-api"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = "512"
  memory                   = "1024"

  runtime_platform {
    cpu_architecture        = "ARM64"   # Graviton に寄せる
    operating_system_family = "LINUX"
  }

  execution_role_arn = aws_iam_role.exec.arn
  task_role_arn      = aws_iam_role.task.arn

  container_definitions = jsonencode([{
    name      = "app"
    image     = "${aws_ecr_repository.api.repository_url}:${var.image_tag}"
    essential = true
    portMappings = [{ containerPort = 8080, protocol = "tcp" }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.api.name
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = "app"
      }
    }
    stopTimeout = 60
  }])
}

# ── 本番 HTTP サービス：base=2 オンデマンド保証 ─────────────
resource "aws_ecs_service" "api" {
  name            = "web-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 4

  # launch_type は書かない（capacity_provider_strategy と排他）
  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    base              = 2  # 最低2タスクはオンデマンドで常時確保
    weight            = 1
  }
  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    base              = 0
    weight            = 1  # desiredCount > 2 の追加タスクは SPOT:OD = 1:1 で均等調達
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = "app"
    container_port   = 8080
  }

  health_check_grace_period_seconds = 30
}

# ── バッチワーカー：ほぼ全量 Spot ─────────────────────────
resource "aws_ecs_task_definition" "worker" {
  family                   = "async-worker"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = "1024"
  memory                   = "2048"

  runtime_platform {
    cpu_architecture        = "ARM64"
    operating_system_family = "LINUX"
  }

  execution_role_arn = aws_iam_role.exec.arn
  task_role_arn      = aws_iam_role.worker_task.arn

  container_definitions = jsonencode([{
    name      = "worker"
    image     = "${aws_ecr_repository.worker.repository_url}:${var.image_tag}"
    essential = true
    stopTimeout = 110  # Spot の 2 分前 SIGTERM に対して最大限使う
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.worker.name
        "awslogs-region"        = data.aws_region.current.name
        "awslogs-stream-prefix" = "worker"
      }
    }
  }])
}

resource "aws_ecs_service" "worker" {
  name            = "async-worker"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.worker.arn
  desired_count   = 2

  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    base              = 1  # フォールバック用に最低 1 はオンデマンド
    weight            = 1
  }
  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    base              = 0
    weight            = 4  # 追加タスクの 80% を Spot から調達
  }

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.worker_task.id]
    assign_public_ip = false
  }
}

# ── CloudWatch Logs（保持期間を明示する） ─────────────────
resource "aws_cloudwatch_log_group" "api" {
  name              = "/ecs/web-api"
  retention_in_days = 30
}

resource "aws_cloudwatch_log_group" "worker" {
  name              = "/ecs/async-worker"
  retention_in_days = 14  # ワーカーは短め
}

stopTimeout = 110 (the batch worker) is a value to maximally leverage Fargate Spot's 2-minute-prior SIGTERM notification. Because you can use 110 seconds from SIGTERM to SIGKILL, there's grace to complete SQS message processing and DB writes. The upper bound of stopTimeout is 120 seconds.

Cost-Optimization Checklist

Items to confirm before applying cost optimization to production.

Measurement / Visualization

Is Container Insights enabled with enhanced?
Did you confirm the past-30-days P99 CPU utilization / P99 memory utilization per task?
Did you confirm AWS Compute Optimizer's recommendation?
Are Environment/Service/Team tags attached to ECS services / task definitions?
Can you decompose ECS cost per service in AWS Cost Explorer?

Right-sizing

Did you decide the task size with measured value + 20–30% headroom (not by guesswork)?
Did you choose the smallest pair after confirming the CPU/memory fixed-pair constraint?
Did you confirm whether the ephemeralStorage expansion is truly needed (don't specify if the default 20 GB is enough)?

ARM64 (Graviton)

Did you confirm the ARM64 support of dependency libraries / base images?
Did you confirm the linux/arm64 build passes with docker buildx?
Did you confirm the E2E test passes by running the ARM64 image in the staging environment?
Did you change the task definition's runtimePlatform.cpuArchitecture to ARM64?

Fargate Spot

Is the target workload interruption-tolerant (a batch, worker, or dev environment)?
Have you implemented a SIGTERM handler (does processing finish within stopTimeout)?
For an SQS worker, is the idempotency design complete (no double-processing on redelivery)?
Did you set FARGATE's base in the capacity_provider_strategy to secure a fallback?
Did you put the Spot-interruption event (EventBridge) on monitoring?

Compute Savings Plans

Did you confirm Fargate's past-90-days minimum usage ($/hour) in AWS Cost Explorer?
Did you compute the optimal commit amount with the SP recommendation tool (commit 90–95% of the minimum consumption)?
Are you choosing Compute SP, not EC2 Instance SP (Fargate + future flexibility)?
Did you confirm the SP's commit period (1 year / 3 years) matches your business plan?

Others

Did you state the CloudWatch Logs retention period (avoid the default indefinite)?
Is there a design for S3-archiving logs needing long-term retention?
Did you automate the nightly stop (desiredCount=0) of dev/staging environments?

Summary: Cut Cost Structurally in "the Order That Works"

ECS on Fargate cost optimization is a one-way accumulation of measure → right-sizing → ARM64 → Spot → SP.

Optimization without measurement is not optimization. See actual usage with Container Insights and choose the smallest size within the fixed-pair constraint.
Switching to ARM64 (Graviton) works more the more it runs always, as long as you finish a multi-arch build. AWS claims an improved price-performance ratio, but confirm it with measurement.
Fargate Spot presupposes the interruption-tolerant design (graceful shutdown + idempotency) is in place. Use it without that and the failure-occurrence cost outweighs the cost reduction.
Compute Savings Plans is a commit to the always-running baseline. Layering with the SP commit as the base and receiving the burst with on-demand and Spot is the basic form.

In the lumber-distribution SaaS, I run API Gateway → NLB → ALB → ECS on Fargate (221 endpoints) with a small team, and advanced cost optimization in unity with the implementation of graceful shutdown and idempotency. FinOps is inseparable from infrastructure design — see also AutoScaling・SQS-worker design and the Fargate-vs-Lambda selection framework.

If you want to achieve both the cost and quality of a container foundation, by all means consult me.