"I want to run a container on AWS. But I don't know which to choose — Fargate, Lambda, or App Runner." When a startup ships a backend to production, the scene of stalling on this three-way choice happens frequently. The reason for the hesitation is clear: the three services each claim to be "serverless," yet the workloads they target and their constraints are completely different.
At the METI-Minister's-Award-winning lumber-distribution B2B SaaS, I run 221 API endpoints in production with an API Gateway → NLB → ALB → ECS on Fargate configuration. On the payment platform I achieved 0 double charges in production, and the batch and event-driven workers all run on Fargate too. From this implementation experience, I know firsthand how much cost a selection mistake produces.
This article is a selection guide that answers the buyer-intent question of "which should I choose" head-on. It provides the essence of each service, comparison tables, per-use-case instant answers, and a decision tree end-to-end. The implementation details of Fargate are left to the sister article AWS ECS on Fargate production guide; this piece concentrates on the judgment of "which to choose."
The cost of a selection mistake
First, grasp "why selection matters." The typical failure patterns you notice afterward are the following three.
- You built long-running processing on Lambda, but it's cut off at 15 minutes. Frequent in ML inference, large-CSV batches, and video processing. Even splitting it with an SQS queue complicates the design, and you end up rewriting to Fargate.
- You couldn't connect to a private RDS with App Runner. A VPC connector can be added later, but you get stuck if you don't consider the network design from the start.
- A real-time requirement appeared after you started using Fargate without WebSocket. Fargate can keep a WebSocket resident via an NLB, but App Runner has no such control.
A selection mistake turns into technical debt as a migration cost (re-design, re-implementation, re-deploy). Choosing right at first protects the mid-to-long-term speed of the team.
The essence of the three services
AWS Lambda: an event-driven function, scale-to-zero
Lambda is an event-driven execution model where the function starts only when a request or asynchronous event arrives, and disappears when the processing ends. Zero billing during idle time — this is the core of scale-to-zero.
The constraints are also clear. The max execution time is 15 minutes (900 seconds). The payload limit for a synchronous call (via API Gateway/ALB) is 6MB, and asynchronous is 256KB. With a container image it's up to 10GB, but the execution-time constraint doesn't change. Memory is allocated in the range of 128MB–10,240MB (10GB), and vCPU increases in proportion to the allocated memory. Pricing is metered by the number of requests + GB-seconds (allocated memory × execution time).
The existence of cold starts also needs consideration. A VPC-internal Lambda in particular can reach a cold start of several hundred ms to several seconds for ENI provisioning, and for APIs with strict latency requirements, Provisioned Concurrency becomes necessary (an additional cost).
Where Lambda shines: traffic is irregular/spiky and on average mostly idle. It finishes within 15 minutes. The payload is small. You want to make the resident cost zero.
AWS App Runner: a fully-managed web app/API, minimal ops
App Runner is a fully-managed service that runs an HTTP/HTTPS web app/API just by handing it a container image (or source repository). The load-balancer configuration, ECS cluster, task definition, service definition, capacity provider — all these concepts are hidden.
Scaling is automatic, based on the number of concurrent connections. It shrinks to the min instance count you set, but doesn't go down to a complete zero (if you set min to 1 or more, the provisioned instance is billed even while idle). VPC connection is possible using a VPC connector, but fine-grained control of SGs/subnets isn't as flexible as ECS. Pricing is a combination of provisioned memory + active-time vCPU.
Where App Runner shines: you want to ship an HTTP web app / internal API with the fastest, most minimal ops. You can't spare time to write cluster management or IaC. Prototypes, MVPs, internal tools.
AWS Fargate (ECS): serverless, but long-running, arbitrary protocols, full VPC control
Fargate is the serverless launch type of ECS (Elastic Container Service). Just by specifying CPU and memory, server management is unnecessary, and there's no max-execution-time limit. It supports arbitrary TCP/UDP protocols, and with awsvpc mode you can fully control ENIs, SGs, and subnets.
Each task has an independent isolation boundary, not sharing the kernel, CPU, memory, or ENI with other tasks. Task CPU/memory is chosen from fixed pairs (.25–16 vCPU, up to 120GB). Pricing is metered by vCPU-seconds + memory-seconds (1-minute minimum).
The big strength is being able to handle, in the vocabulary of the same task definition, not only HTTP-service residency but also batch execution via EventBridge, SQS-queue-driven workers, and WebSocket residency. Fargate auto-scaling, CI/CD (blue-green), and cost optimization each have dedicated articles.
Where Fargate shines: every scene that needs long-running processing, WebSocket residency, arbitrary protocols, fine-grained VPC network control, and production-grade security isolation.
Comparison table: the three services from nine viewpoints
| Viewpoint | Lambda | App Runner | Fargate (ECS) |
|---|---|---|---|
| Launch model | event-driven (function) | resident (container) | resident or one-shot (container) |
| Max execution time | 15 min (900s) | no limit | no limit |
| Scale to zero | complete zero (no idle billing) | doesn't go to zero (min billing) | doesn't go to zero (min billing) |
| Supported protocols | HTTP/HTTPS-centric (ALB/API GW) | HTTP/HTTPS only | arbitrary TCP/UDP (WebSocket, gRPC, etc.) |
| VPC network control | VPC-internal execution possible (more cold starts) | VPC connector (partial) | full control (SG, subnet, ENI) |
| Concurrency | automatic per request (account limit exists) | automatic, concurrent-connection-based | task count (desiredCount + Auto Scaling) |
| Pricing model | requests + GB-seconds | provisioned memory + active vCPU | vCPU-seconds + memory-seconds (1-min minimum) |
| Operational weight | lightest (just write a function) | light (just place a container) | medium (need task definition, service, ALB, IaC) |
| Lock-in | high (the execution model is AWS-proprietary) | medium (App-Runner-specific settings increase) | low (the ECS API is close to the container standard) |
Constraint guideline: Lambda's synchronous payload limit is 6MB (per direction, not the sum of request/response). The execution-time limit (15 min) doesn't change even when using a container image.
The structural differences of the pricing models
Since the specific amounts vary greatly by usage volume, region, and commitment, here I organize only the structure of "when, for what, you're billed."
Lambda: only for what you use
Lambda's pricing has two axes: "number of requests" and "execution time (GB-seconds)." When the task is idle, billing is zero. For features that get only a few hundred to a few thousand requests a month, batches that run only at night, or workloads with unpredictable event volume, it's structurally advantageous.
However, using Provisioned Concurrency turns that portion into constant billing. Note that once you start using it to avoid cold starts, the cost model approaches Fargate's.
App Runner: even when idle, the min instances are billed
App Runner is constantly billed for the memory of provisioned instances, and vCPU is also billed during active execution. The longer the idle time, the relatively lower the cost-performance. Conversely, when there's constant traffic, the operational lightness of not paying the overhead of auto-scale management is worth the cost.
Fargate: per-second for the allocated amount
Fargate is billed per second (1-minute minimum) for the number of allocated vCPUs and the amount of memory while the task is running. Whether utilization is 100% or 10%, the allocated amount doesn't change, so excessive sizing is the biggest cost leak. Optimize with a combination of Graviton (ARM64), Fargate Spot, and Compute Savings Plans (details in the cost-optimization article, and for the overall view of FinOps, AWS cost optimization / FinOps).
Per-use-case instant answers
So you don't hesitate, here are the choices for each representative workload.
HTTP API / web app (always on)
- App Runner: the fastest to ship. No need to write ECS or ALB. If you don't need VPC control and only provide HTTPS, the first candidate.
- Fargate: when you need a DB/KMS inside a private VPC, detailed SGs, or gRPC/WebSocket. The prime choice when production-grade control is needed.
- Lambda: when traffic is extremely irregular and you don't want to pay an always-on cost. But beware of cold starts and payload limits.
cron / scheduled batch
- Lambda: the first candidate for a batch that completes within 15 minutes. The combination with EventBridge Scheduler is simple.
- Fargate: a batch over 15 minutes, or processing where memory during the batch exceeds several GB. Launch one-shot with EventBridge Scheduler + the
RunTaskAPI, and billing stops when it completes.
event-driven worker (SQS, SNS, EventBridge)
- Lambda: the standard when triggered by a queue. You can manage the polling settings (batch size, window, max concurrency) with a Lambda event source mapping.
- Fargate: when the processing time exceeds 15 minutes, the payload per message exceeds 6MB, or you want to run the SQS worker as an independent container. The payment platform's webhook processing is this pattern.
long-running processing (video transcoding, ML inference, large CSV)
- Fargate: close to the only choice. No max-execution-time limit, task memory up to 120GB, and billing stops after the batch completes.
- Lambda + SQS splitting works as a workaround, but the design complexity and debugging cost rise.
WebSocket residency (real-time, bidirectional communication)
- Fargate: can maintain a WebSocket connection for a long time via an NLB. The decisive point is its support for arbitrary TCP protocols.
- App Runner in principle presupposes an HTTP/HTTPS request-response model, and maintaining a long-lived connection isn't guaranteed.
- Lambda, combined with API Gateway WebSocket API, has a way to delegate connection management to API GW, but the design becomes asynchronous-event-centric and the complexity of state management increases.
ML inference / AI backend
- Short response time (within 5 seconds), stateless scoring: Lambda (if cold starts are acceptable).
- Inference batch without GPU and with several-GB memory, or long-running inference: Fargate.
- When a GPU (NVIDIA, etc.) is needed: since Fargate doesn't support GPU, the EC2 launch type or SageMaker Endpoints surface as out-of-scope conditions.
internal tool / admin panel
- App Runner: ship the fastest and start using it right away. Integration with Cognito is also simple.
- When authentication/network control is strict (e.g., can only be published inside a VPC): Fargate.
Migration possibility: the continuous ground when you mis-select
A selection isn't completely irreversible — you can migrate along a gradient.
Lambda → Fargate
The minimal step is rewriting the Lambda function into "an HTTP server that runs a container image." Using the Lambda Web Adapter (made by AWS), you can run an existing server like Express on Lambda almost as-is, and later use the same container image on Fargate too. However, because the difference in conception of the execution model (event-driven vs. resident) affects the code design, the design of idempotency, graceful shutdown, and connection pooling almost always needs to be rewritten.
App Runner → ECS (Fargate)
Since both App Runner and ECS on Fargate are based on a container image, rewriting the image itself is almost unnecessary. The main migration cost is the infrastructure layer — the work of writing the ECS task definition, service, ALB, SG, and IAM roles in IaC. Often you can migrate by leaving the container code untouched and arranging the network control and IAM, and "first prototype with App Runner → productionize with ECS" is a reasonable escalation path. However, if you've embedded App-Runner-specific environment-variable names or config files in the app, absorbing that diff increases.
Decision tree
By answering the following questions in order, you can narrow the choices.
Q1. Does the maximum execution time fit within 15 minutes?
├── No → Fargate (no limit)
└── Yes → go to Q2
Q2. Do you need a protocol other than HTTP/HTTPS (WebSocket, gRPC, UDP, etc.)?
├── Yes → Fargate (arbitrary TCP/UDP)
└── No → go to Q3
Q3. Do you need fine-grained SG control for resources inside a private VPC
(RDS, ElastiCache, etc.), or ENI-attached address management?
├── Yes → Fargate (full VPC control with awsvpc)
└── No → go to Q4
Q4. Is the traffic spiky/irregular, with much idle time expected?
├── Yes → Lambda (scale to zero, zero idle billing)
└── No → go to Q5
Q5. Do you want to skip the effort of writing ALB, ECS cluster, task definition, and IaC,
and ship an HTTP API to production the fastest?
├── Yes → App Runner (minimal ops)
└── No → Fargate (the widest room for future expansion)
Decision-aid notes:
- "When in doubt, Fargate" is roughly correct, but the cost and operational weight both rise. When traffic is small and idle is plentiful, Lambda is structurally advantageous.
- App Runner has the smallest control-plane management cost, but the moment "add a VPC connector later" or "want to change SG settings in detail" appears, you start to consider migrating to ECS. If you already know production-grade control is needed from the start, it's more honest to start from Fargate.
- For the ECS vs. EKS comparison (whether Kubernetes is needed), see the ECS vs EKS startup decision framework.
Concrete configuration examples
To feel the difference among the three, let's compare the volume of configuration in the same "ship Hello API to production" task.
App Runner: apprunner.yaml (minimal)
version: 1.0
runtime: nodejs20
build:
commands:
build:
- npm ci --omit=dev
- npm run build
run:
command: node dist/server.js
network:
port: 3000
env: APP_PORT
env:
- name: NODE_ENV
value: production
That's it. App Runner hides the ALB, the ECS service, the SGs, and the IAM roles. If you tie it to a source repository, it auto-deploys on every push. The VPC connector, secrets, and custom domain are added in the console or as Terraform resource settings.
Lambda: handler vs. Fargate: resident HTTP server (contrast)
// Lambda ハンドラ — イベント駆動、呼ばれるたびに起動
import type { APIGatewayProxyHandlerV2 } from "aws-lambda";
export const handler: APIGatewayProxyHandlerV2 = async (event) => {
const body = JSON.parse(event.body ?? "{}");
return {
statusCode: 200,
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message: "ok", input: body }),
};
};
// 接続プールは起動ごとに再利用(コンテナが温かければ)
// ただし実行時間は 15 分で打ち切られる
// Fargate 上の常駐 HTTP サーバー — 起動後ずっと待機
import http from "node:http";
import { installGracefulShutdown } from "./graceful-shutdown.js";
const server = http.createServer((req, res) => {
if (req.url === "/healthz") {
res.writeHead(200).end("ok");
return;
}
let body = "";
req.on("data", (chunk) => { body += chunk; });
req.on("end", () => {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ message: "ok", input: JSON.parse(body || "{}") }));
});
});
server.listen(8080, () => console.info({ msg: "listening", port: 8080 }));
// SIGTERM を握ってデプロイ時に綺麗に終わる(Fargate の本番必須)
installGracefulShutdown(server, {
drainMs: 50_000, // stopTimeout(60s) より短く
onClose: async () => {
await db.end(); // DB コネクションプールを閉じる
await queue.close(); // SQS コンシューマを止める
},
});
// 実行時間の制限なし。WebSocket や gRPC も同じプロセスで扱える
On Fargate, because the process is resident, you establish the DB connection pool just once at startup and reuse it across all requests. Lambda can reuse the container on a warm start too, but that's best-effort and not guaranteed. If you need connection pooling as a resident service, Fargate is the prime choice.
The minimal skeleton of a Fargate task definition (JSON)
For Fargate, the configuration equivalent to App Runner's apprunner.yaml is the task definition. This volume shows the difference in "operational weight" from App Runner.
{
"family": "hello-api",
"requiresCompatibilities": ["FARGATE"],
"networkMode": "awsvpc",
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/hello-api-exec",
"taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/hello-api-task",
"containerDefinitions": [
{
"name": "app",
"image": "ACCOUNT_ID.dkr.ecr.ap-northeast-1.amazonaws.com/hello-api:abc1234",
"essential": true,
"portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:ap-northeast-1:ACCOUNT_ID:secret:prod/db"
}
],
"stopTimeout": 60,
"healthCheck": {
"command": ["CMD-SHELL", "wget -q -O - http://localhost:8080/healthz || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/hello-api",
"awslogs-region": "ap-northeast-1",
"awslogs-stream-prefix": "app"
}
}
}
]
}
On top of this, you further need to write the ECS service, ALB, target group, SGs, and IAM policies in Terraform. In exchange for this weight, what you get is production-grade control: full VPC control, a deployment circuit breaker, Fargate Spot mixing, ECS Exec, and more.
Selection examples in a real project
Let me show how I use them differently in my practice, for reference.
At the METI-Minister's-Award-winning lumber-distribution B2B SaaS:
- HTTP services (221 endpoints) → Fargate: because connections to RDS (PostgreSQL)/ElastiCache inside the VPC, JWT middleware for Cognito RS256 verification, and detailed SG control were needed.
- Stripe webhook processing → Fargate (SQS worker): it absorbs webhook duplication/out-of-order with an idempotency key, and since processing one message includes a DB transaction it fits within the 15-minute limit, but I chose Fargate prioritizing the stability of the connection pool and the strictness of graceful shutdown.
- Lightweight notification-email sending → Lambda (via EventBridge Pipes): processing triggered by DynamoDB Streams that completes in a few seconds. Idle time is long, and the benefit of scale-to-zero is large.
In my way of building fast, cheap, and safe with "one person × generative AI (Claude Code)," the accuracy of selection determines the speed of the downstream process. A human holding a verification gate, and choosing the right layer from the selection stage, is the biggest risk hedge against later rebuilds.
Summary: the starting point of selection is three questions
The selection of AWS Fargate, Lambda, and App Runner is decided by the following three questions.
- Is the max execution time within 15 minutes? → If No, Fargate is the only choice.
- Do you need a protocol other than HTTP/HTTPS, or fine-grained VPC control? → If Yes, Fargate.
- Is traffic spiky with much idle, and do you want minimal ops? → Lambda or App Runner.
"When in doubt, Fargate" is easy to change later, but the cost and operational weight rise. "First ship with Lambda/App Runner and migrate when stuck" becomes technical debt as a migration cost. Investing accuracy in the first selection protects the team's long-term speed and stability.
The implementation details of production Fargate (task design, networking, deployment, security, cost optimization) are summarized in the AWS ECS on Fargate production guide. If you want to advance the design, selection, and construction of a container platform fast, cheap, and safe, please consult me.