Skip to main content
友田 陽大
ECS on Fargate in production
AWS
ECS
Fargate
awsvpc
ALB
Service Connect
VPC
ネットワーク
Terraform
インフラ

ECS on Fargate Networking Design Complete Guide: Building awsvpc, ALB/NLB, Service Connect, and VPC Endpoints at Production Quality

Systematizing ECS Fargate networking design in real Terraform code, from the essence of awsvpc through ALB/NLB connection, security-group chaining, VPC-endpoint isolation, to service-to-service communication with Service Connect.

Published
Reading time
18 min read
Author
友田 陽大
Share

In a production build of ECS on Fargate, the layer you get stuck on most is the networking layer. The task starts but won't connect to the ALB, can't pull the image from ECR, service-to-service communication is unstable—at the root of these problems is a lack of understanding of Fargate's own awsvpc networking mode.

On a Minister of Economy, Trade and Industry Award-winning lumber-distribution B2B SaaS, I designed, implemented, and operated in production a configuration of API Gateway → NLB → ALB → ECS on Fargate (221 endpoints, private-subnet operation). In stabilizing this stack, networking design was the biggest differentiating factor.

This article shows, end-to-end, the design decisions and Terraform implementation for building at production quality, from the essence of awsvpc through ALB/NLB connection, security-group chaining, private-subnet design, VPC-endpoint isolation, to service-to-service communication with Service Connect. For the Fargate basics (task definitions, deployment, cost, security), read the ECS on Fargate production-operations guide first. This piece specializes in the networking layer.


The whole picture: where does a request pass through?

First, grasp the actual request path with an architecture diagram.

Internet
      │
      ▼
┌─────────────────────────────────────────────┐
│  Public Subnet (ap-northeast-1a / 1c)       │
│  ┌──────────────┐    ┌──────────────────┐   │
│  │  ALB         │    │  NAT Gateway     │   │
│  │  (SG: alb)   │    │  (exit for the   │   │
│  └──────┬───────┘    │   private        │   │
│         │           │   subnet)        │   │
└─────────│───────────└──────────────────┘───┘
          │                    ▲
          │ target_type=ip     │ outbound traffic
          ▼                    │
┌─────────────────────────────────────────────┐
│  Private Subnet (ap-northeast-1a / 1c)      │
│  ┌──────────────────────────────────────┐   │
│  │  ECS Task (ENI + Private IP)         │   │
│  │  SG: task (from alb-sg:8080 only)   │   │
│  │  ┌────────────┐  ┌────────────────┐ │   │
│  │  │  app:8080  │  │ sidecar(Envoy) │ │   │
│  │  └────────────┘  └────────────────┘ │   │
│  └──────────────────────────────────────┘   │
│                                             │
│  VPC Endpoints (Interface / Gateway)        │
│  ecr.api / ecr.dkr / s3 / logs /           │
│  secretsmanager / ssmmessages              │
└─────────────────────────────────────────────┘
          │
          ▼
  AWS services (ECR / CloudWatch / Secrets Manager)

In the lumber-distribution SaaS, API Gateway → NLB is added even before this (see the lumber-industry-dx case study), separating the responsibilities of external publishing and internal L7 routing. This article targets the ECS networking layer from the ALB onward.


The essence of awsvpc: what does it mean for a task to have an ENI?

Fargate is fixed to networkMode: awsvpc. You can't choose the other networking modes (bridge, host). This is not a constraint but a structural consequence of the per-task isolation boundary Fargate provides.

Each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.(— AWS Fargate

In awsvpc mode:

  • A dedicated ENI (Elastic Network Interface) is assigned per task
  • The ENI is given a private IP address within the subnet
  • There is no host port mapping. There's no concept like EC2's bridge mode of "forward the host's port 80 to the container's port 8080"

This consequence of "no host port mapping" directly bears on the ALB configuration.

Why the ALB must be target_type="ip"

When registering a normal EC2 instance as an ALB target, you use target_type="instance". This sends traffic by instance ID and the EC2-side port forwarding does the rest.

A Fargate task has no concept of a host. The task exists only as a private IP tied to an ENI. So the ALB must register targets by IP address directly. This is the reason for target_type="ip".

resource "aws_lb_target_group" "app" {
  # ...
  target_type = "ip"   # Fargate では必ずこれ。"instance" は機能しない
}

Leaving it as target_type="instance" makes ALB target registration fail, or the task starts but the target stays unhealthy. It's a representative mistake you get stuck on in production.


Load-balancer connection: how to choose ALB vs NLB

A Fargate service supports ALB, NLB, and GWLB. The practical decision criteria are as follows.

AspectALB (L7)NLB (L4)
ProtocolHTTP/HTTPSTCP / UDP / TLS
RoutingPath-based, host header, HTTP methodIP + port only
SSL terminationThe ALB handles itNLB pass-through or TLS termination
WebSocketSupportedSupported (TCP)
UDPNot supportedSupported (PV 1.4+)
Static IPNot supported (DNS only)Supported (can assign EIP)
Typical useREST API, web appgRPC, games, IoT, internal NLB→ALB multi-tier

In the lumber-distribution SaaS, I adopt a two-tier configuration of NLB → ALB → ECS. It's a pattern of assigning a static IP to the NLB to make it API Gateway's private integration endpoint, and doing HTTP routing on the ALB. For a service that's mainly a REST API, a single ALB tier is usually enough.

Complete ALB Terraform: LB + target group + SG chain + service

# ── ALB 本体 ──────────────────────────────────────────────────────────────

resource "aws_lb" "app" {
  name               = "prod-alb"
  internal           = false   # パブリック向け。内部 ALB なら true
  load_balancer_type = "application"
  subnets            = var.public_subnet_ids
  security_groups    = [aws_security_group.alb.id]

  # アクセスログを S3 へ(本番必須)
  access_logs {
    bucket  = var.alb_log_bucket
    prefix  = "alb/prod"
    enabled = true
  }
}

# ── ALB セキュリティグループ ───────────────────────────────────────────────

resource "aws_security_group" "alb" {
  name_prefix = "prod-alb-"
  vpc_id      = var.vpc_id
  lifecycle { create_before_destroy = true }
}

# インターネットから HTTPS を受ける
resource "aws_vpc_security_group_ingress_rule" "alb_https" {
  security_group_id = aws_security_group.alb.id
  ip_protocol       = "tcp"
  from_port         = 443
  to_port           = 443
  cidr_ipv4         = "0.0.0.0/0"
}

resource "aws_vpc_security_group_ingress_rule" "alb_https_v6" {
  security_group_id = aws_security_group.alb.id
  ip_protocol       = "tcp"
  from_port         = 443
  to_port           = 443
  cidr_ipv6         = "::/0"
}

# ALB → タスクへのアウトバウンド(タスクの SG と対になる)
resource "aws_vpc_security_group_egress_rule" "alb_to_tasks" {
  security_group_id            = aws_security_group.alb.id
  referenced_security_group_id = aws_security_group.task.id
  ip_protocol                  = "tcp"
  from_port                    = 8080
  to_port                      = 8080
}

# ── タスクセキュリティグループ:ALB の SG からのみ受ける ──────────────────

resource "aws_security_group" "task" {
  name_prefix = "prod-task-"
  vpc_id      = var.vpc_id
  lifecycle { create_before_destroy = true }
}

# インバウンド:ALB の SG からアプリポートのみ。0.0.0.0/0 は開けない
resource "aws_vpc_security_group_ingress_rule" "task_from_alb" {
  security_group_id            = aws_security_group.task.id
  referenced_security_group_id = aws_security_group.alb.id
  ip_protocol                  = "tcp"
  from_port                    = 8080
  to_port                      = 8080
}

# アウトバウンド:外向き全開(NAT 経由で ECR / CloudWatch / Secrets Manager へ)
# VPCエンドポイントを使う場合は HTTPS(443) のみに絞ってもよい
resource "aws_vpc_security_group_egress_rule" "task_egress" {
  security_group_id = aws_security_group.task.id
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}

# ── ターゲットグループ:Fargate は必ず target_type = "ip" ─────────────────

resource "aws_lb_target_group" "app" {
  name        = "prod-app"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"   # ← Fargate の核心。絶対に "instance" にしない

  # 接続ドレイン(タスク交代時の in-flight を待つ時間)
  # デフォルト 300 秒は過剰。stopTimeout と合わせて短くする
  deregistration_delay = 60

  health_check {
    path                = "/healthz"
    protocol            = "HTTP"
    port                = "traffic-port"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 15
    timeout             = 5
    matcher             = "200"
  }
}

# ── HTTPS リスナー ─────────────────────────────────────────────────────────

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.acm_certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

# HTTP → HTTPS リダイレクト
resource "aws_lb_listener" "http_redirect" {
  load_balancer_arn = aws_lb.app.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

# ── ECS サービス ───────────────────────────────────────────────────────────

resource "aws_ecs_service" "app" {
  name             = "web-api"
  cluster          = var.cluster_id
  task_definition  = var.task_definition_arn
  desired_count    = 2
  launch_type      = "FARGATE"
  platform_version = "LATEST"

  # デプロイ中は常に 2 タスク健全に保ち、最大 4 タスクまで起動してから旧版を落とす
  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent         = 200

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  # プライベートサブネットに配置。パブリック IP は不要
  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false   # プライベートサブネット + NAT の場合は false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }

  # デプロイ直後のヘルスチェック誤検知を防ぐ猶予時間
  # コンテナの startPeriod と合わせて設定する
  health_check_grace_period_seconds = 30

  enable_execute_command = true   # ECS Exec によるブレークグラス
}

The roles of health_check_grace_period_seconds and deregistration_delay

These two are often confused, but they're in different phases.

  • health_check_grace_period_seconds: the grace period before the ALB starts health-checking right after a task starts. Prevents the task being judged unhealthy and force-killed before the app's initialization (establishing DB connections, cache warm-up) completes.
  • deregistration_delay: the time to wait for in-flight processing when deregistering a task from the ALB target. It fires on every deploy, scale-in, and task stop. The default is 300 seconds, but since most APIs finish in seconds, it's excessive. Aligning it with stopTimeout to 30–60 seconds is realistic.

Security-group chaining: a design that doesn't open 0.0.0.0/0

The most important security design in Fargate networking is security-group chaining.

[Internet]
      │ TCP 443
      ▼
[alb-sg]  ── egress: only task-sg:8080
      │ TCP 8080
      ▼
[task-sg] ── ingress: allow only from alb-sg
      │
[task ENI:8080]

The point is to use SG references (referenced_security_group_id) rather than CIDRs of a single SG. A CIDR-based allow causes config gaps when IPs change, but an SG reference dynamically allows traffic from "resources with that SG attached," so it automatically follows the ALB scaling out (adding IPs) too.

You must not open 0.0.0.0/0 on the task's SG inbound. Do so, and even though you're in a private subnet, other resources within the VPC can access it directly.

The same principle applies when there are multiple services communicating with each other. Chain "service A's SG → service B's SG (app port only)." Using Service Connect described later, this SG design can be further organized.


Subnet design: private + NAT vs public direct placement

Private subnet + NAT Gateway (the production standard)

The production standard is the pattern of placing tasks in a private subnet and routing outbound traffic via a NAT Gateway.

Private subnet
  └── ECS task (assign_public_ip = false)
        └── → NAT Gateway (public subnet)
              └── → Internet (ECR / CloudWatch / Secrets Manager)

The benefit is minimizing the attack surface. Because the task has no public IP, it can't be reached directly from the internet. Only requests via the ALB arrive.

Public direct placement (assign_public_ip = ENABLED)

In dev environments or prototypes, there's also the choice of placing tasks in a public subnet with assign_public_ip = true. Because the task gets a public IP, it can reach ECR and CloudWatch without a NAT. There's no NAT Gateway cost (data processing fee + hourly charge).

But it's not recommended for production. The task's public IP is directly exposed to the internet, and the moment you misconfigure an SG the risk grows. Also, because Fargate's platform requires a public subnet to start with assign_public_ip = ENABLED, the VPC design becomes complex.

The decision guideline: production is private + NAT, the only choice. If the NAT cost worries you, take the direction of reducing it with the VPC endpoints described later.


VPC endpoints: reduce the NAT, or isolate

When a Fargate task in a private subnet pulls an image from ECR or sends logs to CloudWatch, by default it communicates with public endpoints via the NAT Gateway. Set up VPC endpoints and you can close this communication within the VPC.

The list of VPC endpoints Fargate needs

EndpointTypeUseNecessity
com.amazonaws.<region>.ecr.apiInterfaceECR API (fetching image metadata)Required when using ECR
com.amazonaws.<region>.ecr.dkrInterfacePulling image layers from ECR (Docker Registry API)Required when using ECR
com.amazonaws.<region>.s3GatewayECR image layers are stored in S3. ecr.dkr alone is insufficientRequired when using ECR
com.amazonaws.<region>.logsInterfaceSending to CloudWatch Logs via the awslogs driverRequired when using CloudWatch Logs
com.amazonaws.<region>.secretsmanagerInterfaceSecrets Manager injection via secrets[].valueFromRequired when injecting secrets
com.amazonaws.<region>.ssmmessagesInterfaceECS Exec (via SSM Session Manager)Required when ECS Exec is enabled

Caveat: if you set up only the ECR endpoints (ecr.api, ecr.dkr) and omit the S3 gateway endpoint, image-layer pulls go via the NAT. Because ECR image layers are stored in S3, the S3 gateway is required as a set with the ECR endpoints.

The Fargate task's own ECS control-plane communication (ecs, ecs-agent, ecs-telemetry) works without endpoints, but ECR, Secrets Manager, and CloudWatch Logs traffic flows to public endpoints unless you set them up.

Terraform: the full set of VPC endpoints

# ── Interface エンドポイント用 SG ─────────────────────────────────────────

resource "aws_security_group" "vpc_endpoints" {
  name_prefix = "vpc-endpoints-"
  vpc_id      = var.vpc_id
  lifecycle { create_before_destroy = true }
}

# プライベートサブネットからのみ HTTPS を受け付ける
resource "aws_vpc_security_group_ingress_rule" "endpoint_https" {
  security_group_id = aws_security_group.vpc_endpoints.id
  ip_protocol       = "tcp"
  from_port         = 443
  to_port           = 443
  cidr_ipv4         = var.vpc_cidr
}

resource "aws_vpc_security_group_egress_rule" "endpoint_egress" {
  security_group_id = aws_security_group.vpc_endpoints.id
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}

# ── S3 ゲートウェイエンドポイント(ECR レイヤ pull に必須) ──────────────

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = var.private_route_table_ids
}

# ── Interface エンドポイント群 ────────────────────────────────────────────

locals {
  interface_endpoints = [
    "ecr.api",
    "ecr.dkr",
    "logs",
    "secretsmanager",
    "ssmmessages",
  ]
}

resource "aws_vpc_endpoint" "interface" {
  for_each = toset(local.interface_endpoints)

  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.${each.value}"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true   # DNS 解決を VPC 内で完結させる
}

private_dns_enabled = true is important. This resolves public DNS names like ecr.ap-northeast-1.amazonaws.com to the in-VPC ENI IPs, so you can isolate without changing the app or task definition at all.

Does the NAT Gateway become unnecessary?

Even setting up all the VPC endpoints, you can't necessarily abolish the NAT Gateway completely. If the task calls non-AWS external APIs (Stripe, SendGrid, etc.), you still need the NAT. If your use case is "purely AWS services only," a NAT-less isolated configuration is possible, but it's a rare case.

The realistic decision: the production standard is to keep the NAT Gateway while dropping ECR, CloudWatch, and Secrets Manager traffic internally with VPC endpoints, reducing the NAT's data-processing cost and bandwidth dependence.


In a microservices configuration, you need internal communication where service A calls service B. There are several choices.

MethodMechanismProsCons
Internal ALBStand up an internal ALB for each serviceCan use the ALB's routing featuresMore resources, cost, management complexity
Cloud Map (DNS)DNS resolution with a Route 53 private hosted zoneSimpleNo retries, timeouts, or metrics
Service ConnectInject Envoy as a sidecar, resolve by logical nameDNS + automatic retries + metrics. ECS-managedPremised on communication within the same ECS cluster

Service Connect is recommended. It achieves loose coupling without adding internal ALBs, and Envoy automatically collects connection metrics and feeds them to the CloudWatch Namespace AWS/ECS/ManagedScaling, also improving observability.

How Service Connect works

Amazon ECS Service Connect provides management of service-to-service communication as Amazon ECS configuration. It builds both service discovery and a service mesh in Amazon ECS.(— ECS Service Connect

Service Connect has ECS automatically inject an Envoy proxy as a sidecar (you don't need to explicitly add it to the container definition). The client-side task can call by logical name (http://order-api:8080/orders), and Envoy handles name resolution, load balancing, retries, and timeouts.

Configuration in Terraform + task definition

Service Connect is configured in the ECS service's service_connect_configuration block.

# ── サービス A(order-api)が Service Connect でサービスを公開する ──────────

resource "aws_ecs_service" "order_api" {
  name             = "order-api"
  cluster          = var.cluster_id
  task_definition  = var.order_api_task_definition_arn
  desired_count    = 2
  launch_type      = "FARGATE"
  platform_version = "LATEST"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = var.cloud_map_namespace_arn   # 事前に aws_service_discovery_http_namespace で作成

    # このサービスが公開するエンドポイント
    service {
      port_name      = "http"            # タスク定義の portMappings[].name と一致させる
      discovery_name = "order-api"       # DNS 名として使われる論理名
      client_alias {
        port     = 8080
        dns_name = "order-api"           # 他タスクはこれで到達できる
      }
    }
  }
}

# ── サービス B(inventory-api)が order-api を呼ぶ側 ──────────────────────

resource "aws_ecs_service" "inventory_api" {
  name             = "inventory-api"
  cluster          = var.cluster_id
  task_definition  = var.inventory_api_task_definition_arn
  desired_count    = 2
  launch_type      = "FARGATE"
  platform_version = "LATEST"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.task.id]
    assign_public_ip = false
  }

  service_connect_configuration {
    enabled   = true
    namespace = var.cloud_map_namespace_arn

    # クライアント側は service ブロックを省略(公開しない場合)
    # 自動注入された Envoy が order-api:8080 への通信を仲介する
  }
}

On the task-definition side, you need to give portMappings a name.

{
  "containerDefinitions": [
    {
      "name": "app",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp",
          "name": "http",
          "appProtocol": "http"
        }
      ]
    }
  ]
}

With this, when you throw an HTTP request to http://order-api:8080/orders from inside inventory-api, Envoy acts as an interceptor and automatically handles connection pooling, retries, and timeouts.

Organize service-to-service SGs with Service Connect

Even when using Service Connect, SG design is still needed. When services communicate within the same cluster, allow the relevant port from the sending task's SG to the receiving task's SG.

# inventory-api → order-api への通信を許可
resource "aws_vpc_security_group_ingress_rule" "order_from_inventory" {
  security_group_id            = aws_security_group.order_task.id
  referenced_security_group_id = aws_security_group.inventory_task.id
  ip_protocol                  = "tcp"
  from_port                    = 8080
  to_port                      = 8080
}

When to use Cloud Map instead

Cloud Map is simple DNS-based service discovery using a Route 53 private hosted zone. It updates A records each time an ECS service starts/stops.

  • If simple DNS resolution alone is enough with no need for retries, timeouts, or metrics, Cloud Map suffices
  • If you want connection observability, circuit breakers, or fine-grained timeout control, choose Service Connect

For a new build, I recommend Service Connect. It has more configuration than Cloud Map, but Envoy's metrics (connection count, error rate, latency) are directly usable for production observation.


Caveats when using an NLB

When you need L4 communication (gRPC, WebSocket over TCP, UDP) or a static IP, use an NLB. Let me summarize the Fargate + NLB-specific caveats.

resource "aws_lb" "internal" {
  name               = "prod-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids
}

resource "aws_lb_target_group" "nlb_app" {
  name        = "nlb-app"
  port        = 8080
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "ip"   # NLB でも Fargate は ip 必須

  deregistration_delay = 30

  health_check {
    protocol            = "HTTP"
    path                = "/healthz"
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }
}

When using UDP, platform_version = "LATEST" (= 1.4.0) is required. Earlier platform versions don't support UDP.

Also, because an NLB preserves the client IP, you need to allow traffic from the NLB's subnet CIDR on the task's SG (since you can't do an SG reference like with an ALB, it's a CIDR allow).

# NLB のクライアント IP 保持によりサブネット CIDR を許可する
resource "aws_vpc_security_group_ingress_rule" "task_from_nlb_cidr" {
  security_group_id = aws_security_group.task.id
  ip_protocol       = "tcp"
  from_port         = 8080
  to_port           = 8080
  cidr_ipv4         = var.vpc_cidr   # NLB を置いたサブネットの CIDR
}

To filter traffic with a WAF, you need a combination with an ALB. For details, see the WAF defense-in-depth guide.


Design checklist

Items to always confirm before releasing to production.

awsvpc / ALB basics

  • Is the task definition's networkMode awsvpc (fixed for Fargate, but confirm explicitly)
  • Is the ALB/NLB target group's target_type = "ip"
  • Do the ALB and Fargate task have separate SGs, chained with SG references
  • Is there no 0.0.0.0/0 on the task's SG inbound
  • Is health_check_grace_period_seconds set, accounting for the app's initialization time
  • Is deregistration_delay shortened to align with stopTimeout (the default 300 seconds is excessive)

Subnet / routing

  • Are tasks placed in a private subnet with assign_public_ip = false
  • Does a NAT Gateway exist in each AZ (a single-AZ NAT is a SPOF)
  • Does the private route table have a route to the NAT Gateway

VPC endpoints

  • If using ECR, are the 3-piece set of ecr.api, ecr.dkr, and the S3 gateway in place
  • If using CloudWatch Logs, is there a logs endpoint
  • If using secrets[].valueFrom, is there a secretsmanager endpoint
  • If enabling ECS Exec, is there an ssmmessages endpoint
  • Does the Interface endpoints' SG allow TCP 443 from the private subnet CIDR
  • Is private_dns_enabled = true

Service Connect / service-to-service

  • If there are multiple services, are they resolved with Service Connect or Cloud Map without adding internal ALBs
  • If using Service Connect, is a Cloud Map namespace (HTTP namespace) created
  • Are name and appProtocol set on the task definition's portMappings
  • Is mutual communication allowed between services' SGs (chained with SG references)

Summary

Fargate's networking unfolds entirely from the single point of awsvpc.

  • awsvpc + target_type=ip is the absolute rule to grasp first
  • With SG chaining (ALB SG → task SG), never open 0.0.0.0/0 to the task
  • Private subnet + NAT is the safe side for production. Isolate communication to major AWS services with VPC endpoints to cut cost and attack surface
  • With Service Connect, make service-to-service communication loosely coupled and prevent the proliferation of internal ALBs

The reason I can stably operate a 221-endpoint lumber-distribution SaaS with API Gateway → NLB → ALB → ECS on Fargate is that I'm thorough with this design at each layer. Build the networking layer correctly, and you can clearly separate app-layer problems from infra-layer problems, and the speed of troubleshooting goes up too.

For cost optimization (Spot, Graviton, Savings Plans), see the ECS on Fargate cost-optimization guide; for investigating the cause when a task stops, the ECS on Fargate troubleshooting guide. The full configuration of this award-winning SaaS on this portfolio is introduced in detail in the lumber-industry-dx case study. If you'd like to move forward together on designing and building a Fargate production foundation, please reach out from there.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading