# Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

> A production operations guide for Azure Container Apps faithful to the official Microsoft Learn docs. From the configuration of environments, revisions, and replicas, to zero-scaling with KEDA, Ingress (automatic HTTPS, 240-second timeout), graceful shutdown via SIGTERM, managed identities and Key Vault references, and Consumption/Dedicated cost design — systematized with Bicep, Terraform, az CLI, and real code.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: Azure, Container Apps, コンテナ, サーバーレス, KEDA, インフラ, コスト最適化, 可観測性
- URL: https://tomodahinata.com/en/blog/azure-container-apps-production-guide
- Category: Azure Container Apps in production

## Key points

- Azure Container Apps is a serverless foundation for 'running containers without worrying about server configuration, container orchestration, or deployment details.' It's built on Kubernetes, KEDA, Dapr, and Envoy, but does not expose the Kubernetes API directly
- The keys to production quality are four: 'proper scale rules with KEDA (HTTP/event-driven) and avoiding the zero-scale trap,' 'understanding automatic HTTPS and the 240-second boundary of Ingress alone,' 'graceful shutdown that handles SIGTERM within 30 seconds,' and 'erasing credentials from code with managed identity + Key Vault references'
- A revision is an immutable snapshot. Single-revision mode does zero-downtime automatic switchover; multiple-revision mode realizes Blue/Green, canary, and A/B via traffic splitting
- Cost is metered by vCPU-seconds, GiB-seconds, and HTTP requests. Each month, per subscription, 180,000 vCPU-seconds, 360,000 GiB-seconds, and 2 million requests are free, and you aren't billed during zero-scale. When running with minimum replicas, the idle rate applies
- Networking defaults to a workload-profile environment (/27, UDR, NAT Gateway, Private Endpoint support). Choose your cost and hardware via the Consumption/Dedicated/Flex profiles

---

"I want to run containers in production. But I don't want to deal with the hassle of a Kubernetes cluster, and I can't spare time for patching nodes or scaling either" — when you assemble a production container foundation in a startup or as a solo developer, you almost always arrive here. Azure's answer to that is **Azure Container Apps (ACA)**.

I've run serverless containers in production on top of AWS Fargate. Both the [economic-ministry-award-winning lumber-distribution B2B SaaS](/case-studies/lumber-industry-dx) (an `API Gateway → NLB → ALB → ECS` setup with 221 APIs) and the worker fleet of the [payment platform with zero double-charges in production](/case-studies/payment-platform-reliability) run on containers without touching a single server. Azure Container Apps is the Azure version of that philosophy. The crux of "after choosing Fargate on AWS, how do you build for production" ports to ACA almost as-is.

This article aims to be **faithful to the [official Microsoft Learn docs](https://learn.microsoft.com/en-us/azure/container-apps/overview), yet clearer than the official docs**, and to show with real code "where and how to use it." From environment design, scaling, Ingress, revisions, resilience, security, to cost, it covers end-to-end what's needed to ship to production. A direct comparison with AWS Fargate is gathered in a separate article, [Azure Container Apps vs. AWS ECS on Fargate: A Thorough Serverless Container Comparison](/blog/azure-container-apps-vs-aws-ecs-fargate-serverless-container-comparison-guide). This piece concentrates on **"after choosing ACA, how do you build for production."**

---

## What Is Azure Container Apps: The Official Definition

The official definition is simple.

> Azure Container Apps is a serverless platform that allows you to maintain less infrastructure and save costs while running containerized applications. Instead of worrying about server configuration, container orchestration, and deployment details, Container Apps provides all the up-to-date server resources required to keep your applications stable and secure.（— [Azure Container Apps overview](https://learn.microsoft.com/en-us/azure/container-apps/overview)）

In other words, ACA is **a serverless foundation for concentrating only on running containers, without worrying about server configuration, container orchestration, or deployment details**. OS patches, Kubernetes version upgrades, node scaling — the platform side takes care of all of it.

The representative use cases the official docs list are four.

- **Deploying API endpoints** (HTTP APIs, web apps)
- **Hosting background processing jobs** (batch)
- **Event-driven processing** (queue-/event-source-triggered)
- **Running microservices**

### What It's Built On

ACA is not proprietary technology — it's **a managed layer built on proven OSS**. This is reassurance that "you can always step down, contiguously, into the Kubernetes world."

> Powered by Kubernetes and open-source technologies like [Dapr](https://dapr.io/), [KEDA](https://keda.sh/), and [envoy](https://www.envoyproxy.io/).（— [Comparing Container Apps with other Azure container options](https://learn.microsoft.com/en-us/azure/container-apps/compare-options)）

- **Kubernetes**: the foundation of orchestration. But, as noted below, it doesn't let you touch the API directly.
- **KEDA** (Kubernetes Event-driven Autoscaling): the brain of scaling. Scales on diverse metrics like HTTP, queues, and CPU/memory.
- **Dapr** (Distributed Application Runtime): abstracts microservice service invocation, state management, and Pub/Sub.
- **Envoy**: the HTTP proxy at the edge. Handles TLS termination and routing.

And the most important boundary line is this.

> Azure Container Apps doesn't provide direct access to the underlying Kubernetes APIs. If you require access to the Kubernetes APIs and control plane, you should use [Azure Kubernetes Service](https://learn.microsoft.com/en-us/azure/aks/what-is-aks).（— [compare-options](https://learn.microsoft.com/en-us/azure/container-apps/compare-options)）

**ACA does not expose the Kubernetes API directly.** No CRDs, no kubectl, no Helm. This "concealment" is precisely ACA's value. Without operational Kubernetes knowledge, you get a managed experience based on Kubernetes best practices. Conversely, if you want to swap the `Ingress` controller, run a `DaemonSet`, or install an Operator — i.e., **if you need the Kubernetes control plane itself, choose AKS over ACA**.

---

## When to Use It: Comparison with Other Azure Container Options

Azure has multiple options to "run containers." Let me organize the official [comparison doc](https://learn.microsoft.com/en-us/azure/container-apps/compare-options) into a form usable for decisions.

| Service | In one line | When to choose it |
|---------|------------|------------|
| **Azure Container Apps** | Serverless container / microservices / job foundation | Run general containers, microservices, event-driven workloads, and jobs **without K8s operations**. **When in doubt, here.** |
| **Azure Kubernetes Service (AKS)** | Fully managed Kubernetes | When you want **direct access** to the Kubernetes API and control plane. Any K8s workload. |
| **Azure App Service** | PaaS for web apps | Optimized for websites and web APIs. **If web-centric**, this. |
| **Azure Container Instances (ACI)** | A low-level building block for a single Pod | When you assemble scale, LB, and certificates **yourself**. When you want a lower-level building block. |
| **Azure Functions** | Serverless FaaS | An event-driven **function programming model**. Triggers + bindings. |

The official explanation of the difference between ACI and ACA is excellent.

> Concepts like scale, load balancing, and certificates aren't provided with ACI containers. For example, to scale to five container instances, you create five distinct container instances. Azure Container Apps provide many application-specific concepts on top of containers, including certificates, revisions, scale, and environments.（— [compare-options](https://learn.microsoft.com/en-us/azure/container-apps/compare-options)）

ACI is material at the level of "if you want to scale to 5 instances, you create 5 instances yourself." ACA is a high-completeness foundation that layers app-operation concepts — **certificates, revisions, scale, environments** — on top of it.

> **A correspondence table for those with AWS experience**: `ACA ≈ AWS Fargate / App Runner`, `AKS ≈ EKS`, `ACI ≈ ECS one-off task (RunTask)`, `Azure Functions ≈ AWS Lambda`. The Fargate feeling of "write a task definition and leave it to a Service" is almost the same as ACA's "define a container and leave it to an Environment." For details, go to the [cross-cloud comparison article](/blog/azure-container-apps-vs-aws-ecs-fargate-serverless-container-comparison-guide).

---

## The Core Building Blocks: The Relationship of Four Players

ACA has many terms and confuses you at first. The essence is four.

```text
Environment（環境：安全な境界。複数アプリ/ジョブをまとめ、VNetとログ先を共有）
└── Container App（アプリ：1つのサービス。望ましい状態を宣言する単位）
    └── Revision（リビジョン：アプリの不変スナップショット＝1つのバージョン）
        └── Replica（レプリカ：スケールで増減する実行インスタンス）
            └── Container（あなたのアプリのイメージ。＋任意のsidecar/init）
```

### Environment = A Secure Boundary

> A Container Apps environment is a secure boundary around one or more container apps and jobs. The Container Apps runtime manages each environment by handling OS upgrades, scale operations, failover procedures, and resource balancing.（— [Azure Container Apps environments](https://learn.microsoft.com/en-us/azure/container-apps/environment)）

An environment is a **security boundary**. The runtime handles OS upgrades, scaling, failover, and resource balancing entirely. And,

> When multiple container apps are in the same environment, they share the same virtual network and write logs to the same logging destination.

**Apps in the same environment share a VNet and a logging destination (Log Analytics).** This becomes the judgment axis of environment design.

- **Group into a single environment**: when you want to manage related services together, place them in the same VNet, communicate via Dapr service invocation, and share the logging destination.
- **Separate environments**: when you absolutely don't want to share compute resources, or want to isolate by team / use (production vs. test).

In practice, the canonical setup is to make "production" and "staging" separate environments, and within the production environment co-locate multiple microservices (public API, internal worker, admin panel). It's like compressing the AWS "VPC + ECS cluster" into a single `Environment`.

> ⚠️ **The environment's auto-deletion policy**: if an environment is idle for 90 days (no active apps/jobs), or stays in a failed state due to misconfiguration of the VNet/Azure Policy, **the environment is automatically deleted**. Put "don't leave a verification environment lying around; keep at least one active" into your operational rules.

---

## Resource Design: CPU and Memory Are "Fixed Combinations"

This is ACA's (Consumption plan) biggest pitfall. **CPU and memory are not a free combination — you can only choose from predetermined pairs.** The total of all containers in the container app (including sidecars) must match one of the following ([the official combination table](https://learn.microsoft.com/en-us/azure/container-apps/containers#vcpu-and-memory-allocation-requirements)).

| vCPU (cores) | Memory | | vCPU (cores) | Memory |
|------------|-------|--|------------|-------|
| `0.25` | `0.5Gi` | | `2.0` | `4.0Gi` |
| `0.5` | `1.0Gi` | | `2.25` | `4.5Gi` |
| `0.75` | `1.5Gi` | | `2.5` | `5.0Gi` |
| `1.0` | `2.0Gi` | | `2.75` | `5.5Gi` |
| `1.25` | `2.5Gi` | | `3.0` | `6.0Gi` |
| `1.5` | `3.0Gi` | | `3.5` | `7.0Gi` |
| `1.75` | `3.5Gi` | | `4.0` | `8.0Gi` |

The law is clear: **Memory (GiB) = vCPU × 2**. It's exactly the same idea as Fargate's "the moment you choose `1024 CPU`, the minimum memory is 2GB," and in ACA the ratio is further fixed. A workload like "512MiB of memory is enough but I want 2 cores of CPU" ends up paying for 4GiB of memory too. So the iron rule is to **decide the size "after measuring."** Take a big size on a guess, and you'll keep getting billed per second for resources you don't use.

> **The cap of a Consumption-only environment**: in a legacy *Consumption only* environment, you're limited to **a maximum of 2 cores and 4Gi per app** ([official](https://learn.microsoft.com/en-us/azure/container-apps/containers)). If you want up to 4 cores and 8GiB, or larger hardware, choose the default **workload-profile environment** (covered later).

### Image and Storage Constraints

- **Image**: only `linux/amd64` (x86-64) Linux images. Pullable from any public/private registry.
- **No privileged containers**: privileged mode with host-level access can't be used.
- **Maximum image size**: in a Consumption workload profile, a total of up to **8 GB** per app/job replica.
- **On crash**: `If a container crashes, it automatically restarts.` ([official](https://learn.microsoft.com/en-us/azure/container-apps/containers)) — if a container goes down, it restarts automatically.

---

## Container Definition: Tag Discipline, Sidecars, Init Containers

Containers are defined in the `containers` array of `properties.template`. The first thing to nail down is **tag discipline**.

> Avoid using static tags like `latest` for container images. Using static tags can lead to caching problems and can make your app difficult to troubleshoot. Instead, use unique tags for each deployment, such as a Git hash or date and time to ensure that updates are properly tracked and deployed.（— [Containers in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/containers)）

**Don't use the `latest` tag in production.** Apply a unique tag with a Git hash or date-time. This is an iron rule not limited to ACA, but in ACA — due to the "revision = immutable snapshot" design — tag uniqueness is directly tied to traceability.

### Sidecar and Init Containers

Most apps are a single container, but in advanced scenarios you can define multiple.

- **Sidecar container**: a **tightly-coupled** auxiliary process to the main container (log forwarding via a shared volume, cache updates, etc.). Add it to the `containers` array.
- **Init container**: an initialization process that runs **before** the main container and must succeed before the main container starts (downloading data, migrations, etc.). Define it in the `initContainers` array.

> The majority of microservices should be "one service = one container app." The official docs also state plainly, `For most microservice scenarios, the best practice is to deploy each service as a separate container app.` Limit co-locating multiple containers to tightly-coupled cases that may share a lifecycle (the SRP principle).

---

## Scaling: Horizontal Autoscaling with KEDA

This is the heart of ACA.

> To support this scaling behavior, Azure Container Apps uses KEDA (Kubernetes Event-driven Autoscaling). KEDA supports scaling against a variety of metrics like HTTP requests, queue messages, CPU and memory load, and event sources like Azure Service Bus, Azure Event Hubs, Apache Kafka, and Redis.（— [Scaling in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)）

Scaling is a combination of **limits (lower/upper bounds), rules (conditions), and behavior**.

| Scale bound | Default | Min | Max |
|--------------|-------|-----|-----|
| Min replicas / revision | `0` | `0` | `1,000` |
| Max replicas / revision | `10` | `1` | `1,000` |

Scale rules come in three categories.

- **HTTP**: concurrent HTTP request count (`concurrentRequests`, default 10). Computed every 15 seconds as "the request count over the past 15 seconds ÷ 15."
- **TCP**: concurrent TCP connection count (`concurrentConnections`).
- **Custom**: CPU, memory, KEDA scalers (Service Bus / Event Hubs / Kafka / Redis / Queue Storage, etc.).

If you define no rules, the **default scale rule (HTTP, min 0, max 10)** applies.

### Zero-Scale: The Biggest Appeal and the Biggest Trap

ACA's headline feature is **zero-scale**. `Most applications can scale to zero.` ([official](https://learn.microsoft.com/en-us/azure/container-apps/overview)) — drop replicas to 0 when idle, and wake them when a request arrives. You aren't billed during that time.

But there are two important cautions.

> ⚠️ **CPU/memory scaling can't go to zero**: `Applications that scale on CPU or memory load can't scale to zero.` ([official](https://learn.microsoft.com/en-us/azure/container-apps/overview)) This is because, to measure CPU/memory load, a replica must be running in the first place. If you want zero-scale, use an HTTP or event-driven rule.

> 🚨 **The self-destruct of Ingress-disabled + zero-scale**: `Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, your container app scales to zero and has no way of starting back up.` ([official](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)) — set **a background worker with Ingress disabled to min replicas 0 and no scale rule**, and once it drops to 0, it can never start back up. Always attach an event-driven rule to a worker, or set `minReplicas` to 1 or more.

### Scale Behavior: Understand the Algorithm

| Behavior | Value |
|------|-----|
| Polling interval | 30 sec |
| Cooldown period | 300 sec |
| Scale-up stabilization window | 0 sec |
| Scale-down stabilization window | 300 sec |
| Scale-up step | 1, 4, 8, 16, 32, ... (up to max) |
| Scale-down step | 100% of the replicas that should be dropped |
| Scale algorithm | `desiredReplicas = ceil(currentMetricValue / targetMetricValue)` |

This algorithm is **directly tied to capacity planning in practice**. For example, with a Service Bus queue's `messageCount: 5` (5 messages per replica) and a queue length of 50, `ceil(50/5) = 10` replicas become the target. Deciding `messageCount` by working backward from your throughput target is the straightforward approach.

> A caution: `Vertical scaling isn't supported.` (no vertical scaling). A single replica's CPU/memory is fixed, and load handling is **only horizontal scaling (count)**. It's unified to the same "handle it by count" idea as AWS Fargate's target-tracking scaling.

### Implementing Event-Driven Scaling (Service Bus × Managed Identity)

The production implementation of a queue-driven worker conventionally **authenticates with a managed identity** (don't put a connection string in the app). A Bicep Service Bus scale-rule example:

```bicep
resource app 'Microsoft.App/containerApps@2025-02-02-preview' = {
  name: 'order-worker'
  location: location
  identity: { type: 'SystemAssigned' }          // システム割当IDを有効化
  properties: {
    managedEnvironmentId: environmentId
    configuration: {
      activeRevisionsMode: 'single'             // 非HTTPイベントルールではsingle必須
      ingress: null                              // ワーカーはIngress無し
    }
    template: {
      containers: [
        {
          name: 'worker'
          image: 'myregistry.azurecr.io/order-worker:2026-06-26-a1b2c3d'
          resources: { cpu: json('0.5'), memory: '1.0Gi' }
        }
      ]
      scale: {
        minReplicas: 0
        maxReplicas: 30
        rules: [
          {
            name: 'servicebus-orders'
            custom: {
              type: 'azure-servicebus'
              metadata: {
                queueName: 'orders'
                namespace: 'my-sb-namespace'
                messageCount: '5'                // 1レプリカが捌く目標メッセージ数
              }
              identity: 'system'                 // ← マネージドIDで認証（秘密を持たない）
            }
          }
        ]
      }
    }
  }
}
```

The official docs state it plainly too.

> Where possible, use managed identity authentication to avoid storing secrets within the app.（— [scale-app](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)）

If you grant the `Azure Service Bus Data Receiver` role to the worker's scale target `system` (the system-assigned identity), you don't have to store a connection string anywhere. This is the same idea as when I made the SQS workers idempotent in the payment platform — [**build idempotently on the premise of "at-least-once, out-of-order, can-fail"**](/blog/aws-sqs-lambda-eventbridge-idempotent-async-processing-guide). The platform guarantees scaling; correctness is guaranteed by the structure of the code (the idempotency key).

---

## Ingress: Don't Build HTTPS Yourself

Enable Ingress, and **you don't need to prepare** a load balancer, a public IP, or a certificate yourself.

> When you enable ingress, you don't need to create an Azure Load Balancer, public IP address, or any other Azure resources to enable incoming HTTP requests or TCP (Transmission Control Protocol) traffic.（— [Ingress in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/ingress-overview)）

### External / Internal and Protocols

- **External**: reachable from the internet via the environment's public IP.
- **Internal**: reachable only from within the same environment. Not exposed to the internet.

In a microservices setup, the security convention is to split into **public API = External, internal service = Internal**. With a two-tier "the public front passes to the internal worker," you narrow the attack surface.

### What HTTP Ingress Provides Automatically

Enable HTTP Ingress, and all of the following come **automatically**.

- TLS termination, HTTP/1.1 and HTTP/2, and support for **WebSocket and gRPC**
- An HTTPS endpoint that always uses TLS 1.2/1.3
- Exposing ports 80/443 (**80 → 443 automatic redirect**)
- A fully qualified domain name (FQDN)
- And —

> Request time out is 240 seconds（— [ingress-overview](https://learn.microsoft.com/en-us/azure/container-apps/ingress-overview)）

**The HTTP request timeout is 240 seconds.** For processing longer than this (heavy report generation, synchronous processing of a large upload, etc.), design it to **offload to a job or queue** rather than making it wait over HTTP.

### A Security Pitfall: X-Forwarded-For

Ingress passes client metadata via HTTP headers. There's an important caution here.

> `X-Forwarded-For` … Only the rightmost IP is provided by Azure Container Apps. Any other values must be validated by the user to prevent IP spoofing.（— [ingress-overview](https://learn.microsoft.com/en-us/azure/container-apps/ingress-overview)）

**For `X-Forwarded-For`, only the rightmost IP is trustworthy as ACA-originated; the others can be spoofed by the client.** When you implement IP restriction or rate limiting, taking the leftmost IP at face value is easily circumvented. Per the principle that the trust boundary is the server side, **validate external input** — this is the app's responsibility.

Other Ingress features: IP restriction, client certificates (mTLS), session affinity (sticky), CORS, inter-revision traffic splitting. Note that **port `36985` is reserved for internal health checks** and can't be used by the app.

---

## Health Probes: startup / liveness / readiness

ACA monitors state with three kinds of probes ([Health probes](https://learn.microsoft.com/en-us/azure/container-apps/health-probes)).

| Probe | Role |
|---------|-----|
| **Startup** | Confirms, during the initial startup phase, that the app started normally |
| **Liveness** | Confirms the app is still alive and responding (failure → restart) |
| **Readiness** | Confirms the replica is ready to receive requests (failure → excluded from traffic) |

The constraints are clear: probes are **TCP or HTTP(S) only**, `exec` probes and gRPC are unsupported, and **one of each type per container**. The success condition for an HTTP probe is a status code **of 200 or above and below 400**.

### A liveness That Looks at Dependencies (Real Code)

The JavaScript example the official docs show is a good pattern where liveness expresses not "I am alive" but "**healthy, including dependencies**."

```javascript
const express = require("express");
const app = express();

// liveness: DBやファイルシステムなど依存の健全性まで確認してから 200 を返す
app.get("/liveness", (req, res) => {
  let isSystemStable = false;
  // check for database availability
  // check filesystem structure, etc.
  // set isSystemStable to true if all checks pass
  res.status(isSystemStable ? 200 : 503).end();
});

// readiness: 受け入れ可能になってから 200。起動直後のウォームアップ中は 503 を返す
app.get("/readiness", (req, res) => {
  res.status(isWarmedUp() ? 200 : 503).end();
});
```

### Default Probes and "Slow-Starting Apps"

Enable Ingress, and unless you define each type yourself, **default probes (TCP to the Ingress target port)** are added automatically. What's important is the behavior: "**in multi-revision mode, traffic isn't switched until readiness succeeds**" — this becomes the foundation of zero-downtime.

For apps that take time to start (JVM warm-up, loading a large model, etc.), relax `initialDelaySeconds` / `periodSeconds` / `failureThreshold` to **prevent them from being restarted before they're ready**.

```json
"probes": [
  {
    "type": "Startup",
    "httpGet": { "path": "/startup", "port": 8080 },
    "initialDelaySeconds": 3,
    "periodSeconds": 3,
    "failureThreshold": 30
  },
  {
    "type": "Liveness",
    "httpGet": { "path": "/liveness", "port": 8080 },
    "periodSeconds": 10,
    "failureThreshold": 3
  },
  {
    "type": "Readiness",
    "httpGet": { "path": "/readiness", "port": 8080 },
    "initialDelaySeconds": 3,
    "periodSeconds": 5,
    "failureThreshold": 48
  }
]
```

---

## Revisions and Deployment: Zero-Downtime, Blue/Green, Canary

Change management is done with **revisions**.

> Change management in Azure Container Apps is powered by revisions, which are a snapshot of each version of your container app.（— [Update and deploy changes](https://learn.microsoft.com/en-us/azure/container-apps/revisions)）

A revision is **immutable, versioned, and auto-generated**, and by default retains **100 inactive revisions** as history. A deployment is "create a new revision and switch to it."

### The Two Modes

| Mode | Behavior | Default |
|-------|------|-----|
| **Single** | When the new revision is ready, **automatically switch all traffic** to it. On failure, it stays on the old revision. The old revision is auto-discarded. | ✅ |
| **Multiple** | Make multiple revisions active simultaneously and **split traffic by %**. Used for Blue/Green, A/B, and canary. | — |

### Single Mode's Zero-Downtime

> In single revision mode, Container Apps ensures your app doesn't experience downtime when creating a new revision. The existing active revision isn't deactivated until the new revision is ready.（— [revisions](https://learn.microsoft.com/en-us/azure/container-apps/revisions)）

A new revision is deemed "ready" when **provisioning succeeds + it scales to the old revision's replica count + all replicas pass the startup/readiness probes**. Until then, the old revision keeps receiving 100% of traffic. This is the same safe-side behavior as AWS Fargate's "rolling update + automatic rollback via the deploy circuit breaker" — **it doesn't switch if it fails**.

### Blue/Green and Canary in Multiple Mode

In multiple-revision mode, you can allocate traffic by %. For example, a **canary release** that sends just 10% to the new version:

```bash
# 新リビジョンを10%、現行を90%に（カナリア）
az containerapp ingress traffic set \
  --name my-api --resource-group my-rg \
  --revision-weight my-api--green=10 my-api--blue=90

# 問題なければ100%へ昇格（Blue/Greenの切替）
az containerapp ingress traffic set \
  --name my-api --resource-group my-rg \
  --revision-weight my-api--green=100
```

Further, with **labels** you can assign a stable URL like `https://...---green.<env>.azurecontainerapps.io` to a specific revision, and verify by routing only test users to the new version. A label's advantage is that **the URL doesn't change even when you move it between revisions**.

### revision-scope and application-scope: Beware the Secret Restart

There are two kinds of change, and **whether a new revision is created** differs.

- **revision-scope (`properties.template`)**: container, image, scale rules, etc. → generates a **new revision**.
- **application-scope (`properties.configuration`)**: secret values, revision mode, Ingress, Dapr settings, etc. → does **not** generate a new revision (applied immediately to all revisions).

There's a practical trap here. **Even if you update a secret value, it's not automatically reflected into existing revisions.**

> Secret values (revisions must be restarted before a container recognizes new secret values)（— [revisions](https://learn.microsoft.com/en-us/azure/container-apps/revisions)）

After rotating a secret, it's reflected only once you **restart the revision or deploy a new one**. This differs in behavior from the Key Vault references discussed later (which auto-restart), so document it clearly in your operational runbook.

---

## Graceful Shutdown: SIGTERM and Idempotency

The most important thing for production resilience, yet most often overlooked, is here. A container stops in the following scenes ([Application lifecycle management](https://learn.microsoft.com/en-us/azure/container-apps/application-lifecycle-management)).

- When the app scales in (replica decrease)
- When the app is deleted
- When a revision is deactivated

And the behavior on stop is defined as follows.

> When a shutdown starts, the container host sends a SIGTERM message to your container. The code in the container can respond to this operating system-level message to handle termination. If your application doesn't respond within 30 seconds to the SIGTERM message, then SIGKILL terminates your container.（— [application-lifecycle-management](https://learn.microsoft.com/en-us/azure/container-apps/application-lifecycle-management)）

**If you can't finish cleanly within 30 seconds of receiving `SIGTERM`, you're force-terminated with `SIGKILL`** (the grace period is extendable via `terminationGracePeriodSeconds`, default 30 seconds). This is exactly the same pattern as AWS Fargate's `stopTimeout`. Since processing can be interrupted on every scale-in, **a handler that catches SIGTERM to "stop accepting → finish in-flight work → close connections"** is mandatory.

```javascript
// Node.js：SIGTERMでグレースフルに終了する（30秒以内に完了させる）
const server = app.listen(8080);

let shuttingDown = false;
process.on("SIGTERM", async () => {
  if (shuttingDown) return;        // 冪等：二重シャットダウンを防ぐ
  shuttingDown = true;

  // 1) readinessを落としLBから外す（新規受付を止める）
  // 2) 進行中のHTTPリクエストを捌き切る
  server.close(async () => {
    // 3) DB接続プール・キュー購読を綺麗に閉じる
    await pool.end();
    await queueConsumer.close();
    process.exit(0);
  });

  // 保険：猶予が尽きる前に強制終了（SIGKILLを待たない）
  setTimeout(() => process.exit(1), 25_000).unref();
});
```

And the official warning —

> Containers restart regularly, so don't expect state to persist inside a container. Instead, use external caches for expensive in-memory cache requirements.（— [application-lifecycle-management](https://learn.microsoft.com/en-us/azure/container-apps/application-lifecycle-management)）

**Don't hold state inside a container.** If you need an in-memory cache, offload it to an external cache like Redis. Design replicas **stateless** on the premise of restarts and count changes. For a queue-driven worker, [**absorb duplicates with an idempotency key**](/blog/dynamodb-payment-reliability-idempotency-zero-downtime) so that it doesn't break even if a message processed halfway under SIGTERM is re-delivered. Scaling is the platform's job; **idempotency is the structure of the code** — enforce this division and neither double-charges nor lost messages occur even with zero-scale.

---

## Secrets and Managed Identity: Erase Credentials from Code

Production security is decided by "where you don't put secrets." ACA can erase credentials from both code and image with **managed identities** and **Key Vault references**.

### Managed Identity: Hold No Passwords

> A managed identity from Microsoft Entra ID allows your container app to access other Microsoft Entra protected resources. … Your app connects to resources with the managed identity. You don't need to manage credentials in your container app.（— [Managed identities in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/managed-identity)）

- **System-assigned identity**: integral to the app. Auto-deleted when the app is deleted. One per app.
- **User-assigned identity**: an independent resource. Reusable across multiple apps/resources.

Uses: **pulling images from ACR**, **fetching secrets from Key Vault**, **authenticating scale rules**, **connecting to Azure SQL / Storage / Service Bus**.

The app-side code can be abstracted with `DefaultAzureCredential`. The advantage is that **the same code runs** in local development (the developer's credentials) and in production (the managed identity).

```python
# Python：マネージドIDでKey Vault / Blobへ。コードに秘密を一切書かない
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

# ローカルではaz login、本番ではコンテナのマネージドIDを自動で使い分ける
credential = DefaultAzureCredential()
client = SecretClient(vault_url="https://my-vault.vault.azure.net", credential=credential)
db_password = client.get_secret("db-password").value
```

> 🔐 **Least-privilege lifecycle settings**: from API version `2024-02-02-preview` onward, with `identitySettings`' `lifecycle` you can control in which phase — `Init` / `Main` / `All` / `None` — a managed identity can be used. For example, "an identity used only for ACR pull" can be set to `None` so that **it can't be used at all from code inside the container**. Even if the container is compromised, you can narrow the resources accessible — you can enforce the least-privilege principle at the platform level.

### Key Vault References: Don't Write Secrets Directly

> Avoid specifying the value of a secret directly in a production environment. Instead, use a reference to a secret stored in Azure Key Vault.（— [Manage secrets in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/manage-secrets)）

A secret is an **app-scoped** name/value pair, referenced from an environment variable via `secretRef`. In production, don't hardcode the value — make it a **Key Vault reference**. Grant the managed identity the `Key Vault Secrets User` role, and ACA fetches the value from Key Vault.

```bash
# Key Vault参照のシークレットを環境変数として注入（ユーザー割当IDで認証）
az containerapp create \
  --resource-group my-rg --name my-api --environment my-env \
  --image myregistry.azurecr.io/my-api:2026-06-26-a1b2c3d \
  --user-assigned "$UAMI_ID" \
  --secrets "db-password=keyvaultref:https://my-vault.vault.azure.net/secrets/db-password,identityref:$UAMI_ID" \
  --env-vars "DB_PASSWORD=secretref:db-password"
```

Key Vault references support **automatic rotation**.

> When newer versions become available, the app automatically retrieves the latest version within 30 minutes. Any active revisions that reference the secret in an environment variable is automatically restarted to pick up the new value.（— [manage-secrets](https://learn.microsoft.com/en-us/azure/container-apps/manage-secrets)）

If you don't include a version in the URI, it **fetches the new version within 30 minutes and auto-restarts the referencing revisions** to reflect it (in contrast to hardcoded secrets, which needed a manual restart). If you want it completely fixed, specify the version explicitly in the URI.

---

## Jobs: Batch, Schedule, Event-Driven

Processing that "runs and finishes" rather than a "resident service" is handled by **jobs**.

> Azure Container Apps jobs enable you to run containerized tasks that run for a finite duration and then stop.（— [Jobs in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/jobs)）

Apps and jobs run in the same environment and share networking and logs. There are three triggers.

- **Manual**: on-demand (CLI, portal, ARM API). One-off processing like data migration.
- **Schedule**: a cron expression (**evaluated in UTC**). Periodic processing like nightly report generation.
- **Event**: event-triggered with a KEDA scaler (a queue message, etc.). Also usable for **self-hosted GitHub Actions Runners / Azure Pipelines agents**.

A job's main settings ([official](https://learn.microsoft.com/en-us/azure/container-apps/jobs)):

| Setting | Meaning |
|------|-----|
| `replicaTimeout` | Maximum seconds to wait for a replica to complete |
| `replicaRetryLimit` | Retry limit for a failed replica (`0` for no retry) |
| `parallelism` | Replicas per execution (often `1`) |
| `replicaCompletionCount` | Replicas that must complete to be deemed successful |

An example of a scheduled job (generate a report at midnight every day):

```bash
az containerapp job create \
  --name nightly-report --resource-group my-rg --environment my-env \
  --trigger-type "Schedule" --cron-expression "0 0 * * *" \
  --replica-timeout 1800 --replica-retry-limit 1 \
  --replica-completion-count 1 --parallelism 1 \
  --image myregistry.azurecr.io/report:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"
```

> ⚠️ **A job's constraints**: `The following features aren't supported: Dapr; Ingress and related features such as custom domains and SSL certificates.` ([official](https://learn.microsoft.com/en-us/azure/container-apps/jobs)) — **jobs have neither Ingress nor Dapr**. Since they presume retries, build the job body idempotently too (it doesn't break even processing the same message twice).

Being able to run "a resident API service (App)," "a periodic batch (Schedule Job)," and "a queue-driven worker (Event Job or App + custom scale)" on **the same environment and the same deployment foundation** dramatically lowers operational cognitive load.

---

## Networking and Plans: Consumption / Dedicated / Flex

### Workload Profile = Choosing Compute Resources

> A workload profile determines the type and amount of compute and memory resources available to container apps deployed in an Azure Container Apps environment.（— [Workload profiles](https://learn.microsoft.com/en-us/azure/container-apps/workload-profiles-overview)）

There are three profiles.

- **Consumption**: serverless. Scales on-demand and **can zero-scale** when idle. Billed for what you use. `0.25–4 vCPU / 0.5–8 GiB`. Best for bursty, unpredictable load.
- **Dedicated**: a reserved dedicated pool. Choose the VM size/type, co-locate multiple apps, and **bill per instance**. Can be cheaper for steady load.
  - **D series (general purpose)** `D4–D32`: 4–32 vCPU / 16–128 GiB
  - **E series (memory-optimized)** `E4–E32`: 4–32 vCPU / 32–256 GiB
  - **GPU (NC series A100)**: for large-scale inference / training
- **Flex (preview)**: a compromise between Consumption's ease and Dedicated's performance. Requires a `/25` subnet and **can't zero-scale**.

When in doubt, **Consumption**. Like Fargate, "server-management cost (labor) is the biggest cost." Once you're at **steady, constant high load** and the unit cost becomes dominant, consider migrating to Dedicated.

### VNet and Subnet

Networking is held by the **environment**, and the subnet requirements change with the environment type ([Networking](https://learn.microsoft.com/en-us/azure/container-apps/networking)).

| Environment type | Supported plans | Main features | Minimum subnet |
|-----------|----------|---------|--------------|
| **Workload profiles (default)** | Consumption, Dedicated | **UDR, NAT Gateway, Private Endpoint** support | `/27` |
| **Consumption only (legacy)** | Consumption | UDR, NAT Gateway, etc. **unsupported** | `/23` |

If you have requirements like controlling outbound in production (egress lockdown via Azure Firewall) or limiting internal access with a Private Endpoint, choose a **workload-profile environment + a dedicated subnet**. Note that the subnet is **dedicated to the ACA environment** (can't be shared with other services), and **the network type can't be changed after creation**.

```text
[Internet] → [Application Gateway + WAF] → [Internal ACA Environment (VNet)]
                                                    ├─ public-api (Internal Ingress)
                                                    ├─ order-worker (no Ingress / Service Bus-driven)
                                                    └─ nightly-report (Schedule Job)
                                                              ↓ egress
                                              [UDR → Azure Firewall] → allowed destinations only
```

This is the same idea as WAF's defense-in-depth ([the design of AWS WAF / Cloud Armor](/blog/waf-defense-in-depth-aws-waf-cloud-armor-owasp-guide)), replaced on the Azure side with `Application Gateway + WAF + Internal environment + Firewall egress`.

---

## Cost Design: Metered Billing, Free Tier, the Idle Rate

The Consumption plan's billing is **two kinds** ([Billing](https://learn.microsoft.com/en-us/azure/container-apps/billing)).

- **Resource consumption**: vCPU-seconds, GiB-seconds (allocated amount × seconds)
- **HTTP requests**: incoming request count

And there's a **free tier per subscription, per month**.

> The following resources are free during each calendar month, per subscription:
> - The first 180,000 vCPU-seconds
> - The first 360,000 GiB-seconds
> - The first 2 million HTTP requests（— [Billing in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/billing)）

**180,000 vCPU-seconds, 360,000 GiB-seconds, and 2 million requests** are free each month. For a 0.5 vCPU app, by the free tier alone you can run it roughly 100 hours per month (plenty even computing on the 360,000 GiB-seconds side ÷ 1GiB), so verification and small-scale services can be run **essentially free**.

### Understand the Three Billing States

| State | Condition | Billing |
|------|------|------|
| **Zero** | 0 replicas (during zero-scale) | **No billing** |
| **idle (discounted)** | `minReplicas>0` and waiting idle at the minimum count | **Discounted rate** |
| **active (normal)** | Beyond the minimum count, or processing | Normal rate |

The idle determination is strict — it's discounted only when all conditions are met: **all containers started, no HTTP request being processed, less than 0.01 vCPU, and less than 1000 bytes/sec of network**. It's effective for operations like "I want at least one resident, but cheap when idle."

Other key cost points:

- **Only external requests are billed**. Service-to-service communication within the environment isn't billable. `Health probe requests aren't billable.` (health probes are free too).
- **Jobs are always at the active rate** (no idle). Once execution ends, consumption stops.
- **The managed OpenTelemetry agent runs at no additional compute cost** (covered later).

> The cost-optimization playbook: ① first drop zero-scalable workloads (HTTP/event-driven) to zero. ② for those that must be resident, set `minReplicas` to the minimum to apply the **idle rate**. ③ level out the unit cost for steady high load with **Dedicated**. ④ avoid `latest`, keep images small, speed up startup, and reduce the "coldness" of zero-scale. The thinking is isomorphic to Fargate's [optimization with Spot/Graviton/Savings Plans](/blog/aws-ecs-fargate-cost-optimization-spot-graviton-savings-plans-guide).

---

## Observability: Logs, Metrics, OpenTelemetry

All apps in an environment send logs by default to a **common Log Analytics workspace** ([environment](https://learn.microsoft.com/en-us/azure/container-apps/environment)). What's collected is:

- The container's `stdout`/`stderr` streams
- The app's scale events
- The Dapr sidecar's logs (when enabled)
- System-level metrics and events

ACA provides a **managed OpenTelemetry agent** that can export traces, metrics, and logs over OTLP. As noted earlier, it **runs at no additional compute cost**, so you don't need to stand up a sidecar yourself for observability.

In practice, you can carry over the design as-is: thread a **correlation ID** through structured logs, follow distributed processing with traces, and judge with SLO/error-budget. The principle is the same as [correlating the three pillars with OpenTelemetry](/blog/opentelemetry-observability-production-tracing-metrics-logs); it's accurate to view ACA as providing the "foundation to emit the three pillars" as managed.

---

## IaC and CI/CD: Bicep / Terraform / GitHub Actions (OIDC)

In production, define the app with **declarative code** and ship it with **keyless CI/CD**.

### Terraform (azurerm)

If you've operated AWS with Terraform, you can write ACA in the same style with the `azurerm` provider.

```hcl
resource "azurerm_container_app_environment" "main" {
  name                       = "prod-env"
  resource_group_name        = azurerm_resource_group.main.name
  location                   = azurerm_resource_group.main.location
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
}

resource "azurerm_container_app" "api" {
  name                         = "public-api"
  container_app_environment_id = azurerm_container_app_environment.main.id
  resource_group_name          = azurerm_resource_group.main.name
  revision_mode                = "Single"

  identity { type = "SystemAssigned" }

  ingress {
    external_enabled = true
    target_port      = 8080
    transport        = "auto"
    traffic_weight {
      latest_revision = true
      percentage      = 100
    }
  }

  template {
    min_replicas = 1   # idleレートを効かせつつ常駐（ゼロスケールの冷たさを避ける）
    max_replicas = 20

    container {
      name   = "api"
      image  = "myregistry.azurecr.io/public-api:2026-06-26-a1b2c3d"
      cpu    = 0.5
      memory = "1.0Gi"

      liveness_probe {
        transport = "HTTP"
        path      = "/liveness"
        port      = 8080
      }
      readiness_probe {
        transport = "HTTP"
        path      = "/readiness"
        port      = 8080
      }
    }

    http_scale_rule {
      name                = "http-rule"
      concurrent_requests = 100
    }
  }
}
```

The thinking on Terraform module design, state isolation, and drift detection is common with AWS ([Terraform module design and drift detection](/blog/terraform-module-design-state-isolation-drift-detection-guide)).

### GitHub Actions × OIDC (Keyless)

Putting a long-lived secret (a service principal's password) in GitHub is debt. With **Microsoft Entra federated credentials (OIDC)**, deploy without storing a single key.

```yaml
name: deploy-aca
on:
  push: { branches: [main] }
permissions:
  id-token: write      # OIDCトークンの発行に必須
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2          # 鍵レス：client-id/tenant-id/subscription-idのみ
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - name: Deploy revision
        run: |
          TAG="$(date +%Y-%m-%d)-${GITHUB_SHA::7}"   # 一意タグ（latest禁止）
          az containerapp update \
            --name public-api --resource-group my-rg \
            --image myregistry.azurecr.io/public-api:"$TAG"
```

The design philosophy of keyless CI/CD via OIDC is completely isomorphic with AWS/GCP ([throw away keys with GitHub Actions OIDC](/blog/github-actions-oidc-keyless-cicd-aws-gcp-guide)). Azure provides, under the name "federated credentials," the same "trust a short-lived token" mechanism.

> If you just want to try it quickly, `az containerapp up` is handy. From source or an image, it creates the environment, registry, and app all at once. But **production is IaC**. Use `up` for learning and PoC, Bicep/Terraform for production ([YAGNI](https://ja.wikipedia.org/wiki/YAGNI): don't over-build the production setup from the start — run it first, measure, then solidify).

---

## A Pre-Production Checklist

Before shipping ACA to production, here are this article's key points as a confirmation list.

- [ ] **Environment**: separate production / staging environments. Don't leave a verification environment for 90 days (auto-deletion protection).
- [ ] **Resources**: right-size CPU/memory after measuring (the fixed ratio memory = vCPU×2). No `latest` tags; use unique tags.
- [ ] **Scale**: zero-scale with HTTP/event-driven. **An Ingress-disabled worker requires `minReplicas≥1` or a scale rule** (self-destruct protection). Understand that CPU/memory scaling can't go to zero.
- [ ] **Ingress**: public = External, internal = Internal. **Offload processing over 240 seconds to a job/queue.** Trust only the rightmost of `X-Forwarded-For` and validate.
- [ ] **Probes**: define startup/liveness/readiness. If startup is slow, relax the thresholds.
- [ ] **Deployment**: zero-downtime in single mode, or canary/Blue-Green in multiple mode. Don't forget to **restart the revision after a secret update**.
- [ ] **Resilience**: graceful shutdown that handles `SIGTERM` within 30 seconds. State goes outside the container (Redis, etc.). Make workers **idempotent**.
- [ ] **Security**: erase credentials from code with managed identity + Key Vault references. Least privilege via `identitySettings`' lifecycle. No privileged containers.
- [ ] **Networking**: production is a workload-profile environment + a dedicated subnet (`/27`). Lock down egress with UDR + Firewall.
- [ ] **Cost**: weave the free tier (180K vCPU-seconds, 360K GiB-seconds, 2M req) and the idle rate into the design.
- [ ] **Observability**: Log Analytics + the managed OTel agent. Operate with correlation IDs and SLOs.
- [ ] **IaC/CI/CD**: declarative with Bicep/Terraform. Keyless with GitHub Actions × OIDC (federated credentials).

---

## Summary: The Crux of Fargate Ports to ACA

Azure Container Apps is a serverless container foundation for **"erasing the biggest cost (labor) of server management and concentrating on production quality itself."** On top of proven OSS — Kubernetes, KEDA, Dapr, Envoy — it layers, as managed, the app-operation concepts of certificates, revisions, scale, and environments.

And production quality is, in the end, decided by **the structure of the code and the design** — KEDA's scale rules and the zero-scale trap, Ingress's 240-second boundary and X-Forwarded-For validation, graceful shutdown that catches SIGTERM and idempotency, and erasing credentials with managed identity + Key Vault references. These are astonishingly isomorphic with the crux I've honed running AWS Fargate in production. Even when the cloud changes, the principle of "build unbreakable, never-stopping, cheap, and safe" doesn't change.

With one person × generative AI, on AWS or Azure, I'll help you productionize a container foundation fast, cheaply, and safely. For production builds on serverless containers, or consultation on migrating from AWS / a multi-cloud setup, feel free to reach out from [Contact](/contact).
