# The complete Azure Container Apps autoscaling guide: scale-to-zero and event-driven with KEDA (HTTP, queue, CPU)

> A thorough explanation of Azure Container Apps KEDA autoscaling in real code. Faithful to Microsoft Learn official docs, it shows in Bicep/az CLI: the scale rules for HTTP/TCP/custom (CPU, memory, Service Bus, Event Hubs, Kafka, Redis), the design of scale-to-zero and the self-destruct trap, capacity planning via the scaling algorithm, managed-identity authentication, and ScaledJob for jobs.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: Azure, Container Apps, KEDA, サーバーレス, オートスケール, コスト最適化, 信頼性, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/azure-container-apps-keda-autoscaling-scale-to-zero-event-driven-guide
- Category: Azure Container Apps in production
- Pillar guide: https://tomodahinata.com/en/blog/azure-container-apps-production-guide

## Key points

- Azure Container Apps scales horizontally with KEDA (Kubernetes Event-driven Autoscaling), increasing/decreasing replicas on event sources like HTTP, TCP, CPU/memory, Service Bus/Event Hubs/Kafka/Redis.
- Scale-to-zero is the biggest attraction but has two traps: ① an app that scales on CPU/memory load can't go to zero; ② a worker with ingress disabled, min replicas 0, and no rule can never wake up once it drops to 0.
- Plan capacity by working back from the algorithm `desiredReplicas = ceil(currentMetricValue / targetMetricValue)`. For queue-driven, 'the target number of messages one replica handles' is the star of the design.
- For KEDA scaler authentication, prefer managed identity over secrets and erase the connection string from the app. For non-HTTP event rules, set activeRevisionsMode to single.
- Polling 30 seconds, cooldown 300 seconds, scale-up steps 1→4→8→16→32, scale-down after a 300-second stabilization. Understanding this lets you read the behavior of surges and drops.

---

Autoscaling is an area where accidents easily happen with "I thought I configured it." **A scaled-to-zero worker that never wakes up**, **task count not keeping up with a surge in traffic**, **hardcoding a connection string into the app** — all are preventable if you accurately understand KEDA's behavior.

This article, faithful to [Microsoft Learn's official scaling documentation](https://learn.microsoft.com/en-us/azure/container-apps/scale-app), explains Azure Container Apps (ACA) autoscaling **down to the capacity-planning formula.** On AWS, I've [run SQS-driven idempotent workers in production on Fargate](/blog/aws-ecs-fargate-auto-scaling-target-tracking-sqs-worker-guide) and built [a payment platform with 0 double charges in production](/case-studies/payment-platform-reliability). Scaling is guaranteed by the platform, correctness by the code's structure (idempotency) — this division is the same on Azure. For ACA's overall production operation, see the [Azure Container Apps production-operations guide](/blog/azure-container-apps-production-guide).

---

## What KEDA is: the brain of ACA's scaling

> To support this scaling behavior, Azure Container Apps uses KEDA (Kubernetes Event-driven Autoscaling). KEDA supports scaling against a variety of metrics like HTTP requests, queue messages, CPU and memory load, and event sources like Azure Service Bus, Azure Event Hubs, Apache Kafka, and Redis. (— [Scaling in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/scale-app))

ACA's scaling is increasing/decreasing replicas (a revision's running instances) **horizontally.** **KEDA** makes that decision. Scaling is a combination of three elements.

- **Limits**: min/max replica count (default 0–10, settable up to 1,000)
- **Rules**: when to increase/decrease (HTTP / TCP / custom)
- **Behavior**: time characteristics like polling, cooldown, and steps

> Note: `Vertical scaling isn't supported.` Vertical scaling, making one machine bigger, isn't possible. Load handling is **by count (horizontal) only.**

---

## Three kinds of scale rules: using them differently

| Category | Scale basis | Representative use | Scale-to-zero |
|---------|------------|------------|:-----------:|
| **HTTP** | Concurrent HTTP request count (`concurrentRequests`, default 10) | Web API, web app | ◎ |
| **TCP** | Concurrent TCP connection count (`concurrentConnections`) | TCP protocols like Redis | ◎ |
| **Custom (CPU/memory)** | CPU/memory load | Compute-centric services | **✕** |
| **Custom (event)** | KEDA scaler (Service Bus, etc.) | Queue-driven workers | ◎ |

> When you define multiple rules, `the container app begins to scale once the first condition of any rule is met.` — **scaling starts the moment any one condition is met.**

### HTTP scaling (the most common)

Every 15 seconds, it computes the concurrent request count as "request count over the past 15 seconds ÷ 15." With az CLI, one line:

```bash
az containerapp create \
  --name web-api --resource-group my-rg --environment my-env \
  --image myregistry.azurecr.io/web-api:2026-06-26-a1b2c3d \
  --min-replicas 1 --max-replicas 20 \
  --scale-rule-name http-rule --scale-rule-type http \
  --scale-rule-http-concurrency 100   # 1レプリカあたり同時100リクエストを目標
```

Measure "the concurrent request count one replica can handle comfortably" with a load test, and make that `http-concurrency`. Making it large by guess causes stalls, and making it small over-scales and inflates cost.

---

## Scale-to-zero: the biggest attraction and two traps

ACA's centerpiece is **scale-to-zero** — making replicas 0 when idle, with no billing during that time. But without knowing **two traps**, you'll have an accident.

### Trap ①: CPU/memory scaling can't go to zero

> Applications that scale on CPU or memory load can't scale to zero. (— [overview](https://learn.microsoft.com/en-us/azure/container-apps/overview))

Because to measure CPU/memory load, a replica needs to be running in the first place. **If you want to drop to zero, use an HTTP or event-driven rule.** A CPU/memory rule is premised on "at least 1 always running."

### Trap ②: the self-destruct of an ingress-disabled worker

This is the most common accident.

> Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, your container app scales to zero and has no way of starting back up. (— [scale-app](https://learn.microsoft.com/en-us/azure/container-apps/scale-app))

With a **background worker that has ingress disabled**, setting min replicas 0 and no scale rule means it **can never wake up after dropping to 0** (because there's no request as a trigger to wake it). The countermeasure is one of two:

1. **Attach an event-driven rule** (wakes when a message arrives in the queue) ← recommended
2. **`minReplicas` 1 or more** (keep it resident)

```bash
# ✅ 正しい：イベント駆動ワーカーはスケールルールで0から起こす
az containerapp create \
  --name worker --resource-group my-rg --environment my-env \
  --image myregistry.azurecr.io/worker:2026-06-26-a1b2c3d \
  --min-replicas 0 --max-replicas 30 \
  --scale-rule-name sb-rule --scale-rule-type azure-servicebus \
  --scale-rule-metadata "queueName=orders" "namespace=my-sb" "messageCount=5" \
  --scale-rule-identity system   # マネージドIDで認証
```

---

## Capacity planning: work back using the algorithm

KEDA's scaling algorithm is published and **directly connects to capacity planning.**

| Behavior | Value |
|------|-----|
| Polling interval | 30 seconds (not applied to HTTP/TCP) |
| Cooldown period | 300 seconds (after the last event, until dropping to min replicas) |
| Scale-up stabilization window | 0 seconds |
| Scale-down stabilization window | 300 seconds |
| Scale-up steps | 1, 4, 8, 16, 32, ... (up to the max) |
| Scale-down step | 100% of the replicas to drop |
| Algorithm | `desiredReplicas = ceil(currentMetricValue / targetMetricValue)` |

### worked example: the count for a queue-driven worker

With a Service Bus queue at `messageCount: 5` (the target one replica handles = 5 messages), if the queue length is 50:

```text
desiredReplicas = ceil(50 / 5) = 10 レプリカ
```

To work back from a throughput target — if "2 seconds to process one message" and "100 messages arrive per second at peak," the required throughput = 100 msg/s. One replica handles only 0.5 msg/s (2 sec/item), so theoretically **200 replicas** are needed. Making `messageCount` smaller increases the count quickly (responsiveness ↑, cost ↑), and making it larger increases it gently (responsiveness ↓, cost ↓). **Adjust the SLA-vs-cost trade-off with `messageCount`.**

### How to read surges and drops

- **Surge**: increases by doubling, `1 → 4 → 8 → 16 → 32 ...` (immediate, since stabilization is 0 seconds).
- **Drop**: **scale-down occurs only after satisfying the 300-second stabilization window.** It maintains the count in a momentary trough, preventing flapping (oscillation of increase/decrease).
- **To zero**: it drops to 0 after waiting the **300-second cooldown** from the last event.

This asymmetry of "up immediately, down cautiously" is KEDA's design that **protects availability while keeping cost down.** It's the same philosophy as AWS Application Auto Scaling's scale-in protection.

---

## Event-driven scaling: porting KEDA scalers

You can port any [KEDA scaler](https://keda.sh/docs/latest/scalers/) to an ACA scale rule. Just copy KEDA's `ScaledObject` spec `type` and `metadata` to ACA's `custom.type` and `custom.metadata`.

```bicep
scale: {
  minReplicas: 0
  maxReplicas: 50
  rules: [
    {
      name: 'eventhub-rule'
      custom: {
        type: 'azure-eventhub'         // KEDAスケーラーのtype
        metadata: {
          eventHubName: 'telemetry'
          consumerGroup: '$Default'
          unprocessedEventThreshold: '64'   // 1レプリカあたりの目標未処理イベント数
        }
        identity: 'system'             // マネージドIDで認証
      }
    }
  ]
}
```

Examples of supported event sources: **Azure Service Bus / Event Hubs / Storage Queue / Apache Kafka / Redis** and many more.

> Important: `Set the properties.configuration.activeRevisionsMode property of the container app to single when using non-HTTP event scale rules.` — **for non-HTTP event rules, set `activeRevisionsMode` to `single`.**

---

## Authentication: managed identity over secrets

A KEDA scaler needs authentication credentials. ACA supports two methods, but **preferring managed identity** is the official recommendation.

> Where possible, use managed identity authentication to avoid storing secrets within the app. (— [scale-app](https://learn.microsoft.com/en-us/azure/container-apps/scale-app))

### Managed identity (recommended)

Just specify the `identity` property on the scale rule. You store the connection string nowhere.

```bicep
rules: [
  {
    name: 'azure-queue'
    custom: {
      type: 'azure-queue'
      metadata: {
        accountName: 'mystorage'
        queueName: 'jobs'
        queueLength: '1'
      }
      identity: 'system'   // または ユーザー割当IDのリソースID
    }
  }
]
```

Then grant that managed identity the target resource's data-permission role (e.g., `Azure Service Bus Data Receiver`, `Storage Queue Data Reader`) and you're done. **Managing/rotating connection strings becomes unnecessary**, and the leakage surface disappears.

### Secrets (when unavoidable)

Place the connection string in the app's `secrets` and reference it in the scale rule's `auth` array. Combined with a Key Vault reference, you can avoid hardcoding the value (details in [secret management](/blog/azure-container-apps-production-guide)).

---

## Scaling jobs: ScaledJob

[Event-driven jobs](/blog/azure-container-apps-jobs-batch-scheduled-event-driven-guide) use the same KEDA scaler, but **the usage differs.**

> In an app, each replica continuously processes events, and a scaling rule determines the number of replicas to run to meet demand. In event-driven jobs, each job execution typically processes a single event, and a scaling rule determines the number of job executions to run. (— [Jobs](https://learn.microsoft.com/en-us/azure/container-apps/jobs))

- **App**: one replica **continuously** processes events. The scale rule decides the **number of replicas.**
- **Job**: one execution processes **one event** and finishes. The scale rule decides the **number of executions.**

The distinction is: if "a new instance of a dedicated resource is needed per event" or "processing takes a long time," a job; if "stay resident and keep flowing," an app.

---

## Scaling pitfalls and known limitations

The known limitations the official docs list ([scale-app](https://learn.microsoft.com/en-us/azure/container-apps/scale-app)):

- **No vertical scaling** (count only).
- **The replica count is a target, not a guarantee** (`Replica quantities are a target amount, not a guarantee.`).
- **When managing state with Dapr actors, scale-to-zero is unsupported** (because the in-memory representation is tied to an ID).
- **In multiple-revision mode, adding a new scale trigger creates a new revision.** The old revision remains with the old rules.

### Diagnosing when scaling doesn't work

If you feel it "doesn't scale," check the system log for `Error fetching scaler metrics`. This is a sign that **the scaling signal source (DB, Event Hub, another app) can't be reached.** Check the VNet, DNS, firewall, and permissions ([troubleshooting guide](/blog/azure-container-apps-troubleshooting-revision-failed-exit-code-137-probes-guide)).

---

## Production design: scaling × idempotency

The key to making autoscaling production-quality is to **design scaling and correctness separately.**

- **Scaling is the platform**: KEDA decides the count. You just measure and set "the target per machine" — `messageCount` or HTTP concurrency.
- **Correctness is the code's structure**: make workers **idempotent** so they don't break even if a message is redelivered on scale-in ([exit within 30 seconds on SIGTERM](/blog/azure-container-apps-production-guide)) or a retry. Processing the same message twice doesn't double-charge — [absorb duplicates with an idempotency key](/blog/aws-sqs-lambda-eventbridge-idempotent-async-processing-guide).

Enforce this division thoroughly and **even with scale-to-zero, neither drops nor double-executions occur.** This is the core of the design that achieved 0 double charges in production on the payment platform.

---

## Conclusion

ACA's autoscaling is borne by **KEDA**, scaling horizontally on HTTP, TCP, CPU/memory, and many event sources. There are four key points to make it work in production —

1. **Avoid the two scale-to-zero traps** (CPU/memory can't be 0, the self-destruct of an ingress-disabled worker).
2. **Work back capacity with the algorithm** (`messageCount` is the SLA-vs-cost adjustment knob).
3. **Prefer managed identity for authentication** and erase the connection string from the app.
4. **Scaling × idempotency** — scaling is KEDA, correctness is the code's structure.

From autoscaling design and tuning to making queue-driven workers idempotent, for consultation go to [contact](/contact). For production operation overall, see the [Azure Container Apps production-operations guide](/blog/azure-container-apps-production-guide).
