Autoscaling is an area where accidents easily happen with "I thought I configured it." A scaled-to-zero worker that never wakes up, task count not keeping up with a surge in traffic, hardcoding a connection string into the app — all are preventable if you accurately understand KEDA's behavior.
This article, faithful to Microsoft Learn's official scaling documentation, explains Azure Container Apps (ACA) autoscaling down to the capacity-planning formula. On AWS, I've run SQS-driven idempotent workers in production on Fargate and built a payment platform with 0 double charges in production. Scaling is guaranteed by the platform, correctness by the code's structure (idempotency) — this division is the same on Azure. For ACA's overall production operation, see the Azure Container Apps production-operations guide.
What KEDA is: the brain of ACA's scaling
To support this scaling behavior, Azure Container Apps uses KEDA (Kubernetes Event-driven Autoscaling). KEDA supports scaling against a variety of metrics like HTTP requests, queue messages, CPU and memory load, and event sources like Azure Service Bus, Azure Event Hubs, Apache Kafka, and Redis. (— Scaling in Azure Container Apps)
ACA's scaling is increasing/decreasing replicas (a revision's running instances) horizontally. KEDA makes that decision. Scaling is a combination of three elements.
- Limits: min/max replica count (default 0–10, settable up to 1,000)
- Rules: when to increase/decrease (HTTP / TCP / custom)
- Behavior: time characteristics like polling, cooldown, and steps
Note:
Vertical scaling isn't supported.Vertical scaling, making one machine bigger, isn't possible. Load handling is by count (horizontal) only.
Three kinds of scale rules: using them differently
| Category | Scale basis | Representative use | Scale-to-zero |
|---|---|---|---|
| HTTP | Concurrent HTTP request count (concurrentRequests, default 10) | Web API, web app | ◎ |
| TCP | Concurrent TCP connection count (concurrentConnections) | TCP protocols like Redis | ◎ |
| Custom (CPU/memory) | CPU/memory load | Compute-centric services | ✕ |
| Custom (event) | KEDA scaler (Service Bus, etc.) | Queue-driven workers | ◎ |
When you define multiple rules,
the container app begins to scale once the first condition of any rule is met.— scaling starts the moment any one condition is met.
HTTP scaling (the most common)
Every 15 seconds, it computes the concurrent request count as "request count over the past 15 seconds ÷ 15." With az CLI, one line:
az containerapp create \
--name web-api --resource-group my-rg --environment my-env \
--image myregistry.azurecr.io/web-api:2026-06-26-a1b2c3d \
--min-replicas 1 --max-replicas 20 \
--scale-rule-name http-rule --scale-rule-type http \
--scale-rule-http-concurrency 100 # 1レプリカあたり同時100リクエストを目標
Measure "the concurrent request count one replica can handle comfortably" with a load test, and make that http-concurrency. Making it large by guess causes stalls, and making it small over-scales and inflates cost.
Scale-to-zero: the biggest attraction and two traps
ACA's centerpiece is scale-to-zero — making replicas 0 when idle, with no billing during that time. But without knowing two traps, you'll have an accident.
Trap ①: CPU/memory scaling can't go to zero
Applications that scale on CPU or memory load can't scale to zero. (— overview)
Because to measure CPU/memory load, a replica needs to be running in the first place. If you want to drop to zero, use an HTTP or event-driven rule. A CPU/memory rule is premised on "at least 1 always running."
Trap ②: the self-destruct of an ingress-disabled worker
This is the most common accident.
Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, your container app scales to zero and has no way of starting back up. (— scale-app)
With a background worker that has ingress disabled, setting min replicas 0 and no scale rule means it can never wake up after dropping to 0 (because there's no request as a trigger to wake it). The countermeasure is one of two:
- Attach an event-driven rule (wakes when a message arrives in the queue) ← recommended
minReplicas1 or more (keep it resident)
# ✅ 正しい:イベント駆動ワーカーはスケールルールで0から起こす
az containerapp create \
--name worker --resource-group my-rg --environment my-env \
--image myregistry.azurecr.io/worker:2026-06-26-a1b2c3d \
--min-replicas 0 --max-replicas 30 \
--scale-rule-name sb-rule --scale-rule-type azure-servicebus \
--scale-rule-metadata "queueName=orders" "namespace=my-sb" "messageCount=5" \
--scale-rule-identity system # マネージドIDで認証
Capacity planning: work back using the algorithm
KEDA's scaling algorithm is published and directly connects to capacity planning.
| Behavior | Value |
|---|---|
| Polling interval | 30 seconds (not applied to HTTP/TCP) |
| Cooldown period | 300 seconds (after the last event, until dropping to min replicas) |
| Scale-up stabilization window | 0 seconds |
| Scale-down stabilization window | 300 seconds |
| Scale-up steps | 1, 4, 8, 16, 32, ... (up to the max) |
| Scale-down step | 100% of the replicas to drop |
| Algorithm | desiredReplicas = ceil(currentMetricValue / targetMetricValue) |
worked example: the count for a queue-driven worker
With a Service Bus queue at messageCount: 5 (the target one replica handles = 5 messages), if the queue length is 50:
desiredReplicas = ceil(50 / 5) = 10 レプリカ
To work back from a throughput target — if "2 seconds to process one message" and "100 messages arrive per second at peak," the required throughput = 100 msg/s. One replica handles only 0.5 msg/s (2 sec/item), so theoretically 200 replicas are needed. Making messageCount smaller increases the count quickly (responsiveness ↑, cost ↑), and making it larger increases it gently (responsiveness ↓, cost ↓). Adjust the SLA-vs-cost trade-off with messageCount.
How to read surges and drops
- Surge: increases by doubling,
1 → 4 → 8 → 16 → 32 ...(immediate, since stabilization is 0 seconds). - Drop: scale-down occurs only after satisfying the 300-second stabilization window. It maintains the count in a momentary trough, preventing flapping (oscillation of increase/decrease).
- To zero: it drops to 0 after waiting the 300-second cooldown from the last event.
This asymmetry of "up immediately, down cautiously" is KEDA's design that protects availability while keeping cost down. It's the same philosophy as AWS Application Auto Scaling's scale-in protection.
Event-driven scaling: porting KEDA scalers
You can port any KEDA scaler to an ACA scale rule. Just copy KEDA's ScaledObject spec type and metadata to ACA's custom.type and custom.metadata.
scale: {
minReplicas: 0
maxReplicas: 50
rules: [
{
name: 'eventhub-rule'
custom: {
type: 'azure-eventhub' // KEDAスケーラーのtype
metadata: {
eventHubName: 'telemetry'
consumerGroup: '$Default'
unprocessedEventThreshold: '64' // 1レプリカあたりの目標未処理イベント数
}
identity: 'system' // マネージドIDで認証
}
}
]
}
Examples of supported event sources: Azure Service Bus / Event Hubs / Storage Queue / Apache Kafka / Redis and many more.
Important:
Set the properties.configuration.activeRevisionsMode property of the container app to single when using non-HTTP event scale rules.— for non-HTTP event rules, setactiveRevisionsModetosingle.
Authentication: managed identity over secrets
A KEDA scaler needs authentication credentials. ACA supports two methods, but preferring managed identity is the official recommendation.
Where possible, use managed identity authentication to avoid storing secrets within the app. (— scale-app)
Managed identity (recommended)
Just specify the identity property on the scale rule. You store the connection string nowhere.
rules: [
{
name: 'azure-queue'
custom: {
type: 'azure-queue'
metadata: {
accountName: 'mystorage'
queueName: 'jobs'
queueLength: '1'
}
identity: 'system' // または ユーザー割当IDのリソースID
}
}
]
Then grant that managed identity the target resource's data-permission role (e.g., Azure Service Bus Data Receiver, Storage Queue Data Reader) and you're done. Managing/rotating connection strings becomes unnecessary, and the leakage surface disappears.
Secrets (when unavoidable)
Place the connection string in the app's secrets and reference it in the scale rule's auth array. Combined with a Key Vault reference, you can avoid hardcoding the value (details in secret management).
Scaling jobs: ScaledJob
Event-driven jobs use the same KEDA scaler, but the usage differs.
In an app, each replica continuously processes events, and a scaling rule determines the number of replicas to run to meet demand. In event-driven jobs, each job execution typically processes a single event, and a scaling rule determines the number of job executions to run. (— Jobs)
- App: one replica continuously processes events. The scale rule decides the number of replicas.
- Job: one execution processes one event and finishes. The scale rule decides the number of executions.
The distinction is: if "a new instance of a dedicated resource is needed per event" or "processing takes a long time," a job; if "stay resident and keep flowing," an app.
Scaling pitfalls and known limitations
The known limitations the official docs list (scale-app):
- No vertical scaling (count only).
- The replica count is a target, not a guarantee (
Replica quantities are a target amount, not a guarantee.). - When managing state with Dapr actors, scale-to-zero is unsupported (because the in-memory representation is tied to an ID).
- In multiple-revision mode, adding a new scale trigger creates a new revision. The old revision remains with the old rules.
Diagnosing when scaling doesn't work
If you feel it "doesn't scale," check the system log for Error fetching scaler metrics. This is a sign that the scaling signal source (DB, Event Hub, another app) can't be reached. Check the VNet, DNS, firewall, and permissions (troubleshooting guide).
Production design: scaling × idempotency
The key to making autoscaling production-quality is to design scaling and correctness separately.
- Scaling is the platform: KEDA decides the count. You just measure and set "the target per machine" —
messageCountor HTTP concurrency. - Correctness is the code's structure: make workers idempotent so they don't break even if a message is redelivered on scale-in (exit within 30 seconds on SIGTERM) or a retry. Processing the same message twice doesn't double-charge — absorb duplicates with an idempotency key.
Enforce this division thoroughly and even with scale-to-zero, neither drops nor double-executions occur. This is the core of the design that achieved 0 double charges in production on the payment platform.
Conclusion
ACA's autoscaling is borne by KEDA, scaling horizontally on HTTP, TCP, CPU/memory, and many event sources. There are four key points to make it work in production —
- Avoid the two scale-to-zero traps (CPU/memory can't be 0, the self-destruct of an ingress-disabled worker).
- Work back capacity with the algorithm (
messageCountis the SLA-vs-cost adjustment knob). - Prefer managed identity for authentication and erase the connection string from the app.
- Scaling × idempotency — scaling is KEDA, correctness is the code's structure.
From autoscaling design and tuning to making queue-driven workers idempotent, for consultation go to contact. For production operation overall, see the Azure Container Apps production-operations guide.