The complete Azure Container Apps autoscaling guide: scale-to-zero and event-driven with KEDA (HTTP, queue, CPU)

Autoscaling is an area where accidents easily happen with "I thought I configured it." A scaled-to-zero worker that never wakes up, task count not keeping up with a surge in traffic, hardcoding a connection string into the app — all are preventable if you accurately understand KEDA's behavior.

This article, faithful to Microsoft Learn's official scaling documentation, explains Azure Container Apps (ACA) autoscaling down to the capacity-planning formula. On AWS, I've run SQS-driven idempotent workers in production on Fargate and built a payment platform with 0 double charges in production. Scaling is guaranteed by the platform, correctness by the code's structure (idempotency) — this division is the same on Azure. For ACA's overall production operation, see the Azure Container Apps production-operations guide.

What KEDA is: the brain of ACA's scaling

To support this scaling behavior, Azure Container Apps uses KEDA (Kubernetes Event-driven Autoscaling). KEDA supports scaling against a variety of metrics like HTTP requests, queue messages, CPU and memory load, and event sources like Azure Service Bus, Azure Event Hubs, Apache Kafka, and Redis. (— Scaling in Azure Container Apps)

ACA's scaling is increasing/decreasing replicas (a revision's running instances) horizontally. KEDA makes that decision. Scaling is a combination of three elements.

Limits: min/max replica count (default 0–10, settable up to 1,000)
Rules: when to increase/decrease (HTTP / TCP / custom)
Behavior: time characteristics like polling, cooldown, and steps

Note: Vertical scaling isn't supported. Vertical scaling, making one machine bigger, isn't possible. Load handling is by count (horizontal) only.

Three kinds of scale rules: using them differently

Category	Scale basis	Representative use	Scale-to-zero
HTTP	Concurrent HTTP request count (`concurrentRequests`, default 10)	Web API, web app	◎
TCP	Concurrent TCP connection count (`concurrentConnections`)	TCP protocols like Redis	◎
Custom (CPU/memory)	CPU/memory load	Compute-centric services	✕
Custom (event)	KEDA scaler (Service Bus, etc.)	Queue-driven workers	◎

When you define multiple rules, the container app begins to scale once the first condition of any rule is met. — scaling starts the moment any one condition is met.

HTTP scaling (the most common)

Every 15 seconds, it computes the concurrent request count as "request count over the past 15 seconds ÷ 15." With az CLI, one line:

az containerapp create \
  --name web-api --resource-group my-rg --environment my-env \
  --image myregistry.azurecr.io/web-api:2026-06-26-a1b2c3d \
  --min-replicas 1 --max-replicas 20 \
  --scale-rule-name http-rule --scale-rule-type http \
  --scale-rule-http-concurrency 100   # 1レプリカあたり同時100リクエストを目標

Measure "the concurrent request count one replica can handle comfortably" with a load test, and make that http-concurrency. Making it large by guess causes stalls, and making it small over-scales and inflates cost.

Scale-to-zero: the biggest attraction and two traps

ACA's centerpiece is scale-to-zero — making replicas 0 when idle, with no billing during that time. But without knowing two traps, you'll have an accident.

Trap ①: CPU/memory scaling can't go to zero

Applications that scale on CPU or memory load can't scale to zero. (— overview)

Because to measure CPU/memory load, a replica needs to be running in the first place. If you want to drop to zero, use an HTTP or event-driven rule. A CPU/memory rule is premised on "at least 1 always running."

Trap ②: the self-destruct of an ingress-disabled worker

This is the most common accident.

Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, your container app scales to zero and has no way of starting back up. (— scale-app)

With a background worker that has ingress disabled, setting min replicas 0 and no scale rule means it can never wake up after dropping to 0 (because there's no request as a trigger to wake it). The countermeasure is one of two:

Attach an event-driven rule (wakes when a message arrives in the queue) ← recommended
minReplicas 1 or more (keep it resident)

# ✅ 正しい：イベント駆動ワーカーはスケールルールで0から起こす
az containerapp create \
  --name worker --resource-group my-rg --environment my-env \
  --image myregistry.azurecr.io/worker:2026-06-26-a1b2c3d \
  --min-replicas 0 --max-replicas 30 \
  --scale-rule-name sb-rule --scale-rule-type azure-servicebus \
  --scale-rule-metadata "queueName=orders" "namespace=my-sb" "messageCount=5" \
  --scale-rule-identity system   # マネージドIDで認証

Capacity planning: work back using the algorithm

KEDA's scaling algorithm is published and directly connects to capacity planning.

Behavior	Value
Polling interval	30 seconds (not applied to HTTP/TCP)
Cooldown period	300 seconds (after the last event, until dropping to min replicas)
Scale-up stabilization window	0 seconds
Scale-down stabilization window	300 seconds
Scale-up steps	1, 4, 8, 16, 32, ... (up to the max)
Scale-down step	100% of the replicas to drop
Algorithm	`desiredReplicas = ceil(currentMetricValue / targetMetricValue)`

worked example: the count for a queue-driven worker

With a Service Bus queue at messageCount: 5 (the target one replica handles = 5 messages), if the queue length is 50:

desiredReplicas = ceil(50 / 5) = 10 レプリカ

To work back from a throughput target — if "2 seconds to process one message" and "100 messages arrive per second at peak," the required throughput = 100 msg/s. One replica handles only 0.5 msg/s (2 sec/item), so theoretically 200 replicas are needed. Making messageCount smaller increases the count quickly (responsiveness ↑, cost ↑), and making it larger increases it gently (responsiveness ↓, cost ↓). Adjust the SLA-vs-cost trade-off with messageCount.

How to read surges and drops

Surge: increases by doubling, 1 → 4 → 8 → 16 → 32 ... (immediate, since stabilization is 0 seconds).
Drop: scale-down occurs only after satisfying the 300-second stabilization window. It maintains the count in a momentary trough, preventing flapping (oscillation of increase/decrease).
To zero: it drops to 0 after waiting the 300-second cooldown from the last event.

This asymmetry of "up immediately, down cautiously" is KEDA's design that protects availability while keeping cost down. It's the same philosophy as AWS Application Auto Scaling's scale-in protection.

Event-driven scaling: porting KEDA scalers

You can port any KEDA scaler to an ACA scale rule. Just copy KEDA's ScaledObject spec type and metadata to ACA's custom.type and custom.metadata.

scale: {
  minReplicas: 0
  maxReplicas: 50
  rules: [
    {
      name: 'eventhub-rule'
      custom: {
        type: 'azure-eventhub'         // KEDAスケーラーのtype
        metadata: {
          eventHubName: 'telemetry'
          consumerGroup: '$Default'
          unprocessedEventThreshold: '64'   // 1レプリカあたりの目標未処理イベント数
        }
        identity: 'system'             // マネージドIDで認証
      }
    }
  ]
}

Examples of supported event sources: Azure Service Bus / Event Hubs / Storage Queue / Apache Kafka / Redis and many more.

Important: Set the properties.configuration.activeRevisionsMode property of the container app to single when using non-HTTP event scale rules. — for non-HTTP event rules, set activeRevisionsMode to single.

Authentication: managed identity over secrets

A KEDA scaler needs authentication credentials. ACA supports two methods, but preferring managed identity is the official recommendation.

Where possible, use managed identity authentication to avoid storing secrets within the app. (— scale-app)

Managed identity (recommended)

Just specify the identity property on the scale rule. You store the connection string nowhere.

rules: [
  {
    name: 'azure-queue'
    custom: {
      type: 'azure-queue'
      metadata: {
        accountName: 'mystorage'
        queueName: 'jobs'
        queueLength: '1'
      }
      identity: 'system'   // または ユーザー割当IDのリソースID
    }
  }
]

Then grant that managed identity the target resource's data-permission role (e.g., Azure Service Bus Data Receiver, Storage Queue Data Reader) and you're done. Managing/rotating connection strings becomes unnecessary, and the leakage surface disappears.

Secrets (when unavoidable)

Place the connection string in the app's secrets and reference it in the scale rule's auth array. Combined with a Key Vault reference, you can avoid hardcoding the value (details in secret management).

Scaling jobs: ScaledJob

Event-driven jobs use the same KEDA scaler, but the usage differs.

In an app, each replica continuously processes events, and a scaling rule determines the number of replicas to run to meet demand. In event-driven jobs, each job execution typically processes a single event, and a scaling rule determines the number of job executions to run. (— Jobs)

App: one replica continuously processes events. The scale rule decides the number of replicas.
Job: one execution processes one event and finishes. The scale rule decides the number of executions.

The distinction is: if "a new instance of a dedicated resource is needed per event" or "processing takes a long time," a job; if "stay resident and keep flowing," an app.

Scaling pitfalls and known limitations

The known limitations the official docs list (scale-app):

No vertical scaling (count only).
The replica count is a target, not a guarantee (Replica quantities are a target amount, not a guarantee.).
When managing state with Dapr actors, scale-to-zero is unsupported (because the in-memory representation is tied to an ID).
In multiple-revision mode, adding a new scale trigger creates a new revision. The old revision remains with the old rules.

Diagnosing when scaling doesn't work

If you feel it "doesn't scale," check the system log for Error fetching scaler metrics. This is a sign that the scaling signal source (DB, Event Hub, another app) can't be reached. Check the VNet, DNS, firewall, and permissions (troubleshooting guide).

Production design: scaling × idempotency

The key to making autoscaling production-quality is to design scaling and correctness separately.

Scaling is the platform: KEDA decides the count. You just measure and set "the target per machine" — messageCount or HTTP concurrency.
Correctness is the code's structure: make workers idempotent so they don't break even if a message is redelivered on scale-in (exit within 30 seconds on SIGTERM) or a retry. Processing the same message twice doesn't double-charge — absorb duplicates with an idempotency key.

Enforce this division thoroughly and even with scale-to-zero, neither drops nor double-executions occur. This is the core of the design that achieved 0 double charges in production on the payment platform.

Conclusion

ACA's autoscaling is borne by KEDA, scaling horizontally on HTTP, TCP, CPU/memory, and many event sources. There are four key points to make it work in production —

Avoid the two scale-to-zero traps (CPU/memory can't be 0, the self-destruct of an ingress-disabled worker).
Work back capacity with the algorithm (messageCount is the SLA-vs-cost adjustment knob).
Prefer managed identity for authentication and erase the connection string from the app.
Scaling × idempotency — scaling is KEDA, correctness is the code's structure.

From autoscaling design and tuning to making queue-driven workers idempotent, for consultation go to contact. For production operation overall, see the Azure Container Apps production-operations guide.

The complete Azure Container Apps autoscaling guide: scale-to-zero and event-driven with KEDA (HTTP, queue, CPU)

What KEDA is: the brain of ACA's scaling

Three kinds of scale rules: using them differently

HTTP scaling (the most common)

Scale-to-zero: the biggest attraction and two traps

Trap ①: CPU/memory scaling can't go to zero

Trap ②: the self-destruct of an ingress-disabled worker

Capacity planning: work back using the algorithm

worked example: the count for a queue-driven worker

How to read surges and drops

Event-driven scaling: porting KEDA scalers

Authentication: managed identity over secrets

Managed identity (recommended)

Secrets (when unavoidable)

Scaling jobs: ScaledJob

Scaling pitfalls and known limitations

Diagnosing when scaling doesn't work

Production design: scaling × idempotency

Conclusion

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Azure Container Apps CI/CD guide: deploy safely and automatically with GitHub Actions, OIDC keyless, Bicep, and Blue/Green revisions

Azure Container Apps Jobs implementation guide: production design for batch, schedule (cron), and event-driven

Azure Container Apps network-design guide: VNet integration, internal environment, Private Endpoint, WAF, and egress lockdown

Also worth reading

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

What KEDA is: the brain of ACA's scaling

Three kinds of scale rules: using them differently

HTTP scaling (the most common)

Scale-to-zero: the biggest attraction and two traps

Trap ①: CPU/memory scaling can't go to zero

Trap ②: the self-destruct of an ingress-disabled worker

Capacity planning: work back using the algorithm

worked example: the count for a queue-driven worker

How to read surges and drops

Event-driven scaling: porting KEDA scalers

Authentication: managed identity over secrets

Managed identity (recommended)

Secrets (when unavoidable)

Scaling jobs: ScaledJob

Scaling pitfalls and known limitations

Diagnosing when scaling doesn't work

Production design: scaling × idempotency

Conclusion

Related articles

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Azure Container Apps CI/CD guide: deploy safely and automatically with GitHub Actions, OIDC keyless, Bicep, and Blue/Green revisions

Azure Container Apps Jobs implementation guide: production design for batch, schedule (cron), and event-driven

Azure Container Apps network-design guide: VNet integration, internal environment, Private Endpoint, WAF, and egress lockdown

Also worth reading

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization