Skip to main content
友田 陽大
Azure Container Apps in production
Azure
Container Apps
ジョブ
バッチ
KEDA
信頼性
冪等性
アーキテクチャ設計

Azure Container Apps Jobs implementation guide: production design for batch, schedule (cron), and event-driven

An implementation guide to designing Azure Container Apps Jobs at production quality. With az CLI/ARM faithful to the official Microsoft Learn docs, it explains the three triggers Manual/Schedule/Event, cron expressions (UTC), the replicaTimeout/retry/parallelism/completion settings, KEDA event-driven jobs, self-hosted CI Runners, idempotent design, and monitoring execution history.

Published
Reading time
8 min read
Author
友田 陽大
Share

"I want to generate a report overnight," "I want to handle processing piled up in a queue," "I want to run my own CI Runner" — processing that runs and finishes rather than a resident service is Azure Container Apps Jobs' domain. Though handled on the same platform as resident apps, a design mistake causes double execution, dropped work, and infinite retries.

This article explains the production design of Container Apps Jobs, faithful to the Microsoft Learn Jobs documentation. I've run SQS-driven idempotent batches in production on AWS. The principle "make retries the normal case = build idempotently" is the same on Azure. For ACA as a whole, see the Azure Container Apps production-operations guide.


What a job is: a task that runs and finishes

Azure Container Apps jobs enable you to run containerized tasks that run for a finite duration and then stop. (— Jobs in Azure Container Apps)

Apps (resident services) and jobs run in the same environment and share network and logs. The difference is "whether it finishes."

AppJob
NatureRuns continuouslyRuns for a finite duration and stops
On failureAuto-restarts if the container dropsA non-zero exit is a failure. Retry configurable
ExampleHTTP API, web app, resident workerNightly batch, data migration, processing one queue item

App vs job (official examples)

What you want to doChoose
An HTTP server returning web content/APIApp (HTTP scale rule)
Generate a report every nightJob (Schedule trigger + cron)
Continuously process a Service Bus queueApp (custom scale rule)
Process one item / a small batch from a queue and stopJob (Event trigger)
Background processing that starts on demand and finishesJob (Manual trigger)
A self-hosted CI RunnerJob (Event trigger)

The three triggers

A job's trigger type determines how the job is started. (— Jobs)

  • Manual: on demand (CLI, portal, ARM API).
  • Schedule: periodic with a cron expression.
  • Event: triggered by an event via a KEDA scaler.

Manual: on-demand execution

For one-off processing like data migration. Create it and start it when needed.

az containerapp job create \
  --name migrate-job --resource-group my-rg --environment my-env \
  --trigger-type "Manual" \
  --replica-timeout 1800 --replica-retry-limit 0 \
  --replica-completion-count 1 --parallelism 1 \
  --image myregistry.azurecr.io/migrate:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

# 起動(設定の上書きも可能)
az containerapp job start --name migrate-job --resource-group my-rg

You can override settings at start time (run the same job on different input by changing env vars or the start command). But When you override a configuration, the job's entire template configuration is replacedthe entire template configuration is replaced, so include all needed settings.

Schedule: cron (UTC)

Like report generation every midnight.

az containerapp job create \
  --name nightly-report --resource-group my-rg --environment my-env \
  --trigger-type "Schedule" --cron-expression "0 0 * * *" \
  --replica-timeout 1800 --replica-retry-limit 1 \
  --replica-completion-count 1 --parallelism 1 \
  --image myregistry.azurecr.io/report:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

Cron-expression examples (standard 5 fields):

ExpressionMeaning
*/5 * * * *Every 5 minutes
0 */2 * * *Every 2 hours
0 0 * * *Daily at midnight
0 0 * * 0Every Sunday at midnight
0 0 1 * *The 1st of every month at midnight

Important: Cron expressions in scheduled jobs are evaluated in Coordinated Universal Time (UTC).cron is evaluated in UTC. For "every day at 2am (JST)," write 0 17 * * * in UTC (17:00 UTC the previous day). Timezone slippage is a classic incident, so be careful.

Event: KEDA event-driven

Start when a message arrives in a queue. 1 event = 1 execution is the basis.

az containerapp job create \
  --name queue-job --resource-group my-rg --environment my-env \
  --trigger-type "Event" \
  --replica-timeout 1800 \
  --image myregistry.azurecr.io/queue-job:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi" \
  --min-executions 0 --max-executions 10 \
  --scale-rule-name "queue" --scale-rule-type "azure-queue" \
  --scale-rule-metadata "accountName=mystorage" "queueName=myqueue" "queueLength=1" \
  --scale-rule-auth "connection=connection-string-secret" \
  --secrets "connection-string-secret=<QUEUE_CONNECTION_STRING>"

The difference in KEDA use between apps and jobs is detailed in the scaling guide, but the gist is this — apps decide the "replica count" and jobs decide the "execution count" with a scale rule. If "each event needs a new instance of dedicated resources / long-running processing," a job fits.


Job settings: the four leads

SettingPropertyMeaning
Max wait for completionreplicaTimeoutMax seconds to wait for replica completion. Cut off if exceeded
Retry limitreplicaRetryLimitNumber of retries for a failed replica. 0 = no retry
ParallelismparallelismReplicas per execution (often 1)
Completion countreplicaCompletionCountCompleted replicas needed to count as success (≤ parallelism)

The replicaTimeout setting takes precedence if it expires before all retries occur.timeout takes precedence over retries. Even with "3 retries," if the timeout comes first it's cut off. Take a timeout sufficiently longer than the expected processing time.

Parallel batch (split processing)

To split a large amount of data and process it in parallel, raise parallelism and replicaCompletionCount.

az containerapp job create \
  --name batch-job --resource-group my-rg --environment my-env \
  --trigger-type "Schedule" --cron-expression "0 0 * * *" \
  --replica-timeout 1800 --replica-retry-limit 3 \
  --parallelism 5 --replica-completion-count 5 \
  --image myregistry.azurecr.io/batch:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

Each replica processes its assigned range (allocated via env vars or a queue), and the execution succeeds when all 5 succeed. If large-scale parallelism is needed, you can use it like an AWS Batch equivalent.


Idempotency: make retries the normal case

A job presupposes retries. It retries via replicaRetryLimit, and for event-driven, the same message can be redelivered. So —

⚠️ Build the job body to be idempotent. Even processing the same input (message, date, ID) twice, the result counts as one.

If payments or billing are involved, record "whether processed" with an idempotency key (message ID, order ID) and skip the second time. This is the core of the design that achieved 0 double charges in production in a payments platform, and the design of idempotent async processing applies as-is. "Doesn't break on retry" = "you can treat retries as the normal case" = you can confidently enable auto-retry in production.


CI Runner: the staple of event-driven jobs

A powerful use of event-driven jobs is a self-hosted CI Runner.

A self-hosted GitHub Actions runner or Azure Pipelines agent that runs when a new job is queued in a workflow or pipeline. (— Jobs)

When a job is queued in a workflow, KEDA detects it, starts one execution of the Runner container, and it disappears when done. Scaling only when needed without holding a resident Runner — a configuration excellent in both cost efficiency and security (disposable).


Job constraints and monitoring

Constraints: no Ingress, no Dapr

The following features aren't supported: Dapr; Ingress and related features such as custom domains and SSL certificates. (— Jobs)

A job has no Ingress (can't be hit externally) and can't use Dapr. Put HTTP-receiving processing in an app, and lean to an app if you need Dapr service invocation. Note that when a job calls another app at startup, sidecar containers (such as the Envoy proxy) are guaranteed to be ready before the main job container begins executionthe Envoy sidecar is guaranteed ready before the main container starts, so you don't need to add connection-failure retries for app-to-app calls at startup.

Monitoring: execution history and logs

# 直近の実行ステータス
az containerapp job execution list --name my-job --resource-group my-rg

The execution history for scheduled and event-based jobs is limited to the most recent 100 successful and failed job executions.the execution history is the most recent 100. For auditing or detailed output beyond that, query the environment's Log Analytics (observability design). If long-term retention/alerts are needed, build "notify when failed executions exceed a threshold" with Log Analytics + Azure Monitor alerts.


Design checklist

  • Correctly split "runs and finishes" processing into a Job and "resident" into an App.
  • Choose the trigger: one-off = Manual, periodic = Schedule (cron is UTC!), event = Event.
  • Build idempotently (no double execution on retry/redelivery). Payments require an idempotency key.
  • replicaTimeout sufficiently longer than the expected processing time. replicaRetryLimit per your retry policy (0 disables it).
  • The image tag is unique by commit SHA (no latest).
  • Design on the premise that Ingress/Dapr can't be used. HTTP receiving and Dapr to apps.
  • Execution history is 100. Auditing/alerts with Log Analytics + Azure Monitor.

Summary

Container Apps Jobs is a feature that handles finite-duration tasks on the same platform as resident apps. The triggers are three: Manual / Schedule (cron, UTC) / Event (KEDA). The keys to production quality are — correct trigger selection, UTC-aware cron, and above all idempotent design (making retries the normal case). You can safely run batch, periodic processing, queue-driven, and CI Runners with the same vocabulary.

For designing and making idempotent batch, periodic, and event-driven jobs, contact me. For production operations as a whole, see the Azure Container Apps production-operations guide.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading