Azure Container Apps Jobs implementation guide: production design for batch, schedule (cron), and event-driven

"I want to generate a report overnight," "I want to handle processing piled up in a queue," "I want to run my own CI Runner" — processing that runs and finishes rather than a resident service is Azure Container Apps Jobs' domain. Though handled on the same platform as resident apps, a design mistake causes double execution, dropped work, and infinite retries.

This article explains the production design of Container Apps Jobs, faithful to the Microsoft Learn Jobs documentation. I've run SQS-driven idempotent batches in production on AWS. The principle "make retries the normal case = build idempotently" is the same on Azure. For ACA as a whole, see the Azure Container Apps production-operations guide.

What a job is: a task that runs and finishes

Azure Container Apps jobs enable you to run containerized tasks that run for a finite duration and then stop. (— Jobs in Azure Container Apps)

Apps (resident services) and jobs run in the same environment and share network and logs. The difference is "whether it finishes."

	App	Job
Nature	Runs continuously	Runs for a finite duration and stops
On failure	Auto-restarts if the container drops	A non-zero exit is a failure. Retry configurable
Example	HTTP API, web app, resident worker	Nightly batch, data migration, processing one queue item

App vs job (official examples)

What you want to do	Choose
An HTTP server returning web content/API	App (HTTP scale rule)
Generate a report every night	Job (Schedule trigger + cron)
Continuously process a Service Bus queue	App (custom scale rule)
Process one item / a small batch from a queue and stop	Job (Event trigger)
Background processing that starts on demand and finishes	Job (Manual trigger)
A self-hosted CI Runner	Job (Event trigger)

The three triggers

A job's trigger type determines how the job is started. (— Jobs)

Manual: on demand (CLI, portal, ARM API).
Schedule: periodic with a cron expression.
Event: triggered by an event via a KEDA scaler.

Manual: on-demand execution

For one-off processing like data migration. Create it and start it when needed.

az containerapp job create \
  --name migrate-job --resource-group my-rg --environment my-env \
  --trigger-type "Manual" \
  --replica-timeout 1800 --replica-retry-limit 0 \
  --replica-completion-count 1 --parallelism 1 \
  --image myregistry.azurecr.io/migrate:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

# 起動（設定の上書きも可能）
az containerapp job start --name migrate-job --resource-group my-rg

You can override settings at start time (run the same job on different input by changing env vars or the start command). But When you override a configuration, the job's entire template configuration is replaced — the entire template configuration is replaced, so include all needed settings.

Schedule: cron (UTC)

Like report generation every midnight.

az containerapp job create \
  --name nightly-report --resource-group my-rg --environment my-env \
  --trigger-type "Schedule" --cron-expression "0 0 * * *" \
  --replica-timeout 1800 --replica-retry-limit 1 \
  --replica-completion-count 1 --parallelism 1 \
  --image myregistry.azurecr.io/report:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

Cron-expression examples (standard 5 fields):

Expression	Meaning
`/5 * * *`	Every 5 minutes
`0 /2 * *`	Every 2 hours
`0 0 * * *`	Daily at midnight
`0 0 * * 0`	Every Sunday at midnight
`0 0 1 * *`	The 1st of every month at midnight

Important: Cron expressions in scheduled jobs are evaluated in Coordinated Universal Time (UTC). — cron is evaluated in UTC. For "every day at 2am (JST)," write 0 17 * * * in UTC (17:00 UTC the previous day). Timezone slippage is a classic incident, so be careful.

Event: KEDA event-driven

Start when a message arrives in a queue. 1 event = 1 execution is the basis.

az containerapp job create \
  --name queue-job --resource-group my-rg --environment my-env \
  --trigger-type "Event" \
  --replica-timeout 1800 \
  --image myregistry.azurecr.io/queue-job:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi" \
  --min-executions 0 --max-executions 10 \
  --scale-rule-name "queue" --scale-rule-type "azure-queue" \
  --scale-rule-metadata "accountName=mystorage" "queueName=myqueue" "queueLength=1" \
  --scale-rule-auth "connection=connection-string-secret" \
  --secrets "connection-string-secret=<QUEUE_CONNECTION_STRING>"

The difference in KEDA use between apps and jobs is detailed in the scaling guide, but the gist is this — apps decide the "replica count" and jobs decide the "execution count" with a scale rule. If "each event needs a new instance of dedicated resources / long-running processing," a job fits.

Job settings: the four leads

Setting	Property	Meaning
Max wait for completion	`replicaTimeout`	Max seconds to wait for replica completion. Cut off if exceeded
Retry limit	`replicaRetryLimit`	Number of retries for a failed replica. `0` = no retry
Parallelism	`parallelism`	Replicas per execution (often `1`)
Completion count	`replicaCompletionCount`	Completed replicas needed to count as success (≤ parallelism)

The replicaTimeout setting takes precedence if it expires before all retries occur. — timeout takes precedence over retries. Even with "3 retries," if the timeout comes first it's cut off. Take a timeout sufficiently longer than the expected processing time.

Parallel batch (split processing)

To split a large amount of data and process it in parallel, raise parallelism and replicaCompletionCount.

az containerapp job create \
  --name batch-job --resource-group my-rg --environment my-env \
  --trigger-type "Schedule" --cron-expression "0 0 * * *" \
  --replica-timeout 1800 --replica-retry-limit 3 \
  --parallelism 5 --replica-completion-count 5 \
  --image myregistry.azurecr.io/batch:2026-06-26-a1b2c3d \
  --cpu "0.5" --memory "1.0Gi"

Each replica processes its assigned range (allocated via env vars or a queue), and the execution succeeds when all 5 succeed. If large-scale parallelism is needed, you can use it like an AWS Batch equivalent.

Idempotency: make retries the normal case

A job presupposes retries. It retries via replicaRetryLimit, and for event-driven, the same message can be redelivered. So —

⚠️ Build the job body to be idempotent. Even processing the same input (message, date, ID) twice, the result counts as one.

If payments or billing are involved, record "whether processed" with an idempotency key (message ID, order ID) and skip the second time. This is the core of the design that achieved 0 double charges in production in a payments platform, and the design of idempotent async processing applies as-is. "Doesn't break on retry" = "you can treat retries as the normal case" = you can confidently enable auto-retry in production.

CI Runner: the staple of event-driven jobs

A powerful use of event-driven jobs is a self-hosted CI Runner.

A self-hosted GitHub Actions runner or Azure Pipelines agent that runs when a new job is queued in a workflow or pipeline. (— Jobs)

When a job is queued in a workflow, KEDA detects it, starts one execution of the Runner container, and it disappears when done. Scaling only when needed without holding a resident Runner — a configuration excellent in both cost efficiency and security (disposable).

Job constraints and monitoring

Constraints: no Ingress, no Dapr

The following features aren't supported: Dapr; Ingress and related features such as custom domains and SSL certificates. (— Jobs)

A job has no Ingress (can't be hit externally) and can't use Dapr. Put HTTP-receiving processing in an app, and lean to an app if you need Dapr service invocation. Note that when a job calls another app at startup, sidecar containers (such as the Envoy proxy) are guaranteed to be ready before the main job container begins execution — the Envoy sidecar is guaranteed ready before the main container starts, so you don't need to add connection-failure retries for app-to-app calls at startup.

Monitoring: execution history and logs

# 直近の実行ステータス
az containerapp job execution list --name my-job --resource-group my-rg

The execution history for scheduled and event-based jobs is limited to the most recent 100 successful and failed job executions. — the execution history is the most recent 100. For auditing or detailed output beyond that, query the environment's Log Analytics (observability design). If long-term retention/alerts are needed, build "notify when failed executions exceed a threshold" with Log Analytics + Azure Monitor alerts.

Design checklist

Correctly split "runs and finishes" processing into a Job and "resident" into an App.
Choose the trigger: one-off = Manual, periodic = Schedule (cron is UTC!), event = Event.
Build idempotently (no double execution on retry/redelivery). Payments require an idempotency key.
replicaTimeout sufficiently longer than the expected processing time. replicaRetryLimit per your retry policy (0 disables it).
The image tag is unique by commit SHA (no latest).
Design on the premise that Ingress/Dapr can't be used. HTTP receiving and Dapr to apps.
Execution history is 100. Auditing/alerts with Log Analytics + Azure Monitor.

Summary

Container Apps Jobs is a feature that handles finite-duration tasks on the same platform as resident apps. The triggers are three: Manual / Schedule (cron, UTC) / Event (KEDA). The keys to production quality are — correct trigger selection, UTC-aware cron, and above all idempotent design (making retries the normal case). You can safely run batch, periodic processing, queue-driven, and CI Runners with the same vocabulary.

For designing and making idempotent batch, periodic, and event-driven jobs, contact me. For production operations as a whole, see the Azure Container Apps production-operations guide.

Azure Container Apps Jobs implementation guide: production design for batch, schedule (cron), and event-driven

What a job is: a task that runs and finishes

App vs job (official examples)

The three triggers

Manual: on-demand execution

Schedule: cron (UTC)

Event: KEDA event-driven

Job settings: the four leads

Parallel batch (split processing)

Idempotency: make retries the normal case

CI Runner: the staple of event-driven jobs

Job constraints and monitoring

Constraints: no Ingress, no Dapr

Monitoring: execution history and logs

Design checklist

Summary

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Azure Container Apps CI/CD guide: deploy safely and automatically with GitHub Actions, OIDC keyless, Bicep, and Blue/Green revisions

The complete Azure Container Apps autoscaling guide: scale-to-zero and event-driven with KEDA (HTTP, queue, CPU)

Azure Container Apps network-design guide: VNet integration, internal environment, Private Endpoint, WAF, and egress lockdown

Also worth reading

Cloud Run Jobs and Cloud Workflows: designing long-running batch and parallel processing to be idempotent and resumable

AWS Lambda production-operation guide: firm up the execution model, idempotency, observability, security, and cost with the official spec

Making marshmallow Production-Quality: Performance Optimization, Testing, and Error Design

What a job is: a task that runs and finishes

App vs job (official examples)

The three triggers

Manual: on-demand execution

Schedule: cron (UTC)

Event: KEDA event-driven

Job settings: the four leads

Parallel batch (split processing)

Idempotency: make retries the normal case

CI Runner: the staple of event-driven jobs

Job constraints and monitoring

Constraints: no Ingress, no Dapr

Monitoring: execution history and logs

Design checklist

Summary

Related articles

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Azure Container Apps CI/CD guide: deploy safely and automatically with GitHub Actions, OIDC keyless, Bicep, and Blue/Green revisions

The complete Azure Container Apps autoscaling guide: scale-to-zero and event-driven with KEDA (HTTP, queue, CPU)

Azure Container Apps network-design guide: VNet integration, internal environment, Private Endpoint, WAF, and egress lockdown

Also worth reading

Cloud Run Jobs and Cloud Workflows: designing long-running batch and parallel processing to be idempotent and resumable

AWS Lambda production-operation guide: firm up the execution model, idempotency, observability, security, and cost with the official spec

Making marshmallow Production-Quality: Performance Optimization, Testing, and Error Design