"Process a video," "convert tens of thousands in a batch," "bulk-classify with an LLM" — holding such processing in synchronous HTTP always breaks down. It exceeds the request timeout (Cloud Run is max 60 minutes), processing can't be stopped even when the client disconnects, and retries cause multiple execution. The right answer is to "separate reception (Service) and execution (Job/Workflow)."
On the broadcaster platform, I built a telop-typo-detection pipeline — heavy processing that extracts subtitles from video and matches OCR (image) and speech recognition (utterance) to detect typos — in exactly this form. The HTTP API just starts the job and returns a reception ID, and the heavy processing runs in parallel with Cloud Run Jobs + Cloud Workflows. Parallelizing OCR and speech recognition achieved sequential 18 min → parallel 13 min (about 30% shorter), and I designed the long-running job to be idempotent and resumable by numbering segment IDs deterministically.
This article reproduces that design faithfully to the official documentation. For the production operation of Services themselves, see the Cloud Run production-operations guide.
First, classify: Services / Jobs / Workflows / Worker Pools
Just putting the processing in the right box decides 80% of the design.
| Box | Nature | Where to use |
|---|---|---|
| Services | Handle synchronous HTTP | API, web app, webhook |
| Jobs | Run and stop when finished | Batch, DB migration, bulk conversion, long-running processing |
| Workflows | Orchestration of multiple steps | Bundle jobs/services/APIs by order, parallel, condition |
| Worker Pools | Resident background | Pub/Sub pull, Kafka consumer |
"Heavy → Job," "bundling multiple → Workflow" — do this primary classification first.
Cloud Run Jobs: handle in parallel by task splitting
A job can split one job into up to 10,000 tasks and run them in parallel. The main flags are —
| Flag | Meaning | Default/limit |
|---|---|---|
--tasks | Number of tasks (how many splits of the work) | Default 1, max 10,000 |
--parallelism | Number of tasks to run simultaneously | Default is as parallel as possible |
--max-retries | Retry count for a failed task | Default 3, max 10 |
--task-timeout | Max execution time of one task | Default 10 min, max 168 hours (7 days) |
Each task knows "which number it is in the whole" via environment variables.
CLOUD_RUN_TASK_INDEX: this task's number (0totask count - 1)CLOUD_RUN_TASK_COUNT: the total task count of the whole jobCLOUD_RUN_TASK_ATTEMPT: this task's attempt count (increases on retry)
A task succeeds with exit code 0. On failure it's auto-retried up to --max-retries, and when a task exhausts its retries, the entire job execution fails.
Sharding: decide the assigned range with INDEX
The typical pattern of "process 10,000 items in parallel with 100 tasks." Each task computes its assigned slice deterministically from INDEX.
import os
task_index = int(os.environ["CLOUD_RUN_TASK_INDEX"]) # 0..count-1
task_count = int(os.environ["CLOUD_RUN_TASK_COUNT"]) # 例: 100
items = load_all_item_ids() # 全件(外部ストアから取得)
# 自分の担当だけを処理(重複なく・漏れなく全体をカバー)
my_items = items[task_index::task_count]
for item_id in my_items:
process_one(item_id) # ← この1件が冪等であることが命(後述)
# 100タスク・最大20並列・各タスク最大1時間でデプロイして実行
gcloud run jobs deploy batch-convert \
--image asia-northeast1-docker.pkg.dev/PROJECT_ID/app/batch:${GIT_SHA} \
--region asia-northeast1 \
--tasks 100 --parallelism 20 \
--max-retries 3 --task-timeout 3600s \
--service-account batch@PROJECT_ID.iam.gserviceaccount.com
gcloud run jobs execute batch-convert --region asia-northeast1 --wait
Idempotency and resumability: the heart of production quality
Design a job on the premise that it can always be retried (task failure, SIGTERM, re-execution). So unless it's "the result doesn't change even if the same processing runs twice (idempotent)," data breaks. This is the same principle I enforced on the payment platform with 0 double charges in production.
Two standard ways to make it idempotent:
- Deterministic ID numbering: globally uniquify the task-local ID using
INDEX. The ID is stable across parallel execution and re-execution.
# セグメント単位に「グローバルID = INDEX × stride + local_id」で決定的に採番。
# 並列処理しても後段マージでIDが衝突せず、再実行でも同じIDに収束する。
STRIDE = 100_000
def global_id(local_id: int) -> int:
return task_index * STRIDE + local_id
- Make writes idempotent: make the output a form that can be "overwritten/ignored if it exists." For object storage, overwrite to a deterministic object name; for a DB, UPSERT + a unique constraint.
# 出力先を決定的なパスにする → 再実行は同じ場所を上書きするだけ(重複生成しない)
output_path = f"results/{global_id(local_id)}.json"
storage_client.bucket("outputs").blob(output_path).upload_from_string(payload)
In my telop-typo detection, I numbered, per segment,
global telop ID = segment_index × stride + local_iddeterministically, so the final result converges uniquely under any of parallel processing, partial re-execution, and retries. Not "carefully controlling retries" but structurally guaranteeing "it doesn't break even when retried" — this is the essence of resilience. For the general theory of idempotent async processing, SQS/Lambda idempotent processing is also a reference.
Triggering: scheduled execution and event-driven
Cloud Scheduler (cron)
Trigger periodic batches from Cloud Scheduler. Since running a job hits the Cloud Run Admin API's jobs:run, use OAuth auth (--oauth-service-account-email) (separate from the OIDC for service calls).
# 毎日午前2時にジョブを起動。SAには roles/run.invoker が必要。
gcloud scheduler jobs create http nightly-batch \
--location asia-northeast1 \
--schedule="0 2 * * *" \
--uri="https://run.googleapis.com/v2/projects/PROJECT_ID/locations/asia-northeast1/jobs/batch-convert:run" \
--http-method POST \
--oauth-service-account-email scheduler@PROJECT_ID.iam.gserviceaccount.com
Eventarc (event-driven)
"Process when a file is uploaded" goes to Eventarc. It receives events like GCS finalization to start a service/job.
# GCSへのアップロード(finalized)を受けてスキャナサービスを起動
gcloud eventarc triggers create scan-on-upload \
--location asia-northeast1 \
--destination-run-service malware-scanner \
--event-filters "type=google.cloud.storage.object.v1.finalized" \
--event-filters "bucket=uploads-raw" \
--service-account eventarc@PROJECT_ID.iam.gserviceaccount.com
My malware scanner received GCS uploads with Eventarc, passed them to ClamAV (Cloud Run), and stream-inspected up to 10 GiB without buffering, sorting into clean/quarantine buckets. It's idempotent to retries via the atomicity of
File.move, and safely ignores zero-length, in-progress, and deleted files. A typical case of event-driven × idempotent.
Cloud Workflows: bundle multiple steps
"After job A, run B in parallel, aggregate the results and call C" — Cloud Workflows handles such orchestration. Declare steps in YAML/JSON, billed per step execution (free when idle). It supports HTTP calls and Google Cloud connectors (including Cloud Run), and can hold state and wait for up to 1 year.
# workflow.yaml — OCRと音声認識を並列実行し、結果を照合する(テロップ誤字検出の骨子)
main:
params: [input]
steps:
- extract_subtitles:
call: http.post
args:
url: ${"https://api-xxxxx.a.run.app/extract"}
auth: { type: OIDC } # 認証付きCloud Runサービスを安全に呼ぶ
body: { video: ${input.video} }
result: frames
# ★ OCRと音声認識は相互依存しない → 並列実行で時間短縮
- analyze_in_parallel:
parallel:
shared: [ocr_result, asr_result]
branches:
- ocr_branch:
steps:
- run_ocr:
call: http.post
args:
url: ${"https://ocr-xxxxx.a.run.app/run"}
auth: { type: OIDC }
body: { frames: ${frames.body} }
result: ocr_result
- asr_branch:
steps:
- run_asr:
call: http.post
args:
url: ${"https://asr-xxxxx.a.run.app/run"}
auth: { type: OIDC }
body: { video: ${input.video} }
result: asr_result
- reconcile:
call: http.post
args:
url: ${"https://api-xxxxx.a.run.app/reconcile"}
auth: { type: OIDC }
body: { ocr: ${ocr_result.body}, asr: ${asr_result.body} }
result: report
- done:
return: ${report.body}
Retries and error handling
For unstable external calls, layer retries with exponential backoff declaratively.
- call_flaky_service:
try:
call: http.post
args:
url: ${"https://flaky-xxxxx.a.run.app/run"}
auth: { type: OIDC }
result: r
retry:
predicate: ${http.default_retry_predicate} # 5xx等で再試行
max_retries: 5
backoff: { initial_delay: 1, max_delay: 60, multiplier: 2 }
except:
as: e
steps:
- handle:
call: sys.log
args: { text: ${"failed: " + e.message}, severity: ERROR }
Make it fast with parallel (parallel) and hard to break with try/retry/except — these are Workflows' two big values.
When Jobs / when Workflows / when Worker Pools
- Handle a single heavy process in parallel → Jobs (shard with
--tasks). - Bundle multiple processes by order, parallel, condition / wait a long time → Workflows (orchestrator).
- Continuously pull a queue → Worker Pools (Pub/Sub pull, Kafka).
- Many production systems are a combination (my pipeline is "Workflows orchestrates the whole, each heavy process is a Job/Service, the entrance is Eventarc").
Production-rollout checklist
- Separated processing unsuited to HTTP from Services into Jobs/Workflows
- Tasks split the assigned range deterministically with
CLOUD_RUN_TASK_INDEX - Each task's unit processing is idempotent (deterministic ID + UPSERT/overwrite)
- Designed on the premise of retries (
--max-retries/ Workflows'retry) - Set
--task-timeoutto match the processing (max 7 days) - Trigger with Cloud Scheduler (cron, OAuth) or Eventarc (event)
- Multiple steps with Workflows (parallel with
parallel, recover withtry/except) - Progress/results are observable (structured logs, metrics, a progress store if needed)
Conclusion: heavy processing is "separate, make idempotent, and bundle"
Cloud Run production design starts from "putting processing unsuited to HTTP in the right box." Separate heavy processing into Jobs, handle it in parallel by task splitting, make it idempotent and resumable with deterministic IDs, and bundle order, parallelism, and resilience with Workflows. With this, video processing and large-scale batch alike run without stopping, without breaking, and fast.
I could build "completion, resumption, idempotency" into a broadcaster's production workflow because of this decomposition. For the overall design, the Cloud Run production-operations guide; for CI/CD, the Cloud Run CI/CD guide; for cost, the concurrency/billing guide. If you're troubled by the design of a batch/async platform, I accompany you through implementation.