Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

"I want to run containers in production. But I can't spare time for Kubernetes-cluster node management or patching" — when assembling a container foundation on GCP for a startup or small-team development, you almost always arrive here. The answer is Google Cloud Run.

I have actually built an in-house AI platform for a major domestic broadcaster on GCP with Terraform as IaC and handled its production operation (case study). The FastAPI API group, broadcast-quality speech synthesis, an OCR × speech-recognition pipeline for telop-typo detection, a ClamAV malware scanner for uploaded material — I run these on Cloud Run services and jobs, aggregate data in Cloud SQL / Memorystore / Firestore / Cloud Storage, place Cloud Armor at the entrance, make CI/CD keyless with Workload Identity Federation, and keep 1 instance always warm in the production Region while the secondary Region scales to zero for DR — all without dedicated VMs or Kubernetes.

This article aims to be faithful to the Cloud Run official documentation while being clearer than the official docs, and to show "in which scene, how to use it" with real code. From the container contract, resource design, concurrency, scale, deploy, resilience, security, to cost, it covers end-to-end what's needed to ship to production.

Technology selection itself (Cloud Run or GKE or App Engine) is in the GCP container technology-selection guide, and the deep-dive on concurrency, billing, and cost optimization is split into the Cloud Run auto-scale, billing, and cost-optimization guide. This article concentrates on "after choosing Cloud Run, how to build it in production."

What Cloud Run Is: The Official Definition

The official definition is simple.

Cloud Run is a fully managed application platform for running your code, function, or container on top of Google's highly scalable infrastructure.（— What is Cloud Run）

That is, Cloud Run is a serverless foundation for concentrating on just running containers, leaving server configuration, OS patching, orchestration, and scaling all to the platform. Picking up the important features from the official docs —

The deploy unit is always a container image. You can build it yourself, or hand over source code (Go / Node.js / Python / Java / .NET / Ruby, etc.) and buildpacks auto-containerize it.
Any language / binary runs. As the official docs say, "You can deploy code written in any programming language on Cloud Run if you can build a container image from it."
It has 3 resource forms.

Form	Role	Representative use
Services	Receive requests at a stable HTTPS endpoint, auto-scaling with traffic	REST/GraphQL API, web app, webhook receiver
Jobs	Run, finish, and stop. Manual/scheduled start, parallel tasks	Batch, DB migration, long-running bulk processing
Worker Pools	Resident background processing	A Pub/Sub pull subscriber, a Kafka consumer

This article mainly handles Services, and shows "when to use each" for Jobs / Worker Pools in the latter half. In a real project, I operated with the division of HTTP APIs on Services, and heavy long-running processing like telop-typo detection and malware scanning on Jobs.

When to Use It: A Glance Decision Axis (Details to the Technology-Selection Guide)

There are multiple options on GCP for "running containers." The deep comparison I leave to the technology-selection guide, but let me show just the first decision axis.

Service	In one phrase	When to choose
Cloud Run	A serverless container / microservice foundation	Run stateless containers without K8s operation. When in doubt, here.
Cloud Run functions (formerly Cloud Functions)	An event-driven FaaS	A function responding to a single trigger (HTTP/Pub/Sub/Storage, etc.). Runs on the Cloud Run foundation.
GKE / GKE Autopilot	Managed Kubernetes	K8s-specific features like DaemonSet, CRD, Operator, service mesh are needed.
App Engine	A legacy PaaS	An existing asset. New is Cloud Run recommended (described later).
Compute Engine	A VM	Can't be containerized / OS-level control or resident GPU is needed.

The official docs (the App Engine migration guide) state clearly about new development.

For new Google Cloud users, we recommend using Cloud Run as the preferred alternative over App Engine.（— Compare App Engine and Cloud Run）

When in doubt, Cloud Run. This is the default answer in 2026's GCP.

The Container Contract (Runtime Contract): The 5 Promises to Uphold

A container loaded onto Cloud Run must satisfy the container runtime contract. Miss this and "it works locally but won't start in production" happens. Let me organize the most-important promises into 5.

1. Listen on `$PORT`・`0.0.0.0`

The ingress container within an instance must listen for requests on 0.0.0.0 on the port to which requests are sent.（— Container runtime contract）

The port is passed via the environment variable PORT (default 8080). Listen on localhost/127.0.0.1 and it's unreachable from outside, causing a startup failure. Always listen on 0.0.0.0, reading $PORT.

# FastAPI（uvicorn）。PORT を読み、0.0.0.0 で待ち受ける。
import os
import uvicorn
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def root():
    return {"ok": True}

if __name__ == "__main__":
    # 既定 8080。Cloud Run は PORT を注入するので必ず環境変数から読む。
    port = int(os.environ.get("PORT", "8080"))
    uvicorn.run(app, host="0.0.0.0", port=port)

// Node.js（Express）。同じく PORT を読み、0.0.0.0 で待ち受ける。
import express from "express";

const app = express();
app.get("/", (_req, res) => res.json({ ok: true }));

const port = Number(process.env.PORT ?? 8080);
app.listen(port, "0.0.0.0", () => console.log(`listening on ${port}`));

2. Be Stateless

Instances increase, decrease, and are destroyed anytime. Don't persist state (sessions, counters, files being uploaded) to an instance's memory or local disk. Put state externally in Cloud SQL / Memorystore / Firestore / Cloud Storage, etc.

3. The File System Is In-Memory

the in-memory filesystem ... writing too much data can crash the instance.

The writable file system is in-memory, consuming the instance's memory by what you write. Write a large temporary file and it crashes with OOM. Keep temporary data small, or stream it to Cloud Storage (the later malware scanner does "stream-scan without buffering" for exactly this reason).

4. Receive SIGTERM and Clean Up Within 10 Seconds

Before shutting down an instance, Cloud Run sends a SIGTERM signal to all the containers in an instance, indicating the start of a 10 second period before the actual shutdown occurs, at which point Cloud Run sends a SIGKILL signal.（— Container runtime contract）

The instance is dropped on every scale-in, deploy, or revision switch. When you receive SIGTERM, finish completing in-progress requests, closing connections, and flushing buffers within 10 seconds. The detailed code is in the graceful shutdown section.

5. Return a Response Within the Timeout

If the response doesn't complete within the request timeout (default 300 seconds), the client gets a 504. Don't hold long-running processing in synchronous HTTP; decouple it to jobs or workflows.

The First Deploy: From Source or From a Container

The shortest is a source deploy. You don't even need a Dockerfile (buildpacks take care of it).

# ソースから直接デプロイ（buildpacksが自動でコンテナ化 → Artifact Registry → Cloud Run）
gcloud run deploy api \
  --source . \
  --region asia-northeast1 \
  --no-allow-unauthenticated   # まず認証必須で公開（後述）

# 自前ビルドのイメージからデプロイ（本番はこちらを推奨：再現性が高い）
gcloud run deploy api \
  --image asia-northeast1-docker.pkg.dev/PROJECT_ID/repo/api:GIT_SHA \
  --region asia-northeast1 \
  --no-allow-unauthenticated

Make a habit of attaching --no-allow-unauthenticated first. --allow-unauthenticated is "publish to the entire internet without auth." Make in-house tools and inter-service calls auth-required, and explicitly open only what truly needs to be public (no-auth also raises cost with wasted requests).

In production, the standard is "build the image in CI (Cloud Build / GitHub Actions) and deploy a tagged image to Cloud Run." In my project too, I separated responsibilities — Terraform is 'the infrastructure configuration' and Cloud Build is 'the image and the latest env' — to prevent drift. For making the CI side keyless, see the Workload Identity Federation article.

Resource Design: Understand the "Combinations" of CPU and Memory

CPU and memory can be decided independently, but the upper/lower bounds of memory are determined per CPU value (Configure CPU limits).

vCPU	Memory range
0.08	~512 MiB
0.5	~1 GiB
1	~4 GiB
2	~8 GiB
4	2–16 GiB
6	4–24 GiB
8	4–32 GiB

vCPU is 0.08–8. Below 1 is a decimal in 0.001 steps (e.g. 0.25), 1 and above is only the integers 1, 2, 4, 6, 8.
Start small and right-size by metrics is the principle. Take it large from the start and it rides directly into billing.

gcloud run deploy api \
  --image IMAGE_URL --region asia-northeast1 \
  --cpu 1 --memory 512Mi \
  --cpu-boost           # 起動時だけCPUを増やして冷起動を速くする

Startup CPU boost

Attach --cpu-boost and it temporarily increases the CPU only during instance startup (e.g. 2 vCPU-equivalent during startup for 1 vCPU). It's effective for shrinking the cold start of apps with heavy JVM, Node, or Python initialization, and it's a standard setting with a large effect for the additional cost.

Execution Environments: gen1 and gen2

Cloud Run has 2 generations of execution environments (About execution environments).

	gen1	gen2
Foundation	gVisor	microVM
Cold start	Fast	Somewhat slower for some services
Linux compatibility	Emulates many syscalls (some unsupported)	Full Linux compatibility
Network file system	×	○ (NFS, etc.)
CPU/network performance	Standard	Fast
Memory lower bound	Below 512 MiB possible	512 MiB or above

The default is unspecified (the platform auto-selects).
gen1 for cold-start-first, lightweight APIs, gen2 for full Linux compatibility, NFS, VPC egress, CPU-intensive workloads.
Jobs and Worker Pools are always gen2.

gcloud run deploy api --execution-environment gen2 --region asia-northeast1 ...

Concurrency: The Number of Requests One Instance Handles Simultaneously

Cloud Run's most-important parameter is concurrency. It decides "up to how many requests one instance handles simultaneously."

The maximum concurrency ... is 80 (Console) / 80 times the number of vCPUs (CLI/Terraform). The maximum value is 1000.（— About concurrency）

The default is 80 (in gcloud/Terraform, vCPU count × 80 is the upper-bound default). The max is 1000.
The lower the concurrency, the more instances are needed to handle the same load = more cold starts, and cost tends to rise.
The official docs state plainly that "concurrency 1 significantly degrades scale performance (many instances will have to start up to handle a spike)." It gets weak to spikes.
If the app uses a lot of CPU/memory per request, lower the concurrency; if IO-wait is heavy (DB, external API), raise the concurrency to gain density — that's the tuning.

gcloud run deploy api --concurrency 80 --region asia-northeast1 ...

How concurrency moves scale and billing is explained, down to the unit-cost calculation, in the dedicated article concurrency, auto-scale, billing. Here, just remember "concurrency is the central dial of performance and cost."

Auto-Scale: Scale-to-Zero and Minimum/Maximum Instances

Cloud Run shrinks to zero (scale to zero) when no requests come, and increases automatically when they do. The brain of scaling is —

The autoscaler ... targets ... 60% CPU utilization / 60% concurrency utilization by default.（— About instance autoscaling）

Targets 60% utilization by default to adjust the instance count (both CPU utilization and concurrency utilization).
After request processing, it keeps instances for up to 15 minutes (10 minutes for GPU) to reduce cold starts.
With minimum instances (min instances) you keep them warm to erase cold starts. With maximum instances (max instances, default 100) you cap the cost on a runaway.

gcloud run deploy api \
  --min-instances 1 \      # 本番の入口は1台温めて冷起動を消す
  --max-instances 10 \     # コストの安全弁。スパイクでも10台で頭打ち
  --region asia-northeast1 ...

In a real project, I made an asymmetric configuration of keeping the production Region warm with min-instances=1 and the secondary Region for DR with min-instances=0 (scale to zero), curbing normal-time cost while ensuring resilience on failure. You don't need to "keep all Regions warm."

The scale design (cold-start countermeasures, how to decide min/max, the meaning of the 60% target) is deep-dived in the auto-scale article.

Request Timeout: Long-Running Processing Goes to Jobs

Default timeout: 5 minutes (300 seconds). Maximum timeout: 60 minutes (3,600 seconds).（— Request timeout）

Default 300 seconds, max 60 minutes. You can also specify a duration like --timeout 1m20s.
Don't hold processing exceeding this (video processing, large batches, an LLM's long inference) in synchronous HTTP. Even if the client disconnects, the processing can't be stopped, and a retry causes multiple execution too.

The right answer is to separate reception (Service) and execution (Job/Workflow).

# 長時間処理は Cloud Run Jobs に切り出す（HTTPから切り離す）
gcloud run jobs create telop-ocr \
  --image IMAGE_URL --region asia-northeast1 \
  --tasks 10 --parallelism 5 \       # 10タスクを最大5並列で
  --max-retries 3 --task-timeout 3600s
gcloud run jobs execute telop-ocr --region asia-northeast1

My telop-typo-detection pipeline was exactly this form. The HTTP API only "starts a job and immediately returns a reception ID," and the heavy OCR and speech-recognition processing runs in parallel with Cloud Run Jobs + Cloud Workflows. Progress is delivered near-real-time to the UI via Firestore snapshot subscription + SSE, achieving sequential 18 minutes → parallel 13 minutes (about 30% reduction). Always design long-running processing to be "idempotent and resumable" (number segment IDs deterministically so the result converges uniquely even on a re-run).

Health Checks: startup and liveness

Cloud Run has 2 kinds of probes (Configure health checks).

Startup probe: judges startup completion. Doesn't flow traffic until it succeeds. A new service by default is a TCP probe to the container port (timeoutSeconds: 240 / periodSeconds: 240 / failureThreshold: 1).
Liveness probe: continuous monitoring after startup. Failure restarts the container (if it doesn't succeed within failureThreshold × periodSeconds, SIGKILL → start a new instance).

An HTTP probe is 2XX/3XX = success, otherwise failure. Implement a /healthz in the app and lightly return just "am I alive" as basic (do a heavy dependency-target check every time and the probe clogs, causing a chain restart).

from fastapi import FastAPI, Response
app = FastAPI()

@app.get("/healthz")
def liveness():
    # liveness は「自分のプロセスが応答可能か」だけを軽く返す。
    # 依存先（DB/Redis）の不調で再起動ループに入れないため、依存チェックは入れない。
    return {"status": "ok"}

A Terraform configuration example is in the IaC section later. For "a slow-starting app," the right answer is to widen the startup probe's failure_threshold × period_seconds to sufficient grace for startup (because the default TCP probe presupposes almost-immediate success).

Graceful Shutdown: SIGTERM and Idempotency

Per the container contract, it's SIGKILL 10 seconds after SIGTERM. In these 10 seconds, finish "completing in-progress requests," "closing connections," and "flushing buffers."

# FastAPI（uvicorn）。SIGTERM を捕まえて後始末する。
import signal, asyncio, logging
from contextlib import asynccontextmanager
from fastapi import FastAPI

log = logging.getLogger("app")

@asynccontextmanager
async def lifespan(app: FastAPI):
    # 起動時：プールやクライアントを確保
    app.state.pool = await create_pool()
    yield
    # 終了時（SIGTERM経由でlifespanのshutdownが走る）：確実に閉じる
    log.info("draining: closing pool within the 10s grace window")
    await app.state.pool.close()

app = FastAPI(lifespan=lifespan)

// Node.js。SIGTERM で新規受付を止め、処理中を待ってから終了。
const server = app.listen(Number(process.env.PORT ?? 8080), "0.0.0.0");

process.on("SIGTERM", () => {
  console.log("SIGTERM received: draining connections");
  server.close(async () => {
    await pool.end();          // DBプールを閉じる
    process.exit(0);           // 10秒以内に抜ける
  });
});

What's essentially important here is idempotency. Processing that progressed partway on SIGTERM can be retried and multiply-executed on another instance. The same principle I thoroughly applied in the payment foundation with 0 double charges in production — make multiple execution structurally impossible with an idempotency key + unique constraint, so "the same result no matter how many times the same operation comes" — I uphold on Cloud Run too. "Closing carefully in the SIGTERM handler" alone is insufficient; it's safe only once the processing side is idempotent. The design of idempotent async processing is also helped by SQS/Lambda idempotent processing and Transactional Outbox (the cloud differs but the principle is the same).

Revisions and Deploy: Blue/Green, Canary, Instant Rollback

Cloud Run's deploy strategy stands on revisions. A revision is an immutable snapshot of code and config, and you can distribute traffic to each revision per percent.

Deploy Without Flowing Traffic → Verify with a Tagged URL

# 新リビジョンをデプロイするが、トラフィックは流さない。タグ付きURLだけ発行。
gcloud run deploy api \
  --image IMAGE_URL --region asia-northeast1 \
  --no-traffic --tag green
# → https://green---api-xxxxx.a.run.app で、本番トラフィックと隔離して検証できる

Canary → Blue/Green (Staged Switch)

# 新リビジョン（latest）に5%だけ流す（カナリア）。残り95%は現行が捌く。
gcloud run services update-traffic api --region asia-northeast1 \
  --to-revisions LATEST=5

# メトリクスが健全なら段階的に引き上げ、最後に100%へ（Blue/Green切替）
gcloud run services update-traffic api --region asia-northeast1 --to-latest

Instant Rollback

# 問題が出たら、健全な旧リビジョンに100%戻すだけ。再ビルド不要。
gcloud run services update-traffic api --region asia-northeast1 \
  --to-revisions api-00021-abc=100

That rollback is completed by "just returning 100% traffic to an old revision" is Cloud Run's strength. No re-deploy, no rebuilding of the image. Note, as an official caution, the switch isn't instantaneous; in-progress requests complete on the original revision. If session affinity is enabled, beware that affinity affects the routing of returning users too.

Security: Least-Privilege Service Accounts and Secret Management

Assign a Dedicated Service Account Per Service

This is the first thing to set in Cloud Run. Specify nothing and the service runs with the default Compute Engine service account, which in many cases has too-broad Editor permissions.

We strongly recommend that you disable the automatic role grant by enforcing the iam.automaticIamGrantsForDefaultServiceAccounts organization policy constraint.（— Service identity）

The right answer is to make a least-privilege user-managed service account per service and assign it with --service-account.

# このサービス専用のSAを作り、必要な権限だけを付与（最小権限）
gcloud iam service-accounts create api-runtime --display-name "api runtime"

# 例：このSAに Secret Manager の参照権限だけ与える
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member "serviceAccount:api-runtime@PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/secretmanager.secretAccessor"

gcloud run deploy api --region asia-northeast1 \
  --service-account api-runtime@PROJECT_ID.iam.gserviceaccount.com ...

In my broadcaster platform, I assigned a dedicated SA per service, operated Cloud SQL with IAM auth, mandatory TLS (ENCRYPTED_ONLY), and private IP, erasing credentials from both code and network as much as possible.

Secrets from Secret Manager: Environment Variables vs Volume

Don't put secrets (API keys, DB passwords) in the image or in env-var plaintext; inject them from Secret Manager (Configure secrets). There are 2 injection methods, and their meaning differs.

Method	Value resolution	Suited use
Environment variable	Fixed at instance startup. Doesn't change while running	A secret you want to fix the version of (specify a concrete version, not `latest`)
Volume mount	Always fetches the latest version (as a file)	A rotating secret (follows the new value on the next read)

# 環境変数として注入（バージョンを固定）。SAに roles/secretmanager.secretAccessor が必要。
gcloud run deploy api --region asia-northeast1 \
  --set-secrets "DB_PASSWORD=db-password:3"

# ボリュームとしてマウント（常に最新＝ローテーション向き）
gcloud run deploy api --region asia-northeast1 \
  --set-secrets "/etc/secrets/db/password=db-password:latest"

Harden the Entrance: Auth + Cloud Armor

Make inter-service calls and in-house tools auth-required (--no-allow-unauthenticated). Give the calling side's SA roles/run.invoker and call with an ID token.
For a public endpoint, place an external HTTP(S) load balancer + Cloud Armor in front, applying a WAF (OWASP rules), rate limiting, and adaptive DDoS protection. In my platform, I placed Cloud Armor (OWASP CRS 3.3 + adaptive DDoS + rate limiting) at the entrance, in an operation of fully enabling the WAF in stg to crush false positives before production. For the philosophy of defense in depth, see also the WAF defense-in-depth guide.

Networking: Make Direct VPC egress the Default

To go from Cloud Run to a resource within the VPC (Cloud SQL's private IP, Memorystore, an internal API), there are 2 methods. The official docs recommend the newer one (Networking best practices).

Method	Characteristics
Direct VPC egress (recommended, GA)	No connector VM. No idle billing, low latency, high throughput. Needs subnet IP space
Serverless VPC Access connector	The old method. The connector VM's resident cost / operation rides on

# Direct VPC egress：コネクタを介さず直接VPCへ出る
gcloud run deploy api --region asia-northeast1 \
  --network projects/PROJECT_ID/global/networks/my-vpc \
  --subnet projects/PROJECT_ID/regions/asia-northeast1/subnetworks/run-subnet \
  --vpc-egress private-ranges-only

For new, Direct VPC egress without hesitation. Because the connector's resident cost and idle billing vanish, it's advantageous on cost too. The craftsmanship of ingress control, IAM auth, Cloud SQL private-IP connection, and Cloud Armor defense in depth is detailed in the networking and security guide.

Jobs and Worker Pools: Where to Place Processing Unsuited to HTTP

Once you accept "Services handle synchronous HTTP," the production design organizes at once. The craftsmanship of task splitting, idempotency, resumability design, and orchestration with Cloud Workflows is systematized in the dedicated article Cloud Run Jobs and Cloud Workflows guide.

Cloud Run Jobs: processing that runs, finishes, and stops. DB migration, periodic batches, long-running bulk processing. Split parallelism with --tasks/--parallelism, retry with --max-retries. Cron start with Cloud Scheduler, event start with Eventarc.
Worker Pools: resident background processing. A Pub/Sub pull subscriber, a Kafka consumer, etc. — workloads that keep running without receiving HTTP requests.

# 素材のマルウェアスキャンを Eventarc（GCSイベント）で起動する例
gcloud eventarc triggers create scan-on-upload \
  --location asia-northeast1 \
  --destination-run-service malware-scanner \
  --event-filters "type=google.cloud.storage.object.v1.finalized" \
  --event-filters "bucket=uploads-raw" \
  --service-account eventarc-invoker@PROJECT_ID.iam.gserviceaccount.com

My platform's malware scanner received an upload to GCS via Eventarc, passed it to ClamAV (Cloud Run), and stream-scanned up to 10GiB material without buffering, sorting it into clean/quarantine buckets. The atomicity of File.move made it idempotent against retries, and it safely ignored zero-length, uploading, and deleted ones. "Decouple heavy processing from HTTP" and "make it idempotent" — these 2 points are the spine of Cloud Run production operation.

IaC: Build It Declaratively with Terraform

Production is built not with manual gcloud but declaratively with Terraform (google_cloud_run_v2_service). The settings so far (concurrency, scale, timeout, probes, SA, billing mode, execution environment) are consolidated in one place.

resource "google_cloud_run_v2_service" "api" {
  name     = "api"
  location = "asia-northeast1"
  ingress  = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER" # 入口はLB+Cloud Armor経由に限定

  template {
    service_account                  = google_service_account.api_runtime.email
    max_instance_request_concurrency = 80
    timeout                          = "300s"
    execution_environment            = "EXECUTION_ENVIRONMENT_GEN2"

    scaling {
      min_instance_count = 1   # 本番の入口は温める
      max_instance_count = 10  # コストの安全弁
    }

    containers {
      image = "asia-northeast1-docker.pkg.dev/${var.project_id}/repo/api:${var.image_tag}"
      ports { container_port = 8080 }

      resources {
        limits            = { cpu = "1", memory = "512Mi" }
        cpu_idle          = true   # true=リクエスト課金（アイドル時CPU停止）/ false=インスタンス課金
        startup_cpu_boost = true   # 冷起動を速くする
      }

      startup_probe {
        tcp_socket { port = 8080 }
        failure_threshold = 10     # 起動が遅いアプリは余裕を持たせる
        period_seconds    = 5
        timeout_seconds   = 3
      }
      liveness_probe {
        http_get { path = "/healthz" }
        period_seconds = 10
      }

      # 秘密は Secret Manager から注入（バージョン固定）
      env {
        name = "DB_PASSWORD"
        value_source {
          secret_key_ref {
            secret  = google_secret_manager_secret.db_password.secret_id
            version = "3"
          }
        }
      }
    }

    # VPC内リソース（Cloud SQLプライベートIP等）へは Direct VPC egress
    vpc_access {
      network_interfaces {
        subnetwork = google_compute_subnetwork.run.id
      }
      egress = "PRIVATE_RANGES_ONLY"
    }
  }

  # 最新リビジョンに100%（カナリア時はここを複数 traffic ブロックに分割）
  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}

cpu_idle = true corresponds to request billing (stop CPU when idle), and false to instance billing (CPU always secured). This choice greatly affects cost (detailed in the billing article).

Making CI/CD keyless (Workload Identity Federation) is compiled in detail in a separate article: Make GitHub Actions keyless. In my project, I coded the whole of GCP in about 71 Terraform modules and separated stg/prod state for operation.

Observability: Just Emit Structured Logs to Standard Output

Cloud Run ingests standard output / standard error as-is into Cloud Logging. If the app emits structured logs in JSON to stdout/stderr, not files, logs gather without an additional agent.

import json, logging, sys

class JsonFormatter(logging.Formatter):
    def format(self, record):
        # Cloud Logging は severity / trace を解釈する。相関のため trace を載せる。
        return json.dumps({
            "severity": record.levelname,
            "message": record.getMessage(),
            "logging.googleapis.com/trace": getattr(record, "trace", None),
        })

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JsonFormatter())
logging.basicConfig(level=logging.INFO, handlers=[handler])

Metrics (request count, latency, instance count, CPU/memory utilization) come out to Cloud Monitoring automatically. Tie SLOs and alerts here.
Traces are instrumented with OpenTelemetry and sent to Cloud Trace / Cloud Monitoring. My malware scanner too sent scan results to Cloud Monitoring with OpenTelemetry. For the philosophy of observability, see the OpenTelemetry practical guide.

Pre-Production Checklist

Summary: The Crux of Serverless Containers Is Portable

Cloud Run is a serverless foundation for "concentrating on just running containers." The key to production quality is not special magic but upholding the contract — listen on $PORT, close carefully on SIGTERM, hold no state, decouple heavy processing to jobs, control cost with concurrency and scale, and harden with least privilege and secret management.

These are the common crux of serverless containers, unchanged on AWS Fargate or Azure Container Apps. I have run a broadcaster platform in production on GCP・Cloud Run, and a payment foundation and lumber-distribution DX on AWS・Fargate. Even when the cloud changes, the design principles for running containers in production "unbreakable, cheap, and safe" are continuous.

If you're torn on technology selection, continue to the GCP container technology-selection guide; if you want to refine cost, the concurrency, auto-scale, billing guide.

Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

What Cloud Run Is: The Official Definition

When to Use It: A Glance Decision Axis (Details to the Technology-Selection Guide)

The Container Contract (Runtime Contract): The 5 Promises to Uphold

1. Listen on `$PORT`・`0.0.0.0`

2. Be Stateless

3. The File System Is In-Memory

4. Receive SIGTERM and Clean Up Within 10 Seconds

5. Return a Response Within the Timeout

The First Deploy: From Source or From a Container

Resource Design: Understand the "Combinations" of CPU and Memory

Startup CPU boost

Execution Environments: gen1 and gen2

Concurrency: The Number of Requests One Instance Handles Simultaneously

Auto-Scale: Scale-to-Zero and Minimum/Maximum Instances

Request Timeout: Long-Running Processing Goes to Jobs

Health Checks: startup and liveness

Graceful Shutdown: SIGTERM and Idempotency

Revisions and Deploy: Blue/Green, Canary, Instant Rollback

Deploy Without Flowing Traffic → Verify with a Tagged URL

Canary → Blue/Green (Staged Switch)

Instant Rollback

Security: Least-Privilege Service Accounts and Secret Management

Assign a Dedicated Service Account Per Service

Secrets from Secret Manager: Environment Variables vs Volume

Harden the Entrance: Auth + Cloud Armor

Networking: Make Direct VPC egress the Default

Jobs and Worker Pools: Where to Place Processing Unsuited to HTTP

IaC: Build It Declaratively with Terraform

Observability: Just Emit Structured Logs to Standard Output

Pre-Production Checklist

Summary: The Crux of Serverless Containers Is Portable

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code

Cloud Run CI/CD: keyless, Blue/Green, and canary in real code with Cloud Build / GitHub Actions × Workload Identity

Cloud Run Jobs and Cloud Workflows: designing long-running batch and parallel processing to be idempotent and resumable

Cloud Run networking and security: defense in depth with Ingress control, IAM auth, Direct VPC egress, and Cloud Armor

Also worth reading

AWS ECS on Fargate Production Operation Guide: Designing, Deploying, Costing, and Securing Serverless Containers in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

What Cloud Run Is: The Official Definition

When to Use It: A Glance Decision Axis (Details to the Technology-Selection Guide)

The Container Contract (Runtime Contract): The 5 Promises to Uphold

1. Listen on $PORT・0.0.0.0

2. Be Stateless

3. The File System Is In-Memory

4. Receive SIGTERM and Clean Up Within 10 Seconds

5. Return a Response Within the Timeout

The First Deploy: From Source or From a Container

Resource Design: Understand the "Combinations" of CPU and Memory

Startup CPU boost

Execution Environments: gen1 and gen2

Concurrency: The Number of Requests One Instance Handles Simultaneously

Auto-Scale: Scale-to-Zero and Minimum/Maximum Instances

Request Timeout: Long-Running Processing Goes to Jobs

Health Checks: startup and liveness

Graceful Shutdown: SIGTERM and Idempotency

Revisions and Deploy: Blue/Green, Canary, Instant Rollback

Deploy Without Flowing Traffic → Verify with a Tagged URL

Canary → Blue/Green (Staged Switch)

Instant Rollback

Security: Least-Privilege Service Accounts and Secret Management

Assign a Dedicated Service Account Per Service

Secrets from Secret Manager: Environment Variables vs Volume

Harden the Entrance: Auth + Cloud Armor

Networking: Make Direct VPC egress the Default

Jobs and Worker Pools: Where to Place Processing Unsuited to HTTP

IaC: Build It Declaratively with Terraform

Observability: Just Emit Structured Logs to Standard Output

Pre-Production Checklist

Summary: The Crux of Serverless Containers Is Portable

Related articles

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code

Cloud Run CI/CD: keyless, Blue/Green, and canary in real code with Cloud Build / GitHub Actions × Workload Identity

Cloud Run Jobs and Cloud Workflows: designing long-running batch and parallel processing to be idempotent and resumable

Cloud Run networking and security: defense in depth with Ingress control, IAM auth, Direct VPC egress, and Cloud Armor

Also worth reading

AWS ECS on Fargate Production Operation Guide: Designing, Deploying, Costing, and Securing Serverless Containers in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

1. Listen on `$PORT`・`0.0.0.0`