Skip to main content
友田 陽大
Google Cloud Run in production
GCP
Cloud Run
CI/CD
DevOps
Workload Identity
セキュリティ
インフラ
Terraform

Cloud Run CI/CD: keyless, Blue/Green, and canary in real code with Cloud Build / GitHub Actions × Workload Identity

An implementation guide for building production-quality continuous deployment to Cloud Run. It explains, with real code in cloudbuild.yaml, GitHub Actions, and gcloud: Artifact Registry, when to use Cloud Build vs. GitHub Actions (keyless via Workload Identity Federation), verifying first with --no-traffic + a tag URL then canary → Blue/Green → instant rollback, separating DB migrations into a job, and dividing responsibilities with Terraform.

Published
Reading time
7 min read
Author
友田 陽大
Share

"Deployment is scary" — it's the feeling you most want to avoid on a production container platform. The true identity of that fear is not being able to revert and not knowing what changed. Cloud Run CI/CD can structurally crush both. Because revisions are immutable, you can revert instantly without rebuilding, and if you separate responsibilities, what changed is always clear.

While operating a broadcaster platform on GCP, I ran a never-stopping internal platform with a configuration where I separated stg/prod with Cloud Build, split responsibilities so Terraform owns 'infrastructure' and Cloud Build owns 'the image and latest env,' carved out DB migrations into a dedicated job, and made CI/CD keyless with Workload Identity Federation. This article reproduces that design in real code, faithful to the Google Cloud official documentation.

For the full picture of production operation see the Cloud Run production-operations guide, and for the design of long-running jobs themselves the Jobs / Workflows guide.


Design principle: separate the three responsibilities

Accidents in Cloud Run CI/CD usually happen when responsibilities are mixed. Draw the boundaries first.

ResponsibilityWhat carries itSource of truth
App contentsThe container image (build in CI → Artifact Registry)Git (commit SHA = image tag)
InfrastructureService, SA, VPC, scaling settings (Terraform)Terraform state
Which revision to route toTraffic allocation (immutable revisions)Cloud Run's traffic setting

This separation works because — if you make "image tag = commit SHA," you can uniquely track which commit is running in production, and infrastructure changes (Terraform) and app changes (image) don't mix. Don't use the latest tag (you lose track of what's running).


Artifact Registry: where images live

Images go in Artifact Registry (formerly Container Registry). First create the repository.

gcloud artifacts repositories create app \
  --repository-format=docker \
  --location=asia-northeast1 \
  --description="app container images"
# イメージURLの形:asia-northeast1-docker.pkg.dev/PROJECT_ID/app/api:GIT_SHA

Path A: Cloud Build (self-contained, GCP-native)

If you want everything within GCP, Cloud Build. Declare "build → push → deploy" in cloudbuild.yaml.

# cloudbuild.yaml — push trigger で起動。$SHORT_SHA はCloud Buildが注入する。
steps:
  # 1. ビルド(コミットSHAをタグに)
  - name: "gcr.io/cloud-builders/docker"
    args:
      ["build", "-t",
       "${_REGION}-docker.pkg.dev/$PROJECT_ID/app/api:$SHORT_SHA", "."]
  # 2. Artifact Registry へプッシュ
  - name: "gcr.io/cloud-builders/docker"
    args:
      ["push",
       "${_REGION}-docker.pkg.dev/$PROJECT_ID/app/api:$SHORT_SHA"]
  # 3. トラフィックを流さずにデプロイ(タグURLで検証してから昇格する)
  - name: "gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: gcloud
    args:
      ["run", "deploy", "api",
       "--image", "${_REGION}-docker.pkg.dev/$PROJECT_ID/app/api:$SHORT_SHA",
       "--region", "${_REGION}",
       "--no-traffic", "--tag", "sha-$SHORT_SHA"]
images:
  - "${_REGION}-docker.pkg.dev/$PROJECT_ID/app/api:$SHORT_SHA"
substitutions:
  _REGION: asia-northeast1
options:
  logging: CLOUD_LOGGING_ONLY

Connect a push trigger to the GitHub repository, and every commit automatically runs build and deploy (without routing traffic).

gcloud builds triggers create github \
  --repo-name=app --repo-owner=YOUR_ORG \
  --branch-pattern="^main$" \
  --build-config=cloudbuild.yaml

Path B: GitHub Actions × Workload Identity (keyless)

If your existing CI is GitHub Actions, this is natural. Without issuing a service-account key, authenticate to GCP with Workload Identity Federation (WIF).

# .github/workflows/deploy.yml
name: deploy
on:
  push:
    branches: [main]

permissions:
  contents: read
  id-token: write   # これが無いとGitHubはOIDCトークンを注入せず、認証が失敗する

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # 鍵レス認証:プールとプロバイダはWIFで事前設定(下記リンク参照)
      - id: auth
        uses: google-github-actions/auth@v3
        with:
          # ★プロジェクト「番号」を含むフルパス。プロジェクトIDではない。
          workload_identity_provider: "projects/123456789/locations/global/workloadIdentityPools/github/providers/app-repo"
          service_account: "deployer@PROJECT_ID.iam.gserviceaccount.com"

      - uses: google-github-actions/deploy-cloudrun@v3
        with:
          service: api
          region: asia-northeast1
          image: asia-northeast1-docker.pkg.dev/PROJECT_ID/app/api:${{ github.sha }}
          flags: "--no-traffic --tag=sha-${{ github.sha }}"

The WIF pool/provider setup (allowing only your own repository with an Attribute Condition, etc.) is not repeated in this article. The key points of the setup — always include a match on assertion.repository, never wildcard sub — are collected in the dedicated article making GitHub Actions keyless (DRY). Give the deploy SA only the minimum privileges (roles/run.developer + Artifact Registry read + roles/iam.serviceAccountUser on the runtime SA).


Safe shipping: verify → canary → Blue/Green → instant rollback

The key is to stop CI at "deploy without routing traffic." Promote after a human (or an automated check) verifies. Precisely because revisions are immutable, this staged control works safely.

# 1. タグURLで隔離検証(本番トラフィックに影響しない)
#    → https://sha-abc123---api-xxxxx.a.run.app をスモークテスト
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  https://sha-abc123---api-xxxxx.a.run.app/healthz

# 2. 健全なら5%だけカナリア
gcloud run services update-traffic api --region asia-northeast1 \
  --to-tags sha-abc123=5

# 3. エラー率・レイテンシを監視しつつ段階引き上げ(5 → 25 → 50%)
gcloud run services update-traffic api --region asia-northeast1 \
  --to-tags sha-abc123=50

# 4. 問題なければ100%へ(Blue/Green切替)
gcloud run services update-traffic api --region asia-northeast1 --to-latest

# ── 異常を検知したら、旧リビジョンへ即時ロールバック(再ビルド不要)──
gcloud run services update-traffic api --region asia-northeast1 \
  --to-revisions api-00021-prev=100

The fact that rollback completes by "just sending 100% back to the old revision" is Cloud Run's greatest safety device. No rebuilding the image, no redoing the deployment. Build this into your standard CI/CD procedure, and even a nighttime incident returns to normal in tens of seconds.


Separate DB migrations from deployment

The most accident-prone thing is a schema change. Mixing app rollout and migration into the same step lands you in an "old code, new schema" inconsistency on rollback. The correct answer is to carve it out into a dedicated Cloud Run Job and apply in forward/backward-compatible stages.

# マイグレーション専用ジョブを用意し、デプロイとは独立に実行する
gcloud run jobs deploy db-migrate \
  --image asia-northeast1-docker.pkg.dev/PROJECT_ID/app/migrate:${GIT_SHA} \
  --region asia-northeast1 \
  --service-account migrator@PROJECT_ID.iam.gserviceaccount.com \
  --max-retries 0           # マイグレーションは安易にリトライさせない
gcloud run jobs execute db-migrate --region asia-northeast1 --wait

Make zero-downtime schema changes a multi-stage release: "① add a compatible column → ② deploy code that supports both old and new → ③ backfill → ④ code that removes old references → ⑤ drop the old column." For the design details see zero-downtime schema migration (the principles are the same on Cloud SQL/PostgreSQL). For building out the job itself, go to the Jobs / Workflows guide.


Cloud Build or GitHub Actions: which to choose

Cloud BuildGitHub Actions
AuthenticationNatively easy since it's inside GCPKeyless with WIF (setup required)
EcosystemOptimized for GCPBroad (easy to integrate lint/test/other clouds)
Suited teamGCP-centric, wants infra leaned on Cloud Build tooAlready standardized on GitHub Actions
Build environmentManaged, parallel, cachingRunners (self-hosted possible)

The right answer is "lean toward your team's existing CI." Both can build the same shipping flow of --no-traffic + tag verification → canary → Blue/Green. In my project I consolidated the build/deploy core in Cloud Build while running CodeQL, dependency updates, and tests on the GitHub side — a combined configuration.


Production-rollout checklist

  • Image tag is the commit SHA (don't use latest)
  • Responsibility split: Terraform = infra / image = app
  • CI/CD is keyless with WIF (add id-token: write)
  • The deploy SA has minimum privileges (run.developer + AR read + serviceAccountUser)
  • CI stops at --no-traffic + --tag. Promote after verification
  • Script the staged shipping of canary → Blue/Green
  • Put the instant-rollback procedure (100% to the old revision) in the runbook
  • DB migrations split into a dedicated job and applied in stages
  • Pass production-equivalent verification (including the WAF) in stg first

Conclusion: make deployment a "not scary" task

Cloud Run CI/CD can structurally erase the fear of "can't revert, don't know what changed" with responsibility separation (image/infra/traffic) and staged shipping via immutable revisions. Keyless (WIF) also severs the credential-leak risk, and separating migrations prevents inconsistency. With this, even a small team can carry out production deployments matter-of-factly.

For the overall design go to the Cloud Run production-operations guide, for cost the concurrency/billing guide, and for long-running processing the Jobs / Workflows guide. If you need accompaniment on building out GCP CI/CD or going keyless, I'll help based on real operational experience.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

I can take on the implementation from this article as an engagement

GCP / Cloud Run container platforms, from design to production and cost optimization

Building container platforms on Cloud Run (services + jobs), migration from AWS/on-prem, keyless CI/CD via Workload Identity, defense-in-depth with Cloud Armor and least privilege, and cost optimization of concurrency and the billing model. With experience building and operating a broadcaster platform on GCP with IaC, I deliver fast, cheap, and secure.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading