Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

"Vercel is a place to put the front-end (Next.js static sites), right? The backend is on AWS" — as of 2026, this is a mental model where more than half has become wrong. Today's Vercel is a full-compute application platform that runs backend frameworks like Express, FastAPI, NestJS, and Hono as-is with zero config, and it has, as a platform, databases, object storage, WAF, queues, sandboxes, and even an AI gateway.

I myself run multiple Next.js products including this portfolio site (Next.js 16) in production on Vercel. From that experience, I can say that teams using Vercel only as a "convenient deploy target" are losing much of cost, observability, resilience, and security.

This article is a map of production operation, aimed at being faithful to the Vercel official documentation while being clearer than the official on "in which scene to use what." First it updates the knowledge to 2026's correct mental model, then handles compute, rendering/caching, deployment, security, data, cost, and observability end-to-end. The deep dive of each theme is split into individual articles (this piece is "the entrance of the cluster = the pillar").

First, update to 2026's mental model (correcting common misunderstandings)

The "old Vercel image" remaining in LLMs and articles from a while ago is now obsolete to the level of erring billing and architecture judgments. Let's fix this first.

Old understanding (discard)	The correct understanding of 2026 (official)
Use Edge Functions for low latency	Edge has compatibility issues. The recommendation is to run ordinary Node.js with Fluid Compute (the default) in the same region at the same price. Middleware and Edge Functions internally run on Vercel Functions
Middleware is Edge-only (Web API only)	Middleware can be written in full Node.js (Fluid Compute). Furthermore, "Routing Middleware" is a framework-independent Vercel product
The DB is Vercel Postgres / Vercel KV	Both are discontinued. Integrate Neon (Postgres), Upstash (Redis), etc., via the Vercel Marketplace
The function timeout is 60-90 seconds	Default 300 seconds (all plans). Pro/Ent is 800 seconds (GA), extended 1800 seconds (beta)
Billing is wall-clock GB-seconds	Active CPU billing. Billed only for the time the CPU was actually used, I/O wait is non-billed
The runtime is Node 18	Node.js 24 LTS is the default. Node 18 is deprecated
ISR is Next.js-only	ISR works in SvelteKit, Nuxt, and Astro too
Vercel is a static/front-end-only host	A full-compute platform. Runs Express/FastAPI/NestJS/Hono, etc., with zero config

Each row of this table will be backed up with real code in the subsequent chapters and individual articles. Discard the premise "Vercel = a place to put the front-end" — that's the starting point of this article.

The big picture of the platform: six layers

To fully use Vercel in production, grasping the features in "layers" is the shortcut. The deep dive of each layer links to an individual article.

Layer	Main components	In one phrase	Deep dive
Compute	Vercel Functions / Fluid Compute / Cron / Workflows	Runs code. The default is Fluid Compute	Functions / Fluid Compute guide
Rendering/caching	static / ISR / CDN Cache / Runtime Cache / Cache Components(PPR)	Returns fast. Use the four cache layers properly	Caching / ISR / Cache Components guide
Deployment/delivery	Git integration / preview / Promote / Instant Rollback / Rolling Releases	Ship safely, revert instantly	Deployment / CI/CD / rollback guide
Security	Firewall / WAF / BotID / DDoS mitigation / environment variables / OIDC	Protect the entrance	Firewall / WAF / BotID guide
Data/storage	Blob / Edge Config / Marketplace(Neon, Upstash)	Place state	Storage / Blob / Edge Config guide
Cost/operation	Active CPU billing / observability / Speed Insights	Make it cheap and traceable	Cost / Active CPU optimization guide

Below, this piece shows "the points to grasp first in production" of each layer, and leaves the details to each link.

Compute: understand the default Fluid Compute

What Fluid Compute is

Conventional serverless was "1 request = 1 instance," with a cold start occurring per request and the waste of one instance idling during I/O wait. Fluid Compute changes this to a "process multiple requests concurrently on one instance" model. In the official words —

Fluid compute offers a blend of serverless flexibility and server-like capabilities.（— Fluid compute）

There are five points.

ON by default: Fluid Compute is enabled by default for new projects created after 2025-04-23.
Optimized concurrency (in-function concurrency): one function instance processes multiple invocations. It's especially effective for workloads that become I/O-bound like AI ("embedding retrieval, vector-DB query, external-API call") (available in the Node.js, Python runtimes).
Background processing (waitUntil): after returning the response to the user, it can continue post-processing like log sending and analytics.
Cold-start reduction: on Node.js 20+, with bytecode caching it skips recompilation, and in production it also pre-warms functions.
Error isolation: even if an unhandled exception or unhandled rejection occurs in Node.js, Fluid logs the error and stops the process without dragging in other in-flight requests.

Implication for design: in Fluid Compute, multiple requests share the same process (global state). If you place "request-specific state" in module scope, it leaks. What's OK to place globally is only "things safe to share between requests" like a DB connection pool or config — this directly ties to the security described later.

Enabling (existing project)

Toggle it in the dashboard's Project → Settings → Functions, or declare it in the config file.

// vercel.json — 特定環境やデプロイ単位で Fluid を有効化したいとき
{
  "$schema": "https://openapi.vercel.sh/vercel.json",
  "fluid": true
}

Limits to grasp (official numbers)

These are numbers that become premises of production design (from Functions Limits).

Item	Hobby	Pro / Enterprise
Default timeout	300 sec	300 sec
Max timeout	300 sec	800 sec (GA) / extended 1800 sec (beta, per-function setting)
Memory	2 GB / 1 vCPU	up to 4 GB / 2 vCPU
Concurrent scale	up to 30,000	30,000 (Pro) / 100,000+ (Ent)
Bundle size (uncompressed)	250 MB (Python is 500 MB)	same as left
Request/response body	4.5 MB (over it, 413 `FUNCTION_PAYLOAD_TOO_LARGE`)	same as left
File descriptors	1,024 (including the runtime's usage, shared in concurrent execution)	same as left
Default region	`iad1` (changeable)	multiple regions possible (Pro is up to 3)

When a timeout isn't enough: for processing you want to pause/resume while holding state for minutes to months, instead of extending the function's timeout, carve it out to Vercel Workflows (no upper limit on execution time). The iron rule is not to hold heavy synchronous processing in an HTTP function.

The real code of runtime, streaming, waitUntil, and Cron is detailed in the Functions / Fluid Compute guide.

Rendering and caching: use the four layers properly

"Fast" isn't made with one kind of cache. Vercel provides four layers, and you choose by requirements.

Layer	What it caches	How to set	When to use
Static files	Build output (JS/CSS/images/fonts)	automatic (no config)	All deploys. Persists across deploys with hashed filenames
ISR	Page (HTML + data)	The framework's API (`revalidate`, etc.)	Pages with scheduled updates (minutes to hours). When you want persistence, rollback resistance, and request collapsing
CDN Cache	A function's HTTP response	`Cache-Control` / `CDN-Cache-Control` header	Caching an ISR-unsupported framework or API response per region
Runtime Cache	Data inside a function (fetch results, DB queries, computed values)	Runtime Cache API (tag invalidation)	When you want to reuse individual data pieces, not the whole response

Key points of ISR (the official behavior)

ISR (Incremental Static Regeneration) is the stale-while-revalidate pattern. "Return the cached one immediately while regenerating in the back." When Vercel's CDN is on, the framework's ISR automatically gets the following optimizations (from ISR).

31-day persistent storage: the ISR cache persists in the Function region. Independent per deploy.
300ms global purge: when you revalidate, the cache of all regions is updated within 300ms. It swaps HTML and data atomically.
Request collapsing: even if simultaneous accesses come to the same uncached path, the function is called only once per region. Origin protection during a traffic spike.
Instant-rollback resistance: since the cache of a past deploy isn't purged, you don't lose the generated content even when you roll back.
Stale maintenance on failure: if revalidation fails (network error / a status other than 200, 301, 302, 307, 308, 404, 410 / function error), it keeps returning the existing cache and retries with a 30-second TTL.

The supported frameworks are Next.js, SvelteKit, Nuxt, Astro, and Gatsby (DSG). For Next.js, in the App Router you just export revalidate from a route segment.

// app/products/[id]/page.tsx — 60秒ごとに裏で再生成（stale-while-revalidate）
export const revalidate = 60;

export default async function ProductPage({
  params,
}: {
  params: Promise<{ id: string }>;
}) {
  const { id } = await params;
  const product = await getProduct(id); // ビルド時 or 再検証時に実行
  return <ProductView product={product} />;
}

On-demand revalidation (instant reflection with an inventory-update webhook, etc.), Cache Components (PPR), and the use distinction of the three Cache-Control headers are handled in the Caching / ISR / Cache Components guide.

A minimal example of putting a function response on the CDN

Even for an API that doesn't use ISR, if you return Cache-Control it's cached on the CDN. Vercel has three headers that can control the browser / downstream CDN / Vercel CDN separately.

// app/api/rates/route.ts
export async function GET() {
  return Response.json(await getRates(), {
    headers: {
      "Cache-Control": "max-age=10",            // ブラウザ：10秒
      "CDN-Cache-Control": "max-age=60",        // 下流CDN：60秒
      "Vercel-CDN-Cache-Control": "max-age=3600" // Vercel CDN：3600秒
    },
  });
}

The cacheable conditions are strict: GET/HEAD, no Authorization header, no Set-Cookie, no private/no-store, non-streaming and 10MB or less (streaming is 20MB). You can confirm the result with the x-vercel-cache header (HIT/MISS/STALE/PRERENDER).

Deployment and delivery: four safety valves

Vercel's deployment experience isn't just "git push and it goes to production." Build in from the start four safety valves that prevent production incidents.

Preview URL: an independent URL equivalent to production is auto-generated per branch/PR. Do review and QA here.
Production Promote: promote a deploy verified in preview to production. vercel promote or the dashboard.
Instant Rollback: a production deploy is an immutable snapshot. If a problem comes out, you can instantly return to a past deploy.
Rolling Releases: instead of shipping a new deploy to 100% at once, flow it to only, say, 5%, compare metrics like Speed Insights, and gradually go to 100% if there's no problem (Pro is 1 project, Enterprise is custom).

# プレビュー → 検証 → 本番昇格 → 問題があれば即ロールバック
vercel deploy                 # プレビューデプロイ
vercel promote <deployment>   # 本番へ昇格（Rolling Releases 有効なら自動でカナリア開始）
vercel rollback <deployment>  # 直前の本番へ即時復帰

Use Skew Protection together with Rolling Releases: if the "page version" and the "backend API version" diverge between canary and current, it breaks. Skew Protection guarantees that a user who hit either deploy communicates with the same version of the backend.

CI/CD (GitHub Actions, --prebuilt, OIDC keyless), the staged design of Rolling Releases, verification with vcrrForceCanary, and automation with the REST API are detailed in the Deployment / CI/CD / rollback guide.

Configure with vercel.ts (type-safe) or vercel.json

2026's recommendation is vercel.ts — install @vercel/config and export typed config. Unlike vercel.json, it's executed at build time, so you can dynamically assemble the config with environment variables or API retrieval (use only one of the two).

// vercel.ts
import { routes, type VercelConfig } from "@vercel/config/v1";

export const config: VercelConfig = {
  framework: "nextjs",
  buildCommand: "npm run build",
  rewrites: [routes.rewrite("/api/(.*)", "https://backend.example.com/$1")],
  redirects: [routes.redirect("/old-docs", "/docs", { permanent: true })],
  headers: [
    routes.cacheControl("/static/(.*)", {
      public: true,
      maxAge: "1 week",
      immutable: true,
    }),
  ],
  crons: [{ path: "/api/cleanup", schedule: "0 0 * * *" }],
};

Security: protect the entrance, secrets, and execution in layers

Vercel's security is, fundamentally, made effective "before you write the app's code."

The entrance: Firewall / WAF / BotID / DDoS

DDoS mitigation is automatic on all plans.
Vercel WAF provides custom rules (allow/deny/challenge/log/rate limit), IP blocking, and managed rulesets (Enterprise). Config changes are reflected globally in 300ms, and if there's a problem, instant rollback.
BotID is an "invisible CAPTCHA" (Kasada). You can add bot detection without user operation to high-value routes like checkout, signup, and APIs. Basic is free on all plans, Deep Analysis is $1/1000 checkBotId() calls for Pro.

// app/api/checkout/route.ts — サーバー側で BotID を検証
import { checkBotId } from "botid/server";

export async function POST(request: Request) {
  const verification = await checkBotId();
  if (verification.isBot) {
    return new Response("Forbidden", { status: 403 });
  }
  return handleCheckout(request);
}

WAF custom rules, rate limiting, Attack Challenge Mode, BotID client instrumentation, and the vercel firewall CLI are handled in the Firewall / WAF / BotID guide.

Secrets: environment variables and OIDC

Don't write secrets in code; put them in environment variables. Variables with the NEXT_PUBLIC_ prefix are exposed to the browser, so never attach it to an API key. You can sync to the local .env with vercel env pull. To external clouds (AWS, etc.), don't place a long-lived access key; using OIDC token federation (keyless) is the 2026 standard.

vercel env pull .env.local      # 本番/プレビュー/開発の環境変数をローカルへ
vercel env add STRIPE_SECRET_KEY production

The pitfall in Fluid Compute (repeated): instances are shared by multiple requests. Don't cache request-specific tokens or user information in a module-scope global variable. It becomes a hotbed of cross-tenant leakage and information leak.

Data and storage: where to place state

Vercel Postgres / Vercel KV are discontinued; now you use them differently by purpose.

Purpose	Option	Characteristics
Objects (images, video, PDF, etc.)	Vercel Blob	public/private, S3 backend (11 nines durability), the `@vercel/blob` SDK, CDN delivery
Ultra-low-latency reads (flags, redirects, IP blocks)	Edge Config	Global reads of under P99 15ms (often under 1ms). For data that's rarely updated and read at high frequency
Relational DB	Marketplace: Neon (Postgres)	Serverless Postgres. Integrate with `vercel integration`, environment-variable auto-injection
KV / cache / rate limiting	Marketplace: Upstash (Redis)	Serverless Redis

// Vercel Blob：private で保存し、関数経由で配信
import { put } from "@vercel/blob";

const blob = await put(`invoices/${id}.pdf`, pdfBuffer, {
  access: "private",         // 'public' も選べる（作成後は変更不可）
  addRandomSuffix: true,     // 上書き事故を避け、URL 衝突を防ぐ
});

// Edge Config：機能フラグを超低遅延で読む（Middleware/関数）
import { get } from "@vercel/edge-config";

const maintenance = await get<boolean>("maintenance_mode");
if (maintenance) {
  return Response.redirect(new URL("/maintenance", request.url));
}

Blob's client/server upload, conditional write (ifMatch), multipart, Edge Config's write operation, and the flow of Marketplace integration are handled in the Storage / Blob / Edge Config guide.

Cost: understanding Active CPU billing changes the design

Vercel Functions' billing is three axes, a different thing from the conventional "wall-clock GB-seconds."

Axis	Billing target	Important property
Active CPU	The time the code actually used the CPU	The I/O wait of DB queries and AI calls isn't billed (CPU billing is paused)
Provisioned Memory	Allocated memory × instance running time (GB-hr)	Billing continues even during I/O wait. Until the last in-flight request finishes
Invocations	The number of incoming requests	One by one regardless of success/failure. Pro is $0.60/million

Hobby has a free tier of Active CPU 4 hours, memory 360 GB-hr, Invocations 1 million/month. The official calculation example (sample): in São Paulo (CPU $0.221/h, memory $0.0183/GB-h), an invocation of 4GB memory, CPU 4 seconds, instance lifetime 10 seconds is —

CPU: (4 / 3600) × $0.221 = $0.0002456
Memory: (4 × 10 / 3600) × $0.0183 = $0.0002033
Total: $0.0004489 / invocation

The point where the design changes: an I/O-centric (AI/external-API/DB) app often becomes cheaper than before thanks to Active CPU billing and Fluid's concurrency. Conversely, CPU-bound processing like image processing consumes a lot of Active CPU. That's exactly why "separate heavy CPU processing from the function (turn it into a job/workflow)" is effective.

Per-region unit prices (Tokyo hnd1, Osaka kix1 are CPU $0.202/h, memory $0.0167/GB-h), optimization of maxDuration/memory, and cost monitoring are detailed in the Cost / Active CPU optimization guide.

Observability and resilience: create a "traceable, non-crashing" state in production

Observability / Speed Insights: track function execution time, error rate, and Core Web Vitals on the dashboard. The canary comparison of Rolling Releases is also here.
Resilience: Fluid's cross-region / AZ failover (if one zone goes down, automatic transfer to another AZ in the same region → a neighboring region) and the deploy's Instant Rollback are the two pillars.
Idempotency: design the waitUntil post-processing and webhook reception to be idempotent on the premise of "at least once" (don't double-charge / double-send even if the same event arrives twice).
Graceful shutdown: Fluid sends a signal before termination. Implement the completion of in-flight requests and connection closing.

These are a continuum with the design principles that have achieved "0 double charges in payments" (payment idempotency design).

2026's new primitives (you lose by not knowing)

Vercel is, beyond "hosting," increasing the building blocks of the app themselves.

Product	Status	What it's for
AI Gateway	GA (2025-08)	A unified API to multiple AI providers. Observability, fallback, zero data retention. Switch with a `"provider/model"` string
Workflows (WDK)	—	Durable workflows of minutes to months. pause/resume, retry, crash resistance
Queues	Public beta	At-least-once durable event streaming (on Fluid Compute)
Sandbox	GA (2026-01)	A microVM that runs untrusted code (AI-generated, etc.) in isolation
BotID	GA (2025-06)	An invisible CAPTCHA (described above)
Sign in with Vercel	GA (2025-11)	An OAuth provider for third-party apps
Vercel Agent	Public beta	AI code review, production-incident investigation

If you build an AI feature, before directly hitting a provider-dedicated SDK, consider "provider/model" via the AI Gateway as the default (Vercel AI SDK production implementation).

Pre-production-release checklist

These are the items I actually confirm when I ship a new Vercel project to production.

Summary: design Vercel as a "platform"

Vercel in 2026 is not a place to put the front-end but a full-compute platform that bundles compute, data, delivery, and security. Production quality isn't decided by "I could deploy" but by whether you can build in from the start these five points —

Understand Fluid Compute's concurrency and error isolation, and don't pollute global state
Choose the four cache layers by requirements
Ship safely with preview / Promote / Instant Rollback / Rolling Releases
Protect the entrance with WAF/BotID, and separate the secret and execution boundaries
Design cost and architecture on the premise of Active CPU billing

The real code of each theme is gathered in this cluster's individual articles. First, start by confirming, one at a time, your project's "cache layer" and "billing axis."

This article is based on the Vercel official documentation (Fluid Compute / Functions / ISR / CDN Cache / Rolling Releases / WAF / BotID / Blob / Edge Config, as of June 2026), reconstructed with the addition of practical-operation judgment axes. Since specs and prices get updated, confirm the latest values on each official page when adopting in production.

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

First, update to 2026's mental model (correcting common misunderstandings)

The big picture of the platform: six layers

Compute: understand the default Fluid Compute

What Fluid Compute is

Enabling (existing project)

Limits to grasp (official numbers)

Rendering and caching: use the four layers properly

Key points of ISR (the official behavior)

A minimal example of putting a function response on the CDN

Deployment and delivery: four safety valves

Configure with vercel.ts (type-safe) or vercel.json

Security: protect the entrance, secrets, and execution in layers

The entrance: Firewall / WAF / BotID / DDoS

Secrets: environment variables and OIDC

Data and storage: where to place state

Cost: understanding Active CPU billing changes the design

Observability and resilience: create a "traceable, non-crashing" state in production

2026's new primitives (you lose by not knowing)

Pre-production-release checklist

Summary: design Vercel as a "platform"

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Vercel deployment & CI/CD guide: preview, Promote, Instant Rollback, and Rolling Releases at production quality

Also worth reading

Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code

First, update to 2026's mental model (correcting common misunderstandings)

The big picture of the platform: six layers

Compute: understand the default Fluid Compute

What Fluid Compute is

Enabling (existing project)

Limits to grasp (official numbers)

Rendering and caching: use the four layers properly

Key points of ISR (the official behavior)

A minimal example of putting a function response on the CDN

Deployment and delivery: four safety valves

Configure with vercel.ts (type-safe) or vercel.json

Security: protect the entrance, secrets, and execution in layers

The entrance: Firewall / WAF / BotID / DDoS

Secrets: environment variables and OIDC

Data and storage: where to place state

Cost: understanding Active CPU billing changes the design

Observability and resilience: create a "traceable, non-crashing" state in production

2026's new primitives (you lose by not knowing)

Pre-production-release checklist

Summary: design Vercel as a "platform"

Related articles

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Vercel deployment & CI/CD guide: preview, Promote, Instant Rollback, and Rolling Releases at production quality

Also worth reading

Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

Cloud Run concurrency, autoscaling, billing model, and cost optimization: conquering scale-to-zero and cold starts in real code