"Vercel is a place to put the front-end (Next.js static sites), right? The backend is on AWS" — as of 2026, this is a mental model where more than half has become wrong. Today's Vercel is a full-compute application platform that runs backend frameworks like Express, FastAPI, NestJS, and Hono as-is with zero config, and it has, as a platform, databases, object storage, WAF, queues, sandboxes, and even an AI gateway.
I myself run multiple Next.js products including this portfolio site (Next.js 16) in production on Vercel. From that experience, I can say that teams using Vercel only as a "convenient deploy target" are losing much of cost, observability, resilience, and security.
This article is a map of production operation, aimed at being faithful to the Vercel official documentation while being clearer than the official on "in which scene to use what." First it updates the knowledge to 2026's correct mental model, then handles compute, rendering/caching, deployment, security, data, cost, and observability end-to-end. The deep dive of each theme is split into individual articles (this piece is "the entrance of the cluster = the pillar").
First, update to 2026's mental model (correcting common misunderstandings)
The "old Vercel image" remaining in LLMs and articles from a while ago is now obsolete to the level of erring billing and architecture judgments. Let's fix this first.
| Old understanding (discard) | The correct understanding of 2026 (official) |
|---|---|
| Use Edge Functions for low latency | Edge has compatibility issues. The recommendation is to run ordinary Node.js with Fluid Compute (the default) in the same region at the same price. Middleware and Edge Functions internally run on Vercel Functions |
| Middleware is Edge-only (Web API only) | Middleware can be written in full Node.js (Fluid Compute). Furthermore, "Routing Middleware" is a framework-independent Vercel product |
| The DB is Vercel Postgres / Vercel KV | Both are discontinued. Integrate Neon (Postgres), Upstash (Redis), etc., via the Vercel Marketplace |
| The function timeout is 60-90 seconds | Default 300 seconds (all plans). Pro/Ent is 800 seconds (GA), extended 1800 seconds (beta) |
| Billing is wall-clock GB-seconds | Active CPU billing. Billed only for the time the CPU was actually used, I/O wait is non-billed |
| The runtime is Node 18 | Node.js 24 LTS is the default. Node 18 is deprecated |
| ISR is Next.js-only | ISR works in SvelteKit, Nuxt, and Astro too |
| Vercel is a static/front-end-only host | A full-compute platform. Runs Express/FastAPI/NestJS/Hono, etc., with zero config |
Each row of this table will be backed up with real code in the subsequent chapters and individual articles. Discard the premise "Vercel = a place to put the front-end" — that's the starting point of this article.
The big picture of the platform: six layers
To fully use Vercel in production, grasping the features in "layers" is the shortcut. The deep dive of each layer links to an individual article.
| Layer | Main components | In one phrase | Deep dive |
|---|---|---|---|
| Compute | Vercel Functions / Fluid Compute / Cron / Workflows | Runs code. The default is Fluid Compute | Functions / Fluid Compute guide |
| Rendering/caching | static / ISR / CDN Cache / Runtime Cache / Cache Components(PPR) | Returns fast. Use the four cache layers properly | Caching / ISR / Cache Components guide |
| Deployment/delivery | Git integration / preview / Promote / Instant Rollback / Rolling Releases | Ship safely, revert instantly | Deployment / CI/CD / rollback guide |
| Security | Firewall / WAF / BotID / DDoS mitigation / environment variables / OIDC | Protect the entrance | Firewall / WAF / BotID guide |
| Data/storage | Blob / Edge Config / Marketplace(Neon, Upstash) | Place state | Storage / Blob / Edge Config guide |
| Cost/operation | Active CPU billing / observability / Speed Insights | Make it cheap and traceable | Cost / Active CPU optimization guide |
Below, this piece shows "the points to grasp first in production" of each layer, and leaves the details to each link.
Compute: understand the default Fluid Compute
What Fluid Compute is
Conventional serverless was "1 request = 1 instance," with a cold start occurring per request and the waste of one instance idling during I/O wait. Fluid Compute changes this to a "process multiple requests concurrently on one instance" model. In the official words —
Fluid compute offers a blend of serverless flexibility and server-like capabilities.(— Fluid compute)
There are five points.
- ON by default: Fluid Compute is enabled by default for new projects created after 2025-04-23.
- Optimized concurrency (in-function concurrency): one function instance processes multiple invocations. It's especially effective for workloads that become I/O-bound like AI ("embedding retrieval, vector-DB query, external-API call") (available in the Node.js, Python runtimes).
- Background processing (
waitUntil): after returning the response to the user, it can continue post-processing like log sending and analytics. - Cold-start reduction: on Node.js 20+, with bytecode caching it skips recompilation, and in production it also pre-warms functions.
- Error isolation: even if an unhandled exception or unhandled rejection occurs in Node.js, Fluid logs the error and stops the process without dragging in other in-flight requests.
Implication for design: in Fluid Compute, multiple requests share the same process (global state). If you place "request-specific state" in module scope, it leaks. What's OK to place globally is only "things safe to share between requests" like a DB connection pool or config — this directly ties to the security described later.
Enabling (existing project)
Toggle it in the dashboard's Project → Settings → Functions, or declare it in the config file.
// vercel.json — 特定環境やデプロイ単位で Fluid を有効化したいとき
{
"$schema": "https://openapi.vercel.sh/vercel.json",
"fluid": true
}
Limits to grasp (official numbers)
These are numbers that become premises of production design (from Functions Limits).
| Item | Hobby | Pro / Enterprise |
|---|---|---|
| Default timeout | 300 sec | 300 sec |
| Max timeout | 300 sec | 800 sec (GA) / extended 1800 sec (beta, per-function setting) |
| Memory | 2 GB / 1 vCPU | up to 4 GB / 2 vCPU |
| Concurrent scale | up to 30,000 | 30,000 (Pro) / 100,000+ (Ent) |
| Bundle size (uncompressed) | 250 MB (Python is 500 MB) | same as left |
| Request/response body | 4.5 MB (over it, 413 FUNCTION_PAYLOAD_TOO_LARGE) | same as left |
| File descriptors | 1,024 (including the runtime's usage, shared in concurrent execution) | same as left |
| Default region | iad1 (changeable) | multiple regions possible (Pro is up to 3) |
When a timeout isn't enough: for processing you want to pause/resume while holding state for minutes to months, instead of extending the function's timeout, carve it out to Vercel Workflows (no upper limit on execution time). The iron rule is not to hold heavy synchronous processing in an HTTP function.
The real code of runtime, streaming, waitUntil, and Cron is detailed in the Functions / Fluid Compute guide.
Rendering and caching: use the four layers properly
"Fast" isn't made with one kind of cache. Vercel provides four layers, and you choose by requirements.
| Layer | What it caches | How to set | When to use |
|---|---|---|---|
| Static files | Build output (JS/CSS/images/fonts) | automatic (no config) | All deploys. Persists across deploys with hashed filenames |
| ISR | Page (HTML + data) | The framework's API (revalidate, etc.) | Pages with scheduled updates (minutes to hours). When you want persistence, rollback resistance, and request collapsing |
| CDN Cache | A function's HTTP response | Cache-Control / CDN-Cache-Control header | Caching an ISR-unsupported framework or API response per region |
| Runtime Cache | Data inside a function (fetch results, DB queries, computed values) | Runtime Cache API (tag invalidation) | When you want to reuse individual data pieces, not the whole response |
Key points of ISR (the official behavior)
ISR (Incremental Static Regeneration) is the stale-while-revalidate pattern. "Return the cached one immediately while regenerating in the back." When Vercel's CDN is on, the framework's ISR automatically gets the following optimizations (from ISR).
- 31-day persistent storage: the ISR cache persists in the Function region. Independent per deploy.
- 300ms global purge: when you revalidate, the cache of all regions is updated within 300ms. It swaps HTML and data atomically.
- Request collapsing: even if simultaneous accesses come to the same uncached path, the function is called only once per region. Origin protection during a traffic spike.
- Instant-rollback resistance: since the cache of a past deploy isn't purged, you don't lose the generated content even when you roll back.
- Stale maintenance on failure: if revalidation fails (network error / a status other than 200, 301, 302, 307, 308, 404, 410 / function error), it keeps returning the existing cache and retries with a 30-second TTL.
The supported frameworks are Next.js, SvelteKit, Nuxt, Astro, and Gatsby (DSG). For Next.js, in the App Router you just export revalidate from a route segment.
// app/products/[id]/page.tsx — 60秒ごとに裏で再生成(stale-while-revalidate)
export const revalidate = 60;
export default async function ProductPage({
params,
}: {
params: Promise<{ id: string }>;
}) {
const { id } = await params;
const product = await getProduct(id); // ビルド時 or 再検証時に実行
return <ProductView product={product} />;
}
On-demand revalidation (instant reflection with an inventory-update webhook, etc.), Cache Components (PPR), and the use distinction of the three Cache-Control headers are handled in the Caching / ISR / Cache Components guide.
A minimal example of putting a function response on the CDN
Even for an API that doesn't use ISR, if you return Cache-Control it's cached on the CDN. Vercel has three headers that can control the browser / downstream CDN / Vercel CDN separately.
// app/api/rates/route.ts
export async function GET() {
return Response.json(await getRates(), {
headers: {
"Cache-Control": "max-age=10", // ブラウザ:10秒
"CDN-Cache-Control": "max-age=60", // 下流CDN:60秒
"Vercel-CDN-Cache-Control": "max-age=3600" // Vercel CDN:3600秒
},
});
}
The cacheable conditions are strict: GET/HEAD, no Authorization header, no Set-Cookie, no private/no-store, non-streaming and 10MB or less (streaming is 20MB). You can confirm the result with the x-vercel-cache header (HIT/MISS/STALE/PRERENDER).
Deployment and delivery: four safety valves
Vercel's deployment experience isn't just "git push and it goes to production." Build in from the start four safety valves that prevent production incidents.
- Preview URL: an independent URL equivalent to production is auto-generated per branch/PR. Do review and QA here.
- Production Promote: promote a deploy verified in preview to production.
vercel promoteor the dashboard. - Instant Rollback: a production deploy is an immutable snapshot. If a problem comes out, you can instantly return to a past deploy.
- Rolling Releases: instead of shipping a new deploy to 100% at once, flow it to only, say, 5%, compare metrics like Speed Insights, and gradually go to 100% if there's no problem (Pro is 1 project, Enterprise is custom).
# プレビュー → 検証 → 本番昇格 → 問題があれば即ロールバック
vercel deploy # プレビューデプロイ
vercel promote <deployment> # 本番へ昇格(Rolling Releases 有効なら自動でカナリア開始)
vercel rollback <deployment> # 直前の本番へ即時復帰
Use Skew Protection together with Rolling Releases: if the "page version" and the "backend API version" diverge between canary and current, it breaks. Skew Protection guarantees that a user who hit either deploy communicates with the same version of the backend.
CI/CD (GitHub Actions, --prebuilt, OIDC keyless), the staged design of Rolling Releases, verification with vcrrForceCanary, and automation with the REST API are detailed in the Deployment / CI/CD / rollback guide.
Configure with vercel.ts (type-safe) or vercel.json
2026's recommendation is vercel.ts — install @vercel/config and export typed config. Unlike vercel.json, it's executed at build time, so you can dynamically assemble the config with environment variables or API retrieval (use only one of the two).
// vercel.ts
import { routes, type VercelConfig } from "@vercel/config/v1";
export const config: VercelConfig = {
framework: "nextjs",
buildCommand: "npm run build",
rewrites: [routes.rewrite("/api/(.*)", "https://backend.example.com/$1")],
redirects: [routes.redirect("/old-docs", "/docs", { permanent: true })],
headers: [
routes.cacheControl("/static/(.*)", {
public: true,
maxAge: "1 week",
immutable: true,
}),
],
crons: [{ path: "/api/cleanup", schedule: "0 0 * * *" }],
};
Security: protect the entrance, secrets, and execution in layers
Vercel's security is, fundamentally, made effective "before you write the app's code."
The entrance: Firewall / WAF / BotID / DDoS
- DDoS mitigation is automatic on all plans.
- Vercel WAF provides custom rules (allow/deny/challenge/log/rate limit), IP blocking, and managed rulesets (Enterprise). Config changes are reflected globally in 300ms, and if there's a problem, instant rollback.
- BotID is an "invisible CAPTCHA" (Kasada). You can add bot detection without user operation to high-value routes like checkout, signup, and APIs. Basic is free on all plans, Deep Analysis is
$1/1000 checkBotId()calls for Pro.
// app/api/checkout/route.ts — サーバー側で BotID を検証
import { checkBotId } from "botid/server";
export async function POST(request: Request) {
const verification = await checkBotId();
if (verification.isBot) {
return new Response("Forbidden", { status: 403 });
}
return handleCheckout(request);
}
WAF custom rules, rate limiting, Attack Challenge Mode, BotID client instrumentation, and the vercel firewall CLI are handled in the Firewall / WAF / BotID guide.
Secrets: environment variables and OIDC
Don't write secrets in code; put them in environment variables. Variables with the NEXT_PUBLIC_ prefix are exposed to the browser, so never attach it to an API key. You can sync to the local .env with vercel env pull. To external clouds (AWS, etc.), don't place a long-lived access key; using OIDC token federation (keyless) is the 2026 standard.
vercel env pull .env.local # 本番/プレビュー/開発の環境変数をローカルへ
vercel env add STRIPE_SECRET_KEY production
The pitfall in Fluid Compute (repeated): instances are shared by multiple requests. Don't cache request-specific tokens or user information in a module-scope global variable. It becomes a hotbed of cross-tenant leakage and information leak.
Data and storage: where to place state
Vercel Postgres / Vercel KV are discontinued; now you use them differently by purpose.
| Purpose | Option | Characteristics |
|---|---|---|
| Objects (images, video, PDF, etc.) | Vercel Blob | public/private, S3 backend (11 nines durability), the @vercel/blob SDK, CDN delivery |
| Ultra-low-latency reads (flags, redirects, IP blocks) | Edge Config | Global reads of under P99 15ms (often under 1ms). For data that's rarely updated and read at high frequency |
| Relational DB | Marketplace: Neon (Postgres) | Serverless Postgres. Integrate with vercel integration, environment-variable auto-injection |
| KV / cache / rate limiting | Marketplace: Upstash (Redis) | Serverless Redis |
// Vercel Blob:private で保存し、関数経由で配信
import { put } from "@vercel/blob";
const blob = await put(`invoices/${id}.pdf`, pdfBuffer, {
access: "private", // 'public' も選べる(作成後は変更不可)
addRandomSuffix: true, // 上書き事故を避け、URL 衝突を防ぐ
});
// Edge Config:機能フラグを超低遅延で読む(Middleware/関数)
import { get } from "@vercel/edge-config";
const maintenance = await get<boolean>("maintenance_mode");
if (maintenance) {
return Response.redirect(new URL("/maintenance", request.url));
}
Blob's client/server upload, conditional write (ifMatch), multipart, Edge Config's write operation, and the flow of Marketplace integration are handled in the Storage / Blob / Edge Config guide.
Cost: understanding Active CPU billing changes the design
Vercel Functions' billing is three axes, a different thing from the conventional "wall-clock GB-seconds."
| Axis | Billing target | Important property |
|---|---|---|
| Active CPU | The time the code actually used the CPU | The I/O wait of DB queries and AI calls isn't billed (CPU billing is paused) |
| Provisioned Memory | Allocated memory × instance running time (GB-hr) | Billing continues even during I/O wait. Until the last in-flight request finishes |
| Invocations | The number of incoming requests | One by one regardless of success/failure. Pro is $0.60/million |
Hobby has a free tier of Active CPU 4 hours, memory 360 GB-hr, Invocations 1 million/month. The official calculation example (sample): in São Paulo (CPU $0.221/h, memory $0.0183/GB-h), an invocation of 4GB memory, CPU 4 seconds, instance lifetime 10 seconds is —
- CPU: (4 / 3600) × $0.221 = $0.0002456
- Memory: (4 × 10 / 3600) × $0.0183 = $0.0002033
- Total: $0.0004489 / invocation
The point where the design changes: an I/O-centric (AI/external-API/DB) app often becomes cheaper than before thanks to Active CPU billing and Fluid's concurrency. Conversely, CPU-bound processing like image processing consumes a lot of Active CPU. That's exactly why "separate heavy CPU processing from the function (turn it into a job/workflow)" is effective.
Per-region unit prices (Tokyo hnd1, Osaka kix1 are CPU $0.202/h, memory $0.0167/GB-h), optimization of maxDuration/memory, and cost monitoring are detailed in the Cost / Active CPU optimization guide.
Observability and resilience: create a "traceable, non-crashing" state in production
- Observability / Speed Insights: track function execution time, error rate, and Core Web Vitals on the dashboard. The canary comparison of Rolling Releases is also here.
- Resilience: Fluid's cross-region / AZ failover (if one zone goes down, automatic transfer to another AZ in the same region → a neighboring region) and the deploy's Instant Rollback are the two pillars.
- Idempotency: design the
waitUntilpost-processing and webhook reception to be idempotent on the premise of "at least once" (don't double-charge / double-send even if the same event arrives twice). - Graceful shutdown: Fluid sends a signal before termination. Implement the completion of in-flight requests and connection closing.
These are a continuum with the design principles that have achieved "0 double charges in payments" (payment idempotency design).
2026's new primitives (you lose by not knowing)
Vercel is, beyond "hosting," increasing the building blocks of the app themselves.
| Product | Status | What it's for |
|---|---|---|
| AI Gateway | GA (2025-08) | A unified API to multiple AI providers. Observability, fallback, zero data retention. Switch with a "provider/model" string |
| Workflows (WDK) | — | Durable workflows of minutes to months. pause/resume, retry, crash resistance |
| Queues | Public beta | At-least-once durable event streaming (on Fluid Compute) |
| Sandbox | GA (2026-01) | A microVM that runs untrusted code (AI-generated, etc.) in isolation |
| BotID | GA (2025-06) | An invisible CAPTCHA (described above) |
| Sign in with Vercel | GA (2025-11) | An OAuth provider for third-party apps |
| Vercel Agent | Public beta | AI code review, production-incident investigation |
If you build an AI feature, before directly hitting a provider-dedicated SDK, consider "provider/model" via the AI Gateway as the default (Vercel AI SDK production implementation).
Pre-production-release checklist
These are the items I actually confirm when I ship a new Vercel project to production.
- Fluid Compute is enabled, and you don't place request-specific data in global state
- You make the function's
maxDurationand memory explicit to match the purpose (don't leave the default 300 seconds) - You separate long-running/heavy-CPU processing from the function to Workflows/Queues/jobs
- You choose the cache strategy from the four layers (static/ISR/CDN/Runtime) and confirm HIT with
x-vercel-cache - The procedure of preview → Promote → (Rolling Releases) → Instant Rollback if needed is established
- If you use Rolling Releases, you enable Skew Protection
- You protect high-value routes with WAF / BotID, and DDoS mitigation works automatically
- Secrets are in environment variables, you don't put confidential data in
NEXT_PUBLIC_, and external clouds are OIDC keyless - You choose storage by purpose (Blob / Edge Config / Marketplace)
- You estimate cost on the premise of Active CPU billing, and monitor actual values with Observability
- Webhooks/post-processing are idempotent, and you implement graceful shutdown
Summary: design Vercel as a "platform"
Vercel in 2026 is not a place to put the front-end but a full-compute platform that bundles compute, data, delivery, and security. Production quality isn't decided by "I could deploy" but by whether you can build in from the start these five points —
- Understand Fluid Compute's concurrency and error isolation, and don't pollute global state
- Choose the four cache layers by requirements
- Ship safely with preview / Promote / Instant Rollback / Rolling Releases
- Protect the entrance with WAF/BotID, and separate the secret and execution boundaries
- Design cost and architecture on the premise of Active CPU billing
The real code of each theme is gathered in this cluster's individual articles. First, start by confirming, one at a time, your project's "cache layer" and "billing axis."
This article is based on the Vercel official documentation (Fluid Compute / Functions / ISR / CDN Cache / Rolling Releases / WAF / BotID / Blob / Edge Config, as of June 2026), reconstructed with the addition of practical-operation judgment axes. Since specs and prices get updated, confirm the latest values on each official page when adopting in production.