Vercel Functions are easy up to "put a file in api/ and it works." What's hard is production quality — won't data mix across concurrent requests, won't cost balloon on I/O waits, won't post-processing be dropped, won't Cron be hit defenselessly. This article, faithful to the official specs of Vercel Functions and Fluid Compute, collects in real code how to make functions that don't fall over, are traceable, and are cheap.
For the big picture (layers other than compute), see the Vercel production-operations guide, and for billing details, the Active CPU optimization guide. This piece concentrates on "how to write Functions."
Fluid Compute: multiple requests on one instance
Why it gets faster and cheaper
Traditional serverless was "1 request = 1 instance (microVM)." With this,
- a cold start can occur on every request
- while the function is waiting for a DB or AI response, the instance occupies one request and idles
Fluid Compute has one function instance process multiple invocations concurrently. In official words, "optimized concurrency." Since the same instance can process another request during an I/O wait, cold starts decrease, the total number of instances needed decreases, and cost drops. It's especially effective for I/O-bound processing like AI (embedding, vector search, external APIs).
// app/api/recommend/route.ts
// I/O バウンドな処理の典型。Fluid Compute では、この await の「待ち時間」に
// 同じインスタンスが別リクエストを処理できる。
export async function POST(request: Request) {
const { userId } = await request.json();
// いずれも外部I/O(CPUはほぼ使わない=Active CPU課金が増えない)
const [embedding, profile] = await Promise.all([
fetchEmbedding(userId), // AI API
db.users.findById(userId) // DB
]);
const items = await vectorSearch(embedding); // ベクトルDB
return Response.json({ items, profile });
}
The biggest trap: shared global state
The essence of Fluid Compute is "multiple requests share the same process = global state." This is a performance advantage and at the same time the most common cause of security bugs.
// ❌ 危険:リクエスト固有のデータをモジュールスコープに置く
let currentUser: User | null = null; // 全リクエストで共有される!
export async function GET(request: Request) {
currentUser = await authenticate(request); // 別リクエストが上書きする競合
return Response.json(await getDashboard(currentUser)); // 他人のデータが混ざりうる
}
// ✅ 安全:リクエスト固有のデータは関数スコープに閉じる
export async function GET(request: Request) {
const user = await authenticate(request); // ローカル変数
return Response.json(await getDashboard(user));
}
// ✅ グローバルに置いてよいのは「リクエスト非依存」のものだけ
// (DB接続プール・設定・コンパイル済みスキーマなど)
const pool = createPool(process.env.DATABASE_URL!);
The discipline: don't put user-, token-, or tenant-derived values in a module-scope
let/ mutable object. Share only "things that are the same and safe for any request." This is the same discipline as a Node.js server implementation, but those migrating from serverless overlook it most.
Error isolation
Fluid Compute, even when an uncaught exception / unhandled rejection occurs in Node.js, logs the error and completes the other in-flight requests before stopping the process. One broken request doesn't drag down the other co-resident requests. Still, it's not "swallowing," so the premise is to handle exceptions appropriately within each request.
Choosing a runtime
Fluid Compute runs on these runtimes (runtimes).
| Runtime | Optimized concurrency | Where to use |
|---|---|---|
| Node.js 24 LTS (default) | ✅ | Most apps. Full Node.js API. Node 18 is deprecated |
| Python (3.13/3.14) | ✅ | FastAPI, etc. Data/ML adjacent |
| Edge | — | Lightweight, ultra-low latency. But compatibility issues (new ones basically recommend Fluid/Node) |
| Bun | — | Bun-native processing |
| Rust | — | Parts needing CPU intensity / low latency |
The 2026 guideline: not "Edge because I want it fast," but first Fluid Compute (Node.js). Edge and Middleware run internally on Vercel Functions, and Fluid lets you use ordinary Node.js in the same region at the same price.
Make maxDuration and memory explicit
The default timeout is 300 seconds on all plans. Pro/Ent can set up to 800 seconds (GA), and the extended 1800 seconds is beta (per-function setting). Avoid leaving the default and make it explicit per use.
// app/api/report/route.ts
export const maxDuration = 60; // この関数は最大60秒(秒単位)
export const runtime = "nodejs"; // 既定。明示しておくと意図が伝わる
export async function GET() {
return Response.json(await buildHeavyReport());
}
The setting priority is function code > vercel.json > dashboard > Fluid default. The value written in code takes top priority. Memory is up to 4GB/2vCPU on Pro/Ent, 2GB/1vCPU on Hobby.
Streaming: return the first byte fast
For LLM generation or sequential report output, returning bit by bit without waiting for completion transforms the UX. You can implement it with the standard ReadableStream.
// app/api/stream/route.ts — テキストを逐次ストリーム
export async function GET() {
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
for (const chunk of await generateChunks()) {
controller.enqueue(encoder.encode(chunk));
}
controller.close();
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "no-store", // ストリームはキャッシュしない
},
});
}
Edge runtime constraint: when running on Edge, unless you start sending the response within 25 seconds, you lose streaming capability, and after that you can stream for up to 300 seconds (limits). For an AI chat UI, using the Vercel AI SDK lets you handle this streaming type-safely.
waitUntil: post-response background processing
"I want to return to the user immediately, but I definitely want to do logging, analytics, webhook forwarding, and cache updates" — this standard requirement is waitUntil. It keeps the instance alive after returning the response and completes the post-processing.
import { waitUntil } from "@vercel/functions";
export async function POST(request: Request) {
const event = await request.json();
const result = await processOrder(event); // ユーザーが待つ処理
// レスポンスは即返す。後処理はバックグラウンドで継続
waitUntil(
Promise.allSettled([
logToAnalytics(event), // 分析
sendSlackNotification(result), // 通知
revalidateRelatedCaches(result) // キャッシュ更新
])
);
return Response.json({ ok: true, orderId: result.id });
}
As a set with idempotency: design
waitUntilpost-processing and webhook reception idempotently on the premise of "at-least-once." So that even if the same event arrives twice, no double notification or double charge occurs. The design of an idempotency key is the same principle as the payment idempotency guide. UsingPromise.allSettledso that one post-processing failure doesn't drag down the others is also a point.
Cron Jobs: scheduled execution safely
Backups, notifications, subscription-quantity updates — do periodic execution with Cron. Vercel triggers by throwing an HTTP GET at the production deployment URL (Cron Jobs).
Definition
// vercel.json
{
"$schema": "https://openapi.vercel.sh/vercel.json",
"crons": [
{ "path": "/api/cron/cleanup", "schedule": "0 0 * * *" },
{ "path": "/api/cron/digest", "schedule": "0 9 * * 1" }
]
}
Notes on cron expressions: the timezone is always UTC. Aliases like MON/JAN aren't supported. You can't specify "day (DoM)" and "weekday (DoW)" simultaneously (set one to *).
Always protect it (CRON_SECRET)
A Cron path is a public URL. Since anyone can hit it, authorize with CRON_SECRET. Vercel attaches Authorization: Bearer <CRON_SECRET> on Cron trigger (when you set CRON_SECRET in the environment variables).
// app/api/cron/cleanup/route.ts
export async function GET(request: Request) {
// ① 秘密トークンで認可(外部からの不正起動を弾く)
const auth = request.headers.get("authorization");
if (auth !== `Bearer ${process.env.CRON_SECRET}`) {
return new Response("Unauthorized", { status: 401 });
}
// ② 複数の Cron が同じパスを共有する場合、どのスケジュールかを判別
const schedule = request.headers.get("x-vercel-cron-schedule"); // 例: "0 0 * * *"
// ③ 冪等に:同じ時刻に二重起動しても安全な処理
const deleted = await deleteExpiredSessions();
return Response.json({ ok: true, schedule, deleted });
}
// 環境変数の生成(ローカル)
// openssl rand -hex 32 で生成し、vercel env add CRON_SECRET production で登録
A Cron-trigger request has User-Agent: vercel-cron/1.0, so you can verify it together if needed.
Separate heavy Cron: a Cron function is also subject to the
maxDurationconstraint. For bulk processing exceeding a few minutes, it's safe to make the Cron "just kick" and flow the actual work to Workflows / a queue.
Graceful shutdown
Fluid Compute sends a signal before instance termination on scale-in or deployment. Handling completion of in-flight requests, closing connections, and flushing buffers prevents drops on deployment.
// 接続のクリーンアップ例(モジュールスコープで一度だけ登録)
process.on("SIGTERM", async () => {
await pool.end(); // DB接続プールを閉じる
await flushTelemetry(); // 計測バッファを送り切る
});
File descriptors are limited to 1,024 (shared across concurrent executions). Leak connections and you get "too many open files." Use a connection pool and close it when done — this is especially effective under Fluid's concurrency.
Production checklist (Functions)
- No request-specific data in module-scope globals
- Globals are only request-independent things like a DB pool or config
- Make
maxDurationand memory explicit per use - Separate heavy/long processing into Workflows, a queue, or a job
- Streaming is
no-store, and for Edge start responding within 25 seconds - Post-processing is idempotent with
waitUntil+Promise.allSettled - Always protect Cron with
CRON_SECRET, mind UTC, idempotency, and DoM/DoW exclusivity - Pool connections and close on
SIGTERM, no FD leak
Conclusion
Fluid Compute reconciles "serverless ease" and "server efficiency," but in exchange it demands the discipline of shared global state.
- Confine request-specific data to function scope (most important, directly tied to security)
- The more I/O-bound, the cheaper with concurrency and Active CPU billing
- Post-processing with
waitUntil, fast initial response with streaming - Always protect Cron with CRON_SECRET, idempotently
- Separate long-running processing to Workflows
Next, go to the caching/ISR/Cache Components guide, which speeds up the response you return.
This article is based on the official documentation of Vercel Functions / Fluid Compute / Cron Jobs (as of June 2026). The spec and limits get updated, so confirm the latest values in the official docs when adopting in production.