Vercel Functions × Fluid Compute implementation guide: concurrency, streaming, waitUntil, and Cron at production quality

Vercel Functions are easy up to "put a file in api/ and it works." What's hard is production quality — won't data mix across concurrent requests, won't cost balloon on I/O waits, won't post-processing be dropped, won't Cron be hit defenselessly. This article, faithful to the official specs of Vercel Functions and Fluid Compute, collects in real code how to make functions that don't fall over, are traceable, and are cheap.

For the big picture (layers other than compute), see the Vercel production-operations guide, and for billing details, the Active CPU optimization guide. This piece concentrates on "how to write Functions."

Fluid Compute: multiple requests on one instance

Why it gets faster and cheaper

Traditional serverless was "1 request = 1 instance (microVM)." With this,

a cold start can occur on every request
while the function is waiting for a DB or AI response, the instance occupies one request and idles

Fluid Compute has one function instance process multiple invocations concurrently. In official words, "optimized concurrency." Since the same instance can process another request during an I/O wait, cold starts decrease, the total number of instances needed decreases, and cost drops. It's especially effective for I/O-bound processing like AI (embedding, vector search, external APIs).

// app/api/recommend/route.ts
// I/O バウンドな処理の典型。Fluid Compute では、この await の「待ち時間」に
// 同じインスタンスが別リクエストを処理できる。
export async function POST(request: Request) {
  const { userId } = await request.json();

  // いずれも外部I/O（CPUはほぼ使わない＝Active CPU課金が増えない）
  const [embedding, profile] = await Promise.all([
    fetchEmbedding(userId),   // AI API
    db.users.findById(userId) // DB
  ]);

  const items = await vectorSearch(embedding); // ベクトルDB
  return Response.json({ items, profile });
}

The biggest trap: shared global state

The essence of Fluid Compute is "multiple requests share the same process = global state." This is a performance advantage and at the same time the most common cause of security bugs.

// ❌ 危険：リクエスト固有のデータをモジュールスコープに置く
let currentUser: User | null = null; // 全リクエストで共有される！

export async function GET(request: Request) {
  currentUser = await authenticate(request); // 別リクエストが上書きする競合
  return Response.json(await getDashboard(currentUser)); // 他人のデータが混ざりうる
}

// ✅ 安全：リクエスト固有のデータは関数スコープに閉じる
export async function GET(request: Request) {
  const user = await authenticate(request); // ローカル変数
  return Response.json(await getDashboard(user));
}

// ✅ グローバルに置いてよいのは「リクエスト非依存」のものだけ
//    （DB接続プール・設定・コンパイル済みスキーマなど）
const pool = createPool(process.env.DATABASE_URL!);

The discipline: don't put user-, token-, or tenant-derived values in a module-scope let / mutable object. Share only "things that are the same and safe for any request." This is the same discipline as a Node.js server implementation, but those migrating from serverless overlook it most.

Error isolation

Fluid Compute, even when an uncaught exception / unhandled rejection occurs in Node.js, logs the error and completes the other in-flight requests before stopping the process. One broken request doesn't drag down the other co-resident requests. Still, it's not "swallowing," so the premise is to handle exceptions appropriately within each request.

Choosing a runtime

Fluid Compute runs on these runtimes (runtimes).

Runtime	Optimized concurrency	Where to use
Node.js 24 LTS (default)	✅	Most apps. Full Node.js API. Node 18 is deprecated
Python (3.13/3.14)	✅	FastAPI, etc. Data/ML adjacent
Edge	—	Lightweight, ultra-low latency. But compatibility issues (new ones basically recommend Fluid/Node)
Bun	—	Bun-native processing
Rust	—	Parts needing CPU intensity / low latency

The 2026 guideline: not "Edge because I want it fast," but first Fluid Compute (Node.js). Edge and Middleware run internally on Vercel Functions, and Fluid lets you use ordinary Node.js in the same region at the same price.

Make maxDuration and memory explicit

The default timeout is 300 seconds on all plans. Pro/Ent can set up to 800 seconds (GA), and the extended 1800 seconds is beta (per-function setting). Avoid leaving the default and make it explicit per use.

// app/api/report/route.ts
export const maxDuration = 60; // この関数は最大60秒（秒単位）
export const runtime = "nodejs"; // 既定。明示しておくと意図が伝わる

export async function GET() {
  return Response.json(await buildHeavyReport());
}

The setting priority is function code > vercel.json > dashboard > Fluid default. The value written in code takes top priority. Memory is up to 4GB/2vCPU on Pro/Ent, 2GB/1vCPU on Hobby.

Streaming: return the first byte fast

For LLM generation or sequential report output, returning bit by bit without waiting for completion transforms the UX. You can implement it with the standard ReadableStream.

// app/api/stream/route.ts — テキストを逐次ストリーム
export async function GET() {
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for (const chunk of await generateChunks()) {
        controller.enqueue(encoder.encode(chunk));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "no-store", // ストリームはキャッシュしない
    },
  });
}

Edge runtime constraint: when running on Edge, unless you start sending the response within 25 seconds, you lose streaming capability, and after that you can stream for up to 300 seconds (limits). For an AI chat UI, using the Vercel AI SDK lets you handle this streaming type-safely.

waitUntil: post-response background processing

"I want to return to the user immediately, but I definitely want to do logging, analytics, webhook forwarding, and cache updates" — this standard requirement is waitUntil. It keeps the instance alive after returning the response and completes the post-processing.

import { waitUntil } from "@vercel/functions";

export async function POST(request: Request) {
  const event = await request.json();

  const result = await processOrder(event); // ユーザーが待つ処理

  // レスポンスは即返す。後処理はバックグラウンドで継続
  waitUntil(
    Promise.allSettled([
      logToAnalytics(event),          // 分析
      sendSlackNotification(result),  // 通知
      revalidateRelatedCaches(result) // キャッシュ更新
    ])
  );

  return Response.json({ ok: true, orderId: result.id });
}

As a set with idempotency: design waitUntil post-processing and webhook reception idempotently on the premise of "at-least-once." So that even if the same event arrives twice, no double notification or double charge occurs. The design of an idempotency key is the same principle as the payment idempotency guide. Using Promise.allSettled so that one post-processing failure doesn't drag down the others is also a point.

Cron Jobs: scheduled execution safely

Backups, notifications, subscription-quantity updates — do periodic execution with Cron. Vercel triggers by throwing an HTTP GET at the production deployment URL (Cron Jobs).

Definition

// vercel.json
{
  "$schema": "https://openapi.vercel.sh/vercel.json",
  "crons": [
    { "path": "/api/cron/cleanup", "schedule": "0 0 * * *" },
    { "path": "/api/cron/digest",  "schedule": "0 9 * * 1" }
  ]
}

Notes on cron expressions: the timezone is always UTC. Aliases like MON/JAN aren't supported. You can't specify "day (DoM)" and "weekday (DoW)" simultaneously (set one to *).

Always protect it (CRON_SECRET)

A Cron path is a public URL. Since anyone can hit it, authorize with CRON_SECRET. Vercel attaches Authorization: Bearer <CRON_SECRET> on Cron trigger (when you set CRON_SECRET in the environment variables).

// app/api/cron/cleanup/route.ts
export async function GET(request: Request) {
  // ① 秘密トークンで認可（外部からの不正起動を弾く）
  const auth = request.headers.get("authorization");
  if (auth !== `Bearer ${process.env.CRON_SECRET}`) {
    return new Response("Unauthorized", { status: 401 });
  }

  // ② 複数の Cron が同じパスを共有する場合、どのスケジュールかを判別
  const schedule = request.headers.get("x-vercel-cron-schedule"); // 例: "0 0 * * *"

  // ③ 冪等に：同じ時刻に二重起動しても安全な処理
  const deleted = await deleteExpiredSessions();

  return Response.json({ ok: true, schedule, deleted });
}

// 環境変数の生成（ローカル）
// openssl rand -hex 32 で生成し、vercel env add CRON_SECRET production で登録

A Cron-trigger request has User-Agent: vercel-cron/1.0, so you can verify it together if needed.

Separate heavy Cron: a Cron function is also subject to the maxDuration constraint. For bulk processing exceeding a few minutes, it's safe to make the Cron "just kick" and flow the actual work to Workflows / a queue.

Graceful shutdown

Fluid Compute sends a signal before instance termination on scale-in or deployment. Handling completion of in-flight requests, closing connections, and flushing buffers prevents drops on deployment.

// 接続のクリーンアップ例（モジュールスコープで一度だけ登録）
process.on("SIGTERM", async () => {
  await pool.end();        // DB接続プールを閉じる
  await flushTelemetry();  // 計測バッファを送り切る
});

File descriptors are limited to 1,024 (shared across concurrent executions). Leak connections and you get "too many open files." Use a connection pool and close it when done — this is especially effective under Fluid's concurrency.

Production checklist (Functions)

No request-specific data in module-scope globals
Globals are only request-independent things like a DB pool or config
Make maxDuration and memory explicit per use
Separate heavy/long processing into Workflows, a queue, or a job
Streaming is no-store, and for Edge start responding within 25 seconds
Post-processing is idempotent with waitUntil + Promise.allSettled
Always protect Cron with CRON_SECRET, mind UTC, idempotency, and DoM/DoW exclusivity
Pool connections and close on SIGTERM, no FD leak

Conclusion

Fluid Compute reconciles "serverless ease" and "server efficiency," but in exchange it demands the discipline of shared global state.

Confine request-specific data to function scope (most important, directly tied to security)
The more I/O-bound, the cheaper with concurrency and Active CPU billing
Post-processing with waitUntil, fast initial response with streaming
Always protect Cron with CRON_SECRET, idempotently
Separate long-running processing to Workflows

Next, go to the caching/ISR/Cache Components guide, which speeds up the response you return.

This article is based on the official documentation of Vercel Functions / Fluid Compute / Cron Jobs (as of June 2026). The spec and limits get updated, so confirm the latest values in the official docs when adopting in production.

Vercel Functions × Fluid Compute implementation guide: concurrency, streaming, waitUntil, and Cron at production quality

Fluid Compute: multiple requests on one instance

Why it gets faster and cheaper

The biggest trap: shared global state

Error isolation

Choosing a runtime

Make maxDuration and memory explicit

Streaming: return the first byte fast

waitUntil: post-response background processing

Cron Jobs: scheduled execution safely

Definition

Always protect it (CRON_SECRET)

Graceful shutdown

Production checklist (Functions)

Conclusion

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Also worth reading

Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

Fluid Compute: multiple requests on one instance

Why it gets faster and cheaper

The biggest trap: shared global state

Error isolation

Choosing a runtime

Make maxDuration and memory explicit

Streaming: return the first byte fast

waitUntil: post-response background processing

Cron Jobs: scheduled execution safely

Definition

Always protect it (CRON_SECRET)

Graceful shutdown

Production checklist (Functions)

Conclusion

Related articles

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

Also worth reading

Google Cloud Run Production-Operations Guide: Container Contract, Concurrency, Auto-Scale, Deploy, Cost, and Security in Real Code

Azure Container Apps Production Operations Guide: Designing, Scaling, Deploying, Costing, and Securing Serverless Containers, with Real Code

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization