# Vercel Functions × Fluid Compute implementation guide: concurrency, streaming, waitUntil, and Cron at production quality

> A Functions implementation guide faithful to Vercel's official docs. With real code, it systematizes: Fluid Compute's (default) optimized concurrency and the global-state trap, the Node.js/Python/Bun/Rust runtimes, streaming, post-processing with waitUntil, maxDuration/memory settings, Cron Jobs' CRON_SECRET protection, graceful shutdown, and idempotency.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: Vercel, Fluid Compute, サーバーレス, Next.js, TypeScript, 可観測性, コスト最適化
- URL: https://tomodahinata.com/en/blog/vercel-functions-fluid-compute-streaming-cron-guide
- Category: Vercel in production
- Pillar guide: https://tomodahinata.com/en/blog/vercel-production-platform-guide

## Key points

- Fluid Compute is a model that 'processes multiple requests concurrently on one instance.' This lowers cold starts and cost, but since multiple requests share the process (global state), putting request-specific state in module scope leaks information. Only request-independent things like a DB connection pool may be shared.
- The runtimes are Node.js 24 LTS (default), Python, Edge, Bun, Rust. Optimized concurrency works for Node.js and Python. The more I/O-bound the processing, the greater the benefit of concurrency.
- With waitUntil, you can run post-response background processing (logging, analytics, webhook forwarding). The production standard of completing post-processing without delaying the response to the user.
- Streaming is implemented with a ReadableStream. The Edge runtime must start responding within 25 seconds and can stream for up to 300 seconds. Offload long-running processing to Workflows.
- Cron Jobs are triggered by Vercel with an HTTP GET. Always protect with CRON_SECRET and the Authorization header, and distinguish multiple jobs with the x-vercel-cron-schedule header. The timezone is always UTC; design idempotently.

---

Vercel Functions are easy up to "put a file in `api/` and it works." What's hard is **production quality** — won't data mix across concurrent requests, won't cost balloon on I/O waits, won't post-processing be dropped, won't Cron be hit defenselessly. This article, faithful to the official specs of [Vercel Functions](https://vercel.com/docs/functions) and [Fluid Compute](https://vercel.com/docs/fluid-compute), collects in real code how to make **functions that don't fall over, are traceable, and are cheap.**

For the big picture (layers other than compute), see the [Vercel production-operations guide](/blog/vercel-production-platform-guide), and for billing details, the [Active CPU optimization guide](/blog/vercel-cost-active-cpu-pricing-optimization-guide). This piece concentrates on "how to write Functions."

---

## Fluid Compute: multiple requests on one instance

### Why it gets faster and cheaper

Traditional serverless was "1 request = 1 instance (microVM)." With this,

- a **cold start** can occur on every request
- while the function is **waiting** for a DB or AI response, the instance occupies one request and idles

Fluid Compute has **one function instance process multiple invocations concurrently.** In official words, "optimized concurrency." Since the same instance can process another request during an I/O wait, **cold starts decrease, the total number of instances needed decreases, and cost drops.** It's especially effective for I/O-bound processing like AI (embedding, vector search, external APIs).

```ts
// app/api/recommend/route.ts
// I/O バウンドな処理の典型。Fluid Compute では、この await の「待ち時間」に
// 同じインスタンスが別リクエストを処理できる。
export async function POST(request: Request) {
  const { userId } = await request.json();

  // いずれも外部I/O（CPUはほぼ使わない＝Active CPU課金が増えない）
  const [embedding, profile] = await Promise.all([
    fetchEmbedding(userId),   // AI API
    db.users.findById(userId) // DB
  ]);

  const items = await vectorSearch(embedding); // ベクトルDB
  return Response.json({ items, profile });
}
```

### The biggest trap: shared global state

The essence of Fluid Compute is **"multiple requests share the same process = global state."** This is a performance advantage and at the same time **the most common cause of security bugs.**

```ts
// ❌ 危険：リクエスト固有のデータをモジュールスコープに置く
let currentUser: User | null = null; // 全リクエストで共有される！

export async function GET(request: Request) {
  currentUser = await authenticate(request); // 別リクエストが上書きする競合
  return Response.json(await getDashboard(currentUser)); // 他人のデータが混ざりうる
}
```

```ts
// ✅ 安全：リクエスト固有のデータは関数スコープに閉じる
export async function GET(request: Request) {
  const user = await authenticate(request); // ローカル変数
  return Response.json(await getDashboard(user));
}

// ✅ グローバルに置いてよいのは「リクエスト非依存」のものだけ
//    （DB接続プール・設定・コンパイル済みスキーマなど）
const pool = createPool(process.env.DATABASE_URL!);
```

> **The discipline**: don't put **user-, token-, or tenant-derived values** in a module-scope `let` / mutable object. Share only "things that are the same and safe for any request." This is the same discipline as a Node.js server implementation, but those migrating from serverless overlook it most.

### Error isolation

Fluid Compute, even when an **uncaught exception / unhandled rejection** occurs in Node.js, logs the error and **completes the other in-flight requests before** stopping the process. One broken request doesn't drag down the other co-resident requests. Still, it's not "swallowing," so the premise is to **handle exceptions appropriately within each request.**

---

## Choosing a runtime

Fluid Compute runs on these runtimes ([runtimes](https://vercel.com/docs/functions/runtimes)).

| Runtime | Optimized concurrency | Where to use |
|---|---|---|
| **Node.js 24 LTS (default)** | ✅ | Most apps. Full Node.js API. Node 18 is deprecated |
| **Python** (3.13/3.14) | ✅ | FastAPI, etc. Data/ML adjacent |
| **Edge** | — | Lightweight, ultra-low latency. But compatibility issues (**new ones basically recommend Fluid/Node**) |
| **Bun** | — | Bun-native processing |
| **Rust** | — | Parts needing CPU intensity / low latency |

> The 2026 guideline: **not "Edge because I want it fast," but first Fluid Compute (Node.js).** Edge and Middleware run internally on Vercel Functions, and Fluid lets you use ordinary Node.js in the same region at the same price.

### Make maxDuration and memory explicit

The default timeout is **300 seconds on all plans.** Pro/Ent can set up to 800 seconds (GA), and the extended 1800 seconds is beta (per-function setting). **Avoid leaving the default** and make it explicit per use.

```ts
// app/api/report/route.ts
export const maxDuration = 60; // この関数は最大60秒（秒単位）
export const runtime = "nodejs"; // 既定。明示しておくと意図が伝わる

export async function GET() {
  return Response.json(await buildHeavyReport());
}
```

The setting priority is **function code > vercel.json > dashboard > Fluid default.** The value written in code takes top priority. Memory is up to 4GB/2vCPU on Pro/Ent, 2GB/1vCPU on Hobby.

---

## Streaming: return the first byte fast

For LLM generation or sequential report output, **returning bit by bit without waiting for completion** transforms the UX. You can implement it with the standard `ReadableStream`.

```ts
// app/api/stream/route.ts — テキストを逐次ストリーム
export async function GET() {
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      for (const chunk of await generateChunks()) {
        controller.enqueue(encoder.encode(chunk));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "no-store", // ストリームはキャッシュしない
    },
  });
}
```

> **Edge runtime constraint**: when running on Edge, unless you **start sending the response within 25 seconds**, you lose streaming capability, and after that you can stream for **up to 300 seconds** ([limits](https://vercel.com/docs/functions/limitations)). For an AI chat UI, using the [Vercel AI SDK](/blog/vercel-ai-sdk-production-llm-apps-streaming-tools-rag) lets you handle this streaming type-safely.

---

## waitUntil: post-response background processing

"I want to return to the user immediately, but I definitely want to do logging, analytics, webhook forwarding, and cache updates" — this standard requirement is `waitUntil`. It keeps the instance alive **after** returning the response and completes the post-processing.

```ts
import { waitUntil } from "@vercel/functions";

export async function POST(request: Request) {
  const event = await request.json();

  const result = await processOrder(event); // ユーザーが待つ処理

  // レスポンスは即返す。後処理はバックグラウンドで継続
  waitUntil(
    Promise.allSettled([
      logToAnalytics(event),          // 分析
      sendSlackNotification(result),  // 通知
      revalidateRelatedCaches(result) // キャッシュ更新
    ])
  );

  return Response.json({ ok: true, orderId: result.id });
}
```

> **As a set with idempotency**: design `waitUntil` post-processing and webhook reception **idempotently** on the premise of "at-least-once." So that even if the same event arrives twice, no double notification or double charge occurs. The design of an idempotency key is the same principle as the [payment idempotency guide](/blog/stripe-payments-production-guide-webhooks-idempotency-subscriptions). Using `Promise.allSettled` so that one post-processing failure doesn't drag down the others is also a point.

---

## Cron Jobs: scheduled execution safely

Backups, notifications, subscription-quantity updates — do periodic execution with Cron. Vercel triggers by throwing an **HTTP GET** at the production deployment URL ([Cron Jobs](https://vercel.com/docs/cron-jobs)).

### Definition

```json
// vercel.json
{
  "$schema": "https://openapi.vercel.sh/vercel.json",
  "crons": [
    { "path": "/api/cron/cleanup", "schedule": "0 0 * * *" },
    { "path": "/api/cron/digest",  "schedule": "0 9 * * 1" }
  ]
}
```

Notes on cron expressions: **the timezone is always UTC.** Aliases like `MON`/`JAN` aren't supported. **You can't specify "day (DoM)" and "weekday (DoW)" simultaneously** (set one to `*`).

### Always protect it (CRON_SECRET)

A Cron path is a public URL. Since **anyone can hit it**, authorize with `CRON_SECRET`. Vercel attaches `Authorization: Bearer <CRON_SECRET>` on Cron trigger (when you set `CRON_SECRET` in the environment variables).

```ts
// app/api/cron/cleanup/route.ts
export async function GET(request: Request) {
  // ① 秘密トークンで認可（外部からの不正起動を弾く）
  const auth = request.headers.get("authorization");
  if (auth !== `Bearer ${process.env.CRON_SECRET}`) {
    return new Response("Unauthorized", { status: 401 });
  }

  // ② 複数の Cron が同じパスを共有する場合、どのスケジュールかを判別
  const schedule = request.headers.get("x-vercel-cron-schedule"); // 例: "0 0 * * *"

  // ③ 冪等に：同じ時刻に二重起動しても安全な処理
  const deleted = await deleteExpiredSessions();

  return Response.json({ ok: true, schedule, deleted });
}
```

```ts
// 環境変数の生成（ローカル）
// openssl rand -hex 32 で生成し、vercel env add CRON_SECRET production で登録
```

A Cron-trigger request has `User-Agent: vercel-cron/1.0`, so you can verify it together if needed.

> **Separate heavy Cron**: a Cron function is also subject to the `maxDuration` constraint. For bulk processing exceeding a few minutes, it's safe to make the Cron "just kick" and flow the actual work to **[Workflows](https://vercel.com/docs/workflows) / a queue.**

---

## Graceful shutdown

Fluid Compute sends a signal before instance termination on scale-in or deployment. Handling **completion of in-flight requests, closing connections, and flushing buffers** prevents drops on deployment.

```ts
// 接続のクリーンアップ例（モジュールスコープで一度だけ登録）
process.on("SIGTERM", async () => {
  await pool.end();        // DB接続プールを閉じる
  await flushTelemetry();  // 計測バッファを送り切る
});
```

File descriptors are limited to **1,024 (shared across concurrent executions).** Leak connections and you get "too many open files." **Use a connection pool and close it when done** — this is especially effective under Fluid's concurrency.

---

## Production checklist (Functions)

- [ ] **No request-specific data** in module-scope globals
- [ ] Globals are only **request-independent** things like a DB pool or config
- [ ] **Make `maxDuration` and memory explicit** per use
- [ ] **Separate heavy/long processing into Workflows, a queue, or a job**
- [ ] Streaming is `no-store`, and for Edge start responding within 25 seconds
- [ ] Post-processing is **idempotent** with `waitUntil` + `Promise.allSettled`
- [ ] **Always protect** Cron with `CRON_SECRET`, mind UTC, idempotency, and DoM/DoW exclusivity
- [ ] Pool connections and close on `SIGTERM`, no FD leak

---

## Conclusion

Fluid Compute reconciles "serverless ease" and "server efficiency," but in exchange it demands the discipline of **shared global state.**

1. **Confine request-specific data to function scope** (most important, directly tied to security)
2. The more I/O-bound, **the cheaper with concurrency and Active CPU billing**
3. **Post-processing with `waitUntil`, fast initial response with streaming**
4. **Always protect Cron with CRON_SECRET, idempotently**
5. **Separate long-running processing to Workflows**

Next, go to the [caching/ISR/Cache Components guide](/blog/vercel-caching-isr-cache-components-ppr-guide), which speeds up the response you return.

> This article is based on the official documentation of [Vercel Functions](https://vercel.com/docs/functions) / [Fluid Compute](https://vercel.com/docs/fluid-compute) / [Cron Jobs](https://vercel.com/docs/cron-jobs) (as of June 2026). The spec and limits get updated, so confirm the latest values in the official docs when adopting in production.
