Skip to main content
友田 陽大
Vercel in production
Vercel
コスト最適化
Fluid Compute
サーバーレス
可観測性
アーキテクチャ設計
TypeScript

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

A cost-optimization guide faithful to Vercel's official docs. It explains, with real figures, Fluid Compute's Active CPU billing (billed only for CPU execution time, not I/O wait), the three axes of Provisioned Memory (GB-hr) and Invocations, region-specific unit prices, the official formula, the free tier (Hobby: 4 CPU hours / 360 GB-hr / 1M Invocations), and reduction measures via caching, maxDuration, memory, concurrency, and job separation.

Published
Reading time
7 min read
Author
友田 陽大
Share

"Vercel is expensive at scale" — this happens when you design while misunderstanding the billing model. In 2026, Vercel Functions use Active CPU billing, fundamentally different from traditional serverless (wall-clock GB-seconds). Understand the model correctly and you can make design decisions where the bill changes greatly even for the same app. This article summarizes how cost is made and how to lower it, faithful to the official specs and real figures of Fluid compute pricing.

For the full picture, see the Vercel production-operations guide.


First, understand the three billing axes

Vercel Functions (Fluid Compute) billing is the sum of three axes.

AxisWhat's billedThe decisive property
Active CPUThe time your code actually used the CPU (ms)Billing pauses during I/O wait. Per CPU-hr
Provisioned MemoryAllocated memory × instance uptimeBilling continues during I/O wait too. Per GB-hr
InvocationsNumber of incoming requestsOne each, success or failure. Pro $0.60/1M

How Active CPU changes the thinking

The official explanation itself is the core of the design.

You are only billed during actual code execution and not during I/O operations (database queries, like AI model calls, etc.) ... Pauses billing when your code is waiting for external services. (— Fluid compute pricing)

In other words — while the function is waiting for a DB or AI response, CPU billing stops. For example, a function that takes "100ms for data processing and 400ms waiting for a DB query" is billed for only 100ms of Active CPU.

Two consequences follow.

  1. I/O-bound apps (AI, external APIs, DB-centric) get cheaper. Because wait time isn't billed.
  2. CPU-bound processing (image processing, cryptography, large JSON conversion) consumes a lot of Active CPU. So "separate heavy CPU processing from the function" works directly.

However, Provisioned Memory is billed during I/O wait too. Since it's "memory × instance lifetime," excessive memory allocation and sluggishly long instance lifetimes push up memory billing. Even when Active CPU stops, memory doesn't — this is easy to overlook.


The free tier and region-specific unit prices (real figures)

Hobby free tier (monthly)

ResourceFree tier
Active CPU4 hours
Provisioned Memory360 GB-hr
Invocations1 million

Pro uses on-demand billing, and a monthly Pro usage credit can offset it.

Region-specific unit prices (excerpt)

Unit prices change by region. Here's an excerpt of areas around Japan and major regions (per-hour CPU / per-GB-hr memory).

RegionActive CPU (/h)Provisioned Memory (/GB-h)
Tokyo hnd1$0.202$0.0167
Osaka kix1$0.202$0.0167
Washington D.C. iad1 (default)$0.128$0.0106
Portland pdx1$0.128$0.0106
Singapore sin1$0.160$0.0133
Frankfurt fra1$0.184$0.0152

Region selection is also cost: the default iad1 is in the cheapest band. If latency requirements allow, be conscious of the trade-off between the unit price and performance of a region near your users (but since the distance to the DB also affects total latency and transfer, the basic is to bring the function and DB close).

The official calculation example

In São Paulo (CPU $0.221/h, memory $0.0183/GB-h), one invocation with 4GB memory, 4 seconds of Active CPU, and a 10-second instance lifetime:

  • CPU: (4 / 3600) × $0.221 = $0.0002456
  • Memory: (4GB × 10 / 3600) × $0.0183 = $0.0002033
  • Total: $0.0004489 / invocation

What this formula shows is that the memory allocation (4GB) and instance lifetime (10 seconds) dominate the memory billing, while Active CPU (4 seconds) is only what the CPU was used for.


Reduction measures: in effective order

Rather than blind saving, put your hands in in order of greatest impact.

① Eliminate the Invocation itself with caching (the biggest effect)

What works most is to "not call the function." On a cache hit, Active CPU, memory, and Invocation are all three axes zero.

// 関数レスポンスを CDN にキャッシュ → 2回目以降は関数が走らない
export async function GET() {
  return Response.json(await getCatalog(), {
    headers: { "Cache-Control": "public, s-maxage=300, stale-while-revalidate=600" },
  });
}
  • Pages with ISR, APIs with CDN Cache, data pieces with Runtime Cache (caching strategy).
  • Measure HIT with x-vercel-cache. This is proof of "not called = not billed."
  • ISR's request coalescing bundles function calls to the same path into one per region during a spike, suppressing Invocations.

② Separate heavy CPU processing from the function

Move processing that eats Active CPU (image/video conversion, large aggregations, cryptography) off the synchronous path of the user request.

  • For things that can be post-processed, waitUntil (run after the response; Functions guide).
  • For long-running, stateful things, to Workflows / queues.
  • With this, "the function the user waits for" becomes I/O-centric, and Active CPU drops.

③ Match maxDuration and memory to the use

  • Memory is billed during I/O wait too. 4GB is excessive for an I/O-centric function that barely uses the CPU. Lower it to the use.
  • An over-set maxDuration risks a hung function staying alive to the limit and leaking memory billing. State a reasonable limit.
// I/O 主体の軽い関数:メモリ控えめ・タイムアウト短め
export const maxDuration = 15; // だらだら生かさない
// メモリはダッシュボード/設定で用途に合わせて(Hobby 2GB/Pro 最大4GB)

④ Reduce the total number of instances with Fluid's concurrency

Fluid Compute's optimized concurrency has the same instance handle another request during I/O wait, so the total number of needed instances = the total amount of memory billing decreases. This works automatically, but it presupposes the discipline of not placing request-specific data in global state (making concurrent processing safe) (Functions guide).

⑤ Look at the other billing drivers

  • BotID Deep Analysis is per checkBotId() call ($1/1000). Limit to high-value routes (Firewall/BotID).
  • Blob is storage (GB-month, averaged on 15-minute snapshots) + data transfer. Avoid transfer billing with client upload, and make caching effective with immutable operation (storage).
  • Image optimization and Fast Data Transfer — also check the breakdown in Observability.

Monitor cost

To avoid "being surprised when the bill comes," continuously monitor real values in Observability.

  • Look at the breakdown of each function's Active CPU / memory / Invocation.
  • Identify which routes mass-produce Invocations (= candidates to cache).
  • Identify which routes eat Active CPU (= candidates to separate/optimize).
  • Detect budget overruns with Spend Management (spend caps, alerts).

The order of measure → optimize: rather than raising/lowering memory by guesswork, identify heavy routes in Observability → trim with ②③ → re-measure. This is the same as the principles of observability/SRE — tracing back "from the symptom (bill, latency) to the cause (a specific route)" is the royal road.


Production checklist (cost)

  • Designed on the premise of Active CPU billing (I/O wait not billed)
  • Understand that memory is billed during I/O wait too, and avoid over-allocation
  • Eliminate Invocations with caching, and measure HIT with x-vercel-cache
  • Separate heavy CPU processing to waitUntil/Workflows/jobs
  • State maxDuration and memory matched to the use
  • Don't pollute global state so Fluid's concurrency works
  • Check the breakdown of other billing drivers like BotID/Blob/image optimization
  • Continuously monitor and set limits with Observability + Spend Management

Summary

Vercel's cost is decided not by "scale" but by "design."

  1. Active CPU billing = I/O wait is free. The more I/O-centric, the cheaper
  2. Memory is billed during wait time too = over-allocation and long-lived instances are the enemy
  3. Reduction is in the order ① caching → ② separating CPU processing → ③ maxDuration/memory → ④ concurrency
  4. Measure → optimize (identify heavy routes in Observability)
  5. Look at all billing axes — region unit prices, BotID, Blob transfer, etc.

Make the billing model your ally and Vercel, far from "expensive at scale," becomes one of the most cost-efficient platforms for I/O-centric modern apps. Start the implementation from this cluster's Vercel production-operations guide.

This article is based on the Fluid compute pricing / Functions Limits official documentation (as of June 2026). Prices, free tiers, and region unit prices fluctuate, so estimate with the latest official values at production adoption.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Stuck on Vercel design or cost?

Technical advisory for Vercel architecture & cost optimization

Assuming Active CPU billing: what to cache and which work to move off functions. Edge or Fluid, Vercel or another platform, whether a migration from AWS makes sense. As a technical advisor, I'll help decide the Vercel configuration and billing optimization that fit your load profile, team, and budget.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading