"Vercel is expensive at scale" — this happens when you design while misunderstanding the billing model. In 2026, Vercel Functions use Active CPU billing, fundamentally different from traditional serverless (wall-clock GB-seconds). Understand the model correctly and you can make design decisions where the bill changes greatly even for the same app. This article summarizes how cost is made and how to lower it, faithful to the official specs and real figures of Fluid compute pricing.
For the full picture, see the Vercel production-operations guide.
First, understand the three billing axes
Vercel Functions (Fluid Compute) billing is the sum of three axes.
| Axis | What's billed | The decisive property |
|---|---|---|
| Active CPU | The time your code actually used the CPU (ms) | Billing pauses during I/O wait. Per CPU-hr |
| Provisioned Memory | Allocated memory × instance uptime | Billing continues during I/O wait too. Per GB-hr |
| Invocations | Number of incoming requests | One each, success or failure. Pro $0.60/1M |
How Active CPU changes the thinking
The official explanation itself is the core of the design.
You are only billed during actual code execution and not during I/O operations (database queries, like AI model calls, etc.) ... Pauses billing when your code is waiting for external services. (— Fluid compute pricing)
In other words — while the function is waiting for a DB or AI response, CPU billing stops. For example, a function that takes "100ms for data processing and 400ms waiting for a DB query" is billed for only 100ms of Active CPU.
Two consequences follow.
- I/O-bound apps (AI, external APIs, DB-centric) get cheaper. Because wait time isn't billed.
- CPU-bound processing (image processing, cryptography, large JSON conversion) consumes a lot of Active CPU. So "separate heavy CPU processing from the function" works directly.
However, Provisioned Memory is billed during I/O wait too. Since it's "memory × instance lifetime," excessive memory allocation and sluggishly long instance lifetimes push up memory billing. Even when Active CPU stops, memory doesn't — this is easy to overlook.
The free tier and region-specific unit prices (real figures)
Hobby free tier (monthly)
| Resource | Free tier |
|---|---|
| Active CPU | 4 hours |
| Provisioned Memory | 360 GB-hr |
| Invocations | 1 million |
Pro uses on-demand billing, and a monthly Pro usage credit can offset it.
Region-specific unit prices (excerpt)
Unit prices change by region. Here's an excerpt of areas around Japan and major regions (per-hour CPU / per-GB-hr memory).
| Region | Active CPU (/h) | Provisioned Memory (/GB-h) |
|---|---|---|
Tokyo hnd1 | $0.202 | $0.0167 |
Osaka kix1 | $0.202 | $0.0167 |
Washington D.C. iad1 (default) | $0.128 | $0.0106 |
Portland pdx1 | $0.128 | $0.0106 |
Singapore sin1 | $0.160 | $0.0133 |
Frankfurt fra1 | $0.184 | $0.0152 |
Region selection is also cost: the default
iad1is in the cheapest band. If latency requirements allow, be conscious of the trade-off between the unit price and performance of a region near your users (but since the distance to the DB also affects total latency and transfer, the basic is to bring the function and DB close).
The official calculation example
In São Paulo (CPU $0.221/h, memory $0.0183/GB-h), one invocation with 4GB memory, 4 seconds of Active CPU, and a 10-second instance lifetime:
- CPU: (4 / 3600) × $0.221 = $0.0002456
- Memory: (4GB × 10 / 3600) × $0.0183 = $0.0002033
- Total: $0.0004489 / invocation
What this formula shows is that the memory allocation (4GB) and instance lifetime (10 seconds) dominate the memory billing, while Active CPU (4 seconds) is only what the CPU was used for.
Reduction measures: in effective order
Rather than blind saving, put your hands in in order of greatest impact.
① Eliminate the Invocation itself with caching (the biggest effect)
What works most is to "not call the function." On a cache hit, Active CPU, memory, and Invocation are all three axes zero.
// 関数レスポンスを CDN にキャッシュ → 2回目以降は関数が走らない
export async function GET() {
return Response.json(await getCatalog(), {
headers: { "Cache-Control": "public, s-maxage=300, stale-while-revalidate=600" },
});
}
- Pages with ISR, APIs with CDN Cache, data pieces with Runtime Cache (caching strategy).
- Measure
HITwithx-vercel-cache. This is proof of "not called = not billed." - ISR's request coalescing bundles function calls to the same path into one per region during a spike, suppressing Invocations.
② Separate heavy CPU processing from the function
Move processing that eats Active CPU (image/video conversion, large aggregations, cryptography) off the synchronous path of the user request.
- For things that can be post-processed,
waitUntil(run after the response; Functions guide). - For long-running, stateful things, to Workflows / queues.
- With this, "the function the user waits for" becomes I/O-centric, and Active CPU drops.
③ Match maxDuration and memory to the use
- Memory is billed during I/O wait too. 4GB is excessive for an I/O-centric function that barely uses the CPU. Lower it to the use.
- An over-set
maxDurationrisks a hung function staying alive to the limit and leaking memory billing. State a reasonable limit.
// I/O 主体の軽い関数:メモリ控えめ・タイムアウト短め
export const maxDuration = 15; // だらだら生かさない
// メモリはダッシュボード/設定で用途に合わせて(Hobby 2GB/Pro 最大4GB)
④ Reduce the total number of instances with Fluid's concurrency
Fluid Compute's optimized concurrency has the same instance handle another request during I/O wait, so the total number of needed instances = the total amount of memory billing decreases. This works automatically, but it presupposes the discipline of not placing request-specific data in global state (making concurrent processing safe) (Functions guide).
⑤ Look at the other billing drivers
- BotID Deep Analysis is per
checkBotId()call ($1/1000). Limit to high-value routes (Firewall/BotID). - Blob is storage (GB-month, averaged on 15-minute snapshots) + data transfer. Avoid transfer billing with client upload, and make caching effective with immutable operation (storage).
- Image optimization and Fast Data Transfer — also check the breakdown in Observability.
Monitor cost
To avoid "being surprised when the bill comes," continuously monitor real values in Observability.
- Look at the breakdown of each function's Active CPU / memory / Invocation.
- Identify which routes mass-produce Invocations (= candidates to cache).
- Identify which routes eat Active CPU (= candidates to separate/optimize).
- Detect budget overruns with Spend Management (spend caps, alerts).
The order of measure → optimize: rather than raising/lowering memory by guesswork, identify heavy routes in Observability → trim with ②③ → re-measure. This is the same as the principles of observability/SRE — tracing back "from the symptom (bill, latency) to the cause (a specific route)" is the royal road.
Production checklist (cost)
- Designed on the premise of Active CPU billing (I/O wait not billed)
- Understand that memory is billed during I/O wait too, and avoid over-allocation
- Eliminate Invocations with caching, and measure
HITwithx-vercel-cache - Separate heavy CPU processing to
waitUntil/Workflows/jobs - State
maxDurationand memory matched to the use - Don't pollute global state so Fluid's concurrency works
- Check the breakdown of other billing drivers like BotID/Blob/image optimization
- Continuously monitor and set limits with Observability + Spend Management
Summary
Vercel's cost is decided not by "scale" but by "design."
- Active CPU billing = I/O wait is free. The more I/O-centric, the cheaper
- Memory is billed during wait time too = over-allocation and long-lived instances are the enemy
- Reduction is in the order ① caching → ② separating CPU processing → ③ maxDuration/memory → ④ concurrency
- Measure → optimize (identify heavy routes in Observability)
- Look at all billing axes — region unit prices, BotID, Blob transfer, etc.
Make the billing model your ally and Vercel, far from "expensive at scale," becomes one of the most cost-efficient platforms for I/O-centric modern apps. Start the implementation from this cluster's Vercel production-operations guide.
This article is based on the Fluid compute pricing / Functions Limits official documentation (as of June 2026). Prices, free tiers, and region unit prices fluctuate, so estimate with the latest official values at production adoption.