Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

A cost-optimization guide faithful to Vercel's official docs. It explains, with real figures, Fluid Compute's Active CPU billing (billed only for CPU execution time, not I/O wait), the three axes of Provisioned Memory (GB-hr) and Invocations, region-specific unit prices, the official formula, the free tier (Hobby: 4 CPU hours / 360 GB-hr / 1M Invocations), and reduction measures via caching, maxDuration, memory, concurrency, and job separation.

June 28, 2026

7 min read

友田陽大

"Vercel is expensive at scale" — this happens when you design while misunderstanding the billing model. In 2026, Vercel Functions use Active CPU billing, fundamentally different from traditional serverless (wall-clock GB-seconds). Understand the model correctly and you can make design decisions where the bill changes greatly even for the same app. This article summarizes how cost is made and how to lower it, faithful to the official specs and real figures of Fluid compute pricing.

For the full picture, see the Vercel production-operations guide.

First, understand the three billing axes

Vercel Functions (Fluid Compute) billing is the sum of three axes.

Axis	What's billed	The decisive property
Active CPU	The time your code actually used the CPU (ms)	Billing pauses during I/O wait. Per CPU-hr
Provisioned Memory	Allocated memory × instance uptime	Billing continues during I/O wait too. Per GB-hr
Invocations	Number of incoming requests	One each, success or failure. Pro $0.60/1M

How Active CPU changes the thinking

The official explanation itself is the core of the design.

You are only billed during actual code execution and not during I/O operations (database queries, like AI model calls, etc.) ... Pauses billing when your code is waiting for external services. (— Fluid compute pricing)

In other words — while the function is waiting for a DB or AI response, CPU billing stops. For example, a function that takes "100ms for data processing and 400ms waiting for a DB query" is billed for only 100ms of Active CPU.

Two consequences follow.

I/O-bound apps (AI, external APIs, DB-centric) get cheaper. Because wait time isn't billed.
CPU-bound processing (image processing, cryptography, large JSON conversion) consumes a lot of Active CPU. So "separate heavy CPU processing from the function" works directly.

However, Provisioned Memory is billed during I/O wait too. Since it's "memory × instance lifetime," excessive memory allocation and sluggishly long instance lifetimes push up memory billing. Even when Active CPU stops, memory doesn't — this is easy to overlook.

The free tier and region-specific unit prices (real figures)

Hobby free tier (monthly)

Resource	Free tier
Active CPU	4 hours
Provisioned Memory	360 GB-hr
Invocations	1 million

Pro uses on-demand billing, and a monthly Pro usage credit can offset it.

Region-specific unit prices (excerpt)

Unit prices change by region. Here's an excerpt of areas around Japan and major regions (per-hour CPU / per-GB-hr memory).

Region	Active CPU (/h)	Provisioned Memory (/GB-h)
Tokyo `hnd1`	$0.202	$0.0167
Osaka `kix1`	$0.202	$0.0167
Washington D.C. `iad1` (default)	$0.128	$0.0106
Portland `pdx1`	$0.128	$0.0106
Singapore `sin1`	$0.160	$0.0133
Frankfurt `fra1`	$0.184	$0.0152

Region selection is also cost: the default iad1 is in the cheapest band. If latency requirements allow, be conscious of the trade-off between the unit price and performance of a region near your users (but since the distance to the DB also affects total latency and transfer, the basic is to bring the function and DB close).

The official calculation example

In São Paulo (CPU $0.221/h, memory $0.0183/GB-h), one invocation with 4GB memory, 4 seconds of Active CPU, and a 10-second instance lifetime:

CPU: (4 / 3600) × $0.221 = $0.0002456
Memory: (4GB × 10 / 3600) × $0.0183 = $0.0002033
Total: $0.0004489 / invocation

What this formula shows is that the memory allocation (4GB) and instance lifetime (10 seconds) dominate the memory billing, while Active CPU (4 seconds) is only what the CPU was used for.

Reduction measures: in effective order

Rather than blind saving, put your hands in in order of greatest impact.

① Eliminate the Invocation itself with caching (the biggest effect)

What works most is to "not call the function." On a cache hit, Active CPU, memory, and Invocation are all three axes zero.

// 関数レスポンスを CDN にキャッシュ → 2回目以降は関数が走らない
export async function GET() {
  return Response.json(await getCatalog(), {
    headers: { "Cache-Control": "public, s-maxage=300, stale-while-revalidate=600" },
  });
}

Pages with ISR, APIs with CDN Cache, data pieces with Runtime Cache (caching strategy).
Measure HIT with x-vercel-cache. This is proof of "not called = not billed."
ISR's request coalescing bundles function calls to the same path into one per region during a spike, suppressing Invocations.

② Separate heavy CPU processing from the function

Move processing that eats Active CPU (image/video conversion, large aggregations, cryptography) off the synchronous path of the user request.

For things that can be post-processed, waitUntil (run after the response; Functions guide).
For long-running, stateful things, to Workflows / queues.
With this, "the function the user waits for" becomes I/O-centric, and Active CPU drops.

③ Match maxDuration and memory to the use

Memory is billed during I/O wait too. 4GB is excessive for an I/O-centric function that barely uses the CPU. Lower it to the use.
An over-set maxDuration risks a hung function staying alive to the limit and leaking memory billing. State a reasonable limit.

// I/O 主体の軽い関数：メモリ控えめ・タイムアウト短め
export const maxDuration = 15; // だらだら生かさない
// メモリはダッシュボード/設定で用途に合わせて（Hobby 2GB/Pro 最大4GB）

④ Reduce the total number of instances with Fluid's concurrency

Fluid Compute's optimized concurrency has the same instance handle another request during I/O wait, so the total number of needed instances = the total amount of memory billing decreases. This works automatically, but it presupposes the discipline of not placing request-specific data in global state (making concurrent processing safe) (Functions guide).

⑤ Look at the other billing drivers

BotID Deep Analysis is per checkBotId() call ($1/1000). Limit to high-value routes (Firewall/BotID).
Blob is storage (GB-month, averaged on 15-minute snapshots) + data transfer. Avoid transfer billing with client upload, and make caching effective with immutable operation (storage).
Image optimization and Fast Data Transfer — also check the breakdown in Observability.

Monitor cost

To avoid "being surprised when the bill comes," continuously monitor real values in Observability.

Look at the breakdown of each function's Active CPU / memory / Invocation.
Identify which routes mass-produce Invocations (= candidates to cache).
Identify which routes eat Active CPU (= candidates to separate/optimize).
Detect budget overruns with Spend Management (spend caps, alerts).

The order of measure → optimize: rather than raising/lowering memory by guesswork, identify heavy routes in Observability → trim with ②③ → re-measure. This is the same as the principles of observability/SRE — tracing back "from the symptom (bill, latency) to the cause (a specific route)" is the royal road.

Production checklist (cost)

Designed on the premise of Active CPU billing (I/O wait not billed)
Understand that memory is billed during I/O wait too, and avoid over-allocation
Eliminate Invocations with caching, and measure HIT with x-vercel-cache
Separate heavy CPU processing to waitUntil/Workflows/jobs
State maxDuration and memory matched to the use
Don't pollute global state so Fluid's concurrency works
Check the breakdown of other billing drivers like BotID/Blob/image optimization
Continuously monitor and set limits with Observability + Spend Management

Summary

Vercel's cost is decided not by "scale" but by "design."

Active CPU billing = I/O wait is free. The more I/O-centric, the cheaper
Memory is billed during wait time too = over-allocation and long-lived instances are the enemy
Reduction is in the order ① caching → ② separating CPU processing → ③ maxDuration/memory → ④ concurrency
Measure → optimize (identify heavy routes in Observability)
Look at all billing axes — region unit prices, BotID, Blob transfer, etc.

Make the billing model your ally and Vercel, far from "expensive at scale," becomes one of the most cost-efficient platforms for I/O-centric modern apps. Start the implementation from this cluster's Vercel production-operations guide.

This article is based on the Fluid compute pricing / Functions Limits official documentation (as of June 2026). Prices, free tiers, and region unit prices fluctuate, so estimate with the latest official values at production adoption.

Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

First, understand the three billing axes

How Active CPU changes the thinking

The free tier and region-specific unit prices (real figures)

Hobby free tier (monthly)

Region-specific unit prices (excerpt)

The official calculation example

Reduction measures: in effective order

① Eliminate the Invocation itself with caching (the biggest effect)

② Separate heavy CPU processing from the function

③ Match maxDuration and memory to the use

④ Reduce the total number of instances with Fluid's concurrency

⑤ Look at the other billing drivers

Monitor cost

Production checklist (cost)

Summary

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel deployment & CI/CD guide: preview, Promote, Instant Rollback, and Rolling Releases at production quality

Also worth reading

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code

The complete Echo production-deployment guide: zero-downtime operation with multi-stage Docker, distroless, server timeouts, and graceful shutdown

First, understand the three billing axes

How Active CPU changes the thinking

The free tier and region-specific unit prices (real figures)

Hobby free tier (monthly)

Region-specific unit prices (excerpt)

The official calculation example

Reduction measures: in effective order

① Eliminate the Invocation itself with caching (the biggest effect)

② Separate heavy CPU processing from the function

③ Match maxDuration and memory to the use

④ Reduce the total number of instances with Fluid's concurrency

⑤ Look at the other billing drivers

Monitor cost

Production checklist (cost)

Summary

Related articles

Vercel production-operation guide: use it not as a front-end-only host but as a 'full-compute platform'

Run a backend on Vercel: operate Express, Hono, FastAPI, and NestJS in production with zero config

Vercel caching-strategy guide: using the 4 layers of ISR, CDN Cache, Runtime Cache, and Cache Components (PPR)

Vercel deployment & CI/CD guide: preview, Promote, Instant Rollback, and Rolling Releases at production quality

Also worth reading

DynamoDB Capacity, Cost, and Performance Design Complete Guide (2026 Edition): On-Demand vs. Provisioned, Auto Scaling, Avoiding Hot Partitions, Cost Optimization

DynamoDB Single-Table Design & Production Reliability Patterns — The Complete Guide (2026 Edition): Idempotency, Conditional Writes, and Transactions in Real Code

The complete Echo production-deployment guide: zero-downtime operation with multi-stage Docker, distroless, server timeouts, and graceful shutdown