# Vercel cost-optimization guide: understand the Active CPU pricing model and lower your bill

> A cost-optimization guide faithful to Vercel's official docs. It explains, with real figures, Fluid Compute's Active CPU billing (billed only for CPU execution time, not I/O wait), the three axes of Provisioned Memory (GB-hr) and Invocations, region-specific unit prices, the official formula, the free tier (Hobby: 4 CPU hours / 360 GB-hr / 1M Invocations), and reduction measures via caching, maxDuration, memory, concurrency, and job separation.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: Vercel, コスト最適化, Fluid Compute, サーバーレス, 可観測性, アーキテクチャ設計, TypeScript
- URL: https://tomodahinata.com/en/blog/vercel-cost-active-cpu-pricing-optimization-guide
- Category: Vercel in production
- Pillar guide: https://tomodahinata.com/en/blog/vercel-production-platform-guide

## Key points

- Vercel Functions use Active CPU billing — you're billed only for the time your code actually used the CPU (milliseconds), and I/O waits for DB queries or AI calls aren't billed. Fundamentally different from traditional wall-clock GB-seconds, and the more I/O-bound the app, the cheaper.
- Billing is three axes: Active CPU (CPU-hr, paused during I/O wait) / Provisioned Memory (GB-hr, continues for the instance's lifetime even during I/O wait) / Invocations (request count, Pro $0.60/1M). The point is that memory is billed during wait time too.
- The Hobby free tier is 4 hours of Active CPU, 360 GB-hr of Provisioned Memory, and 1M Invocations/month. Unit prices are region-specific: Tokyo hnd1/Osaka kix1 are CPU $0.202/h, memory $0.0167/GB-h; iad1 is $0.128/$0.0106.
- The effective order of reduction is — ① eliminate the Invocation itself with caching (confirm HIT with x-vercel-cache) ② separate heavy CPU processing from the function (Workflows/jobs) ③ trim over-allocation by matching maxDuration/memory to the use ④ reduce the total number of instances with Fluid's concurrency.
- I/O-bound processing benefits greatly from Active CPU billing and Fluid concurrency. Conversely, CPU-bound processing like image processing consumes a lot of Active CPU, so it's a target for separation and optimization.

---

"Vercel is expensive at scale" — this happens **when you design while misunderstanding the billing model.** In 2026, Vercel Functions use **Active CPU billing**, fundamentally different from traditional serverless (wall-clock GB-seconds). Understand the model correctly and you can make design decisions where **the bill changes greatly even for the same app.** This article summarizes how cost is made and how to lower it, faithful to the official specs and real figures of [Fluid compute pricing](https://vercel.com/docs/functions/usage-and-pricing).

For the full picture, see the [Vercel production-operations guide](/blog/vercel-production-platform-guide).

---

## First, understand the three billing axes

Vercel Functions (Fluid Compute) billing is **the sum of three axes.**

| Axis | What's billed | The decisive property |
|---|---|---|
| **Active CPU** | The time your code **actually used the CPU** (ms) | **Billing pauses during I/O wait.** Per CPU-hr |
| **Provisioned Memory** | Allocated memory × instance uptime | **Billing continues during I/O wait too.** Per GB-hr |
| **Invocations** | Number of incoming requests | One each, success or failure. Pro $0.60/1M |

### How Active CPU changes the thinking

The official explanation itself is the core of the design.

> You are only billed during actual code execution and not during I/O operations (database queries, like AI model calls, etc.) ... Pauses billing when your code is waiting for external services. (— [Fluid compute pricing](https://vercel.com/docs/functions/usage-and-pricing))

In other words — **while the function is waiting for a DB or AI response, CPU billing stops.** For example, a function that takes "100ms for data processing and 400ms waiting for a DB query" is **billed for only 100ms of Active CPU.**

Two consequences follow.

1. **I/O-bound apps (AI, external APIs, DB-centric) get cheaper.** Because wait time isn't billed.
2. **CPU-bound processing (image processing, cryptography, large JSON conversion) consumes a lot of Active CPU.** So "separate heavy CPU processing from the function" works directly.

> However, **Provisioned Memory is billed during I/O wait too.** Since it's "memory × instance lifetime," **excessive memory allocation** and **sluggishly long instance lifetimes** push up memory billing. Even when Active CPU stops, memory doesn't — this is easy to overlook.

---

## The free tier and region-specific unit prices (real figures)

### Hobby free tier (monthly)

| Resource | Free tier |
|---|---|
| Active CPU | 4 hours |
| Provisioned Memory | 360 GB-hr |
| Invocations | 1 million |

Pro uses on-demand billing, and a monthly Pro usage credit can offset it.

### Region-specific unit prices (excerpt)

Unit prices **change by region.** Here's an excerpt of areas around Japan and major regions (per-hour CPU / per-GB-hr memory).

| Region | Active CPU (/h) | Provisioned Memory (/GB-h) |
|---|---|---|
| Tokyo `hnd1` | $0.202 | $0.0167 |
| Osaka `kix1` | $0.202 | $0.0167 |
| Washington D.C. `iad1` (default) | $0.128 | $0.0106 |
| Portland `pdx1` | $0.128 | $0.0106 |
| Singapore `sin1` | $0.160 | $0.0133 |
| Frankfurt `fra1` | $0.184 | $0.0152 |

> **Region selection is also cost**: the default `iad1` is in the cheapest band. If latency requirements allow, be conscious of the trade-off between the unit price and performance of a region near your users (but since the distance to the DB also affects total latency and transfer, the basic is to **bring the function and DB close**).

### The official calculation example

In São Paulo (CPU $0.221/h, memory $0.0183/GB-h), one invocation with **4GB memory, 4 seconds of Active CPU, and a 10-second instance lifetime:**

- CPU: (4 / 3600) × $0.221 = **$0.0002456**
- Memory: (4GB × 10 / 3600) × $0.0183 = **$0.0002033**
- Total: **$0.0004489 / invocation**

What this formula shows is that **the memory allocation (4GB) and instance lifetime (10 seconds) dominate the memory billing**, while **Active CPU (4 seconds) is only what the CPU was used for.**

---

## Reduction measures: in effective order

Rather than blind saving, put your hands in **in order of greatest impact.**

### ① Eliminate the Invocation itself with caching (the biggest effect)

What works most is to "**not call the function.**" On a cache hit, Active CPU, memory, and Invocation are **all three axes zero.**

```ts
// 関数レスポンスを CDN にキャッシュ → 2回目以降は関数が走らない
export async function GET() {
  return Response.json(await getCatalog(), {
    headers: { "Cache-Control": "public, s-maxage=300, stale-while-revalidate=600" },
  });
}
```

- Pages with **ISR**, APIs with **CDN Cache**, data pieces with **Runtime Cache** ([caching strategy](/blog/vercel-caching-isr-cache-components-ppr-guide)).
- **Measure `HIT` with `x-vercel-cache`.** This is proof of "not called = not billed."
- ISR's **request coalescing** bundles function calls to the same path into one per region during a spike, suppressing Invocations.

### ② Separate heavy CPU processing from the function

Move processing that eats Active CPU (image/video conversion, large aggregations, cryptography) off the synchronous path of the user request.

- For things that can be post-processed, **`waitUntil`** (run after the response; [Functions guide](/blog/vercel-functions-fluid-compute-streaming-cron-guide)).
- For long-running, stateful things, to **Workflows / queues.**
- With this, "the function the user waits for" becomes I/O-centric, and Active CPU drops.

### ③ Match maxDuration and memory to the use

- **Memory is billed during I/O wait too.** 4GB is excessive for an I/O-centric function that barely uses the CPU. Lower it to the use.
- **An over-set `maxDuration`** risks a hung function staying alive to the limit and leaking memory billing. State a reasonable limit.

```ts
// I/O 主体の軽い関数：メモリ控えめ・タイムアウト短め
export const maxDuration = 15; // だらだら生かさない
// メモリはダッシュボード/設定で用途に合わせて（Hobby 2GB/Pro 最大4GB）
```

### ④ Reduce the total number of instances with Fluid's concurrency

Fluid Compute's **optimized concurrency** has the same instance handle another request during I/O wait, so **the total number of needed instances = the total amount of memory billing** decreases. This works automatically, but it presupposes the discipline of **not placing request-specific data in global state** (making concurrent processing safe) ([Functions guide](/blog/vercel-functions-fluid-compute-streaming-cron-guide)).

### ⑤ Look at the other billing drivers

- **BotID Deep Analysis** is per `checkBotId()` call ($1/1000). Limit to high-value routes ([Firewall/BotID](/blog/vercel-firewall-waf-botid-ddos-security-guide)).
- **Blob** is storage (GB-month, averaged on 15-minute snapshots) + data transfer. **Avoid transfer billing with client upload**, and make caching effective with immutable operation ([storage](/blog/vercel-storage-blob-edge-config-marketplace-guide)).
- **Image optimization and Fast Data Transfer** — also check the breakdown in Observability.

---

## Monitor cost

To avoid "being surprised when the bill comes," **continuously monitor real values in Observability.**

- Look at the breakdown of each function's **Active CPU / memory / Invocation.**
- Identify which routes mass-produce Invocations (= candidates to cache).
- Identify which routes eat Active CPU (= candidates to separate/optimize).
- Detect budget overruns with **Spend Management** (spend caps, alerts).

> **The order of measure → optimize**: rather than raising/lowering memory by guesswork, **identify heavy routes in Observability → trim with ②③ → re-measure.** This is the same as the [principles of observability/SRE](/blog/opentelemetry-observability-production-tracing-metrics-logs) — tracing back "from the symptom (bill, latency) to the cause (a specific route)" is the royal road.

---

## Production checklist (cost)

- [ ] Designed on the premise of **Active CPU billing** (I/O wait not billed)
- [ ] Understand that **memory is billed during I/O wait too**, and avoid over-allocation
- [ ] **Eliminate Invocations** with caching, and measure `HIT` with `x-vercel-cache`
- [ ] **Separate heavy CPU processing to `waitUntil`/Workflows/jobs**
- [ ] **State `maxDuration` and memory** matched to the use
- [ ] **Don't pollute global state** so Fluid's concurrency works
- [ ] Check the breakdown of **other billing drivers** like BotID/Blob/image optimization
- [ ] Continuously monitor and set limits with **Observability + Spend Management**

---

## Summary

Vercel's cost is decided not by "scale" but by "**design.**"

1. **Active CPU billing** = I/O wait is free. The more I/O-centric, the cheaper
2. **Memory is billed during wait time too** = over-allocation and long-lived instances are the enemy
3. Reduction is in the order **① caching → ② separating CPU processing → ③ maxDuration/memory → ④ concurrency**
4. **Measure → optimize** (identify heavy routes in Observability)
5. Look at **all billing axes** — region unit prices, BotID, Blob transfer, etc.

Make the billing model your ally and Vercel, far from "expensive at scale," becomes one of **the most cost-efficient platforms for I/O-centric modern apps.** Start the implementation from this cluster's [Vercel production-operations guide](/blog/vercel-production-platform-guide).

> This article is based on the [Fluid compute pricing](https://vercel.com/docs/functions/usage-and-pricing) / [Functions Limits](https://vercel.com/docs/functions/limitations) official documentation (as of June 2026). Prices, free tiers, and region unit prices fluctuate, so estimate with the latest official values at production adoption.
