# Rate Limiting That 'Actually Works' in Next.js — Why In-Memory Breaks in Serverless, and Distributed-Store Design

> Because Vercel/Lambda instances are disposable and run concurrently, in-process-memory rate limiting passes straight through. We explain — in real Next.js code (middleware/route handler) — a design that implements an atomic sliding window with a distributed store like Upstash Redis.

- Published: 2026-06-28
- Author: 友田 陽大
- Tags: Next.js, アーキテクチャ設計, セキュリティ, TypeScript
- URL: https://tomodahinata.com/en/blog/nextjs-serverless-rate-limiting-vercel-guide
- Category: Application-layer security
- Pillar guide: https://tomodahinata.com/en/blog/nextjs-supabase-application-security-guide

## Key points

- Serverless rate limiting always breaks with an 'in-process `Map`.' Because Vercel/Lambda function instances are disposable, multiple-concurrent, and a different one on every cold start, the counter isn't shared. Put the state in an external shared store, no question
- A fixed window has the boundary-burst problem. At the window changeover it passes 2× the requests. Count smoothly with a sliding window
- To count correctly under concurrent requests, you need an 'atomic' operation where read → increment → write isn't split. Use Redis pipelines/Lua, or a distributed rate limiter like @upstash/ratelimit
- Design keys by IP, route, and user, and read `x-forwarded-for` only via a trusted proxy for the client IP. On rejection, returning 429 + Retry-After is the etiquette. This is the countermeasure against the abuse and cost runaway that OWASP API4:2023 (Unrestricted Resource Consumption) points to
- Honestly, app-layer rate limiting suppresses brute force, OTP abuse, scraping, and cost-type DoS, but it's no substitute for L3/L4 volumetric DDoS. That's the edge/WAF's domain — use both together

---

Let me state the conclusion first. **In serverless Next.js (Vercel / AWS Lambda), an implementation that does rate limiting by "counting in an in-process `Map`" will reliably break in production.** Because function instances are disposable, multiple of them start up at the same time, and a different one comes up on each cold start, so the counter isn't shared. It works perfectly locally, passes tests, and looks fine in a demo — which is exactly what makes it troublesome, as it silently passes through under production scale.

This isn't a "Next.js is bad" or "don't use serverless" story. Serverless is the right choice, and rate limiting can be implemented. The problem lies in a fundamental design mismatch: **rate limiting is "processing that holds state (a counter)," yet serverless runs on the premise of "holding no state (stateless)."** This article explains from the architecture why in-memory breaks, and designs — based on real code and published primary sources — the difference between a fixed window and a sliding window, the necessity of an atomic increment, key design, the correct way to get the client IP, and through to `429 + Retry-After`. It's also a place AI-generated code almost always gets wrong, so it should help both those who write it and those who review it.

---

## 1. Why you need rate limiting — suppressing 4 abuses

Rate limiting is a mechanism that decides "the cap on the number of times one client can hit in a fixed time." Let me grasp what it protects, with concrete threats.

| Threat | What happens | How rate limiting works |
|---|---|---|
| **Login brute force** | Password brute force, credential stuffing | Narrow the number of attempts per IP/account, making brute force impractical |
| **OTP / email abuse** | Rapid-fire sending of auth codes and password resets | Limit the number of sends, suppressing SMS/email costs and inbox spam |
| **Scraping / API abuse** | Mass retrieval of public endpoints, inventory-monitoring bots | Suppress data leakage and load, protecting the legitimate user's experience |
| **Cost-type DoS** | Rapid-firing heavy processing (AI inference, image generation, external APIs) to explode the bill | Limit higher-unit-cost processing more strictly, stopping the bill's runaway |

The last "cost-type DoS" is especially important in the serverless era. In a pay-per-use cloud, the attacker **doesn't even need to take down the server.** Just by continuously hitting a heavy endpoint, your bill goes through the ceiling. This is exactly the risk OWASP positions at the top of the API Security Top 10 as **[API4:2023 Unrestricted Resource Consumption](https://owasp.org/API-Security/editions/2023/en/0xa4-unrestricted-resource-consumption/).** Processing a request costs not only bandwidth, CPU, memory, and storage, but also **monetary costs** like SMS or third-party APIs — and if you don't put a cap on that consumption, both availability and cost become an attack surface, is OWASP's point.

Rate limiting is **the most cost-effective horizontal control** against these 4 abuses. It applies uniformly across the app, and implemented correctly, there's no need for "a human to think every time" either. Its positioning within the whole of Next.js app-layer security is mapped in the [Next.js × Supabase Application Security Complete Guide](/blog/nextjs-supabase-application-security-guide), so refer there for the overall picture. This article digs into the "rate limiting" within it, narrowed to the serverless-specific pitfalls.

---

## 2. Why an in-memory `Map` breaks in serverless

Search, and this kind of code comes up in droves. Ask AI to "implement rate limiting in Next.js," and there's a fairly high chance this comes back.

```ts
// ❌ 壊れる：プロセス内 Map にカウンタを置くレート制限
const hits = new Map<string, { count: number; resetAt: number }>();

export function rateLimit(ip: string, limit = 10, windowMs = 60_000): boolean {
  const now = Date.now();
  const rec = hits.get(ip);

  if (!rec || now > rec.resetAt) {
    hits.set(ip, { count: 1, resetAt: now + windowMs });
    return true; // 許可
  }
  if (rec.count >= limit) return false; // 拒否
  rec.count += 1;
  return true;
}
```

In `next dev`'s single process, it works perfectly. So you think "it works" and ship to production. But in serverless, there are 3 reasons this `hits` map is **unreliable.**

### 2-1. Instances are disposable (stateless)

Vercel and Lambda functions start up to process a request and, after a while, **are discarded.** Process memory is bound to the function instance's lifespan, and when the instance disappears, the `hits` map disappears wholesale too. If the next request is processed by a new instance, the counter restarts from `0`. The attacker can reset the cap just by "waiting for the instance to be swapped out." Next.js's official documentation also repeatedly states to design on the **premise that serverless/edge functions are stateless** ([Next.js docs](https://nextjs.org/docs)).

### 2-2. Multiple instances run at the same time (horizontal scaling)

This is the most fatal. As load rises, the platform **scales the function horizontally** and starts up tens or hundreds of instances at the same time. Each instance has **its own dedicated `hits` map** and doesn't know the others' counts.

```text
実際の挙動：limit=10/分 のつもりが、インスタンスごとに 10 を許す

  攻撃者 ──┬─→ インスタンスA（自分の Map: 10回まで許可）
          ├─→ インスタンスB（別の Map: さらに10回）
          ├─→ インスタンスC（別の Map: さらに10回）
          └─→ …N個 → 実効上限 = 10 × N（事実上、無制限）
```

The more the load balancer distributes requests to each instance, the more the total number the attacker can pass **increases in proportion to the number of instances.** That is, the more it scales, the looser the rate limit becomes — the exact opposite of the intent.

### 2-3. Reset on cold start

With no access for a while, the function goes dormant and **cold-starts** on the next request. At this time the process is recreated, and the in-memory counter is initialized. The more intermittent the traffic, the more frequently the counter resets, losing the meaning of the "window."

**Conclusion: rate limiting inherently needs "state shared across multiple instances."** As long as that state is in process memory, it's fundamentally impossible to count correctly in serverless. The only correct answer is to put the state **outside the process** — in a single shared store visible to all instances.

> **"It worked locally" is not a counterargument.** The dread of an in-memory implementation is that it looks normal in all of development, testing, and small-scale demos. It breaks only "when it scales in production." So the accident surfaces at the most inconvenient timing (in the middle of an attack). Verification needs to be designed **assuming multiple instances**, not a single process.

---

## 3. Fixed window vs sliding window — the boundary-burst problem

Once you've decided to move the state to a shared store, next is "how to count." A naive implementation is a **fixed window**, but this has an easy-to-overlook flaw.

### 3-1. The boundary burst of a fixed window

A fixed window is the method of "resetting the counter at second 00 every minute." The implementation is simple, but it has the problem of **passing 2× the cap at the window changeover.**

```text
limit = 10/分、固定ウィンドウの場合：

  12:00:00 ───────────── 12:00:59 │ 12:01:00 ───────────── 12:01:59
                          ↑10回    │ ↑10回
                  12:00:59 に10回   │  12:01:00 に10回
                  ───────────────────────────
                  60秒未満の間に 20回 通過してしまう
```

Send 10 times at 12:00:59 and 10 times at 12:01:00, and **20 times pass within just over 1 second.** Each window individually keeps the cap, yet straddling windows breaks the cap — this is the **boundary-burst problem.** An attacker who wants to "concentrate in a short time," like brute force or cost-type DoS, aims at this boundary.

### 3-2. Count smoothly with a sliding window

A **sliding window** always looks at "from this very moment, the past 60 seconds." Since there's no fixed division, a boundary burst doesn't occur.

```text
スライディングウィンドウ：リクエスト時刻を起点に「直前の60秒」を毎回数える

  ……[━━━━━━ 直前の60秒間のリクエスト数を数える ━━━━━━]→ 今
       この窓は時間とともに連続的にスライドする（固定の区切りが無い）
```

The strictest implementation records each request's timestamp in a sorted set (Redis's `ZSET`) and counts every time "how many are in the range of the past 60 seconds." It's accurate but uses memory proportional to the number of requests per request. The `slidingWindow` provided by `@upstash/ratelimit` **approximates this with a weighted average of the current and previous windows**, achieving both memory efficiency and smoothness. For many apps, this approximation is enough.

| Algorithm | Pros | Cons | Suited use |
|---|---|---|---|
| Fixed window | Simple, light implementation | Passes 2× the cap with a boundary burst | A loose limit prioritizing lightness over strictness |
| Sliding window (log) | Strictly accurate | Consumes memory proportional to the number of requests | Places needing strictness, like billing/OTP |
| Sliding window (approximate) | Smooth & memory-efficient | Tolerates a slight error | The default for the majority of API/login limits |

When unsure, make the **sliding window (approximate)** the default. Limit the fixed window to situations where you can tolerate "2× passing at the boundary."

---

## 4. The atomic increment — don't break the count under concurrent requests

Even after moving to a shared store and choosing a sliding window, there's still a hole. **Concurrency.**

The core of rate limiting is a read-modify-write of "read the current value → increment → write back." If this is **divisible (non-atomic)**, multiple requests arriving at the same time **read the same old value**, all judge "still under the cap," and pass all of them. This is called a **race condition.**

```text
limit=10、現在値=9 のときに 5 本が同時到着（非アトミックな実装）

  req1: GET→9  req2: GET→9  req3: GET→9  req4: GET→9  req5: GET→9
   全員「9 < 10 だから OK」と判断
  → SET 10 が5回走り、本来1本しか通らないはずが 5本 通過（=14回目まで通る）
```

In serverless, because multiple instances hit the same key at the same time, this race is **the norm, not an exception.** To prevent it, you need to execute the read-modify-write as **a single, indivisible operation.** There are 2 means.

### 4-1. Redis's atomic commands / Lua

Because Redis executes commands serially, a single command like `INCR` is atomic. When you want to make multiple steps one unit, use a **Lua script** to execute "read, increment, set TTL, judge" together on the server side. Since the whole script isn't split, no race occurs.

```ts
// アトミックなレート制限を Lua で表現（読み・加算・初回のみTTL設定・判定を1単位で実行）
// KEYS[1] = レート制限キー / ARGV[1] = 上限 / ARGV[2] = ウィンドウ秒
const SLIDING_LUA = `
  local current = redis.call('INCR', KEYS[1])
  if current == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[2])  -- 初回だけ有効期限を設定
  end
  if current > tonumber(ARGV[1]) then
    return {0, current}                     -- 拒否
  end
  return {1, current}                        -- 許可
`;
// ↑これは固定ウィンドウの原子化。スライディングは ZSET で範囲削除→ZCARD→ZADD を
//   同じ Lua にまとめる（手書きするなら下記の通り、まず @upstash/ratelimit を勧める）
```

### 4-2. A distributed rate limiter (recommended)

Hand-writing Lua has many landmines: the TTL boundary, clock drift, the sliding approximation, etc. **Avoid reinventing the wheel and use a battle-tested library** is the right answer. `@upstash/ratelimit` uses an atomic script internally and works combined with an HTTP-based Redis (Upstash) callable from serverless (Vercel Edge / Lambda). The biggest advantage is that you **don't have to guarantee atomicity, the sliding window, and distributed counting yourself.**

```ts
// lib/ratelimit.ts — 分散ストアにアトミックなスライディングウィンドウを置く
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
import "server-only"; // この計数ロジックはサーバー専用。クライアントに混入させない

// 全関数インスタンスから見える単一の共有ストア（Redis）。プロセスメモリには置かない
export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(), // UPSTASH_REDIS_REST_URL / _TOKEN を env から読む
  limiter: Ratelimit.slidingWindow(10, "60 s"), // 直前60秒で10回まで（近似）
  analytics: true,
  prefix: "rl", // キーの名前空間
});
```

The values `Redis.fromEnv()` reads are secrets. Put `UPSTASH_REDIS_REST_URL` / `UPSTASH_REDIS_REST_TOKEN` **only in env, and never attach `NEXT_PUBLIC_`.** Make the rate-limiting logic callable from the client, and the attacker can observe and evade the cap itself.

---

## 5. Key design — decide the granularity of IP, route, and user

What decides "whose how-many-times to count" is **key design.** Get the granularity wrong, and it's either too loose to take effect, or too strict and causes collateral damage.

| Key axis | Example | Suited use | Caution |
|---|---|---|---|
| **Per IP** | `rl:ip:203.0.113.5` | Not-logged-in paths (login attempts, public APIs) | Under NAT/proxy it can sweep in legitimate users |
| **Per route** | `rl:login:…` / `rl:ai:…` | Change the cap per endpoint | Heavier processing stricter, lighter processing looser |
| **Per user** | `rl:user:<uid>` | Post-login abuse, cost-type DoS | Possible only when authenticated (use the verified ID value) |

In practice, the standard play is to **combine multiple axes.** For example, "limit login attempts both per IP and per account," "limit higher-unit-cost processing like AI inference strictly per user." Prepare a separate `Ratelimit` instance per route and match the cap to the endpoint's "heaviness."

```ts
// ルートの「重さ」に応じて別々の上限を持つ（重い処理ほど厳しく）
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();

export const loginLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, "60 s"), // ログインは厳しめ：5回/分
  prefix: "rl:login",
});

export const aiLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(20, "1 h"), // 単価の高いAI推論：20回/時
  prefix: "rl:ai",
});
```

> **Beware collateral damage (false positives).** Narrow strictly per IP only, and in the case where **many legitimate users share one IP** under a company's, school's, or mobile carrier's NAT, unrelated people get locked out. After authentication, make per-user the main axis, and use per-IP together as a defense line for not-logged-in paths — this distinction balances effect and collateral damage.

---

## 6. The correct way to get the client IP — the `x-forwarded-for` trap

If you count per IP, you need to correctly get "the client's real IP." This is the most error-prone point security-wise.

In serverless, a request reaches the function via the platform's proxy / load balancer. At this time the original client IP is stacked in the `x-forwarded-for` header. The problem is — **the header can be spoofed by the client.** If the attacker **attaches `x-forwarded-for: 1.2.3.4` on their own** and sends it, they can change the rate-limit key every time and evade the cap.

```ts
// ❌ 危険：x-forwarded-for を無条件に信じる（攻撃者が偽装してキーを変え放題）
const ip = req.headers.get("x-forwarded-for") ?? "anonymous";

// ❌ 危険：複数値の末尾を取る（末尾はクライアントが注入できる）
const ip = req.headers.get("x-forwarded-for")?.split(",").pop()?.trim();
```

`x-forwarded-for` chains comma-separated like `client, proxy1, proxy2`. It's a mechanism where each proxy **appends** "the IP of the party it received from." So "what's trustworthy is **only the value attached by a proxy you trust.**"

**The correct way to get it depends on the platform.**

- **Vercel** attaches a dedicated, un-tamperable header **`x-real-ip`** (and `x-vercel-forwarded-for`). On Vercel, using these **platform-guaranteed headers** is safer than picking up the tail of the raw `x-forwarded-for`.
- **Under your own proxy (Nginx/ALB, etc.)**, knowing "the **number of trusted proxies (trusted hops)**," adopt the **Nth from the right** of `x-forwarded-for` = the IP the last trusted proxy received. This is to not read the fake value the client injected at the front.

```ts
// lib/client-ip.ts — 信頼できるプロキシ経由でのみIPを採る
import "server-only";

/**
 * クライアントIPを安全に解決する。
 * Vercel ではプラットフォームが付ける x-real-ip を信頼する（改ざん不可）。
 * 自前プロキシ環境では trustedHops を環境に合わせて設定し、右からN番目を採る。
 */
export function getClientIp(req: Request, trustedHops = 1): string {
  // 1) Vercel が保証するヘッダーを最優先（クライアントは上書きできない）
  const real = req.headers.get("x-real-ip");
  if (real) return real;

  // 2) x-forwarded-for は「右から数えて trustedHops 番目」を採る
  //    （末尾＝最後の信頼プロキシが観測したIP。先頭はクライアントが偽装可能）
  const xff = req.headers.get("x-forwarded-for");
  if (xff) {
    const parts = xff.split(",").map((s) => s.trim()).filter(Boolean);
    const idx = parts.length - trustedHops;
    if (idx >= 0 && parts[idx]) return parts[idx];
  }

  // 3) どれも無ければ匿名扱い（共有キーになるため最も厳しい上限を適用すべき）
  return "anonymous";
}
```

The principle in one line: **"don't make anything other than a header attached by a trusted proxy the grounds for the IP."** Which header you may trust is decided by your own deployment configuration (whether it's Vercel, whether Cloudflare or ALB is in front). Reading the tail of the raw `x-forwarded-for` while leaving this ambiguous is the most frequent evasion point.

---

## 7. Implementation — middleware and route handler

Now that the parts are in place, build them into Next.js. There are 2 placements, used by purpose.

### 7-1. Reject "broadly and early" with middleware

`middleware.ts` runs in front of all requests, so it's suited to **cross-cutting limits** (overall abuse from the same IP, protecting a specific path subtree). The advantage is rejecting **before** reaching heavy processing.

```ts
// middleware.ts — エッジ手前で横断的にレート制限し、429 + Retry-After を返す
import { NextResponse, type NextRequest } from "next/server";
import { ratelimit } from "@/lib/ratelimit";
import { getClientIp } from "@/lib/client-ip";

export async function middleware(request: NextRequest) {
  const ip = getClientIp(request);

  // ルートを名前空間に含め、エンドポイントごとに数を分ける
  const { success, limit, remaining, reset } = await ratelimit.limit(
    `${ip}:${request.nextUrl.pathname}`,
  );

  // 上限・残数・リセット時刻を常に返す（クライアントが自制できる）
  const headers = new Headers();
  headers.set("RateLimit-Limit", String(limit));
  headers.set("RateLimit-Remaining", String(Math.max(0, remaining)));
  headers.set("RateLimit-Reset", String(Math.ceil((reset - Date.now()) / 1000)));

  if (!success) {
    headers.set("Retry-After", String(Math.ceil((reset - Date.now()) / 1000)));
    return NextResponse.json(
      { error: "Too Many Requests" },
      { status: 429, headers }, // ← 429。Retry-After で「何秒後に再試行可」を伝える
    );
  }

  const res = NextResponse.next();
  headers.forEach((v, k) => res.headers.set(k, v));
  return res;
}

// 保護対象を絞る（静的アセット等は除外し、無駄なRedis呼び出しを避ける）
export const config = {
  matcher: ["/api/:path*", "/login"],
};
```

### 7-2. Narrow "pinpoint" with a route handler

A cap specific to a particular endpoint (5 login attempts/min, 20 AI inferences/hour, etc.) is applied individually at the route handler's entrance. Use a per-route `Ratelimit` instance.

```ts
// app/api/login/route.ts — ログイン経路に厳しい上限を個別適用
import { loginLimiter } from "@/lib/ratelimit";
import { getClientIp } from "@/lib/client-ip";
import { z } from "zod";

const Body = z.object({ email: z.string().email(), password: z.string().min(1) });

export async function POST(req: Request) {
  // 1) まずレート制限（重い認証処理に入る前に弾く）
  const ip = getClientIp(req);
  const { success, reset } = await loginLimiter.limit(`ip:${ip}`);
  if (!success) {
    const retry = Math.ceil((reset - Date.now()) / 1000);
    return Response.json(
      { error: "試行回数が上限に達しました。しばらく待って再試行してください。" },
      { status: 429, headers: { "Retry-After": String(retry) } },
    );
  }

  // 2) 入力検証（外部入力は境界で必ず Zod で絞る）
  const parsed = Body.safeParse(await req.json());
  if (!parsed.success) return Response.json({ error: "invalid" }, { status: 400 });

  // 3) 本処理（認証）。成功/失敗を漏らさないメッセージにする…
  // ブルートフォースをさらに抑えるなら、IP単位に加えてアカウント単位の制限も併用する
}
```

The point is the **order.** Place rate limiting **before** heavy processing like input validation or authentication. Not making the attacker run high-cost processing — this is the crux of cost-type DoS countermeasures.

> **`Retry-After` is etiquette and kindness.** When you return 429, adding `Retry-After` (seconds or an HTTP date) makes well-behaved clients (your own front, legitimate bots, retry mechanisms) **wait accordingly.** This reduces wasteful retries and improves both server load and client experience. Always return the remaining count with `RateLimit-*`-family headers, and clients can self-restrain before reaching the cap.

In real operation, horizontal controls like rate limiting, Origin verification, and CSP are often **consolidated into middleware.** CSRF protection via Origin verification is carved out into [Server Actions CSRF / Origin Protection](/blog/nextjs-csrf-origin-protection-server-actions-guide), and hardening of CSP and security headers into [CSP Nonce and Security-Header Design](/blog/nextjs-security-headers-csp-nonce-middleware-guide). These are typical horizontal controls of "write once and it takes effect on all requests," belonging to the same layer as rate limiting.

---

## 8. The honest scope — app-layer rate limiting is no substitute for DDoS

Let me emphasize here. **This article's rate limiting is an "application-layer (L7)" abuse countermeasure, and doesn't prevent volumetric DDoS (L3/L4).**

| Attack type | Example | Layer it protects | Preventable with this article's method? |
|---|---|---|---|
| **L7 abuse** | Login brute force, OTP rapid-fire, API abuse, cost-type DoS | App layer (route/middleware) | ✅ This is the target |
| **L3/L4 volumetric DDoS** | SYN flood, UDP flood, bandwidth saturation | Network / edge | ❌ The edge/WAF's domain |
| **L7 volumetric DDoS** | Saturate the function with a huge HTTP flood | Edge / WAF + app | △ Absorb at the edge, complement in the app |

The reason is simple. For app-layer rate limiting to judge "reject," the request needs to **reach your function and hit Redis once.** If millions per second of attack traffic come, **the processing of counting the rejection itself** becomes overloaded, and the Redis cost spikes too. Volumetric DDoS must be absorbed **before** reaching the function — at the network edge.

So the correct stance is **using both together.**

- **Edge / WAF**: absorb volumetric DDoS, known malicious IPs, bots, and L7 floods **in front of the function.**
- **App-layer rate limiting (this article)**: precisely narrow, **near the business logic**, "legitimate-looking but abusive" requests that passed through the edge — brute force, OTP abuse, cost-type DoS.

Either one alone is insufficient, and they don't substitute for each other. **"DDoS is fine because I put in rate limiting" is wrong.** App-layer limiting complements the edge's defense, it doesn't replace it — make this line clear to both the orderer and the team. By the way, the decision axes of how far to do in-house and from where to entrust to an expert — "automatable horizontal controls" and "vertical risks only design can protect (authorization/IDOR, etc.)" — are organized in [The Scope Where a Security Audit Becomes Necessary](/blog/nextjs-supabase-security-audit-scope-when-needed-guide).

---

## 9. Pre-production checklist

Whether outsourced or AI-made, before shipping rate limiting to production, confirm at minimum just this.

- [ ] **State isn't in process memory** (counting with `new Map()` / a module-scope variable is not allowed). It's in a shared store (Redis, etc.)
- [ ] The count is **shared across all function instances** (verified on the premise of multiple instances)
- [ ] The read-modify-write is **atomic** (Lua or a battle-tested library. Not a naive GET→SET)
- [ ] Uses a **sliding window**, avoiding the fixed window's boundary burst (especially on paths needing strictness)
- [ ] The **key design** is appropriately split by IP, route, and user, with heavier processing having a stricter cap
- [ ] The client IP is obtained **only via a trusted proxy** (not swallowing the raw `x-forwarded-for` tail whole)
- [ ] On rejection, returns **429 + `Retry-After`**, and conveys the remaining count with `RateLimit-*`
- [ ] Places rate limiting **before heavy processing (authentication, AI inference, external APIs)**
- [ ] The rate-limit store's credentials are **not exposed with `NEXT_PUBLIC_`** (env only)
- [ ] **Separately uses an edge/WAF DDoS countermeasure together**, not misunderstanding that "the app layer alone can prevent DDoS"

From the orderer's viewpoint, the most effective is the single question **"where do you save the rate-limit counter?"** If the answer is "in memory" or "in a `Map`," there's a high chance it's not taking effect in serverless. A good developer can immediately answer "in a shared store like Redis, atomically."

---

## 10. Summary: put the state outside the process, and rate limiting takes effect correctly

Let me organize the key points.

- Because serverless (Vercel / Lambda) functions are **disposable, multiple-concurrent, and initialized on cold start**, **an in-process `Map` rate limit will reliably break under production scale.** Working locally is not a counterargument.
- The correct answer is to **put the state outside the process (a shared store).** Consolidate the counter in a single Redis, etc., visible to all instances.
- **A fixed window passes 2× the cap with a boundary burst.** In most cases, make the **sliding window (approximate is enough)** the default.
- To count correctly under concurrent requests, an **atomic read-modify-write** is mandatory. Rather than hand-writing Lua, use a **battle-tested distributed rate limiter** like `@upstash/ratelimit`.
- Design the key by **IP, route, and user**, and obtain the IP **only via a trusted proxy.** On rejection, return **429 + `Retry-After`.** This is the countermeasure against the abuse and cost runaway that OWASP **API4:2023 (Unrestricted Resource Consumption)** points to.
- **Honestly, app-layer rate limiting is no substitute for L3/L4 volumetric DDoS.** That's the edge/WAF's domain — **use both together.** No product or method "is safe because you put it in."

The implementation of horizontal controls like this "rate limiting that works correctly in serverless" is the domain my published OSS [Aegis](/aegis) supports. It drop-in hardens **controls that apply uniformly across the app** — headers/CSP, rate limiting, CSRF, typed env — and visualizes the current state with `npx @aegiskit/cli scan`. But honestly, Aegis helps the implementation of horizontal controls and goes as far as **detecting and warning** on vertical risks like authorization/IDOR — it's **not magic that makes you "completely safe."** If you need a design review of rate limiting, or hardening of a whole serverless Next.js app, I undertake it with a [security audit](/aegis/audit). I myself have designed in real operation the reliability layer (retries, idempotency, flow control) on the pay-per-use, high-load payment paths of an [environment-sector serverless payment platform](/case-studies/payment-platform-reliability).

Building fast with AI is itself correct. **Making what you built fast take effect correctly without leaking** — if you need that design or verification, feel free to consult us.

---

## Frequently Asked Questions (FAQ)

**Q. Can I use something other than Upstash?**
A. Sure. The requirement is only "a shared store visible to all function instances, where you can count atomically." Candidates include Vercel KV, your own Redis/Valkey, Memcached (using CAS), etc. What matters is not the product name but that it's **outside process memory and the read-modify-write is atomic.** `@upstash/ratelimit` is widely used for the ease of being hit over HTTP from serverless, but the essence is shared + atomicity.

**Q. Is a fixed window no good?**
A. It's not "no good" but "depends on whether you can tolerate the boundary burst." For a loose overall limit, a fixed window does little real harm. On the other hand, for paths facing "attacks that want to concentrate in a short time," like login attempts, OTP sending, or billing processing, 2× passing at the boundary can't be ignored. When unsure, going with a sliding window is safe.

**Q. Can rate limiting prevent DDoS?**
A. No. As in Section 8, app-layer rate limiting takes effect on L7 abuse (brute force, OTP abuse, cost-type DoS), but L3/L4 volumetric DDoS is a domain to absorb at the edge/WAF in front of the function. Both are needed, and one doesn't substitute for the other. The understanding "rate limiting = DDoS countermeasure" is dangerous.

**Q. Should I place it in middleware or a route handler?**
A. Using both is the standard play. Middleware runs in front of all requests, so it's suited to "cross-cutting limits" and "rejecting before reaching heavy processing." A route handler is suited to endpoint-specific caps like "login 5/min, AI inference 20/hour." Split by role: the cross-cutting defense line in middleware, the precise limit in the route handler.

**Q. What's the correct way to get the client IP?**
A. It depends on the deployment configuration. On Vercel, trust the un-tamperable `x-real-ip`. If your own proxy (Nginx/ALB) is in front, grasp "the number of trusted proxies" and adopt the Nth from the right of `x-forwarded-for`. The common principle is **"don't make anything other than a header attached by a trusted proxy the grounds for the IP."** Swallow the raw `x-forwarded-for` tail whole, and the attacker can spoof the header to change the key and evade the cap.

**Q. Should I change the key before and after authentication?**
A. Yes. For not-logged-in paths (login attempts, public APIs), per-IP becomes the main axis, but beware collateral damage under NAT. After authentication, making the verified user ID the main axis is accurate and avoids collateral damage too. Especially for cost-type DoS (rapid-firing heavy processing), narrowing per user is the most effective.

---

## References

- [OWASP API4:2023 — Unrestricted Resource Consumption](https://owasp.org/API-Security/editions/2023/en/0xa4-unrestricted-resource-consumption/)
- [OWASP Application Security Verification Standard (ASVS)](https://owasp.org/www-project-application-security-verification-standard/)
- [OWASP Top 10 (Web)](https://owasp.org/www-project-top-ten/)
- [Next.js Documentation](https://nextjs.org/docs)
