Rate Limiting That 'Actually Works' in Next.js — Why In-Memory Breaks in Serverless, and Distributed-Store Design

Let me state the conclusion first. In serverless Next.js (Vercel / AWS Lambda), an implementation that does rate limiting by "counting in an in-process Map" will reliably break in production. Because function instances are disposable, multiple of them start up at the same time, and a different one comes up on each cold start, so the counter isn't shared. It works perfectly locally, passes tests, and looks fine in a demo — which is exactly what makes it troublesome, as it silently passes through under production scale.

This isn't a "Next.js is bad" or "don't use serverless" story. Serverless is the right choice, and rate limiting can be implemented. The problem lies in a fundamental design mismatch: rate limiting is "processing that holds state (a counter)," yet serverless runs on the premise of "holding no state (stateless)." This article explains from the architecture why in-memory breaks, and designs — based on real code and published primary sources — the difference between a fixed window and a sliding window, the necessity of an atomic increment, key design, the correct way to get the client IP, and through to 429 + Retry-After. It's also a place AI-generated code almost always gets wrong, so it should help both those who write it and those who review it.

1. Why you need rate limiting — suppressing 4 abuses

Rate limiting is a mechanism that decides "the cap on the number of times one client can hit in a fixed time." Let me grasp what it protects, with concrete threats.

Threat	What happens	How rate limiting works
Login brute force	Password brute force, credential stuffing	Narrow the number of attempts per IP/account, making brute force impractical
OTP / email abuse	Rapid-fire sending of auth codes and password resets	Limit the number of sends, suppressing SMS/email costs and inbox spam
Scraping / API abuse	Mass retrieval of public endpoints, inventory-monitoring bots	Suppress data leakage and load, protecting the legitimate user's experience
Cost-type DoS	Rapid-firing heavy processing (AI inference, image generation, external APIs) to explode the bill	Limit higher-unit-cost processing more strictly, stopping the bill's runaway

The last "cost-type DoS" is especially important in the serverless era. In a pay-per-use cloud, the attacker doesn't even need to take down the server. Just by continuously hitting a heavy endpoint, your bill goes through the ceiling. This is exactly the risk OWASP positions at the top of the API Security Top 10 as API4:2023 Unrestricted Resource Consumption. Processing a request costs not only bandwidth, CPU, memory, and storage, but also monetary costs like SMS or third-party APIs — and if you don't put a cap on that consumption, both availability and cost become an attack surface, is OWASP's point.

Rate limiting is the most cost-effective horizontal control against these 4 abuses. It applies uniformly across the app, and implemented correctly, there's no need for "a human to think every time" either. Its positioning within the whole of Next.js app-layer security is mapped in the Next.js × Supabase Application Security Complete Guide, so refer there for the overall picture. This article digs into the "rate limiting" within it, narrowed to the serverless-specific pitfalls.

2. Why an in-memory `Map` breaks in serverless

Search, and this kind of code comes up in droves. Ask AI to "implement rate limiting in Next.js," and there's a fairly high chance this comes back.

// ❌ 壊れる：プロセス内 Map にカウンタを置くレート制限
const hits = new Map<string, { count: number; resetAt: number }>();

export function rateLimit(ip: string, limit = 10, windowMs = 60_000): boolean {
  const now = Date.now();
  const rec = hits.get(ip);

  if (!rec || now > rec.resetAt) {
    hits.set(ip, { count: 1, resetAt: now + windowMs });
    return true; // 許可
  }
  if (rec.count >= limit) return false; // 拒否
  rec.count += 1;
  return true;
}

In next dev's single process, it works perfectly. So you think "it works" and ship to production. But in serverless, there are 3 reasons this hits map is unreliable.

2-1. Instances are disposable (stateless)

Vercel and Lambda functions start up to process a request and, after a while, are discarded. Process memory is bound to the function instance's lifespan, and when the instance disappears, the hits map disappears wholesale too. If the next request is processed by a new instance, the counter restarts from 0. The attacker can reset the cap just by "waiting for the instance to be swapped out." Next.js's official documentation also repeatedly states to design on the premise that serverless/edge functions are stateless (Next.js docs).

2-2. Multiple instances run at the same time (horizontal scaling)

This is the most fatal. As load rises, the platform scales the function horizontally and starts up tens or hundreds of instances at the same time. Each instance has its own dedicated hits map and doesn't know the others' counts.

実際の挙動：limit=10/分 のつもりが、インスタンスごとに 10 を許す

  攻撃者 ──┬─→ インスタンスA（自分の Map: 10回まで許可）
          ├─→ インスタンスB（別の Map: さらに10回）
          ├─→ インスタンスC（別の Map: さらに10回）
          └─→ …N個 → 実効上限 = 10 × N（事実上、無制限）

The more the load balancer distributes requests to each instance, the more the total number the attacker can pass increases in proportion to the number of instances. That is, the more it scales, the looser the rate limit becomes — the exact opposite of the intent.

2-3. Reset on cold start

With no access for a while, the function goes dormant and cold-starts on the next request. At this time the process is recreated, and the in-memory counter is initialized. The more intermittent the traffic, the more frequently the counter resets, losing the meaning of the "window."

Conclusion: rate limiting inherently needs "state shared across multiple instances." As long as that state is in process memory, it's fundamentally impossible to count correctly in serverless. The only correct answer is to put the state outside the process — in a single shared store visible to all instances.

"It worked locally" is not a counterargument. The dread of an in-memory implementation is that it looks normal in all of development, testing, and small-scale demos. It breaks only "when it scales in production." So the accident surfaces at the most inconvenient timing (in the middle of an attack). Verification needs to be designed assuming multiple instances, not a single process.

3. Fixed window vs sliding window — the boundary-burst problem

Once you've decided to move the state to a shared store, next is "how to count." A naive implementation is a fixed window, but this has an easy-to-overlook flaw.

3-1. The boundary burst of a fixed window

A fixed window is the method of "resetting the counter at second 00 every minute." The implementation is simple, but it has the problem of passing 2× the cap at the window changeover.

limit = 10/分、固定ウィンドウの場合：

  12:00:00 ───────────── 12:00:59 │ 12:01:00 ───────────── 12:01:59
                          ↑10回    │ ↑10回
                  12:00:59 に10回   │  12:01:00 に10回
                  ───────────────────────────
                  60秒未満の間に 20回 通過してしまう

Send 10 times at 12:00:59 and 10 times at 12:01:00, and 20 times pass within just over 1 second. Each window individually keeps the cap, yet straddling windows breaks the cap — this is the boundary-burst problem. An attacker who wants to "concentrate in a short time," like brute force or cost-type DoS, aims at this boundary.

3-2. Count smoothly with a sliding window

A sliding window always looks at "from this very moment, the past 60 seconds." Since there's no fixed division, a boundary burst doesn't occur.

スライディングウィンドウ：リクエスト時刻を起点に「直前の60秒」を毎回数える

  ……[━━━━━━ 直前の60秒間のリクエスト数を数える ━━━━━━]→ 今
       この窓は時間とともに連続的にスライドする（固定の区切りが無い）

The strictest implementation records each request's timestamp in a sorted set (Redis's ZSET) and counts every time "how many are in the range of the past 60 seconds." It's accurate but uses memory proportional to the number of requests per request. The slidingWindow provided by @upstash/ratelimit approximates this with a weighted average of the current and previous windows, achieving both memory efficiency and smoothness. For many apps, this approximation is enough.

Algorithm	Pros	Cons	Suited use
Fixed window	Simple, light implementation	Passes 2× the cap with a boundary burst	A loose limit prioritizing lightness over strictness
Sliding window (log)	Strictly accurate	Consumes memory proportional to the number of requests	Places needing strictness, like billing/OTP
Sliding window (approximate)	Smooth & memory-efficient	Tolerates a slight error	The default for the majority of API/login limits

When unsure, make the sliding window (approximate) the default. Limit the fixed window to situations where you can tolerate "2× passing at the boundary."

4. The atomic increment — don't break the count under concurrent requests

Even after moving to a shared store and choosing a sliding window, there's still a hole. Concurrency.

The core of rate limiting is a read-modify-write of "read the current value → increment → write back." If this is divisible (non-atomic), multiple requests arriving at the same time read the same old value, all judge "still under the cap," and pass all of them. This is called a race condition.

limit=10、現在値=9 のときに 5 本が同時到着（非アトミックな実装）

  req1: GET→9  req2: GET→9  req3: GET→9  req4: GET→9  req5: GET→9
   全員「9 < 10 だから OK」と判断
  → SET 10 が5回走り、本来1本しか通らないはずが 5本 通過（=14回目まで通る）

In serverless, because multiple instances hit the same key at the same time, this race is the norm, not an exception. To prevent it, you need to execute the read-modify-write as a single, indivisible operation. There are 2 means.

4-1. Redis's atomic commands / Lua

Because Redis executes commands serially, a single command like INCR is atomic. When you want to make multiple steps one unit, use a Lua script to execute "read, increment, set TTL, judge" together on the server side. Since the whole script isn't split, no race occurs.

// アトミックなレート制限を Lua で表現（読み・加算・初回のみTTL設定・判定を1単位で実行）
// KEYS[1] = レート制限キー / ARGV[1] = 上限 / ARGV[2] = ウィンドウ秒
const SLIDING_LUA = `
  local current = redis.call('INCR', KEYS[1])
  if current == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[2])  -- 初回だけ有効期限を設定
  end
  if current > tonumber(ARGV[1]) then
    return {0, current}                     -- 拒否
  end
  return {1, current}                        -- 許可
`;
// ↑これは固定ウィンドウの原子化。スライディングは ZSET で範囲削除→ZCARD→ZADD を
//   同じ Lua にまとめる（手書きするなら下記の通り、まず @upstash/ratelimit を勧める）

4-2. A distributed rate limiter (recommended)

Hand-writing Lua has many landmines: the TTL boundary, clock drift, the sliding approximation, etc. Avoid reinventing the wheel and use a battle-tested library is the right answer. @upstash/ratelimit uses an atomic script internally and works combined with an HTTP-based Redis (Upstash) callable from serverless (Vercel Edge / Lambda). The biggest advantage is that you don't have to guarantee atomicity, the sliding window, and distributed counting yourself.

// lib/ratelimit.ts — 分散ストアにアトミックなスライディングウィンドウを置く
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
import "server-only"; // この計数ロジックはサーバー専用。クライアントに混入させない

// 全関数インスタンスから見える単一の共有ストア（Redis）。プロセスメモリには置かない
export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(), // UPSTASH_REDIS_REST_URL / _TOKEN を env から読む
  limiter: Ratelimit.slidingWindow(10, "60 s"), // 直前60秒で10回まで（近似）
  analytics: true,
  prefix: "rl", // キーの名前空間
});

The values Redis.fromEnv() reads are secrets. Put UPSTASH_REDIS_REST_URL / UPSTASH_REDIS_REST_TOKEN only in env, and never attach NEXT_PUBLIC_. Make the rate-limiting logic callable from the client, and the attacker can observe and evade the cap itself.

5. Key design — decide the granularity of IP, route, and user

What decides "whose how-many-times to count" is key design. Get the granularity wrong, and it's either too loose to take effect, or too strict and causes collateral damage.

Key axis	Example	Suited use	Caution
Per IP	`rl:ip:203.0.113.5`	Not-logged-in paths (login attempts, public APIs)	Under NAT/proxy it can sweep in legitimate users
Per route	`rl:login:…` / `rl:ai:…`	Change the cap per endpoint	Heavier processing stricter, lighter processing looser
Per user	`rl:user:<uid>`	Post-login abuse, cost-type DoS	Possible only when authenticated (use the verified ID value)

In practice, the standard play is to combine multiple axes. For example, "limit login attempts both per IP and per account," "limit higher-unit-cost processing like AI inference strictly per user." Prepare a separate Ratelimit instance per route and match the cap to the endpoint's "heaviness."

// ルートの「重さ」に応じて別々の上限を持つ（重い処理ほど厳しく）
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();

export const loginLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, "60 s"), // ログインは厳しめ：5回/分
  prefix: "rl:login",
});

export const aiLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(20, "1 h"), // 単価の高いAI推論：20回/時
  prefix: "rl:ai",
});

Beware collateral damage (false positives). Narrow strictly per IP only, and in the case where many legitimate users share one IP under a company's, school's, or mobile carrier's NAT, unrelated people get locked out. After authentication, make per-user the main axis, and use per-IP together as a defense line for not-logged-in paths — this distinction balances effect and collateral damage.

6. The correct way to get the client IP — the `x-forwarded-for` trap

If you count per IP, you need to correctly get "the client's real IP." This is the most error-prone point security-wise.

In serverless, a request reaches the function via the platform's proxy / load balancer. At this time the original client IP is stacked in the x-forwarded-for header. The problem is — the header can be spoofed by the client. If the attacker attaches x-forwarded-for: 1.2.3.4 on their own and sends it, they can change the rate-limit key every time and evade the cap.

// ❌ 危険：x-forwarded-for を無条件に信じる（攻撃者が偽装してキーを変え放題）
const ip = req.headers.get("x-forwarded-for") ?? "anonymous";

// ❌ 危険：複数値の末尾を取る（末尾はクライアントが注入できる）
const ip = req.headers.get("x-forwarded-for")?.split(",").pop()?.trim();

x-forwarded-for chains comma-separated like client, proxy1, proxy2. It's a mechanism where each proxy appends "the IP of the party it received from." So "what's trustworthy is only the value attached by a proxy you trust."

The correct way to get it depends on the platform.

Vercel attaches a dedicated, un-tamperable header x-real-ip (and x-vercel-forwarded-for). On Vercel, using these platform-guaranteed headers is safer than picking up the tail of the raw x-forwarded-for.
Under your own proxy (Nginx/ALB, etc.), knowing "the number of trusted proxies (trusted hops)," adopt the Nth from the right of x-forwarded-for = the IP the last trusted proxy received. This is to not read the fake value the client injected at the front.

// lib/client-ip.ts — 信頼できるプロキシ経由でのみIPを採る
import "server-only";

/**
 * クライアントIPを安全に解決する。
 * Vercel ではプラットフォームが付ける x-real-ip を信頼する（改ざん不可）。
 * 自前プロキシ環境では trustedHops を環境に合わせて設定し、右からN番目を採る。
 */
export function getClientIp(req: Request, trustedHops = 1): string {
  // 1) Vercel が保証するヘッダーを最優先（クライアントは上書きできない）
  const real = req.headers.get("x-real-ip");
  if (real) return real;

  // 2) x-forwarded-for は「右から数えて trustedHops 番目」を採る
  //    （末尾＝最後の信頼プロキシが観測したIP。先頭はクライアントが偽装可能）
  const xff = req.headers.get("x-forwarded-for");
  if (xff) {
    const parts = xff.split(",").map((s) => s.trim()).filter(Boolean);
    const idx = parts.length - trustedHops;
    if (idx >= 0 && parts[idx]) return parts[idx];
  }

  // 3) どれも無ければ匿名扱い（共有キーになるため最も厳しい上限を適用すべき）
  return "anonymous";
}

The principle in one line: "don't make anything other than a header attached by a trusted proxy the grounds for the IP." Which header you may trust is decided by your own deployment configuration (whether it's Vercel, whether Cloudflare or ALB is in front). Reading the tail of the raw x-forwarded-for while leaving this ambiguous is the most frequent evasion point.

7. Implementation — middleware and route handler

Now that the parts are in place, build them into Next.js. There are 2 placements, used by purpose.

7-1. Reject "broadly and early" with middleware

middleware.ts runs in front of all requests, so it's suited to cross-cutting limits (overall abuse from the same IP, protecting a specific path subtree). The advantage is rejecting before reaching heavy processing.

// middleware.ts — エッジ手前で横断的にレート制限し、429 + Retry-After を返す
import { NextResponse, type NextRequest } from "next/server";
import { ratelimit } from "@/lib/ratelimit";
import { getClientIp } from "@/lib/client-ip";

export async function middleware(request: NextRequest) {
  const ip = getClientIp(request);

  // ルートを名前空間に含め、エンドポイントごとに数を分ける
  const { success, limit, remaining, reset } = await ratelimit.limit(
    `${ip}:${request.nextUrl.pathname}`,
  );

  // 上限・残数・リセット時刻を常に返す（クライアントが自制できる）
  const headers = new Headers();
  headers.set("RateLimit-Limit", String(limit));
  headers.set("RateLimit-Remaining", String(Math.max(0, remaining)));
  headers.set("RateLimit-Reset", String(Math.ceil((reset - Date.now()) / 1000)));

  if (!success) {
    headers.set("Retry-After", String(Math.ceil((reset - Date.now()) / 1000)));
    return NextResponse.json(
      { error: "Too Many Requests" },
      { status: 429, headers }, // ← 429。Retry-After で「何秒後に再試行可」を伝える
    );
  }

  const res = NextResponse.next();
  headers.forEach((v, k) => res.headers.set(k, v));
  return res;
}

// 保護対象を絞る（静的アセット等は除外し、無駄なRedis呼び出しを避ける）
export const config = {
  matcher: ["/api/:path*", "/login"],
};

7-2. Narrow "pinpoint" with a route handler

A cap specific to a particular endpoint (5 login attempts/min, 20 AI inferences/hour, etc.) is applied individually at the route handler's entrance. Use a per-route Ratelimit instance.

// app/api/login/route.ts — ログイン経路に厳しい上限を個別適用
import { loginLimiter } from "@/lib/ratelimit";
import { getClientIp } from "@/lib/client-ip";
import { z } from "zod";

const Body = z.object({ email: z.string().email(), password: z.string().min(1) });

export async function POST(req: Request) {
  // 1) まずレート制限（重い認証処理に入る前に弾く）
  const ip = getClientIp(req);
  const { success, reset } = await loginLimiter.limit(`ip:${ip}`);
  if (!success) {
    const retry = Math.ceil((reset - Date.now()) / 1000);
    return Response.json(
      { error: "試行回数が上限に達しました。しばらく待って再試行してください。" },
      { status: 429, headers: { "Retry-After": String(retry) } },
    );
  }

  // 2) 入力検証（外部入力は境界で必ず Zod で絞る）
  const parsed = Body.safeParse(await req.json());
  if (!parsed.success) return Response.json({ error: "invalid" }, { status: 400 });

  // 3) 本処理（認証）。成功/失敗を漏らさないメッセージにする…
  // ブルートフォースをさらに抑えるなら、IP単位に加えてアカウント単位の制限も併用する
}

The point is the order. Place rate limiting before heavy processing like input validation or authentication. Not making the attacker run high-cost processing — this is the crux of cost-type DoS countermeasures.

Retry-After is etiquette and kindness. When you return 429, adding Retry-After (seconds or an HTTP date) makes well-behaved clients (your own front, legitimate bots, retry mechanisms) wait accordingly. This reduces wasteful retries and improves both server load and client experience. Always return the remaining count with RateLimit-*-family headers, and clients can self-restrain before reaching the cap.

In real operation, horizontal controls like rate limiting, Origin verification, and CSP are often consolidated into middleware. CSRF protection via Origin verification is carved out into Server Actions CSRF / Origin Protection, and hardening of CSP and security headers into CSP Nonce and Security-Header Design. These are typical horizontal controls of "write once and it takes effect on all requests," belonging to the same layer as rate limiting.

8. The honest scope — app-layer rate limiting is no substitute for DDoS

Let me emphasize here. This article's rate limiting is an "application-layer (L7)" abuse countermeasure, and doesn't prevent volumetric DDoS (L3/L4).

Attack type	Example	Layer it protects	Preventable with this article's method?
L7 abuse	Login brute force, OTP rapid-fire, API abuse, cost-type DoS	App layer (route/middleware)	✅ This is the target
L3/L4 volumetric DDoS	SYN flood, UDP flood, bandwidth saturation	Network / edge	❌ The edge/WAF's domain
L7 volumetric DDoS	Saturate the function with a huge HTTP flood	Edge / WAF + app	△ Absorb at the edge, complement in the app

The reason is simple. For app-layer rate limiting to judge "reject," the request needs to reach your function and hit Redis once. If millions per second of attack traffic come, the processing of counting the rejection itself becomes overloaded, and the Redis cost spikes too. Volumetric DDoS must be absorbed before reaching the function — at the network edge.

So the correct stance is using both together.

Edge / WAF: absorb volumetric DDoS, known malicious IPs, bots, and L7 floods in front of the function.
App-layer rate limiting (this article): precisely narrow, near the business logic, "legitimate-looking but abusive" requests that passed through the edge — brute force, OTP abuse, cost-type DoS.

Either one alone is insufficient, and they don't substitute for each other. "DDoS is fine because I put in rate limiting" is wrong. App-layer limiting complements the edge's defense, it doesn't replace it — make this line clear to both the orderer and the team. By the way, the decision axes of how far to do in-house and from where to entrust to an expert — "automatable horizontal controls" and "vertical risks only design can protect (authorization/IDOR, etc.)" — are organized in The Scope Where a Security Audit Becomes Necessary.

9. Pre-production checklist

Whether outsourced or AI-made, before shipping rate limiting to production, confirm at minimum just this.

From the orderer's viewpoint, the most effective is the single question "where do you save the rate-limit counter?" If the answer is "in memory" or "in a Map," there's a high chance it's not taking effect in serverless. A good developer can immediately answer "in a shared store like Redis, atomically."

10. Summary: put the state outside the process, and rate limiting takes effect correctly

Let me organize the key points.

Because serverless (Vercel / Lambda) functions are disposable, multiple-concurrent, and initialized on cold start, an in-process Map rate limit will reliably break under production scale. Working locally is not a counterargument.
The correct answer is to put the state outside the process (a shared store). Consolidate the counter in a single Redis, etc., visible to all instances.
A fixed window passes 2× the cap with a boundary burst. In most cases, make the sliding window (approximate is enough) the default.
To count correctly under concurrent requests, an atomic read-modify-write is mandatory. Rather than hand-writing Lua, use a battle-tested distributed rate limiter like @upstash/ratelimit.
Design the key by IP, route, and user, and obtain the IP only via a trusted proxy. On rejection, return 429 + Retry-After. This is the countermeasure against the abuse and cost runaway that OWASP API4:2023 (Unrestricted Resource Consumption) points to.
Honestly, app-layer rate limiting is no substitute for L3/L4 volumetric DDoS. That's the edge/WAF's domain — use both together. No product or method "is safe because you put it in."

The implementation of horizontal controls like this "rate limiting that works correctly in serverless" is the domain my published OSS Aegis supports. It drop-in hardens controls that apply uniformly across the app — headers/CSP, rate limiting, CSRF, typed env — and visualizes the current state with npx @aegiskit/cli scan. But honestly, Aegis helps the implementation of horizontal controls and goes as far as detecting and warning on vertical risks like authorization/IDOR — it's not magic that makes you "completely safe." If you need a design review of rate limiting, or hardening of a whole serverless Next.js app, I undertake it with a security audit. I myself have designed in real operation the reliability layer (retries, idempotency, flow control) on the pay-per-use, high-load payment paths of an environment-sector serverless payment platform.

Building fast with AI is itself correct. Making what you built fast take effect correctly without leaking — if you need that design or verification, feel free to consult us.

Frequently Asked Questions (FAQ)

Q. Can I use something other than Upstash? A. Sure. The requirement is only "a shared store visible to all function instances, where you can count atomically." Candidates include Vercel KV, your own Redis/Valkey, Memcached (using CAS), etc. What matters is not the product name but that it's outside process memory and the read-modify-write is atomic. @upstash/ratelimit is widely used for the ease of being hit over HTTP from serverless, but the essence is shared + atomicity.

Q. Is a fixed window no good? A. It's not "no good" but "depends on whether you can tolerate the boundary burst." For a loose overall limit, a fixed window does little real harm. On the other hand, for paths facing "attacks that want to concentrate in a short time," like login attempts, OTP sending, or billing processing, 2× passing at the boundary can't be ignored. When unsure, going with a sliding window is safe.

Q. Can rate limiting prevent DDoS? A. No. As in Section 8, app-layer rate limiting takes effect on L7 abuse (brute force, OTP abuse, cost-type DoS), but L3/L4 volumetric DDoS is a domain to absorb at the edge/WAF in front of the function. Both are needed, and one doesn't substitute for the other. The understanding "rate limiting = DDoS countermeasure" is dangerous.

Q. Should I place it in middleware or a route handler? A. Using both is the standard play. Middleware runs in front of all requests, so it's suited to "cross-cutting limits" and "rejecting before reaching heavy processing." A route handler is suited to endpoint-specific caps like "login 5/min, AI inference 20/hour." Split by role: the cross-cutting defense line in middleware, the precise limit in the route handler.

Q. What's the correct way to get the client IP? A. It depends on the deployment configuration. On Vercel, trust the un-tamperable x-real-ip. If your own proxy (Nginx/ALB) is in front, grasp "the number of trusted proxies" and adopt the Nth from the right of x-forwarded-for. The common principle is "don't make anything other than a header attached by a trusted proxy the grounds for the IP." Swallow the raw x-forwarded-for tail whole, and the attacker can spoof the header to change the key and evade the cap.

Q. Should I change the key before and after authentication? A. Yes. For not-logged-in paths (login attempts, public APIs), per-IP becomes the main axis, but beware collateral damage under NAT. After authentication, making the verified user ID the main axis is accurate and avoids collateral damage too. Especially for cost-type DoS (rapid-firing heavy processing), narrowing per user is the most effective.

Rate Limiting That 'Actually Works' in Next.js — Why In-Memory Breaks in Serverless, and Distributed-Store Design

1. Why you need rate limiting — suppressing 4 abuses

2. Why an in-memory `Map` breaks in serverless

2-1. Instances are disposable (stateless)

2-2. Multiple instances run at the same time (horizontal scaling)

2-3. Reset on cold start

3. Fixed window vs sliding window — the boundary-burst problem

3-1. The boundary burst of a fixed window

3-2. Count smoothly with a sliding window

4. The atomic increment — don't break the count under concurrent requests

4-1. Redis's atomic commands / Lua

4-2. A distributed rate limiter (recommended)

5. Key design — decide the granularity of IP, route, and user

6. The correct way to get the client IP — the `x-forwarded-for` trap

7. Implementation — middleware and route handler

7-1. Reject "broadly and early" with middleware

7-2. Narrow "pinpoint" with a route handler

8. The honest scope — app-layer rate limiting is no substitute for DDoS

9. Pre-production checklist

10. Summary: put the state outside the process, and rate limiting takes effect correctly

Frequently Asked Questions (FAQ)

References

Next.js × Supabase Application Security Complete Guide — Protecting Authorization and RLS with Vulnerability Detection and Defense in Depth

Vulnerability assessment of AI-generated code (vibe coding) [2026 edition] — a practical procedure to crush, before release, the vulnerabilities that generative AI multiplies

CSRF / Origin protection for Next.js Server Actions — what's protected by default, and what you should add

Next.js Environment Variables and Secret-Leak Countermeasures — The NEXT_PUBLIC_ Trap, and the Typed env Boundary

Also worth reading

Vercel environment-variable / secret management guide: 3 environments, the NEXT_PUBLIC_ trap, OIDC keyless, and a type-safe boundary

Vercel Routing Middleware implementation guide: auth gates, personalization, A/B, and redirects before the cache

Building your own auth hub that bundles multiple AI tools: BFF × OIDC × back-channel logout (PKCE required, PII encryption, audit logs)

1. Why you need rate limiting — suppressing 4 abuses

2. Why an in-memory Map breaks in serverless

2-1. Instances are disposable (stateless)

2-2. Multiple instances run at the same time (horizontal scaling)

2-3. Reset on cold start

3. Fixed window vs sliding window — the boundary-burst problem

3-1. The boundary burst of a fixed window

3-2. Count smoothly with a sliding window

4. The atomic increment — don't break the count under concurrent requests

4-1. Redis's atomic commands / Lua

4-2. A distributed rate limiter (recommended)

5. Key design — decide the granularity of IP, route, and user

6. The correct way to get the client IP — the x-forwarded-for trap

7. Implementation — middleware and route handler

7-1. Reject "broadly and early" with middleware

7-2. Narrow "pinpoint" with a route handler

8. The honest scope — app-layer rate limiting is no substitute for DDoS

9. Pre-production checklist

10. Summary: put the state outside the process, and rate limiting takes effect correctly

Frequently Asked Questions (FAQ)

References

Related articles

Next.js × Supabase Application Security Complete Guide — Protecting Authorization and RLS with Vulnerability Detection and Defense in Depth

Vulnerability assessment of AI-generated code (vibe coding) [2026 edition] — a practical procedure to crush, before release, the vulnerabilities that generative AI multiplies

CSRF / Origin protection for Next.js Server Actions — what's protected by default, and what you should add

Next.js Environment Variables and Secret-Leak Countermeasures — The NEXT_PUBLIC_ Trap, and the Typed env Boundary

Also worth reading

Vercel environment-variable / secret management guide: 3 environments, the NEXT_PUBLIC_ trap, OIDC keyless, and a type-safe boundary

Vercel Routing Middleware implementation guide: auth gates, personalization, A/B, and redirects before the cache

Building your own auth hub that bundles multiple AI tools: BFF × OIDC × back-channel logout (PKCE required, PII encryption, audit logs)

2. Why an in-memory `Map` breaks in serverless

6. The correct way to get the client IP — the `x-forwarded-for` trap