Build an internal AI platform and you almost always hit the same wall. The wall of "every time a tool is added, authentication, permissions, and logout proliferate separately."
A speech-synthesis tool, a telop-proofreading tool, a generative-AI review-support tool… it's healthy for each to grow as an independent app. But from the user's view, "once I log in, I want to use them all," and from the operator's view, "I want to lock out a departing employee in one place," "I want to change permissions per tool." Satisfying these two demands at once is the role of an auth hub (BFF: Backend for Frontend).
This article explains, at the implementation level, the design of the auth hub I built for an internal AI platform for a major domestic broadcaster. For confidentiality, proper nouns, domains, and project IDs are hidden, and the code is anonymized while preserving its structure, but the design judgments and mechanisms are all from the real product.
Rules for this article: the single source of truth is the real code. I write not the generality of "this is how it should be" but only the design judgments of "I actually built it this way and it runs in production." On the other hand, I don't handle business figures like user count or cost reduction, since they need the client's real data.
1. Why "don't make each tool implement authentication"
The first and most important design judgment is this. Don't make the tool side hold any authentication logic. Fully consolidate identity responsibility into the BFF (auth hub).
When each tool implements its own OAuth client, the following problems arise.
- Quality variance: one tool uses PKCE, another forgets state verification. The weakest point of security becomes the whole platform's strength.
- Logout doesn't propagate: log out of tool A and tool B's session survives. Immediate revocation of a departing employee can't be done.
- Permission changes aren't reflected: drop a permission in the admin screen and each tool keeps running while grabbing the old token.
So the configuration I adopted is "make the BFF the sole identity provider (IdP)." The big picture is like this.
[employee browser]
│ ① log in to the BFF with Google Workspace SSO (NextAuth.js v5 / Identity Platform)
▼
[BFF (auth hub)]── OIDC provider ──┐
│ ② /api/oauth/authorize (PKCE S256 required)
│ ③ /api/oauth/token → issue a short-lived JWT per tool
│ (aud = tool ID, lifetime 10 min)
│ ④ /api/auth/tool-callback (pass the token via an auto-POST form)
▼
[each AI tool (speech synthesis / telop proofreading / review support …)]
▲ ⑤ authenticate by verifying the received JWT's aud / signature / revocation
│
[BFF]── back channel ──> each tool
⑥ asynchronously deliver logout / permission changes with an HMAC signature
The point is that all a tool knows is "how to verify a short-lived token addressed to itself." The user DB, SSO, passwords, MFA… the body of identity is all closed within the BFF. The tool stays loosely coupled and can follow company-wide login/logout.
2. OIDC authorization endpoint: don't make PKCE S256 "optional"
The BFF behaves as a small OIDC provider. The crux of the authorization endpoint (/api/oauth/authorize) is to make PKCE (Proof Key for Code Exchange) mandatory.
PKCE was originally a spec for mobile/SPA, but I make it mandatory even for internal server-side web. The reason is simple: I want to guarantee, with no room for a configuration gap, the state where "even if the authorization code leaks through some path, an attacker without code_verifier can't exchange it for a token."
// app/api/oauth/authorize/route.ts(匿名化・抜粋)
export async function GET(req: NextRequest) {
const params = AuthorizeQuery.parse(/* Zodで検証 */);
// PKCE は「あれば使う」ではなく「無ければ拒否」
if (params.code_challenge_method !== "S256" || !params.code_challenge) {
return oauthError("invalid_request", "PKCE S256 is required");
}
// 登録済みツールだけを許可(aud = client_id をDBで照合)
const tool = await registeredTools.findByAudience(params.client_id);
if (!tool || !tool.isActive) {
return oauthError("unauthorized_client");
}
// redirect_uri はツール登録時のホワイトリストと完全一致
if (!tool.allowedRedirectUris.includes(params.redirect_uri)) {
return oauthError("invalid_request", "redirect_uri mismatch");
}
// 認可コードは単回使用・短寿命でDB保存(code_challenge も一緒に保存)
const code = await authorizationCodes.issue({
sub: session.userId,
aud: params.client_id,
codeChallenge: params.code_challenge,
nonce: params.nonce,
expiresInSec: 600,
});
return redirectWithCode(params.redirect_uri, code, params.state);
}
There are 3 design judgments at work here.
client_id(= audience) is DB-registered. A stray tool can't request a token.redirect_uriexactly matches the registered whitelist. It crushes open redirectors.- The authorization code is stored single-use, short-lived, in the DB, with
code_challengetied to it. The next token endpoint matches it againstcode_verifier.
3. Token endpoint: lifetimes "short and by purpose"
The token endpoint (/api/oauth/token) verifies the authorization code and issues a tool-dedicated short-lived JWT. The design decisions are as follows.
| Token | Lifetime | Role |
|---|---|---|
| Access token | 10 min | authorize API calls. Short-lived to minimize the damage window on leak |
| ID token | 1 hour | assert the user identity (sub / aud / roles) |
| Authorization code | 10 min, single-use | dedicated to code→token exchange |
// app/api/oauth/token/route.ts(匿名化・抜粋)
const ACCESS_TOKEN_EXPIRY_SEC = 600; // 10分
const ID_TOKEN_EXPIRY_SEC = 3600; // 1時間
export async function POST(req: NextRequest) {
const body = await parseForm(req);
if (body.grant_type !== "authorization_code") {
return oauthError("unsupported_grant_type");
}
const auth = await authorizationCodes.consume(body.code); // 単回使用:消費したら無効化
if (!auth || auth.expired) return oauthError("invalid_grant");
// PKCE 検証:S256(code_verifier) === 保存した code_challenge
const challenge = base64url(sha256(body.code_verifier));
if (!timingSafeEqual(challenge, auth.codeChallenge)) {
return oauthError("invalid_grant", "PKCE verification failed");
}
// ツールに対する実効ロールを解決(後述:Redisキャッシュ)
const role = await roleResolver.getRoleForAudience(auth.sub, auth.aud);
if (role === "NONE") return oauthError("access_denied");
const claims = {
sub: auth.sub,
aud: auth.aud, // ← 必ず単一ツール宛て
sid: auth.sid, // デバイス単位のセッションID
sid_gen: auth.sidGeneration, // 失効世代
ent_ver: auth.entitlementsVersion, // 権限バージョン
roles: [role],
};
return json({
access_token: signJwt(claims, ACCESS_TOKEN_EXPIRY_SEC),
id_token: signJwt({ ...claims, nonce: auth.nonce }, ID_TOKEN_EXPIRY_SEC),
token_type: "Bearer",
expires_in: ACCESS_TOKEN_EXPIRY_SEC,
});
}
Putting sid (the device-level session ID), sid_gen (the revocation generation), and ent_ver (the permission version) in the claims is the foreshadowing for the later stage. These enable "logout of only a specific device" and "detection of permission changes."
On the choice of algorithm: this product uses a symmetric key (HMAC / HS256) between internal services within the same trust boundary. Because the issuer and verifier are under the same operation and the shared secret can be safely distributed with Secret Manager. To extend to a configuration that entrusts token verification to a third party (e.g., an external partner's tool), switch to an asymmetric key (RS256 + JWKS publication). The principle is that the key scheme is decided by "who verifies."
4. Passing the token: don't put it in the URL
Plain but effective is here. The textbook OIDC flow returns the authorization code in the query string, but when passing the final token to the tool, you must not put it in the URL. Because the URL remains in browser history, the Referer header, proxy logs, and access logs.
So tool-callback passes the token with an auto-submitting HTML form (POST). Furthermore, to prevent tampering of the redirect destination's origin, it bundles an HMAC signature of the origin.
// app/api/auth/tool-callback/route.ts(匿名化・抜粋)
// redirectBase(戻り先オリジン)を署名し、改ざんを検知可能にする
const redirectBaseSig = createHmac("sha256", SHARED_SECRET)
.update(redirectBase)
.digest("hex");
return new Response(
`<!doctype html><html><body onload="document.forms[0].submit()">
<form method="POST" action="${escapeHtml(redirect)}">
<input type="hidden" name="token" value="${escapeHtml(token)}">
<input type="hidden" name="redirectBase" value="${escapeHtml(redirectBase)}">
<input type="hidden" name="redirectBaseSig" value="${redirectBaseSig}">
<input type="hidden" name="sid" value="${escapeHtml(sid)}">
</form>
</body></html>`,
{ headers: { "content-type": "text/html; charset=utf-8" } },
);
This form has a side benefit too: it isn't bound by URL length limits. Even with rich permission info, a form body carries it without problem. The receiving tool re-computes redirectBaseSig and confirms a match, discarding the token if the signature doesn't match.
5. Back-channel logout: "lock out company-wide" while staying loosely coupled
Where the auth hub's true value is tested is logout. When a user logs out, or an admin revokes a permission, all tools' sessions must be reliably invalidated. But I want to keep the tools loosely coupled. I solve this tension with back-channel events that are signed, retry-equipped, and dedup-equipped.
// lib/backchannel/dispatch.ts(匿名化・抜粋)
async function dispatch(event: BackchannelEvent, tool: RegisteredTool) {
const body = JSON.stringify(event); // 例: { type: "SID_REVOKED", sid, ts }
const sig = createHmac("sha256", tool.sharedSecret)
.update(`${event.ts}.${body}`) // タイムスタンプ込みで署名(リプレイ対策)
.digest("hex");
// 一意制約 (eventId, toolId) で「同じイベントを二度処理しない」を保証
const log = await backchannelLog.create({ eventId: event.id, toolId: tool.id });
// 指数バックオフで最大3回(恒久失敗は打ち切ってアラート)
for (const delayMs of [0, 1000, 5000, 30000]) {
if (delayMs) await sleep(delayMs);
const res = await fetch(tool.backchannelEventUrl, {
method: "POST",
headers: { "x-signature": sig },
body,
});
if (res.ok) return backchannelLog.markDelivered(log.id);
}
await backchannelLog.markFailed(log.id); // 監視で検知 → 手動リカバリ
}
There are 3 design cruxes.
- HMAC signature + timestamp: the tool can verify "is this really an event the BFF emitted" and reject replays.
- The
(eventId, toolId)unique constraint: even if a network re-send comes, the tool side doesn't double-process (idempotent). - DB tracking + backoff retry: temporary failures are auto-absorbed, and only permanent failures are escalated to an alert.
The tool side just drops the relevant session to REVOKED on the received SID_REVOKED. Without knowing the BFF's user model, the tool can participate in company-wide logout.
And the sid_gen / ent_ver put in the token work here. Even before the back channel arrives, the tool can detect a permission change early by, at verification time, querying the BFF whether "the ent_ver it holds is stale" (or matching it against the cached latest version). With the two-stage of push (back channel) and pull (version matching), it prevents missed revocations.
6. Cache role resolution — but provide a "bypass"
Resolving the effective role a user has for a tool is surprisingly heavy processing. Because it traverses Google Workspace group memberships and the permission tables of external user groups. Pulling all this on every token issuance makes login slow.
So I cache the role in Redis with a 30-minute TTL. But just after a permission change, I don't want to return the old value. So I always provide a path that intentionally bypasses the cache.
// lib/auth/role-resolver.ts(匿名化・抜粋)
async function getRoleForAudience(userId: string, aud: string): Promise<Role> {
const key = `tool-role:${userId}:${aud}`;
const cached = await cache.get<Role>(key);
if (cached) return cached;
const role = await resolveFromGroups(userId, aud); // 重い:グループ横断
await cache.set(key, role, 30 * 60);
return role;
}
// 権限を変更した直後はこちらで「いま」の値を取り、キャッシュも更新する
async function getRoleForAudienceBypassCache(userId: string, aud: string) {
const role = await resolveFromGroups(userId, aud);
await cache.set(`tool-role:${userId}:${aud}`, role, 30 * 60);
return role;
}
A cache is both "a tool to make things fast" and "the risk that an old permission remains." If you introduce a cache, always design the invalidation/bypass path as a set — this is a universal principle beyond authorization.
Note that, so login isn't stopped even if Redis goes down, the cache layer automatically falls back to in-memory. It's the judgment of placing availability above the availability of the dependency middleware.
7. Hold PII encrypted — and make it coexist with "search"
In a broadcaster's internal controls, the handling of users' personal information (email, phone, name) is strictly questioned. Put it in the DB plainly and the moment a DB dump leaks, all PII spills. So I encrypt it at storage time with AES-256-GCM.
The problem is "you can't search if encrypted." In the admin screen, partial-match search like "find an external user by part of the email" is needed. I solve this by pulling with an HMAC token without decrypting.
// lib/crypto/pii.ts(匿名化・抜粋)
// 保存:本体は AES-256-GCM、検索用に決定的HMACのトークンを別カラムへ
function sealEmail(email: string) {
return {
emailEncrypted: aesGcmEncrypt(email, KEY), // 可逆・表示用(IV+tag付き)
emailHash: hmacSha256(email.toLowerCase(), HMAC_KEY), // 完全一致検索用
searchTokens: ngram(email, 3).map((g) => hmacSha256(g, HMAC_KEY)), // 部分一致用
};
}
// 検索:入力もHMAC化して、ハッシュ列にインデックスを張って引く(復号ゼロ)
async function searchByEmailFragment(fragment: string) {
const token = hmacSha256(fragment, HMAC_KEY);
return db.externalUserSearchToken.findMany({ where: { token } });
}
The point is not to decrypt the DB for searching. Since plaintext PII isn't expanded into memory at query execution, the leakage surface via logs or core dumps also shrinks. For comparisons between secret values (matching shared secrets, etc.), always use timingSafeEqual (constant-time comparison after running through SHA-256) to avoid length leaks and timing attacks.
Furthermore, sensitive operations like session creation, revocation, and update are recorded in an append-only audit log. Being able to later trace "who, when, whose device's session, and why revoked" is the evidence of internal controls.
8. First-line defense at the edge: middleware and WAF
Finally, the layer before reaching the app. The BFF's middleware runs on the Edge Runtime (under the constraint that Node.js modules can't be used) and does IP whitelisting + session-Cookie verification. Unauthenticated is 401 for an API, 307 to the auth screen for a page.
// middleware.ts(匿名化・抜粋)
export function middleware(req: NextRequest) {
if (isPublicPath(req.nextUrl.pathname)) return NextResponse.next();
if (!isAllowedIp(req)) return new NextResponse(null, { status: 403 });
const session = readSessionCookie(req);
if (!session) {
return req.nextUrl.pathname.startsWith("/api/")
? NextResponse.json({ error: "AUTH_REQUIRED" }, { status: 401 })
: NextResponse.redirect(new URL("/login", req.url));
}
return NextResponse.next();
}
In front of it is Cloud Armor (OWASP CRS 3.3 + adaptive DDoS protection + rate limiting). What matters here is the operational manner: run the WAF in the staging environment with "full block enabled." OAuth callbacks and auth cookies are easily false-flagged by the WAF's regex rules. Before shipping to production, hit all rules with real traffic in stg and crush the false positives before promoting. It structurally prevents the accident of "suddenly enabling it in production and the auth flow dying" via the operational flow.
9. Conclusion: an auth hub is not a "feature" but a "contract"
The philosophy running through this auth hub was to consolidate identity responsibility into a single point and hand the tools only the minimum contract.
- All a tool knows is "how to verify a short-lived JWT addressed to itself" and "how to receive back-channel events."
- Login, logout, permission changes, PII, and audit are all closed within the BFF.
- The strength of security is decided by the hub's strength, not the weakest tool.
What's good about this structure is that the cost of adding a tool drops. When adding a new AI tool, authentication is just "register the tool with the BFF, verify the short-lived JWT, and receive the back channel." It can be deployed laterally as a platform. In fact, on top of this hub, I've loaded tools of differing nature — speech synthesis, telop typo detection, generative-AI review — and integrated them as a single SSO experience.
What an enterprise asks of an external developer, taken to the limit, is "can I leave it to them without an accident happening." Making PKCE mandatory, making tokens short-lived, not putting tokens in the URL, reliably propagating logout, encrypting PII, leaving audit logs — the accumulation of each of these judgments is the answer to that question.
The code in this article is all anonymized and reconstructed, but the design judgments are from the real product. For consultations on the same kind of "internal platform bundling multiple tools" or "enterprise-grade authentication foundation," please reach out via the services page or contact.