# Claude API Production Implementation Guide: Designing Prompt Caching, Tool Use, Structured Output, and Agents

> The definitive guide to implementing production-quality AI features with the Claude API and Vercel AI SDK v6. Structured output, tool use, streaming, agents, prompt caching, cost optimization, observability, and security explained in real code compliant with the official documentation. Also covering model specification via the AI Gateway and fallback.

- Published: 2026-06-22
- Author: 友田 陽大
- Tags: Claude, Anthropic, AI SDK, TypeScript, LLM, エージェント, 構造化出力, ストリーミング, コスト最適化
- URL: https://tomodahinata.com/en/blog/claude-api-ai-sdk-v6-production-ai-features
- Category: Generative AI, LLMs & RAG
- Pillar guide: https://tomodahinata.com/en/blog/vercel-ai-sdk-production-llm-apps-streaming-tools-rag

## Key points

- Production quality is an accumulation of 5 unglamorous principles. Bind with a schema, minimize tool permissions, cancelable streaming, pre-measure + cache, verification gate
- All Opus is wrong and all Haiku is wrong. The default is Sonnet, only the hard steps to Opus, classification/extraction to Haiku
- Bind structured output type-safely with the AI SDK's generateObject + Zod, and validate the AI's output too at the boundary as external input
- Always cap an agent's max step count with stopWhen: stepCountIs(n), and stop irreversible side effects with an approval gate
- Prompt caching is an exact prefix match. Place fixed content at the front and variable content at the back, and measure the effect with cacheReadInputTokens

---

"I called an LLM in a PoC and it worked. But the moment I shipped to production, the JSON broke and it fell over."

"I assembled an agent that uses tools, and it melted tokens in an infinite loop."

"I made a streaming UI, but it can't be interrupted or canceled, and the user stares at a frozen screen."

"I can't read the cost. I first notice 'it's expensive' when the month-end bill comes."

Both companies considering outsourcing AI-feature development and developers implementing it themselves stop here. "Just call an LLM" finishes in a day. **The work of raising it to production quality is the body itself.**

I am a core engineer of a B2B SaaS that won the Minister of Economy, Trade and Industry Award, and I have built an in-house AI platform for a major domestic broadcaster (5 AI services, an auth hub, speech synthesis, OCR × speech-recognition typo detection, generative-AI review assistance). What worked there was not a flashy model but the unglamorous production techniques of **structured output, safe tool design, cancelable streaming, prompt caching, and a verification gate.**

This article shows them in TypeScript real code, **faithful to Anthropic official (docs.anthropic.com / platform.claude.com) and Vercel AI SDK v6 official (ai-sdk.dev / vercel.com/docs).** I state the referenced official URL at the end of each section.

> **The map of this article**
> 1. Model selection (Opus 4.8 / Sonnet 4.6 / Haiku 4.5 / Fable 5)
> 2. Basics: AI SDK v6's `generateText` / `streamText` and model specification with the AI Gateway
> 3. Structured output: type-safe with `generateObject` / `streamObject` + Zod
> 4. Tool use: `tool()` definition and turning it into a multi-step agent
> 5. Streaming UX: Route Handler → `useChat`, cancel/interrupt
> 6. Cost / performance optimization: prompt caching, routing, measurement
> 7. Reliability / observability: retries, guardrails, hallucination suppression
> 8. Security: API-key concealment, prompt injection, PII

---

## First, Grasp This: "One Person × Generative AI" Is Fast and Cheap Because It Doesn't Skip Production Techniques

Make generative AI (Claude Code) an accelerator and even one person can implement enterprise AI features fast and cheap. But that's not because "you can write code fast." It's because **you automate the verification gates and uphold the production-quality patterns.**

- Always **bind the output with a schema** and validate at the boundary (don't flow broken JSON downstream)
- **Minimize permissions** on tools and isolate side effects (make the blast radius of a runaway small)
- Make streaming **cancelable** (stop the user's departure)
- **Pre-measure** the cost and cut it with caching (don't be shocked by the bill)
- Pass the output through a **human or another-model verification gate** (don't put hallucinations into production)

Below, let me land these 5 principles into concrete code.

---

## Model Selection: Intelligence vs Cost vs Latency

The first design judgment is model selection. Based on Anthropic's official model list, let me table the use distinctions as of June 2026. **"All Opus" and "all Haiku" are both wrong.** The optimal point differs per scene.

| Model | Model ID | Context | Assumed use | Decision axis |
|--------|---------|------------|---------|--------|
| **Claude Opus 4.8** | `claude-opus-4-8` | 1M | Long-running autonomous agents, complex code generation, high-difficulty knowledge work | Highest intelligence. The core of hard tasks |
| **Claude Sonnet 4.6** | `claude-sonnet-4-6` | 1M | The default for most apps. RAG, summarization, dialogue, tool use | The best balance of speed and intelligence |
| **Claude Haiku 4.5** | `claude-haiku-4-5` | 200K | Classification, extraction, routing, preprocessing, large batches | Fastest and cheapest. Simple tasks |
| **Claude Fable 5** | `claude-fable-5` | 1M | The hardest reasoning, ultra-long-term agents | The highest-performance. The fee exceeds Opus |

The way to think about allocation in practice:

- **The default is Sonnet 4.6.** When in doubt, here. Many B2B features (RAG answers, in-house search, shaping) get sufficient quality with Sonnet.
- **Only the hard steps to Opus 4.8.** Bulk implementation from a spec, deep refactoring, a long-running autonomous loop. It costs, but invest where the cost of fixing an error is higher.
- **The front-stage / large processing to Haiku 4.5.** Simple tasks like "which category is this inquiry" or "extract the date from this document" go to the cheap, fast Haiku. Haiku also works for an agent's **sub-agents.**
- **Fable 5 only when you explicitly choose it.** The tokenizer changed, so the same content consumes about 30% more tokens, and the fee exceeds Opus. The `thinking` parameter's spec also differs (always on), so think of it as dedicated to "the hardest problems." It's not an object to choose with "the latest, just in case."

> Tip: intelligence tuning can be done not only by **swapping the model but with the `effort` parameter** (`low`/`medium`/`high`/`xhigh`/`max`). With Opus 4.8, `high` is the default, and `xhigh` is recommended for coding/agents. Via the AI SDK, you can pass it as a model-specific option in `providerOptions.anthropic`.

> Source: [Models overview](https://platform.claude.com/docs/en/about-claude/models/overview) / [Effort](https://platform.claude.com/docs/en/build-with-claude/effort)

---

## Basics: Call a Model with AI SDK v6 and the AI Gateway

The foundation of implementation is the Vercel AI SDK v6 (the `ai` package). **How you specify the model governs the production design**, so first get this right.

### Recommended: The AI Gateway's `"provider/model"` String

Go through the **Vercel AI Gateway** (GA'd in August 2025) and you can specify a model with just a `"provider/model"` string like `"anthropic/claude-opus-4.8"`. No import of a provider-specific package (`@ai-sdk/anthropic`) is needed.

Reasons to use the AI Gateway (the official advantages):

- **Multiple providers with 1 key.** Access Anthropic / Bedrock / Vertex, etc. with a single `AI_GATEWAY_API_KEY`.
- **Automatic fallback.** Even if one provider goes down, it auto-retries on another path.
- **No markup on tokens** (no markup). The same unit price as direct use.
- **Spend visibility.** Monitor cost across providers.

```ts
// app/api/summary/route.ts
import { generateText } from "ai";

export async function POST(req: Request) {
  const { article } = (await req.json()) as { article: string };

  const { text, usage, finishReason } = await generateText({
    // "provider/model" 文字列。@ai-sdk/anthropic の import は不要
    model: "anthropic/claude-sonnet-4.6",
    system:
      "あなたは技術記事の要約者です。事実のみを、箇条書き3点で日本語要約してください。",
    prompt: `次の記事を要約してください:\n\n${article}`,
  });

  return Response.json({ text, usage, finishReason });
}
```

Authentication is just reading the environment variable `AI_GATEWAY_API_KEY` (an OIDC token also works on a Vercel deploy). **Not writing the API key in code** is the big premise (details in the security chapter later).

> Supplement: the AI Gateway's model slug is officially notated with dot separators (`anthropic/claude-opus-4.8`). On the other hand, when hitting the Anthropic API **directly** (with `@ai-sdk/anthropic` or the Anthropic SDK), the model ID is the hyphen-separated primary notation (`claude-opus-4-8` / `claude-sonnet-4-6` / `claude-haiku-4-5`). Note the notation changes by path.

### Make Streaming the Default

For requests with long inputs, long outputs, or high `max_tokens`, **make streaming the default** to avoid HTTP timeouts. The server side is `streamText`:

```ts
import { streamText } from "ai";

const result = streamText({
  model: "anthropic/claude-sonnet-4.6",
  system: "簡潔で正確な日本語で答えてください。",
  prompt: "Reactのレンダリング最適化を3行で説明して。",
});

for await (const textPart of result.textStream) {
  process.stdout.write(textPart);
}
```

### When You Want to Explicitly Use a Specific Provider

Only when you have explicit requirements like "I want to hit Anthropic directly" or "I want fine prompt-caching specification" do you use the specific package `@ai-sdk/anthropic`.

```ts
import { createAnthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";

const anthropic = createAnthropic({
  apiKey: process.env.ANTHROPIC_API_KEY, // 直叩きの場合のみ
});

const { text } = await generateText({
  model: anthropic("claude-sonnet-4-6"), // 一次表記（ハイフン区切り）
  prompt: "...",
});
```

> Source: [AI Gateway](https://vercel.com/docs/ai-gateway) / [AI Gateway: Text Generation](https://vercel.com/docs/ai-gateway/getting-started/text) / [AI SDK Core: generateText/streamText](https://ai-sdk.dev/docs/ai-sdk-core/generating-text)

---

## Structured Output: Don't Flow Broken JSON to Production

What works first in a production AI feature is **structured output.** Just asking an LLM to "return it as JSON" mixes in explanatory text before/after, or drops fields. AI SDK v6 binds the output with a schema (Zod), and **the SDK does the parsing and validation too.**

### `generateObject`: Extraction, Classification, Shaping

An example of extracting structured data from an inquiry email. Pass Zod to `schema` and the result's `object` is returned typed.

```ts
import { generateObject } from "ai";
import { z } from "zod";

const LeadSchema = z.object({
  name: z.string().describe("問い合わせ者の氏名"),
  email: z.string().describe("連絡先メールアドレス"),
  plan: z.enum(["lite", "standard", "enterprise"]).describe("希望プラン"),
  interests: z.array(z.string()).describe("関心のある機能"),
  demoRequested: z.boolean().describe("デモ希望の有無"),
});

const { object, usage } = await generateObject({
  model: "anthropic/claude-haiku-4.5", // 抽出は安価なHaikuで十分
  schema: LeadSchema,
  prompt:
    "次の問い合わせから情報を抽出: 「田中太郎です(tanaka@example.com)。" +
    "Enterpriseを検討中で、API連携とSSOに関心あり。デモ希望です」",
});

// object は LeadSchema 型として保証される
console.log(object.plan); // "enterprise"
```

> Some of Zod's constraints (`min`/`max`/`minLength`, etc.) aren't supported in Claude's schema, but the AI SDK validates them on the client side. Validation at the boundary is the core of the design of **treating the AI's output too as "external input."** Don't trust; always bind with a schema.

### Classification Tasks: Fix the Choices with enum

Classification like "which department to route this inquiry to" is most robust by fixing the choices with `enum`. Send it to Haiku to curb cost.

```ts
const { object } = await generateObject({
  model: "anthropic/claude-haiku-4.5",
  schema: z.object({
    category: z.enum(["技術サポート", "営業", "請求", "その他"]),
    urgency: z.enum(["低", "中", "高"]),
  }),
  prompt: `次の問い合わせを分類: ${inquiry}`,
});
```

### `streamObject`: Partial Display While Generating

When generating a long report or list, with `streamObject` you can receive **partial objects sequentially.** You can produce a "filling-in" experience in the UI.

```ts
import { streamObject } from "ai";
import { z } from "zod";

const { partialObjectStream } = streamObject({
  model: "anthropic/claude-sonnet-4.6",
  schema: z.object({
    title: z.string(),
    sections: z.array(z.object({ heading: z.string(), body: z.string() })),
  }),
  prompt: "新製品の発表ブログ記事の構成を生成して。",
});

for await (const partial of partialObjectStream) {
  // partial は生成途中の部分オブジェクト（型は Deep Partial）
  render(partial);
}
```

> Source: [AI SDK Core: Generating Structured Data](https://ai-sdk.dev/docs/ai-sdk-core/generating-structured-data)

---

## Tool Use: Turning It into an Agent and the "Don't Let It Run Away" Design

Giving an LLM access to external APIs, DBs, and computation is **tool use (tool calling).** In AI SDK v6, you define it with the `tool()` helper, bind input with Zod, and write the actual processing in `execute`.

### Tool Definition

```ts
import { tool } from "ai";
import { z } from "zod";

const getOrderStatus = tool({
  description:
    "注文IDから配送ステータスを取得する。ユーザーが注文状況を尋ねたときに使う。",
  inputSchema: z.object({
    orderId: z.string().describe("注文ID（例: ORD-12345）"),
  }),
  execute: async ({ orderId }) => {
    // 権限最小化: 読み取り専用の照会APIだけを呼ぶ
    const status = await db.orders.findStatus(orderId);
    return { orderId, status }; // 戻り値はモデルのコンテキストに入る
  },
});
```

The knack of a tool's `description` is to **make "when to call it" explicit.** Not just "what it does" but writing "when the user asks for ○○" raises recent Claude's call-judgment accuracy.

### Multi-Step = Turning It into an Agent (`stopWhen`)

What enables the iteration of "look at a tool's result and call the next tool" is `stopWhen`. **Always cap the max step count** with `stepCountIs(n)`. This is the most important guard preventing a runaway (infinite loop, token melt).

```ts
import { generateText, stepCountIs, tool } from "ai";

const { text, steps } = await generateText({
  model: "anthropic/claude-sonnet-4.6",
  tools: { getOrderStatus, searchKnowledgeBase },
  // 上限を必ず設ける。無限ループとコスト爆発の最初の防波堤
  stopWhen: stepCountIs(5),
  system:
    "あなたはカスタマーサポートです。ツールで事実を確認してから答えてください。",
  prompt: "注文ORD-12345はいつ届きますか？",
});
```

### Safe Tool Design (Permission Minimization, Side-Effect Isolation)

A tool's execution is requested by the LLM, and **your code actually runs it.** That's exactly why a tool's shape decides safety.

- **Separate read and write.** `searchKnowledgeBase` (read) may auto-execute, but interpose a human approval for `sendEmail` / `issueRefund` (irreversible side effects).
- **Gate irreversible operations.** A robust design is, when the step boundary is reached in the AI SDK's `stopWhen`, to inspect `steps`, confirm "no side-effect tool was called," and continue after showing an approval UI.
- **Re-validate the input inside `execute`.** Zod can bind the type, but **authorization** like "is this `orderId` the user's" you check separately inside `execute`. Don't trust the value the LLM produced and pull the DB.

```ts
const issueRefund = tool({
  description: "返金を実行する。金額と注文IDが必要。",
  inputSchema: z.object({
    orderId: z.string(),
    amount: z.number(),
  }),
  execute: async ({ orderId, amount }, { abortSignal }) => {
    // 認可: このツールを起動したユーザーが当該注文の持ち主か再確認
    await assertOwnership(currentUserId, orderId);
    // 副作用は隔離されたサービス層経由でのみ実行
    return await refundService.execute(orderId, amount, { abortSignal });
  },
});
```

> When you want to switch the model or `toolChoice` per step, use `prepareStep` (e.g. force tool start with `toolChoice: 'required'` only on the first move).

> Source: [AI SDK Core: Tools and Tool Calling](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling) / [Tool use overview](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview)

---

## Streaming UX: Cancelable, Accessible Sequential Display

What erases the "being made to wait" feeling is the streaming UX. Combine the server's Route Handler and the client's `useChat`.

### Server: Route Handler

Convert the UI messages from the client to model messages with `convertToModelMessages`, and return the `streamText` result with `toUIMessageStreamResponse()`.

```ts
// app/api/chat/route.ts
import { convertToModelMessages, streamText, type UIMessage } from "ai";

export async function POST(req: Request) {
  const { messages } = (await req.json()) as { messages: UIMessage[] };

  const result = streamText({
    model: "anthropic/claude-sonnet-4.6",
    system: "あなたは親切なアシスタントです。簡潔に答えてください。",
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}
```

### Client: `useChat` (Including Interrupt and Cancel)

`@ai-sdk/react`'s `useChat` manages the messages, input, state (`status`), and interrupt (`stop`). **Always tie `status` and `stop` to the UI** — this is the dividing point of production quality.

```tsx
"use client";

import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport } from "ai";
import { useState } from "react";

export function Chat() {
  const [input, setInput] = useState("");
  const { messages, sendMessage, status, stop } = useChat({
    transport: new DefaultChatTransport({ api: "/api/chat" }),
  });

  const isBusy = status === "submitted" || status === "streaming";

  return (
    <div>
      {/* aria-live で、追記される応答をスクリーンリーダーに伝える */}
      <div aria-live="polite">
        {messages.map((m) => (
          <article key={m.id}>
            <strong>{m.role === "user" ? "あなた" : "AI"}</strong>
            {m.parts.map((part, i) =>
              part.type === "text" ? <span key={i}>{part.text}</span> : null,
            )}
          </article>
        ))}
      </div>

      <form
        onSubmit={(e) => {
          e.preventDefault();
          if (!input.trim() || isBusy) return;
          sendMessage({ text: input });
          setInput("");
        }}
      >
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          aria-label="メッセージを入力"
        />
        {isBusy ? (
          // 生成中はキャンセルボタンに切り替える
          <button type="button" onClick={() => stop()}>
            停止
          </button>
        ) : (
          <button type="submit">送信</button>
        )}
      </form>
    </div>
  );
}
```

The accessibility points:

- Attach `aria-live="polite"` to the response region and convey the sequential appending to the screen reader.
- During generation (`streaming`), **switch the send button to a stop button** and make it interruptible with `stop()`.
- Prepare a `status === "error"` branch and surface a re-send path.

> Source: [AI SDK UI: Chatbot](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot) / [Streaming](https://platform.claude.com/docs/en/build-with-claude/streaming)

---

## Cost / Performance Optimization: Prompt Caching and Routing

Let me solve "I can't read the cost." What works most is **prompt caching** and **model routing/fallback.**

### Prompt Caching: Reusing the Prefix

Prompt caching is **an exact prefix match.** Because it renders in the order `tools` → `system` → `messages`, **place stable content (a fixed system prompt, knowledge) at the front, and variable content (the user's question) at the back.** Insert `Date.now()` or a request ID at the front and everything after it is cache-invalidated.

To use Anthropic's caching in the AI SDK, attach `providerOptions.anthropic.cacheControl` to the target content.

```ts
import { generateText } from "ai";

const { text, providerMetadata } = await generateText({
  model: "anthropic/claude-sonnet-4.6",
  messages: [
    {
      role: "system",
      content: LARGE_KNOWLEDGE_BASE, // 数千トークンの固定コンテキスト
      providerOptions: {
        anthropic: { cacheControl: { type: "ephemeral" } }, // キャッシュ境界
      },
    },
    { role: "user", content: userQuestion }, // 変動部分は後ろ。マーカー無し
  ],
});

// キャッシュの効きを必ず計測する
console.log(providerMetadata?.anthropic);
// cacheCreationInputTokens（書き込み）/ cacheReadInputTokens（読み出し）
```

Via the AI Gateway, you can also have it auto-apply a per-provider caching strategy with `providerOptions.gateway.caching: 'auto'` (handy for providers like Anthropic that need an explicit cache marker).

```ts
const result = streamText({
  model: "anthropic/claude-sonnet-4.6",
  messages,
  providerOptions: { gateway: { caching: "auto" } },
});
```

The economics guide: a cache read is about 0.1× the base input unit price, and a write about 1.25× (5-minute TTL). **If you use the same prefix 2 or more times, it's almost certainly a win.** If `cacheReadInputTokens` is always 0, suspect a "silent invalidation" like a `datetime.now()` mixing in or a non-deterministic JSON order.

### Routing and Fallback (AI Gateway)

Control both availability and cost with `order` / `only` / `sort` of `providerOptions.gateway`.

```ts
const result = await generateText({
  model: "anthropic/claude-sonnet-4.6",
  prompt,
  providerOptions: {
    gateway: {
      order: ["bedrock", "anthropic"], // Bedrock優先、ダメならAnthropic
      sort: "cost", // コスト最小のプロバイダから試す（'ttft'/'tps'も可）
    },
  },
});
```

### Measuring Token Usage and the Way to Think About Batches

- **Record usage every time.** Emit `usage` / `totalUsage` to logs and aggregate per route, per model. This is the only way to "not be shocked by the bill."
- **Curb unneeded regeneration.** Cache the result for the same input on the app side (within the range determinism allows). Not calling the LLM is the biggest cost reduction.
- **Latency-independent large processing in batch/parallel.** Independent tasks like classification and extraction, throw them in parallel to the cheap Haiku. Anthropic's Message Batches API is asynchronous, at 50% of the standard price.

> Source: [Prompt caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching) / [AI Gateway: Provider Options](https://vercel.com/docs/ai-gateway/models-and-providers/provider-options)

---

## Reliability / Observability: Design on the Premise of Failure

In production, assemble **on the premise that failure happens.** An LLM is probabilistic, external APIs go down, and you hit rate limits too.

### Retries, Timeouts, Rate Limits

- The Anthropic SDK / AI SDK **auto-retries 429/5xx with exponential backoff** (with a default retry count). Leave it to that while separately setting an app-specific cap.
- Pass `abortSignal` to a long-running tool to propagate the upstream timeout/cancel (see the `issueRefund` above).
- A rate limit (429) has a `retry-after` header, which the SDK reads and waits on. On a burst, the realistic solution is to **fall back to Haiku** or interpose queuing.

### Observability: Usage, Latency, Failure Rate

At minimum, emit the following to structured logs. Visualize not "is it working" but **"how much, how fast, and how often it's failing."**

```ts
const started = performance.now();
const { text, usage, finishReason } = await generateText({
  model: "anthropic/claude-sonnet-4.6",
  prompt,
});

logger.info("llm.call", {
  route: "summary",
  model: "anthropic/claude-sonnet-4.6",
  inputTokens: usage.inputTokens,
  outputTokens: usage.outputTokens,
  finishReason, // "stop" / "length" / "tool-calls" など
  latencyMs: Math.round(performance.now() - started),
});
```

The AI Gateway itself also provides cross-provider observability of usage, latency, and spend.

### Guardrails and Hallucination Suppression (Verification Gate)

Don't flow an LLM's output to production as-is. Always interpose a **verification gate.**

1. **Bind with a schema** (the structured output above). If the shape is broken, don't let it go downstream.
2. **Have it fetch facts with tools.** State in the system prompt "don't answer from knowledge, first call `search`" to suppress unfounded assertions.
3. **Verify important output in a separate step.** Separate generation and verification (generation is exhaustive, verification selects in a separate pass). Institutionalize a step where "a human or another model confirms," like code review or a review.

The review-assistance / typo detection I built for the broadcaster was exactly this form of "**generation is AI, the final confirmation is the verification gate.**" Place AI as an accelerator and guarantee quality assurance with human (and another-model) verification. This is the crux of achieving both "fast, cheap" and "safe."

> Source: [Errors / rate limits](https://platform.claude.com/docs/en/api/errors) / [Tool use overview](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview)

---

## Security: API Keys, Prompt Injection, PII

Finally, the security requirements you must always confirm even when considering outsourcing.

### Concealing the API Key

- **Don't write the key in code.** Put `AI_GATEWAY_API_KEY` / `ANTHROPIC_API_KEY` in an environment variable or a secret manager. Never emit it to the repository, the client bundle, or logs.
- **Don't call the LLM directly from the browser.** Always go through a server Route Handler (don't expose the key to the client). OIDC-token authentication is also an option on a Vercel deploy.

### Prompt-Injection Countermeasures

The principle is to **treat externally-sourced text (user input, fetched web pages, tool results) as data, not instructions.**

- **Permission separation**: don't execute a tool-mediated operation on the LLM's judgment alone; protect irreversible side effects with an approval gate + server-side authorization (the `issueRefund` above).
- **Make the trust boundary explicit**: don't conflate the system prompt (the operator's instructions) and user/fetched content (untrusted data). Even if the latter says "ignore the prior instructions," if the side-effect tool is protected by server-side authorization, the blast radius is limited.
- **Validate the output**: make it defense in depth where, even if the injection succeeds, the schema, authorization, and verification gate stop it.

### Handling PII

- Don't leave the inquiry form's PII in logs more than necessary. The log example above too is **just token counts and meta info**, not emitting the body or PII.
- Judge the retention of personal data against regulation (GDPR / Act on the Protection of Personal Information). Don't store secrets in a memory feature, etc.

> Source: [Tool use overview](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview) / [AI Gateway: Authentication](https://vercel.com/docs/ai-gateway/authentication-and-byok/authentication)

---

## FAQ

**Q1. Is the AI Gateway mandatory? Won't the specific package (`@ai-sdk/anthropic`) do?**
It's not mandatory. But v6's default recommendation is the AI-Gateway-mediated `"provider/model"` string specification. You get multiple providers with 1 key, automatic fallback, and spend visibility, with no markup on the token unit price. The practical distinction is to use the specific package **only when you have an explicit requirement** like fine prompt-caching specification.

**Q2. How do I distinguish `generateObject` vs `streamObject` vs `generateText` + `Output`?**
If "a structured result is the main goal," like extraction, classification, or shaping, `generateObject` / `streamObject` is straightforward. On the other hand, if you want structured output within text generation (combined with tool use, etc.), there's the hand of combining `Output.object()` with `generateText` / `streamText`. Both can bind type-safely with a Zod schema.

**Q3. I'm anxious about an agent running away and melting tokens.**
**Always cap the max step count** with `stopWhen: stepCountIs(n)`. In addition, stop irreversible-side-effect tools with an approval gate, and log `usage` to detect abnormal consumption. From Opus 4.7 on, you can also use the API-native **Task Budgets** (conveying the remaining tokens to the model).

**Q4. Should I just make everything Opus 4.8?**
No. The cost and latency don't add up. The optimal allocation is the default Sonnet 4.6, simple tasks (classification, extraction, preprocessing) to Haiku 4.5, only the hard steps to Opus 4.8. You can also adjust intelligence/cost with the `effort` parameter.

**Q5. The output is occasionally factually wrong (hallucination).**
Model selection alone doesn't solve it. Stack the verification gates: ① bind with a schema, ② have it fetch facts with tools ("don't answer from knowledge, call search"), ③ verify important output in a separate step. The division of generation by AI, final confirmation by a human or another model, works.

---

## Summary: Production Quality Is the Accumulation of "Unglamorous Techniques"

The wall of raising "just call an LLM" to production quality is crossed by the accumulation of unglamorous but effective techniques — structured output, safe tool design, cancelable streaming, prompt caching, and a verification gate. Claude API × AI SDK v6 is a combination that lets you implement these straightforwardly along the official patterns.

As a core engineer of a B2B SaaS that won the Minister of Economy, Trade and Industry Award, and as the builder of an in-house AI platform for a major domestic broadcaster (5 AI services, an auth hub, speech synthesis, OCR × speech-recognition typo detection, generative-AI review assistance), I have operated this "generation is AI, quality is the verification gate" design in the field. Because of **one person × generative AI (Claude Code)**, I can deliver AI features fast, cheap, and safe without skipping production techniques.

If you're considering AI-feature development, production-izing from a PoC, or integrating AI into an existing system, feel free to consult me.

[Contact here](/contact)

---

## References (Anthropic / Vercel AI SDK Official)

- [Anthropic — Models overview](https://platform.claude.com/docs/en/about-claude/models/overview)
- [Anthropic — Tool use overview](https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview)
- [Anthropic — Prompt caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
- [Anthropic — Streaming](https://platform.claude.com/docs/en/build-with-claude/streaming)
- [Anthropic — Effort](https://platform.claude.com/docs/en/build-with-claude/effort)
- [Vercel — AI Gateway](https://vercel.com/docs/ai-gateway)
- [Vercel — AI Gateway: Provider Options](https://vercel.com/docs/ai-gateway/models-and-providers/provider-options)
- [AI SDK Core — Generating Text](https://ai-sdk.dev/docs/ai-sdk-core/generating-text)
- [AI SDK Core — Generating Structured Data](https://ai-sdk.dev/docs/ai-sdk-core/generating-structured-data)
- [AI SDK Core — Tools and Tool Calling](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling)
- [AI SDK UI — Chatbot (useChat)](https://ai-sdk.dev/docs/ai-sdk-ui/chatbot)
- [AI SDK Providers — Anthropic](https://ai-sdk.dev/providers/ai-sdk-providers/anthropic)