# LLM structured output built with Pydantic: implementing JSON Schema generation, validation, and a self-healing loop with the raw API

> Faithful to the Pydantic v2 official docs, with real code it explains provider-independent LLM structured output: LLM tool-schema generation with model_json_schema, designing to guide the model with Field(description/examples), validation with model_validate_json, a re-prompting self-healing loop using ValidationError.errors, and partial validation via TypeAdapter's experimental_allow_partial.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: Python, Pydantic, LLM, 構造化出力, 型安全, バリデーション
- URL: https://tomodahinata.com/en/blog/pydantic-llm-structured-output-json-schema-validation-guide
- Category: Pydantic & type-safe validation
- Pillar guide: https://tomodahinata.com/en/blog/pydantic-v2-production-validation-type-safety

## Key points

- A Pydantic model plays two roles — 'schema generation (sending)' and 'response validation (receiving)' — with one source of truth. The core of LLM structured output is in this round trip.
- Generate the JSON schema of a tool/function definition with model_json_schema(), and raise the LLM's extraction accuracy with Field(description=...) and examples. The schema's descriptions are themselves the guidance.
- Validate the response with model_validate_json. strict=True is rigorous, but LLMs tend to return numbers as strings and retries increase, so the default lax is realistic at the boundary.
- Recover structural breakage with a self-healing loop that formats ValidationError.errors(include_url=False) and feeds it back to the LLM. Always set a retry cap.
- Streaming partial validation is limited to TypeAdapter's experimental_allow_partial. It doesn't work on BaseModel and doesn't nest, so wrap with list[TypedDict].

---

## **Introduction: an LLM's output is also "unvalidated external input"**

Extract the amount and line items from an invoice image. Classify an inquiry email into "urgency, category, summary." Structure ToDos from meeting minutes — much of the practical application of LLMs comes down to **"extracting fixed-shape data from free text."** And here a problem familiar to backend engineers returns. **An LLM's output is merely unvalidated external input.**

In the [PydanticAI practical guide](/blog/pydantic-ai-agent-framework-production-guide), I handled how to solve this problem with a framework. This article handles the **one layer below** — implementing structured output **with just Anthropic / OpenAI's raw API and Pydantic**, without relying on a framework. Why should you know the raw API? Because you want to embed it into an existing codebase with minimal dependency, finely control provider-specific features (prompt caching, etc.), or fully grasp "what's happening" — for such practical demands, understanding the inside of the abstraction works.

Pydantic's role boils down to a single principle — **"a Pydantic model is the sender of the schema and the validator of the response."** From one `BaseModel`, ① generate the JSON Schema passed to the LLM (sending), and ② validate the returned JSON (receiving). Implement this round trip faithful to the [official documentation](https://pydantic.dev/docs/validation/latest/concepts/json_schema/), and one notch more understandably.

> 💡 **For the TypeScript crowd**: implementing the same philosophy with Zod is handled in [reliability design of structured output](/blog/structured-output-reliability-constrained-decoding-semantic-validation) and [the discipline of TypeScript type safety](/blog/typescript-type-safety-discipline-zod-nevererror-no-any). This article is the Python / Pydantic version of those.

---

## **1. One model plays two roles**

First, declare the shape of the data you want to extract with `BaseModel`. This becomes the **Single Source of Truth.**

```python
from pydantic import BaseModel, Field


class Invoice(BaseModel):
    vendor_name: str = Field(description="請求元の会社名")
    total_amount: int = Field(description="税込の合計金額（円、整数）", ge=0)
    due_date: str = Field(description="支払期日。YYYY-MM-DD 形式")
    line_items: list[str] = Field(description="明細の品目名リスト")
```

From this one class, you can generate **① the schema for sending.**

```python
schema = Invoice.model_json_schema()
# {
#   "type": "object",
#   "properties": {
#     "vendor_name": {"type": "string", "description": "請求元の会社名"},
#     "total_amount": {"type": "integer", "minimum": 0, "description": "..."},
#     ...
#   },
#   "required": ["vendor_name", "total_amount", "due_date", "line_items"]
# }
```

And with the same class, you can also do **② the validation for receiving.**

```python
raw = '{"vendor_name":"Acme","total_amount":50000,"due_date":"2026-07-31","line_items":["設計費"]}'
invoice = Invoice.model_validate_json(raw)  # 検証して型付きオブジェクトに
```

**Why is this superior?**
The most common bug in LLM integration is **a divergence between "the schema told to the LLM" and "the type the code expects."** Manage the schema as a hand-written dict and write validation in a separate function, and you'll fix one and forget the other — a typical accident born of a DRY violation. Make the Pydantic model the source of truth, and both schema and validation are **derived from the same definition**, so they can't structurally diverge. Add one field and the sending schema and the receiving validation **update at the same time.**

---

## **2. Guide the LLM with the schema: `description` and `examples` are critical**

The accuracy with which the LLM returns data conforming to the schema is decided by **the quality of the descriptions embedded in the schema.** `Field(description=...)` is not mere documentation but **the extraction instruction the LLM reads itself.**

```python
from typing import Annotated, Literal
from pydantic import BaseModel, Field


class SupportTicket(BaseModel):
    """ユーザーからの問い合わせを構造化したもの。"""  # docstring はスキーマの説明になる

    category: Literal["bug", "billing", "feature_request", "other"] = Field(
        description="問い合わせの分類。判断に迷う場合は other を選ぶ。"
    )
    urgency: int = Field(
        description="緊急度を1（低）〜5（高）で。サービス停止に言及があれば5。",
        ge=1, le=5,
    )
    summary: str = Field(
        description="問い合わせ内容の日本語1文要約。",
        examples=["決済画面でエラーが出てログインできない"],
    )
```

As you can confirm in the official documentation, `description` is **reflected as-is** into the generated JSON Schema, and `examples` are likewise put on the schema.

```python
SupportTicket.model_json_schema()
# urgency → {"type": "integer", "minimum": 1, "maximum": 5, "description": "緊急度を..."}
# summary → {"type": "string", "description": "...", "examples": ["決済画面で..."]}
```

Using `Literal`, it's expressed as an **enum** in the schema, structurally narrowing the LLM's output candidates. `examples` works as **few-shot** examples for the LLM, suppressing variance in the output format.

> ⚠️ **The pitfall of `by_alias` and `$ref`**: `model_json_schema()` defaults to **`by_alias=True`.** That is, a field with `Field(alias=...)` has its schema key become the **alias.** The LLM returns with that alias, so the validation side must be consistent. Also, nested models are expressed with `$defs` + `$ref` in the schema, but **some LLM providers' "strict structured-output mode" dislikes `$ref`.** In that case, you need preprocessing to flatten the schema (Pydantic doesn't go as far as flattening). To make it OpenAPI-compatible, specify `ref_template="#/components/schemas/{model}"`.

---

## **3. Pass it to the provider: Anthropic's tool use as an example**

The surest way to pass the generated schema to the LLM is to use it as **the input schema of a tool (function calling).** With the Anthropic Messages API, define a tool and force its call.

```python
import anthropic
from pydantic import BaseModel, Field


class Invoice(BaseModel):
    vendor_name: str = Field(description="請求元の会社名")
    total_amount: int = Field(description="税込の合計金額（円）", ge=0)


client = anthropic.Anthropic()  # API キーは環境変数から（ハードコードしない）

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[
        {
            "name": "save_invoice",
            "description": "抽出した請求書データを保存する。",
            "input_schema": Invoice.model_json_schema(),  # ← ここが要
        }
    ],
    tool_choice={"type": "tool", "name": "save_invoice"},  # 必ずこのツールを呼ばせる
    messages=[{"role": "user", "content": "請求書テキスト: ..."}],
)

# ツール呼び出しブロックから input（dict）を取り出して検証する
tool_use = next(b for b in message.content if b.type == "tool_use")
invoice = Invoice.model_validate(tool_use.input)  # ← 検証して初めて信頼する
print(invoice.total_amount)  # int として保証される
```

Two points. **Pass `model_json_schema()` to `input_schema`**, and **force that tool's call with `tool_choice`.** This makes the model return "structured data along the schema, not free text." `tool_use.input` is a dict so validate with `model_validate` (use `model_validate_json` for a raw JSON string, see chapter 1).

> 💡 **The idea is the same with OpenAI**: with OpenAI's Structured Outputs (`response_format`'s JSON Schema), the round trip of passing the schema generated with `Model.model_json_schema()` and validating the returned JSON with `model_validate_json` is unchanged. **The pattern is invariant even when the provider changes** — that's the value of understanding the raw API. For Anthropic / Claude API details, see the [Claude API practical guide](/blog/claude-api-ai-sdk-v6-production-ai-features).

---

## **4. Validate the response: when to use `strict`**

For validating the returned data, use `model_validate` (from a dict) or `model_validate_json` (from a JSON string). As touched on in chapter 1, for a JSON string `model_validate_json` does **parsing and validation in one pass**, which is efficient.

Here, one practical judgment. **Whether to use `strict=True`.**

```python
# lax（既定）：LLM が "50000" と文字列で返しても int 50000 に変換してくれる
Invoice.model_validate_json(raw)

# strict：型の完全一致を要求。"50000"（文字列）は拒否される
Invoice.model_validate_json(raw, strict=True)
```

> ⚠️ **Against an LLM, `strict` increases retries**: even if you specify `integer` in the schema, an LLM often returns a number as a **string** (`"50000"`). Set `strict=True` at the boundary and such outputs all become validation errors, retries fire frequently, and cost and latency balloon. **In validating LLM output, leaving it to type coercion in the default lax mode is realistic.** Pinpoint-strictify only specific fields where you "absolutely don't want to accept a string as a number" with `Field(strict=True)` — this division is the landing point (for details on strict and type coercion, see chapter 4 of the [Pydantic v2 practical guide](/blog/pydantic-v2-production-validation-type-safety)).

---

## **5. Self-healing loop: feed validation errors back to the LLM**

When validation fails, just throwing an exception and ending is immature for "LLM integration." If you **convert the detailed error information `ValidationError` holds into the next prompt to the LLM**, the model can fix it itself. This is the heart of the self-healing loop.

`ValidationError.errors()` returns, in a structured way, what went wrong and how.

```python
from pydantic import ValidationError

try:
    Invoice.model_validate({"vendor_name": "Acme", "total_amount": "とても高い"})
except ValidationError as e:
    for err in e.errors(include_url=False):
        print(err["loc"], err["type"], err["msg"])
    # ('total_amount',) int_parsing  Input should be a valid integer, ...
    # ('due_date',)     missing      Field required
```

Each error has `type` (`int_parsing` / `missing`, etc.), `loc` (location), `msg` (human-readable explanation), and `input` (the actual value that came). Make this **the feedback to the LLM as-is.**

```python
import json
from pydantic import BaseModel, ValidationError


def extract_with_retry(client, prompt: str, model: type[BaseModel], max_retries: int = 2):
    messages = [{"role": "user", "content": prompt}]
    for attempt in range(max_retries + 1):
        raw = call_llm(client, messages, model)  # ツール経由で JSON を得る（第3章）
        try:
            return model.model_validate_json(raw)
        except ValidationError as e:
            if attempt == max_retries:
                raise  # 上限到達。これ以上は粘らない
            # url を落としてトークンを節約しつつ、エラーを構造化して差し戻す
            feedback = e.errors(include_url=False, include_input=True)
            messages.append({"role": "assistant", "content": raw})
            messages.append({
                "role": "user",
                "content": f"出力に検証エラーがありました。修正して再出力してください:\n"
                           f"{json.dumps(feedback, ensure_ascii=False, default=str)}",
            })
    raise RuntimeError("unreachable")
```

**Why is this superior?**
`ValidationError` returns **all errors found in one validation together** (an exception doesn't fly per field). So in one feedback you can convey "the amount is invalid and the due date is missing" at once, and the LLM has a high chance of fixing it in one shot. With `include_url=False`, drop the `errors.pydantic.dev/...` URL attached to each error and save on feedback tokens too. **Pydantic's error structure becomes the teaching signal to the LLM as-is** — this is the design that connects syntactic validation to re-prompting.

> ⚠️ **A retry cap is essential**: the self-healing loop is powerful, but without a cap, cost and latency become unbounded. Always set `max_retries`, and if it still doesn't fix, **review the design of the schema or `description`** (chapter 2) is the proper approach. A retry is insurance to absorb "occasional fluctuations," not a routine drug for design flaws.

---

## **6. Streaming partial validation: `TypeAdapter`-only `experimental_allow_partial`**

In a chat UI, there are scenes where you want to incrementally validate "incomplete JSON mid-generation." Pydantic has an experimental **partial validation** feature, **returning only the validatable range even from JSON cut off midway.**

```python
from typing import TypedDict
from pydantic import TypeAdapter


class Item(TypedDict):
    a: int
    b: float


adapter = TypeAdapter(list[Item])

# 途中で切れた JSON（"b" の値が未到達）でも、完成した要素までを返す
adapter.validate_json('[{"a": 1, "b": 2.0}, {"a": 1', experimental_allow_partial=True)
# → [{'a': 1, 'b': 2.0}]  ← 不完全な2件目は捨てられる
```

But this feature has **serious constraints.** As the official explicitly states:

- **`experimental_allow_partial` is limited to `TypeAdapter` methods.** It **can't be used** in `BaseModel.model_validate_json` (*"You can only pass `experimental_allow_partial` to TypeAdapter methods"*).
- Supported types are `list` / `set` / `dict` / `TypedDict` (non-required fields), etc. **It doesn't propagate to a nested collection that passes through a `BaseModel`.**
- *"It's an experimental feature and should be considered a proof of concept."* Errors at the end of the input are **all ignored.**

> ⚠️ **The crux of design**: if you do streaming partial validation, **wrap the output not in a `BaseModel` but in `TypeAdapter(list[YourTypedDict])`.** `BaseModel.model_validate_json(..., experimental_allow_partial=True)` **doesn't work** — this is the point many blog articles get wrong. Note that if you use PydanticAI, streaming validation of structured output (`stream_output`) is provided **on the framework side** (chapter 6 of the [PydanticAI practical guide](/blog/pydantic-ai-agent-framework-production-guide)). Building partial validation into the raw API costs accordingly, so **if you need that far, considering PydanticAI** is wise.

---

## **7. How much to build yourself and where to leave it to a framework**

We've looked at the raw-API implementation so far, but in reality there are 3 stages of options. **Choosing by requirements** is the right answer.

| Approach | What it does | Suited scene |
| --- | --- | --- |
| **Raw API + Pydantic (this article)** | build schema generation, validation, retry yourself | minimal dependency, fine control, embedding into existing code |
| **`instructor` (third-party)** | just pass `response_model=Model` for validation + auto-retry | quick validated extraction. Multi-provider support |
| **PydanticAI** | agents, tools, DI, observability, durable execution | full-fledged agents / long-running workflows |

`instructor` is a library that thinly wraps this article's pattern (model → schema → validation → retry) (※ not Pydantic-official, third-party).

```python
# instructor を使うと、本記事の往復が数行に圧縮される（非公式ライブラリ）
import instructor

client = instructor.from_provider("anthropic/claude-sonnet-4-6")
invoice = client.create(response_model=Invoice, messages=[{"role": "user", "content": "..."}])
# 内部で JSON Schema 生成・検証・失敗時の自動リトライを行ってくれる
```

**The criterion is simple.** For one-shot extraction/classification, the raw API + Pydantic or `instructor` is enough. If tool chaining, state, human approval, or long-running execution is involved, PydanticAI. The purpose of **turning a "smart guess" into a "part that goes to production"** is the same; only the weight of the means to achieve it differs.

---

## **Conclusion: take LLM output inside the type system**

The key to making LLM structured output robust isn't special magic. **Acknowledge that "an LLM's output is also unvalidated external input" and set the Pydantic model as the single source of truth for schema and validation** — it boils down to this boundary design. Restating the key points of this article.

1. **One `BaseModel` plays two roles**: send with `model_json_schema()`, receive with `model_validate_json()`.
2. **Guide the LLM with `Field(description=...)`, `examples`, and `Literal`.** The descriptions directly decide extraction accuracy (mind the handling of `by_alias` and `$ref`).
3. **Trust the response only after validating it.** Since `strict` increases retries against an LLM, the default lax is realistic at the boundary.
4. **Convert `ValidationError.errors(include_url=False)` into a re-prompt** and build a self-healing loop. Always cap retries.
5. **Partial validation is limited to `TypeAdapter` + `experimental_allow_partial`.** It doesn't work on `BaseModel`. If you genuinely need it, PydanticAI.
6. **Use yourself, `instructor`, and PydanticAI by the weight of requirements.**

What separates "a working LLM feature" from "an LLM feature trustworthy in production" is whether you can **take the output inside the type system.** Pydantic is the gatekeeper standing at that intake.

As official primary sources, I recommend re-reading the following from this article's viewpoint.

- [JSON Schema](https://pydantic.dev/docs/validation/latest/concepts/json_schema/)
- [Serialization / Validation](https://pydantic.dev/docs/validation/latest/concepts/serialization/)
- [Partial Validation (experimental)](https://pydantic.dev/docs/validation/latest/concepts/experimental/)
- [Validation Errors](https://pydantic.dev/docs/validation/latest/errors/validation_errors/)

---

### **Consulting on LLM structured-extraction pipelines**

The author has operated long-running AI jobs and structured extraction at **production quality** on an internal AI platform for a major domestic broadcaster. Stably extracting validated structured data from non-standard data like invoices, contracts, inquiries, and meeting minutes — that reliability is decided by the accumulation of schema design, validation boundaries, self-healing, and observability. I implement **LLM structured-extraction / classification / RAG pipelines** using Pydantic / PydanticAI / Claude API, quickly and at high quality with generative AI. Feel free to reach out.
