Introduction: escaping the "string hell" of LLM apps
An application that embeds an LLM becomes, if left alone, string hell. You pass a string called a prompt, a string called free text returns, you fearfully parse it with regex or json.loads, and if the shape is broken it falls at runtime. Tests are settled with "it's probably about right," and what happened in production you grep the logs and guess — this is a reversion to the "world without types" that we spent 20 years discarding in backend design.
PydanticAI is an agent framework that answers this problem with Pydantic's philosophy. The developer is the team that makes Pydantic itself. So the root of the design is consistent — "never trust data coming from outside the system boundary." The LLM's output is also merely unvalidated external input. Then declare the shape with BaseModel, validate at the boundary, and pass it inside as a typed object. Doing for the LLM's response what FastAPI does for an HTTP request — that's PydanticAI.
The latest at the time of writing is PydanticAI 2.0 (released June 23, 2026, Python 3.10+). The API was overhauled in the major version, and it diverges on important points from old tutorials on the net (result_type → output_type, system_prompt → instructions recommended, etc.; described later). This article, faithful to the official documentation, summarizes with the correct 2.0 API.
💡 The consistent theme of this blog: in this portfolio, I hold up "building fast, cheap, and safe with one person × generative AI." PydanticAI is exactly a tool for "using AI while a human holds the verification gate." Validate LLM output with types, separate tools into deterministic code, and track behavior with observability — I handle the design that changes AI from a "smart guess" to "a part that ships in production." The TypeScript-side counterpart is the Vercel AI SDK production guide; for making structured output with the raw API without PydanticAI, see the LLM-structured-output guide built with Pydantic.
1. The minimal agent: run it in 5 lines
First, install. Choose the full version, or the per-provider lightweight version (pydantic-ai-slim).
# フル版(全プロバイダ同梱)
pip install pydantic-ai
# 軽量版+必要なプロバイダだけ(推奨)
pip install "pydantic-ai-slim[anthropic]"
The minimal agent is just this.
from pydantic_ai import Agent
agent = Agent(
"anthropic:claude-sonnet-4-6",
instructions="簡潔に、1文で答えてください。",
)
result = agent.run_sync('"hello world" の語源は?')
print(result.output)
Specify the model in the "provider:model-name" format ("anthropic:claude-sonnet-4-6", "openai:gpt-5.2", "google:gemini-3-flash-preview", etc.). There are three ways to run.
| Method | Use |
|---|---|
agent.run_sync(...) | synchronous execution (scripts, batches) |
await agent.run(...) | asynchronous execution (servers like FastAPI) |
async with agent.run_stream(...) as result: | streaming (chapter 6) |
💡 Use
instructions(notsystem_prompt): in v2,instructionsis recommended. The difference between the two is in the handling of conversation history — when you passmessage_history,system_promptsends the past prompts included in the history together, butinstructionssends only the current agent's instructions. Since it avoids the accident of prompts duplicating in multi-turn, chooseinstructionsunless there's a special reason.
2. Structured output: make the LLM a "typed function" with output_type
This is PydanticAI's core. If you pass a BaseModel to output_type, the LLM's response is automatically validated as that model, and result.output becomes a typed object.
from pydantic import BaseModel
from pydantic_ai import Agent
class CityLocation(BaseModel):
city: str
country: str
agent = Agent("anthropic:claude-sonnet-4-6", output_type=CityLocation)
result = agent.run_sync("2012年のオリンピックはどこで開催された?")
print(result.output) # city='London' country='United Kingdom'
print(result.output.city) # 'London' ← str として型補完が効く
result.output is of type CityLocation. Editor completion works, mypy / Pyright type-check it, and downstream code can be written on the guarantee that "city is always str." The LLM behaves just like a typed pure function.
H3: Output modes — ToolOutput / NativeOutput / PromptedOutput
PydanticAI, by default, takes out structured output using the model's "tool calling (function calling)" feature. This is the most portable method. When you want to allow multiple output types, pass them as a list.
from pydantic import BaseModel
from pydantic_ai import Agent, ToolOutput, NativeOutput
class Fruit(BaseModel):
name: str
color: str
class Vehicle(BaseModel):
name: str
wheels: int
# 既定(ツール経由)。型名でツール名を明示することもできる
agent = Agent(
"anthropic:claude-sonnet-4-6",
output_type=[
ToolOutput(Fruit, name="return_fruit"),
ToolOutput(Vehicle, name="return_vehicle"),
],
)
# モデルがネイティブの構造化出力に対応していれば NativeOutput も使える
native_agent = Agent(
"anthropic:claude-sonnet-4-6",
output_type=NativeOutput([Fruit, Vehicle], name="fruit_or_vehicle"),
)
| Marker | Mechanism | Where to use |
|---|---|---|
ToolOutput (default) | structure via tool calling | portability-focused. The first choice that works on almost all models |
NativeOutput | the model's native structured-output feature | the most certain schema compliance on supported models |
PromptedOutput | instruct JSON in the prompt | a safeguard for models with neither tools nor native |
Why is this superior?
The design of "have it return free text and parse it hard later" is at the mercy of the LLM's whims (extra preambles, code-block fences, slightly different key names) every time. output_type forces the schema on the model side and validates the response with Pydantic, so a broken shape converges into a controlled flow of "validation error → retry." The shape of the output becomes a contract, and that contract remains as-is as the BaseModel's source code — this is the exit from string hell.
3. Tools: hand "what it can do" to the LLM type-safely
The means by which an agent acts on the outside world is a tool. In PydanticAI, just attach a decorator to an ordinary Python function. From the function's type annotations and docstring, the JSON schema passed to the LLM is auto-generated.
import random
from pydantic_ai import Agent, RunContext
agent = Agent(
"anthropic:claude-sonnet-4-6",
deps_type=str, # 第4章で解説。ここではプレイヤー名を注入する
instructions="サイコロゲームの進行役。出目が予想と一致したら勝ち。",
)
@agent.tool_plain
def roll_dice() -> str:
"""6面ダイスを振り、出目を返す。"""
return str(random.randint(1, 6))
@agent.tool
def get_player_name(ctx: RunContext[str]) -> str:
"""プレイヤーの名前を取得する。"""
return ctx.deps
The difference between the two decorators is whether it receives a context.
@agent.tool: takesctx: RunContext[...]as the first argument. Can access dependencies (DB connections, API keys, user information, etc.).@agent.tool_plain: a pure tool that doesn't need a context.
Arguments other than ctx become the tool's input schema as-is. It interprets the docstring format (Google / NumPy / Sphinx) and auto-reflects even each argument's description into the schema.
@agent.tool_plain(docstring_format="google", require_parameter_descriptions=True)
def search_products(keyword: str, max_results: int = 10) -> list[str]:
"""商品を検索する。
Args:
keyword: 検索キーワード。
max_results: 返す件数の上限。
"""
...
H3: ModelRetry — request a "redo" from the tool to the LLM
When a tool judges "the input is invalid," instead of throwing an exception and crashing, it can send ModelRetry to prompt the LLM to correct.
from pydantic_ai import Agent, ModelRetry
agent = Agent("anthropic:claude-sonnet-4-6")
@agent.tool_plain
def lookup_user(user_id: str) -> str:
if not user_id.startswith("usr_"):
# 例外で落とさず、LLM に「正しい形式で渡し直して」と伝える
raise ModelRetry("user_id は 'usr_' で始まる必要があります。")
return f"ユーザー {user_id} の情報"
Why is this superior?
A tool is "deterministic code," the LLM is "ambiguous judgment." Not mixing the two is robust design. Confine processing that needs certainty, like inventory lookup, payments, and DB writes, to tools (= ordinary Python), and leave to the LLM only "when, with which arguments, to call it." ModelRetry is a mechanism to self-repair, within the conversation, the discrepancy that occurred at that boundary. This is PydanticAI providing, as a language feature, the "separation of deterministic code and probabilistic judgment" discussed in the tool-use design of AI agents.
4. Dependency injection: inject side effects and make them testable
If a tool touches a DB or an external API, you must not hardcode that dependency. PydanticAI has dependency injection (DI) via deps_type. If you know FastAPI's Depends, the philosophy is the same.
from dataclasses import dataclass
import httpx
from pydantic_ai import Agent, RunContext
@dataclass
class Deps:
"""エージェントが必要とする外部依存。dataclass で束ねるのが定石。"""
api_key: str
http_client: httpx.AsyncClient
agent = Agent("anthropic:claude-sonnet-4-6", deps_type=Deps)
@agent.tool
async def fetch_weather(ctx: RunContext[Deps], city: str) -> str:
"""指定都市の天気を取得する。"""
resp = await ctx.deps.http_client.get(
"https://api.example.com/weather",
params={"city": city, "key": ctx.deps.api_key},
)
return resp.text
async def main(client: httpx.AsyncClient) -> None:
deps = Deps(api_key="...", http_client=client)
result = await agent.run("東京の天気は?", deps=deps)
print(result.output)
Declare the type with deps_type=Deps, and pass the instance at runtime with agent.run(..., deps=deps). Inside the tool, you can access it type-safely with ctx.deps.
Why is this superior?
DI's true worth is testability. In production you inject the actual httpx.AsyncClient, and in tests a mock. Furthermore, using agent.override(deps=...), you can swap the dependency only during the test. Verifying only the tool's logic without calling the LLM, or running the whole flow with a deterministic test model ("test") — the path opens to test AI-including code without AI. This is the very "build the verification path first" principle of CLAUDE.md.
5. Output validators and self-repair: build verification into the conversation loop
There's semantic verification that Pydantic validation of output_type alone isn't enough for. "Is the SQL the LLM generated actually executable," "is the proposed date a business day" — such verification is done with @agent.output_validator, and on failure, make the LLM re-create with ModelRetry.
from pydantic import BaseModel
from pydantic_ai import Agent, ModelRetry, RunContext
class SqlQuery(BaseModel):
sql_query: str
agent = Agent(
"anthropic:claude-sonnet-4-6",
output_type=SqlQuery,
deps_type=DatabaseConn, # 例:DB 接続
)
@agent.output_validator
async def validate_sql(ctx: RunContext[DatabaseConn], output: SqlQuery) -> SqlQuery:
try:
# EXPLAIN で「実行可能か」だけを安全に検証する(実行はしない)
await ctx.deps.execute(f"EXPLAIN {output.sql_query}")
except QueryError as e:
# 失敗をエラーメッセージごと LLM に返し、修正版を生成させる
raise ModelRetry(f"無効なクエリです: {e}") from e
return output
This loop is powerful. Pydantic's syntactic verification (types, required, constraints) and output_validator's semantic verification (is it business-correct) work in two stages, and a failure in either is sent back to the conversation with ModelRetry. A self-repairing agent that "automatically redoes until correct output is obtained" can be assembled declaratively.
⚠️ Cap the retries: limit the retry count with
Agent(..., retries=2). Unlimited retries make cost and latency unbounded. If retries are frequent, that's a sign of a design flaw in the prompt or schema — prioritize root improvement, like conveying each field's intent to the LLM withField(description=...)(detailed in the LLM-structured-output guide built with Pydantic).
6. Streaming: flow partial results while validating
In a chat UI, "return after everything is done" is too slow. run_stream can stream partial results while validating the structured output.
async def stream_profile(agent: Agent, user_input: str) -> None:
async with agent.run_stream(user_input) as result:
# 検証済みの「途中までのオブジェクト」が順次流れてくる
async for profile in result.stream_output():
print(profile)
# {'name': 'Ben'} → {'name': 'Ben', 'dob': date(1990, 1, 28)} → ...
To stream only text, use result.stream_text(); to receive the raw ModelResponse thinned out, use result.stream_response(debounce_by=0.01).
⚠️ Beware of side effects on partial results: during streaming, "still-incomplete objects" are also passed to
output_validator. You'd want to do side effects like DB writes only on the completed version. Look at thectx.partial_outputflag and skip verification/side effects on a partial result. Neglecting this becomes the accident of polluting an external system with unconfirmed data.
7. Observability: see "AI's behavior" at a glance with Logfire
The biggest hurdle of an LLM app is debugging. "Why was this tool called?" "Which retry fixed it?" "Where am I wasting tokens?" — these can't be traced with print debugging. PydanticAI integrates with Logfire from the same Pydantic team, and instruments all behavior on an OpenTelemetry base.
import logfire
from pydantic_ai import Agent
logfire.configure()
logfire.instrument_pydantic_ai() # これだけで全エージェントが計装される
agent = Agent("anthropic:claude-sonnet-4-6")
result = agent.run_sync("...")
print(result.usage) # RunUsage(input_tokens=62, output_tokens=1, requests=1)
With just two lines (configure + instrument_pydantic_ai), each agent execution, tool call, retry, and token usage is visualized as a trace. From the result.usage attribute you can directly get the token count and request count, which becomes the foundation of cost monitoring. If you want to see even the raw HTTP requests, add logfire.instrument_httpx(capture_all=True).
Why is this superior? PydanticAI's instrumentation follows OpenTelemetry's GenAI semantic conventions, so it can flow to OTel backends other than Logfire (Grafana, Datadog, etc.). This means the AI's execution naturally rides on the "correlate the three pillars" design I discussed in the OpenTelemetry observability guide. "Trace a stalled process at a glance" — in running an AI agent in production, this observability is not a feature but a precondition. Being able to trace which agent execution / tool call consumes time and tokens becomes the foundation for shortening debugging and early detection of cost anomalies.
8. Production resilience: "continue from where it stopped" with durable execution
An agent hits external APIs many times, chains tools, and sometimes waits for human approval (human-in-the-loop) — such long-running workflows definitely crash midway. API rate limits, network drops, process restarts. Redoing from the start each time is unacceptable cost-wise and UX-wise.
PydanticAI provides durable execution via integration with Temporal / DBOS / Prefect / Restate. Just by wrapping an existing Agent in a dedicated class, progress is persisted and you can resume from where you left off across a failure.
from pydantic_ai import Agent
from pydantic_ai.durable_exec.temporal import TemporalAgent
# name は必須(ワークフロー/アクティビティの識別に使われる)
agent = Agent(
"anthropic:claude-sonnet-4-6",
instructions="...",
name="geography",
)
temporal_agent = TemporalAgent(agent) # Temporal ワークフロー内で実行する
With DBOS, it checkpoints the state to a DB.
from pydantic_ai.durable_exec.dbos import DBOSAgent
dbos_agent = DBOSAgent(agent)
result = await dbos_agent.run("メキシコの首都は?")
| Backend | Nature | Suited scene |
|---|---|---|
| Temporal | a workflow engine. Powerful retry/timers | complex long-running orchestration |
| DBOS | a DB-checkpoint method. Lightweight | when you want to lean the state on an existing DB |
| Prefect / Restate | data-pipeline / durable-RPC oriented | to match each foundation |
⚠️ Always attach
name=: an agent wrapped in durable execution requiresname=(it becomes the workflow identifier). Also,TemporalAgenthas backend-specific constraints, like being defined at the module's top level. When introducing it, always refer to the target backend's documentation.
Why does this work?
In the internal AI platform I built for a broadcaster (program-production support), I guaranteed resilience by separating long AI jobs into Cloud Workflows / Cloud Run Jobs. PydanticAI's durable execution solves that requirement of "running a long job in a non-crashing form" at the agent layer. Think of it as the AI-agent version of the design (see the FastAPI production-operation guide) of escaping processing too heavy to hold in FastAPI's BackgroundTasks to a job foundation.
Conclusion: change AI into "a part that ships in production"
PydanticAI is a framework that raises an LLM app from a "smart guess" to "a production system that's type-safe, observable, and recovers even when it crashes." Let me re-list the key points of this article.
- Make a minimal agent with
Agent+instructions, and validate the output as a typed object withoutput_type=BaseModel(useToolOutput/NativeOutput/PromptedOutputproperly). - Define tools with
@agent.tool/@agent.tool_plain, and auto-generate the schema from type annotations and the docstring. Self-repair discrepancies withModelRetry. - Inject side effects with
deps_typeand make them testable (swap mocks withoverride). - Build semantic verification into the conversation loop with
@agent.output_validator+ModelRetry, and cap the retries. - Stream while validating with
run_stream+stream_output(control side effects withpartial_output). - Integrate into OpenTelemetry with Logfire (
instrument_pydantic_ai) and see all behavior and cost at a glance. - Make long-running workflows fault-tolerant with durable execution (Temporal / DBOS, etc.).
At the root of PydanticAI is, after all, the same discipline as Pydantic itself — "validate data coming from outside (including LLM output) at the boundary before passing it inside." This consistency is exactly what bridges AI to production reliability.
As official primary sources, I recommend re-reading the following from this article's viewpoint.
- PydanticAI Overview
- Agents
- Output (structured output, streaming)
- Tools
- Dependencies
- Durable Execution
- Logfire integration
Consultation on type-safe AI-agent development
The author has designed and operated backends that embed generative AI at production quality, including an internal AI platform for a major domestic broadcaster. Validate the LLM's output with types, separate tools into deterministic code, track behavior with observability, and assemble non-crashing workflows with durable execution — I implement, fast and at high quality leveraging generative AI, the design for not "running AI" but "putting AI on the business's reliability." Please feel free to consult me about building AI agents, RAG, and structured-extraction pipelines using PydanticAI / FastAPI.