Introduction: why you should relearn "Pydantic v2" now
The design philosophy of a robust backend can be condensed into a single sentence — "never trust data coming from outside the system boundary." HTTP request bodies, external API responses, environment variables, message-queue payloads. These are all "unvalidated data with no type guarantee," and the moment you let them pass straight inside the application, they morph into KeyError, AttributeError, and in the worst case a security hole.
Pydantic is the gatekeeper standing at this boundary. Now that FastAPI has become the de facto standard framework, the core of boundary validation in Python has nearly converged on Pydantic. But there's a problem. Many of the articles on the web, Stack Overflow, and the code generated AI outputs are still written in v1-series legacy style (@validator, .dict(), class Config).
Pydantic v2, officially released at the end of June 2023, was not a mere version up. The core validation engine was rewritten in Rust as pydantic-core and split into a separate package. Along with gaining the performance the official touts as "much faster vs v1," the API was also renewed to the model_* prefix. Write code with v1 knowledge, and it works but becomes an obsolete style and technical debt.
This article isn't a repeat of an introduction. While faithful to the official documentation (pydantic.dev/docs/validation/latest), yet one level clearer than it, it breaks through, with concrete code, the following walls you'll definitely face in practice.
- "I've written with
@validator, but I don't understand what changed with v2's@field_validator/@model_validator" - "The use distinction of
Field()'s constraints,alias, anddefault_factoryis vague" - "Where should I write validation spanning multiple fields, like password confirmation?"
- "
.dict()doesn't work. What's the difference frommodel_dump(mode='json')?" - "
strictmode and type coercion — which should I choose in production?" - "I've been reading environment variables all over with
os.environ['...'], but I want to consolidate them type-safely"
The author designed and implemented the backend of a B2B SaaS that won the METI Minister's Award in Python 3.11 / Flask / SQLAlchemy 2.0 / PostgreSQL 16, and operated it in production with a strict layer separation of Router → UseCase → Repository → Model. That project's boundary validation adopted Marshmallow 3, but the discipline itself of "always validate external input at the boundary before passing inward" is completely identical to this article. In a FastAPI-based stack, Pydantic is exactly what plays that role. This article organizes the knowledge of that boundary design, together with the backing of the Pydantic v2 official documentation.
💡 This article is part of a series on Python backend design. Read the web-framework layer in FastAPI Production-Operation Guide and the persistence layer in SQLAlchemy 2.0 Practical Guide together, and you can survey consistent type-safe design from the boundary to the DB.
1. BaseModel and Field: declaratively define "the shape of correct data"
Pydantic's starting point is inheriting BaseModel. Just write type annotations on class attributes, and it becomes the single source of truth for schema, validation, and serialization.
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str = "Jane Doe" # デフォルト値を持つ=省略可能フィールド
# dict から検証して生成(型強制が働き、文字列 "42" は int 42 になる)
user = User.model_validate({"id": "42"})
print(user.id) # 42 ← int に変換されている
print(user.name) # "Jane Doe"
Validation runs even if you call the constructor directly like User(id="42"), but for generation from external input (dict / JSON), using model_validate() / model_validate_json() is the standard play. The intent at the boundary of "it becomes a typed object only after validation" becomes clear in the code.
H3: declare constraints, aliases, and defaults with Field()
Type annotations alone can't express business constraints like "a positive integer" or "3–30 characters." What handles that is Field().
from typing import Annotated
from pydantic import BaseModel, Field
class Product(BaseModel):
# Annotated パターン(v2 で推奨):型と制約を分離して読みやすい
name: Annotated[str, Field(min_length=1, max_length=120)]
price: Annotated[int, Field(gt=0)] # 正の整数のみ
discount_rate: Annotated[float, Field(ge=0, le=1)] # 0.0〜1.0
sku: Annotated[str, Field(pattern=r"^[A-Z]{3}-\d{4}$")]
# タグは「都度新しい空リスト」を生成(mutable default の罠を回避)
tags: list[str] = Field(default_factory=list)
The main constraint parameters are, per the official documentation, as follows.
| Parameter | Meaning | Applicable type |
|---|---|---|
gt / ge / lt / le | Greater than / at least / less than / at most | Numbers |
min_length / max_length | Min/max length | Strings, collections |
pattern | Regex match | Strings |
default | A static default value | All |
default_factory | A callable that generates the default | All |
⚠️ The mutable-default trap: write
tags: list[str] = []and it becomes the classic Python bug of sharing the same list object across all instances. Pydantic detects this, but always usedefault_factory=list/default_factory=dictfor the default of a collection or dict.
H3: separate "external naming" and "internal naming" with alias
The external API is camelCase, and you want to unify the internal code in snake_case — a common requirement. Field(alias=...) handles this translation.
from pydantic import BaseModel, ConfigDict, Field
class ApiPayload(BaseModel):
# 入力 JSON は "userName" だが、内部では user_name として扱いたい
model_config = ConfigDict(populate_by_name=True)
user_name: str = Field(alias="userName")
is_active: bool = Field(alias="isActive")
# 外部のキャメルケースで検証
payload = ApiPayload.model_validate({"userName": "alice", "isActive": True})
print(payload.user_name) # "alice" ← 内部はスネークケース
alias takes effect on both validation and serialization. If you want to use different names at validation time and serialization time, specify validation_alias / serialization_alias individually. Attach populate_by_name=True and you can supply the value with either the alias or the field name, effective for backward compatibility during a migration period.
Why is this superior?
So the external schema's naming convention doesn't erode the application's internal code quality, alias confines the translation layer to the boundary. Even if the external API suddenly changes user_name to userId, the only place to fix is the one line Field(alias=...). This is the practice of "ETC (Easy To Change)" in CLAUDE.md's terms, localizing the change's impact range to the boundary.
2. Validators: verify business rules that can't be expressed with types
Field()'s constraints go up to "static rules of a single field." For dynamic validation spanning multiple fields, like "normalize the email address" or "the password and confirmation password match," use validator decorators.
H3: @field_validator: validate/transform a single field
@field_validator receives a specific field's value and returns the validated or transformed value. In v2, combining it with @classmethod is canonical.
from pydantic import BaseModel, field_validator
class SignupForm(BaseModel):
email: str
age: int
@field_validator("email", mode="after")
@classmethod
def normalize_email(cls, value: str) -> str:
# mode="after":Pydantic の内部検証後に走る。value は既に str 型が保証される
return value.strip().lower()
@field_validator("age", mode="after")
@classmethod
def must_be_adult(cls, value: int) -> int:
if value < 18:
raise ValueError("18歳以上である必要があります")
return value
The use distinction of mode is the key. Let me organize it faithfully to the official definition.
mode | Execution timing | Value received | Main use |
|---|---|---|---|
"after" (default) | After Pydantic's internal validation | A type-guaranteed value | Type-safe validation/normalization (the first choice) |
"before" | Before internal validation/type coercion | The raw input (Any) | Pre-shaping the input form (e.g., wrapping a single value in a list) |
"wrap" | Control before/after validation yourself | Any + handler | The most flexible, for exception catching, fallback, etc. |
mode="before" is effective for pre-processing that shapes "the miscellaneous forms coming from a DB or form" into the proper form.
from typing import Any
from pydantic import BaseModel, field_validator
class Article(BaseModel):
tags: list[str]
@field_validator("tags", mode="before")
@classmethod
def ensure_list(cls, value: Any) -> Any:
# "python,rust" のような単一文字列もリストとして受け入れる
if isinstance(value, str):
return [t.strip() for t in value.split(",")]
return value
💡 Prefer
mode="after": the official positions the after validator as "generally more type-safe." Because before has the input asAnywith no type guarantee, limit it to the necessary pre-processing, and placing most of the validation logic in after is safe.
H3: @model_validator: validation spanning multiple fields
To validate a relationship between fields, like "the password and confirmation password match," use @model_validator. With mode="after", define it as an instance method and return the validated self.
from typing import Self
from pydantic import BaseModel, model_validator
class PasswordChange(BaseModel):
password: str
password_repeat: str
@model_validator(mode="after")
def check_passwords_match(self) -> Self:
# この時点で password / password_repeat は型検証済み
if self.password != self.password_repeat:
raise ValueError("パスワードが一致しません")
return self
On the other hand, mode="before" receives the whole raw input (dict) before the model is instantiated. It's suited to an input guard like "forbid the existence of a specific key."
from typing import Any
from pydantic import BaseModel, model_validator
class Account(BaseModel):
username: str
@model_validator(mode="before")
@classmethod
def forbid_raw_card_number(cls, data: Any) -> Any:
# 生のクレジットカード番号が混入していたら即座に拒否する
if isinstance(data, dict) and "card_number" in data:
raise ValueError("card_number を直接含めることはできません")
return data
Why is this superior?
Scatter cross-field validation as hand-written ifs in the router or service layer, and the validation logic mixes into the business logic, breaking SRP (single responsibility). Consolidate it into @model_validator, and "the invariant this model represents" completes within the model definition. The guarantee that the existence of a PasswordChange instance = the passwords match is upheld at the code level, and all downstream code can trust that premise.
3. Serialization: safely return a typed object to "the outward shape"
Once you've validated and made it an object, you now need to return it to a response JSON or a DB-storage form. In Pydantic v2, model_dump() / model_dump_json() handle this (v1's .dict() / .json() are abolished).
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
password: str
user = User(id=1, name="alice", password="secret")
# Python オブジェクトの dict(tuple などは Python 型のまま保持される)
user.model_dump() # {'id': 1, 'name': 'alice', 'password': 'secret'}
# JSON 文字列(datetime → ISO 文字列など JSON 互換型へ変換される)
user.model_dump_json() # '{"id":1,"name":"alice","password":"secret"}'
The difference between model_dump(mode='json') and mode='python' (the default) appears frequently in practice. mode='python' keeps tuple and datetime as Python types, while mode='json' converts to JSON-compatible types (lists, ISO strings, etc.). You can organize it by thinking of model_dump_json() as the latter directly turned into a JSON string.
The main control parameters are as follows.
| Parameter | Effect | Typical use |
|---|---|---|
exclude={'password'} | Exclude the specified field | Remove confidential info from the response |
include={'id', 'name'} | Output only the specified fields | Partial exposure |
by_alias=True | Output by alias, not the field name | Return to an external camelCase API |
exclude_none=True | Exclude fields whose value is None | A sparse response |
exclude_unset=True | Exclude fields not explicitly passed | PATCH diff updates |
⚠️ Preventing confidential-info leakage: accidentally including a password hash or token in a response is a typical accident. Writing
model_dump(exclude={"password"})each time is a hotbed for misses, so the design of the laterfield_serializer, or splitting an outward-only response model in the first place, is robust.
H3: transform the output with field_serializer
To customize the output format of a specific field, use @field_serializer.
from datetime import datetime
from pydantic import BaseModel, field_serializer
class Event(BaseModel):
name: str
starts_at: datetime
@field_serializer("starts_at")
def serialize_starts_at(self, value: datetime) -> str:
# フロントの表示規約に合わせて Unix エポック秒で返す
return str(int(value.timestamp()))
H3: include a derived value in serialization with computed_field
When you want to include "a value computed from other fields" in the output, stack @computed_field with @property.
from pydantic import BaseModel, computed_field
class Box(BaseModel):
width: float
height: float
depth: float
@computed_field
@property
def volume(self) -> float:
return self.width * self.height * self.depth
box = Box(width=2, height=3, depth=4)
box.model_dump() # {'width': 2.0, 'height': 3.0, 'depth': 4.0, 'volume': 24.0}
The value declared with computed_field is included in model_dump()'s output and the JSON Schema (readOnly: True). As the official explicitly states, Pydantic applies no additional validation logic to a computed_field — it's purely a mechanism for "outputting a derived value."
Why is this superior?
Compute volume on the caller side each time, and the same formula scatters across multiple places, a DRY violation. computed_field confines that knowledge to the single place called the model and exposes it consistently as a serialization result. The data and its derivation logic cohere, and the reason for change consolidates to a single point.
4. strict mode and type coercion: the trade-off between safety and convenience
Pydantic by default does type coercion (the lax mode). It converts the string "123" to int 123, and "true" to bool True — this is the source of the convenience. But there are situations where this "cleverness" backfires.
from pydantic import BaseModel
class Order(BaseModel):
quantity: int
# lax(デフォルト):文字列が黙って int に変換される
Order.model_validate({"quantity": "5"}) # quantity=5 ← 通ってしまう
For a field where type strictness directly connects to business risk, like a payment amount or inventory count, this implicit conversion is a hotbed for bugs. strict mode disables type coercion and requires an exact type match.
from pydantic import BaseModel, ConfigDict, Field
# ① 呼び出し単位で strict にする
Order.model_validate({"quantity": "5"}, strict=True)
# → ValidationError:str は int として受け付けられない
# ② フィールド単位で strict にする
class StrictOrder(BaseModel):
quantity: int = Field(strict=True)
note: str # ここは lax のまま
# ③ モデル全体を strict にする
class FullyStrictOrder(BaseModel):
model_config = ConfigDict(strict=True)
quantity: int
amount: int
Let me organize strict's behavior.
| Input | lax (default) | strict |
|---|---|---|
{"quantity": "5"} (string) | Converted to 5 | ValidationError |
{"quantity": 5} (integer) | 5 | 5 |
{"is_active": "true"} | Converted to True | ValidationError |
💡 Where to use strict: input received from humans or loose clients at the API's outermost edge is realistically left lax, prioritizing convenience, leaving
int-ification to Pydantic. On the other hand, for service-internal domain models and places where implicit conversion becomes an accident, like amount/quantity, setstrict=Trueper field. This distinction is the practical compromise that balances convenience and safety. Note that in JSON mode, even with strict, conversion of values where "JSON has no strict type," like a date-time string, is allowed.
5. Configuration management: realize 12-factor type-safely with pydantic-settings
Code that reads environment variables each time with os.environ["DATABASE_URL"] carries a triple burden of the type fixed to str, no existence guarantee, and default values scattered. pydantic-settings consolidates configuration into a single typed model and auto-loads it from environment variables.
⚠️ Note it's a separate package: in v2,
BaseSettingswas separated from the main body into a separate package.pip install pydantic-settingsis needed, and the import isfrom pydantic_settings import ...(from pydantic import BaseSettingsis the v1 way of writing and doesn't work in v2).
from pydantic import Field
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
# .env を読み、APP_ プレフィックス付きの環境変数にマッピングする
model_config = SettingsConfigDict(
env_file=".env",
env_prefix="APP_",
case_sensitive=False,
)
database_url: str # APP_DATABASE_URL(必須)
debug: bool = False # APP_DEBUG(型強制で "1"→True)
allowed_hosts: list[str] = Field(default_factory=list) # JSON としてパース
max_connections: int = Field(default=10, gt=0)
# アプリ起動時に一度だけ生成。未設定の必須項目があればここで即座に失敗する
settings = Settings()
The points are, per the official spec, as follows.
- A missing required field becomes a
ValidationErrorat startup, so you can detect a configuration mistake before deploy rather than "noticing it for the first time in production." debug: boolconverts an environment-variable string like"1"/"true"toboolby type coercion.- Complex types like
list/dictparse the environment variable as JSON (APP_ALLOWED_HOSTS='["a.com","b.com"]'). - Prevent name collisions with
env_prefix, and withenv_nested_delimiter(e.g.,__) you can express nested configuration inFOO__BARform.
Why is this superior?
By consolidating configuration into a typed model, the application can only start in a state where "the configuration is complete" (Fail Fast). settings.max_connections is statically known to be int, and a typo like settings.databse_url is detected by the type checker. This is the standard play of realizing the 12-factor App principle "store config in the environment" with both type safety and not hardcoding secrets. Secrets aren't written in code but flowed into this model via environment variables — completely consistent with CLAUDE.md's security principles too.
6. v1 → v2 migration: a quick-reference of the changes
Encounter an existing v1 codebase, or v1-style code that generative AI tends to output, and you can mechanically replace it with the following correspondence table. Let me organize the main renames listed in the official migration guide.
| v1 (old) | v2 (new) | Category |
|---|---|---|
@validator | @field_validator | Single-field validation |
@root_validator | @model_validator | Whole-model / cross-field validation |
.dict() | .model_dump() | Serialize to dict |
.json() | .model_dump_json() | Serialize to JSON string |
.copy() | .model_copy() | Duplicate an instance |
.construct() | .model_construct() | Generation without validation |
.parse_obj() | .model_validate() | Validate-generate from dict / object |
.parse_raw() | .model_validate_json() | Validate-generate from JSON string |
class Config: | model_config = ConfigDict(...) | Model configuration |
.update_forward_refs() | .model_rebuild() | Resolve forward references |
__fields__ | model_fields | Reference field metadata |
from pydantic import BaseSettings | from pydantic_settings import BaseSettings | Settings (separated into a package) |
.from_orm(obj) | .model_validate(obj, ...) (from_attributes=True) | Generation from an ORM object |
💡 The crux of migration: v2's methods consistently have the
model_prefix. This is the design decision to avoid name collisions with user-defined fields, like "theUsermodel wants to have a business method nameddict()." In the conversion@validator→@field_validator, don't forget to add@classmethodand makemode=explicit. For bulk conversion, you can also use the officially-provided migration-support tool (bump-pydantic).
H3: TypeAdapter that validates non-model types
A convenient mechanism that didn't exist in v1 is TypeAdapter. It can validate/serialize, on the spot, a type like list[int] or dict that doesn't warrant defining a BaseModel.
from pydantic import TypeAdapter
# list[int] を BaseModel なしで検証する
adapter = TypeAdapter(list[int])
adapter.validate_python(["1", "2", "3"]) # [1, 2, 3] ← 各要素を型強制
adapter.validate_json("[1, 2, 3]") # [1, 2, 3]
adapter.dump_json([1, 2, 3]) # b'[1,2,3]' ← bytes を返す点に注意
In cases like an external API returning "an array of user objects" at the top level, where the root element is not a model, TypeAdapter(list[User]) shows its power.
Conclusion: make boundary validation "part of the type system"
Pydantic v2 is a modern validation library with the Rust-made pydantic-core at its core, deeply integrated with type annotations. Let me restate this article's key points.
- With
BaseModel+Field(), declaratively define "the shape of correct data," and boundary-validate withmodel_validate(). - Consolidate business rules into
@field_validator(single) /@model_validator(cross-field) and guarantee the model's invariants. - Safely control the outward shape with
model_dump()/model_dump_json()/field_serializer/computed_field. - Apply
strictmode to places where accidents aren't permitted, like amount/quantity, using convenience and safety by distinction. - With
pydantic-settings, consolidate configuration type-safely, achieving both Fail Fast and not hardcoding secrets. - With the
v1 → v2quick-reference, mechanically migrate to themodel_-prefix API, and validate non-model types too withTypeAdapter.
The difference between "code that works" and "code you can operate for 10 years" lies in the accumulation of boundary design of where and how you dam up untrustworthy data. Pydantic is the best tool to declaratively express that boundary as part of the type system.
For further exploration, I recommend re-reading the following from the official documentation, with this article's design viewpoint in mind.
- Models
- Fields
- Validators
- Serialization
- Configuration
- Strict Mode
- Settings Management
- Migration Guide
Consultation on type-safe backend design
The author has implemented and operated the discipline explained here of "always validate external input at the system boundary" in the production environment of a B2B SaaS that won the METI Minister's Award (as boundary validation with Marshmallow 3). In a FastAPI-based stack, Pydantic v2 plays that role. I build, fast and at high quality leveraging generative AI, foundations directly tied to business reliability — type-safe input validation, configuration management, API schema design, and boundary defense of external integration. Feel free to consult us about backend development with Python and making existing systems type-safe.