Introduction: Pydantic is "fast." But you can also lose that
Pydantic v2, by rewriting in Rust the core engine of validation, pydantic-core, became greatly faster than v1. In many applications, validation is no longer a bottleneck. That's exactly why the starting point of optimization is "measurement." Bringing out model_construct in an admin-panel API that only handles a few requests per second is a typical premature optimization (a YAGNI violation) that gains nothing in exchange for sacrificing readability and safety.
What this article targets is the case where, as a result of measurement, validation has actually become a hot path.
- High-throughput APIs: validating thousands of requests per second at the boundary
- Batch/ETL processing: bulk-ingesting hundreds of thousands of records
- Huge JSON payloads: parsing external API responses and large arrays
- Environments where startup time matters: serverless (cold start), where the model-construction cost hits
This article, while being faithful to Pydantic's official performance guide, is one level clearer than it, organizing in real code "which optimization, why it works, and where you should use it." Note that Pydantic's basics (BaseModel / Field / validators / serialization) are covered in the Pydantic v2 practical guide. This article, as its sequel, narrows to the practical techniques for "after you can write it correctly, write it fast."
⚠️ Don't take numbers at face value: the web is flooded with assertions like "becomes N× faster," but performance strongly depends on the schema, data, and hardware. The multipliers cited in this article quote only benchmarks the official documentation clearly states, and otherwise I describe qualitatively as "tends to become faster." After running
timeit/pytest-benchmarkon your own workload and confirming, adopt them.
1. Generate TypeAdapter "once" and reuse it
TypeAdapter is a convenient mechanism that can validate/serialize on the spot a type like list[int] or dict[str, User] that doesn't even need a BaseModel definition. But there's a pitfall — TypeAdapter newly constructs a validator and serializer internally each time it's generated. This is by no means a cheap process.
from pydantic import TypeAdapter
# ❌ アンチパターン:関数が呼ばれるたびに TypeAdapter を作り直す
def parse_ids(raw: bytes) -> list[int]:
adapter = TypeAdapter(list[int]) # 毎回スキーマを再構築するコスト
return adapter.validate_json(raw)
The official documentation clearly warns about this point.
Each time a
TypeAdapteris instantiated, it will construct a new validator and serializer. If you're using aTypeAdapterin a function, it will be instantiated each time the function is called. Instead, instantiate it once, and reuse it.
Correctly, generate it once at module scope and reuse it.
from pydantic import TypeAdapter
# ✅ モジュールスコープで一度だけ構築する(再利用される)
_IDS_ADAPTER = TypeAdapter(list[int])
def parse_ids(raw: bytes) -> list[int]:
return _IDS_ADAPTER.validate_json(raw)
Why does this work?
Schema construction is a process that "analyzes the type's shape and assembles a validator" — once is enough. Repeating it per request is the same waste as throwing away the compile result every time. If you reuse it, only the cost of validation itself remains on the hot path. This is a principle that also applies to BaseModel: Model.model_validate(...) reuses the validator constructed at class-definition time, so this problem doesn't occur. What becomes a problem is only when you generate a TypeAdapter inside a function.
2. Fuse-parse JSON with model_validate_json
When validating JSON arriving from external, you tend to write it like this.
import json
from pydantic import BaseModel
class Event(BaseModel):
id: int
name: str
# ❌ 二度手間:Python で JSON をパースしてから検証する
raw = '{"id": 1, "name": "signup"}'
event = Event.model_validate(json.loads(raw))
This way of writing makes the processing two-staged: ① parse the JSON string to a dict in Python → ② construct the dict as a Python object → ③ validate it. Pydantic v2 has a dedicated method that does these ①②③ all in one pass on the Rust side.
# ✅ 融合パース:パースと検証を pydantic-core 内部でまとめて行う
event = Event.model_validate_json(raw)
The official explains the difference between the two like this.
On
model_validate(json.loads(...)), the JSON is parsed in Python, then converted to a dict, then it's validated internally. On the other hand,model_validate_json()already performs the validation internally.
In other words, model_validate_json, by validating directly without making the intermediate-product dict, works especially for large payloads. TypeAdapter also has the same validate_json (as used in the example in chapter 1).
⚠️ The only exception:
before/wrapvalidators: the official documentation notes that if the model has abeforeorwrapvalidator, currently the benefit ofmodel_validate_json's fused parsing thins out and it may conversely become slower (an area where future improvements on the pydantic-core side are anticipated). How to choose validators is detailed in chapter 5.
The reverse direction (serialization) is the same. Going json.dumps via a Python dict is slower than directly making JSON with model_dump_json(). Note that BaseModel.model_dump_json() returns str, but TypeAdapter.dump_json() returns bytes (it can be written straight to a network or file, but string concatenation requires .decode()).
3. Make a Union a "discriminated union"
If you write a field that can take multiple types with a naive Union, Pydantic tries validating in order, since it doesn't know which member it hits (the default smart mode). The more members increase and the bigger each model is, the more this "brute-force" cost swells.
from typing import Literal, Union
from pydantic import BaseModel, Field
class Cat(BaseModel):
pet_type: Literal["cat"]
meows_per_day: int
class Dog(BaseModel):
pet_type: Literal["dog"]
barks_per_day: int
class Owner(BaseModel):
# ✅ discriminator を指定:pet_type を見て一発で正しいメンバーを選ぶ
pet: Union[Cat, Dog] = Field(discriminator="pet_type")
Owner.model_validate({"pet": {"pet_type": "cat", "meows_per_day": 30}})
# → pet=Cat(...) :Dog の検証を試さずに確定する
Give each member a common discriminator field (a Literal type) and specify it with Field(discriminator=...). Then Pydantic, looking only at the discriminator field's value, can uniquely determine the member to validate. The official's recommendation is clear.
In general, we recommend using discriminated unions. They are both more performant and more predictable than untagged unions.
When the discriminator field's name differs by member, or you want to dispatch by type like "model if dict, int if int," pass a discrimination function to Discriminator and label each member with Tag.
from typing import Annotated, Any, Literal, Optional, Union
from pydantic import BaseModel, Discriminator, Tag
class ApplePie(BaseModel):
fruit: Literal["apple"]
class PumpkinPie(BaseModel):
filling: Literal["pumpkin"] # 判別キーの名前が ApplePie と異なる
def discriminate(v: Any) -> Optional[str]:
if isinstance(v, dict):
return v.get("fruit", v.get("filling"))
return getattr(v, "fruit", getattr(v, "filling", None))
class Dinner(BaseModel):
dessert: Annotated[
Union[
Annotated[ApplePie, Tag("apple")],
Annotated[PumpkinPie, Tag("pumpkin")],
],
Discriminator(discriminate),
]
Why does this work?
A discriminated union changes validation from "brute force" to "O(1) dispatch." It improves not only performance but also the error message — when a tagged-less Union fails, it lines up the errors of all candidates as "it didn't match any member," but a discriminated union can point out pinpoint, "pet_type='cat' but meows_per_day is invalid." It's one of the few "free-ride" optimizations where speed and diagnosability are obtained at the same time. The deeper design of discriminated unions is covered in the Pydantic advanced-types / custom-validators practical guide.
4. Write type hints "concretely"
Pydantic converts type annotations directly into a validation strategy. So an abstract type produces an abstract cost, and a concrete type produces concrete speed.
H3: list / dict over Sequence / Mapping
from collections.abc import Sequence
from pydantic import BaseModel
class Slow(BaseModel):
items: Sequence[int] # ❌ list か tuple か不明 → 複数の型を試す
class Fast(BaseModel):
items: list[int] # ✅ list と分かっている → 専用の高速パス
The official's explanation is this.
When using
Sequence, Pydantic callsisinstance(value, Sequence)to check if the value is a sequence. Also, Pydantic will try to validate against different types of sequences, likelistandtuple. If you know the value is alistortuple, uselistortupleinstead ofSequence.
Mapping vs dict is the same reasoning. If you know the value is a list, write list — this is a zero-cost optimization that also benefits readability.
H3: TypedDict over a nested model
For a pure data structure where "I want to validate but don't need behavior (methods or properties)," you can use TypedDict instead of a nested BaseModel.
from typing import TypedDict
from pydantic import BaseModel
class AddressTD(TypedDict):
city: str
zipcode: str
class User(BaseModel):
name: str
address: AddressTD # ✅ BaseModel をネストするより軽い
The official gives a concrete number.
With a simple benchmark,
TypedDictis about ~2.5x faster than nested models.
BaseModel, having the functionality of an instance (model_dump, computed_field, methods, etc.), has overhead in generation. If the child elements don't need them, lightening with TypedDict is the standard.
H3: FailFast if you don't need all errors
When, in sequence validation, "fail immediately if even one is broken" is fine, you can cut off at the first error with FailFast.
from typing import Annotated
from pydantic import FailFast, TypeAdapter
_ADAPTER = TypeAdapter(Annotated[list[int], FailFast()])
_ADAPTER.validate_python([1, "x", 3]) # "x" で即停止(3 は検証しない)
⚠️ Trade-off: as the official says,
FailFast"means you won't get validation errors for the rest of the items if one fails — you trade visibility for performance." Don't use it in form validation where you want to return all errors to the user; limit it to batch ingestion, etc., where "if you find even one broken row, discard it."
5. How to choose validators: avoid wrap and leave it to the core
Custom validators are powerful, but performance differs greatly by mode. The most flexible wrap (controlling before and after validation yourself) is also the heaviest mode.
Wrap validators are generally slower than other validators. This is because they require that data is materialized in Python during validation.
"Materialize in Python" is the cost of deliberately raising data — which could have completed on the Rust side — into a Python object and handing it over. This isn't negligible on the hot path.
from typing import Annotated, Any
from pydantic import BaseModel, BeforeValidator
# ❌ pydantic-core が標準でできる型強制を、わざわざ before で肩代わりする
def to_int(v: Any) -> int:
return int(v)
class Slow(BaseModel):
count: Annotated[int, BeforeValidator(to_int)]
# ✅ "123"→123 のような数値化は core に任せれば速いし、融合パースの利点も保てる
class Fast(BaseModel):
count: int
The priority order is this.
- First consider whether
pydantic-core's standard functionality suffices (type coercion,Fieldconstraints, discriminated unions). - If insufficient, a light
aftervalidator (validation/normalization for a type-guaranteed value). - Only when pre-shaping of the input format is needed,
before. - Only when control before/after, like exception catching or fallback, is absolutely needed,
wrap.
As touched on in chapter 2, before / wrap also shave the benefit of model_validate_json's fused parsing. "Don't rewrite in Python what the core can do" — this is the cost principle of validator design. For the detailed use distinction of each validator, see the Pydantic advanced-types / custom-validators practical guide.
6. Scenes where you can skip validation: model_construct and Any
H3: model_construct for validated data
When the data's source is already validated and trustworthy (e.g., just re-packing a row read from your own DB back into a model), validation is pure overhead. model_construct() completely skips validation and generates an instance.
# 信頼できる(検証済みの)データからのみ使う
user = User.model_construct(id=1, name="alice") # バリデーションは走らない
But the official's warning is strongly toned.
model_construct()does not do any validation, meaning it can create models which are invalid. You should only ever use themodel_construct()method with data which has already been validated.
⚠️ Abuse is a source of accidents: a model created with
model_constructcan hold an invalid state (it passes through even if the type doesn't match). Furthermore,extra='forbid'is not enforced either (extra keys are silently ignored). Never use it at the boundary (external input); limit it only to "reconstruction of data that can be guaranteed internally as validated." Note the official also states "in V2 the performance difference between validation andmodel_construct()has narrowed considerably" — don't forget that the gain from throwing away safety is often not as large as you think.
H3: Any if you really pass anything
For a field that needs no validation at all (like holding arbitrary JSON as-is), making it Any makes Pydantic skip that field's validation.
from typing import Any
from pydantic import BaseModel
class Webhook(BaseModel):
event_id: str
payload: Any # 中身は検証しない(後段で改めて型付きに検証する想定)
But this is a judgment to "intentionally remove type safety at just one point." At the stage you actually use payload, the right thing is to re-validate it with an appropriate model or TypeAdapter.
7. Startup cost and Config: defer_build / cache_strings / validate_default
Finally, three settings that work not per request but at startup/construction time. They have meaning in serverless cold starts and apps that hold many models.
| Setting | Default | Effect | Where to use |
|---|---|---|---|
defer_build=True | False | Defer the model's validator/serializer construction until the first validation | When it's only nested in other models / you don't want to construct all models at startup |
cache_strings | True | Cache strings at validation time and suppress new object generation | Data where the same string appears frequently (enabled by default; basically don't touch) |
validate_default=True | False | Also validate the default value | Not attaching it is faster. You can avoid re-validating the default |
from pydantic import BaseModel, ConfigDict
class Nested(BaseModel):
# 単体では検証されず、親モデルから使われる時に初めて構築される
model_config = ConfigDict(defer_build=True)
value: int
About defer_build, the official (ConfigDict API) states this.
Whether to defer model validator and serializer construction until the first model validation. ... This can be useful to avoid the overhead of building models which are only used nested within other models.
cache_strings is enabled by default and is described as "caches strings to avoid constructing new Python objects. This significantly improves validation performance, while increasing memory usage slightly." It's fine to leave it at the default. Leaving validate_default at the default False is advantageous in avoiding unnecessary re-validation.
💡 Note:
defer_build/cache_strings/validate_defaultare items described in the configuration (ConfigDict) API reference, not the official's dedicated performance page (cache_stringsand "Anyisn't validated" are also on the performance page). Don't overstate them as "the official always recommends them for performance"; adopt them after measuring the effect on your own workload.
Conclusion: optimize in the order "measure → official standard → re-measure"
Pydantic v2's performance optimization isn't about fanciful hacks but comes down to correctly applying the official-backed standards to the hot path where they work. Let me re-list the key points of this article.
- Generate
TypeAdapteronce and reuse it (don't re-create it inside a function). - Fuse-parse JSON with
model_validate_json(exception when there's abefore/wrapvalidator). - Make a Union a discriminated union — the first choice the official clearly states is "more performant and more predictable."
- Make type hints concrete:
list/dict>Sequence/Mapping,TypedDictfor pure data (about 2.5x in the official bench),FailFastif you don't need all errors. - Avoid
wrap/beforevalidators and don't rewrite in Python what the core can do. model_constructfor validated data,Anyfor fields that need no validation — but limitedly, understanding the trade-off with safety.defer_buildfor startup cost,cache_stringsfor strings (enabled by default), and leavevalidate_defaultatFalse.
The most important principle is to always measure before and after optimization. Apply pytest-benchmark or timeout, identify the hot path, and confirm with numbers that the applied optimization actually worked. This is exactly the same discipline as PostgreSQL tuning (see the PostgreSQL performance-tuning practical guide).
As official primary sources, I recommend re-reading the following from this article's viewpoint.
Consultation on high-throughput Python backends
The author designed and implemented the backend of a METI-Minister's-Award-winning B2B SaaS with Python / Flask / SQLAlchemy 2.0 / PostgreSQL 16, and has run in production the large data of a multi-stage distribution flow. Thoroughly doing type validation at the boundary without sacrificing speed directly ties to both the business's reliability and cost efficiency. High-throughput APIs using FastAPI / Pydantic v2, validation pipelines for batch/ETL, cold-start optimization in serverless — I advance measurement-based, down-to-earth performance improvements fast and at high quality, leveraging generative AI. Please feel free to consult me.