# Pydantic v2 performance optimization: use the Rust core to the fullest and speed up hot-path validation

> Faithful to the Pydantic v2 official documentation, this explains in real code the practical techniques to speed up the validation hot path — reusing TypeAdapter, model_validate_json's fused parsing, discriminated unions, concretizing type hints (list/TypedDict), avoiding wrap validators, model_construct, and defer_build/cache_strings.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: Python, Pydantic, パフォーマンス, 型安全, バリデーション, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/pydantic-v2-performance-optimization-guide
- Category: Pydantic & type-safe validation
- Pillar guide: https://tomodahinata.com/en/blog/pydantic-v2-production-validation-type-safety

## Key points

- Pydantic is fast by default (the Rust-made pydantic-core), but depending on how you use it, you lose performance. Limit optimization to the hot path (high-throughput APIs, batch validation, huge payloads).
- Generate a TypeAdapter once and reuse it rather than re-creating it inside a function, and fuse-parse JSON with model_validate_json (faster than json.loads→model_validate).
- Make a Union a discriminated union. It's the first choice the official clearly states is 'more performant and more predictable than untagged unions.'
- Make type hints concrete: list/dict > Sequence/Mapping, and TypedDict is about 2.5x faster than a nested model (official bench). A wrap validator is slow because it needs to be 'materialized in Python.'
- Validated data can skip validation with model_construct, startup cost can be shaved with defer_build, and strings with cache_strings. But optimize after measuring.

---

## **Introduction: Pydantic is "fast." But you can also lose that**

Pydantic v2, by **rewriting in Rust** the core engine of validation, `pydantic-core`, became greatly faster than v1. In many applications, validation is no longer a bottleneck. **That's exactly why the starting point of optimization is "measurement."** Bringing out `model_construct` in an admin-panel API that only handles a few requests per second is a typical premature optimization (a YAGNI violation) that gains nothing in exchange for sacrificing readability and safety.

What this article targets is the case where, as a result of measurement, **validation has actually become a hot path.**

- **High-throughput APIs**: validating thousands of requests per second at the boundary
- **Batch/ETL processing**: bulk-ingesting hundreds of thousands of records
- **Huge JSON payloads**: parsing external API responses and large arrays
- **Environments where startup time matters**: serverless (cold start), where the model-construction cost hits

This article, while being faithful to Pydantic's [official performance guide](https://pydantic.dev/docs/validation/latest/concepts/performance/), is one level clearer than it, organizing in real code **"which optimization, why it works, and where you should use it."** Note that Pydantic's basics (`BaseModel` / `Field` / validators / serialization) are covered in the [Pydantic v2 practical guide](/blog/pydantic-v2-production-validation-type-safety). This article, as its sequel, narrows to the practical techniques for **"after you can write it correctly, write it fast."**

> ⚠️ **Don't take numbers at face value**: the web is flooded with assertions like "becomes N× faster," but performance strongly depends on the schema, data, and hardware. The multipliers cited in this article quote only **benchmarks the official documentation clearly states,** and otherwise I describe qualitatively as "tends to become faster." **After running `timeit` / `pytest-benchmark` on your own workload and confirming,** adopt them.

---

## **1. Generate `TypeAdapter` "once" and reuse it**

`TypeAdapter` is a convenient mechanism that can validate/serialize on the spot a type like `list[int]` or `dict[str, User]` that doesn't even need a `BaseModel` definition. But **there's a pitfall** — `TypeAdapter` **newly constructs** a validator and serializer internally each time it's generated. This is by no means a cheap process.

```python
from pydantic import TypeAdapter

# ❌ アンチパターン：関数が呼ばれるたびに TypeAdapter を作り直す
def parse_ids(raw: bytes) -> list[int]:
    adapter = TypeAdapter(list[int])  # 毎回スキーマを再構築するコスト
    return adapter.validate_json(raw)
```

The official documentation clearly warns about this point.

> *Each time a `TypeAdapter` is instantiated, it will construct a new validator and serializer. If you're using a `TypeAdapter` in a function, it will be instantiated each time the function is called. Instead, instantiate it once, and reuse it.*

Correctly, **generate it once at module scope and reuse it.**

```python
from pydantic import TypeAdapter

# ✅ モジュールスコープで一度だけ構築する（再利用される）
_IDS_ADAPTER = TypeAdapter(list[int])


def parse_ids(raw: bytes) -> list[int]:
    return _IDS_ADAPTER.validate_json(raw)
```

**Why does this work?**
Schema construction is a process that "analyzes the type's shape and assembles a validator" — once is enough. Repeating it per request is the same waste as throwing away the compile result every time. If you reuse it, only **the cost of validation itself** remains on the hot path. This is a principle that also applies to `BaseModel`: `Model.model_validate(...)` reuses the validator constructed at class-definition time, so this problem doesn't occur. What becomes a problem is **only when you generate a `TypeAdapter` inside a function.**

---

## **2. Fuse-parse JSON with `model_validate_json`**

When validating JSON arriving from external, you tend to write it like this.

```python
import json
from pydantic import BaseModel


class Event(BaseModel):
    id: int
    name: str


# ❌ 二度手間：Python で JSON をパースしてから検証する
raw = '{"id": 1, "name": "signup"}'
event = Event.model_validate(json.loads(raw))
```

This way of writing makes the processing two-staged: **① parse the JSON string to a dict in Python → ② construct the dict as a Python object → ③ validate it.** Pydantic v2 has a dedicated method that does these ①②③ **all in one pass** on the Rust side.

```python
# ✅ 融合パース：パースと検証を pydantic-core 内部でまとめて行う
event = Event.model_validate_json(raw)
```

The official explains the difference between the two like this.

> *On `model_validate(json.loads(...))`, the JSON is parsed in Python, then converted to a dict, then it's validated internally. On the other hand, `model_validate_json()` already performs the validation internally.*

In other words, `model_validate_json`, by **validating directly without making the intermediate-product dict,** works especially for large payloads. `TypeAdapter` also has the same `validate_json` (as used in the example in chapter 1).

> ⚠️ **The only exception: `before` / `wrap` validators**: the official documentation notes that if the model has a `before` or `wrap` validator, currently the benefit of `model_validate_json`'s fused parsing thins out and it may conversely become slower (an area where future improvements on the pydantic-core side are anticipated). How to choose validators is detailed in chapter 5.

**The reverse direction (serialization) is the same.** Going `json.dumps` via a Python dict is slower than directly making JSON with `model_dump_json()`. Note that `BaseModel.model_dump_json()` returns `str`, but **`TypeAdapter.dump_json()` returns `bytes`** (it can be written straight to a network or file, but string concatenation requires `.decode()`).

---

## **3. Make a Union a "discriminated union"**

If you write a field that can take multiple types with a naive `Union`, Pydantic **tries validating in order, since it doesn't know which member it hits** (the default smart mode). The more members increase and the bigger each model is, the more this "brute-force" cost swells.

```python
from typing import Literal, Union
from pydantic import BaseModel, Field


class Cat(BaseModel):
    pet_type: Literal["cat"]
    meows_per_day: int


class Dog(BaseModel):
    pet_type: Literal["dog"]
    barks_per_day: int


class Owner(BaseModel):
    # ✅ discriminator を指定：pet_type を見て一発で正しいメンバーを選ぶ
    pet: Union[Cat, Dog] = Field(discriminator="pet_type")


Owner.model_validate({"pet": {"pet_type": "cat", "meows_per_day": 30}})
# → pet=Cat(...) ：Dog の検証を試さずに確定する
```

Give each member **a common discriminator field (a `Literal` type)** and specify it with `Field(discriminator=...)`. Then Pydantic, looking only at the discriminator field's value, can **uniquely determine** the member to validate. The official's recommendation is clear.

> *In general, we recommend using discriminated unions. They are both more performant and more predictable than untagged unions.*

When the discriminator field's **name differs** by member, or you want to **dispatch by type** like "model if dict, int if int," pass a discrimination function to `Discriminator` and label each member with `Tag`.

```python
from typing import Annotated, Any, Literal, Optional, Union
from pydantic import BaseModel, Discriminator, Tag


class ApplePie(BaseModel):
    fruit: Literal["apple"]


class PumpkinPie(BaseModel):
    filling: Literal["pumpkin"]  # 判別キーの名前が ApplePie と異なる


def discriminate(v: Any) -> Optional[str]:
    if isinstance(v, dict):
        return v.get("fruit", v.get("filling"))
    return getattr(v, "fruit", getattr(v, "filling", None))


class Dinner(BaseModel):
    dessert: Annotated[
        Union[
            Annotated[ApplePie, Tag("apple")],
            Annotated[PumpkinPie, Tag("pumpkin")],
        ],
        Discriminator(discriminate),
    ]
```

**Why does this work?**
A discriminated union changes validation from "brute force" to **"O(1) dispatch."** It improves not only performance but also **the error message** — when a tagged-less Union fails, it lines up the errors of all candidates as "it didn't match any member," but a discriminated union can point out pinpoint, "`pet_type='cat'` but `meows_per_day` is invalid." It's one of the few "free-ride" optimizations where speed and diagnosability are obtained at the same time. The deeper design of discriminated unions is covered in the [Pydantic advanced-types / custom-validators practical guide](/blog/pydantic-custom-types-annotated-validators-advanced-guide).

---

## **4. Write type hints "concretely"**

Pydantic converts type annotations directly into a validation strategy. So **an abstract type produces an abstract cost,** and **a concrete type produces concrete speed.**

### **H3: `list` / `dict` over `Sequence` / `Mapping`**

```python
from collections.abc import Sequence
from pydantic import BaseModel


class Slow(BaseModel):
    items: Sequence[int]  # ❌ list か tuple か不明 → 複数の型を試す


class Fast(BaseModel):
    items: list[int]      # ✅ list と分かっている → 専用の高速パス
```

The official's explanation is this.

> *When using `Sequence`, Pydantic calls `isinstance(value, Sequence)` to check if the value is a sequence. Also, Pydantic will try to validate against different types of sequences, like `list` and `tuple`. If you know the value is a `list` or `tuple`, use `list` or `tuple` instead of `Sequence`.*

`Mapping` vs `dict` is the same reasoning. **If you know the value is a `list`, write `list`** — this is a zero-cost optimization that also benefits readability.

### **H3: `TypedDict` over a nested model**

For a pure data structure where "I want to validate but don't need behavior (methods or properties)," you can use `TypedDict` instead of a nested `BaseModel`.

```python
from typing import TypedDict
from pydantic import BaseModel


class AddressTD(TypedDict):
    city: str
    zipcode: str


class User(BaseModel):
    name: str
    address: AddressTD  # ✅ BaseModel をネストするより軽い
```

The official gives a concrete number.

> *With a simple benchmark, `TypedDict` is about ~2.5x faster than nested models.*

`BaseModel`, having the functionality of an instance (`model_dump`, `computed_field`, methods, etc.), has overhead in generation. **If the child elements don't need them, lightening with `TypedDict`** is the standard.

### **H3: `FailFast` if you don't need all errors**

When, in sequence validation, "**fail immediately if even one is broken**" is fine, you can cut off at the first error with `FailFast`.

```python
from typing import Annotated
from pydantic import FailFast, TypeAdapter

_ADAPTER = TypeAdapter(Annotated[list[int], FailFast()])
_ADAPTER.validate_python([1, "x", 3])  # "x" で即停止（3 は検証しない）
```

> ⚠️ **Trade-off**: as the official says, `FailFast` *"means you won't get validation errors for the rest of the items if one fails — you trade visibility for performance."* Don't use it in form validation where you want to return all errors to the user; limit it to batch ingestion, etc., where "if you find even one broken row, discard it."

---

## **5. How to choose validators: avoid `wrap` and leave it to the core**

Custom validators are powerful, but **performance differs greatly by mode.** The most flexible `wrap` (controlling before and after validation yourself) is also the heaviest mode.

> *Wrap validators are generally slower than other validators. This is because they require that data is materialized in Python during validation.*

"Materialize in Python" is the cost of deliberately raising data — which could have completed on the Rust side — into a Python object and handing it over. This isn't negligible on the hot path.

```python
from typing import Annotated, Any
from pydantic import BaseModel, BeforeValidator


# ❌ pydantic-core が標準でできる型強制を、わざわざ before で肩代わりする
def to_int(v: Any) -> int:
    return int(v)


class Slow(BaseModel):
    count: Annotated[int, BeforeValidator(to_int)]


# ✅ "123"→123 のような数値化は core に任せれば速いし、融合パースの利点も保てる
class Fast(BaseModel):
    count: int
```

**The priority order is this.**

1. **First consider whether `pydantic-core`'s standard functionality suffices** (type coercion, `Field` constraints, discriminated unions).
2. If insufficient, a light `after` validator (validation/normalization for a type-guaranteed value).
3. Only when pre-shaping of the input format is needed, `before`.
4. **Only when control before/after, like exception catching or fallback, is absolutely needed, `wrap`.**

As touched on in chapter 2, `before` / `wrap` also shave the benefit of `model_validate_json`'s fused parsing. **"Don't rewrite in Python what the core can do"** — this is the cost principle of validator design. For the detailed use distinction of each validator, see the [Pydantic advanced-types / custom-validators practical guide](/blog/pydantic-custom-types-annotated-validators-advanced-guide).

---

## **6. Scenes where you can skip validation: `model_construct` and `Any`**

### **H3: `model_construct` for validated data**

When the data's source is **already validated and trustworthy** (e.g., just re-packing a row read from your own DB back into a model), validation is pure overhead. `model_construct()` **completely skips** validation and generates an instance.

```python
# 信頼できる（検証済みの）データからのみ使う
user = User.model_construct(id=1, name="alice")  # バリデーションは走らない
```

But the official's warning is strongly toned.

> *`model_construct()` does not do any validation, meaning it can create models which are invalid. You should only ever use the `model_construct()` method with data which has already been validated.*

> ⚠️ **Abuse is a source of accidents**: a model created with `model_construct` **can hold an invalid state** (it passes through even if the type doesn't match). Furthermore, `extra='forbid'` is **not enforced** either (extra keys are silently ignored). Never use it at the boundary (external input); limit it only to "reconstruction of data that can be guaranteed internally as validated." Note the official also states *"in V2 the performance difference between validation and `model_construct()` has narrowed considerably"* — don't forget that **the gain from throwing away safety is often not as large as you think.**

### **H3: `Any` if you really pass anything**

For a field that needs no validation at all (like holding arbitrary JSON as-is), making it `Any` makes Pydantic skip that field's validation.

```python
from typing import Any
from pydantic import BaseModel


class Webhook(BaseModel):
    event_id: str
    payload: Any  # 中身は検証しない（後段で改めて型付きに検証する想定）
```

But this is a judgment to "intentionally remove type safety at just one point." At the stage you actually use `payload`, the right thing is to re-validate it with an appropriate model or `TypeAdapter`.

---

## **7. Startup cost and Config: `defer_build` / `cache_strings` / `validate_default`**

Finally, three settings that work not per request but **at startup/construction time.** They have meaning in serverless cold starts and apps that hold many models.

| Setting | Default | Effect | Where to use |
| --- | --- | --- | --- |
| `defer_build=True` | `False` | **Defer** the model's validator/serializer construction **until the first validation** | When it's only nested in other models / you don't want to construct all models at startup |
| `cache_strings` | `True` | **Cache strings at validation time and suppress new object generation** | Data where the same string appears frequently (enabled by default; basically don't touch) |
| `validate_default=True` | `False` | Also validate the default value | **Not attaching it** is faster. You can avoid re-validating the default |

```python
from pydantic import BaseModel, ConfigDict


class Nested(BaseModel):
    # 単体では検証されず、親モデルから使われる時に初めて構築される
    model_config = ConfigDict(defer_build=True)
    value: int
```

About `defer_build`, the official (ConfigDict API) states this.

> *Whether to defer model validator and serializer construction until the first model validation. ... This can be useful to avoid the overhead of building models which are only used nested within other models.*

`cache_strings` is **enabled by default** and is described as *"caches strings to avoid constructing new Python objects. This significantly improves validation performance, while increasing memory usage slightly."* It's fine to leave it at the default. Leaving `validate_default` at the **default `False`** is advantageous in avoiding unnecessary re-validation.

> 💡 **Note**: `defer_build` / `cache_strings` / `validate_default` are items described **in the configuration (ConfigDict) API reference, not the official's dedicated performance page** (`cache_strings` and "`Any` isn't validated" are also on the performance page). Don't overstate them as "the official always recommends them for performance"; adopt them after **measuring the effect on your own workload.**

---

## **Conclusion: optimize in the order "measure → official standard → re-measure"**

Pydantic v2's performance optimization isn't about fanciful hacks but comes down to **correctly applying the official-backed standards to the hot path where they work.** Let me re-list the key points of this article.

1. **Generate `TypeAdapter` once and reuse it** (don't re-create it inside a function).
2. **Fuse-parse JSON with `model_validate_json`** (exception when there's a `before`/`wrap` validator).
3. **Make a Union a discriminated union** — the first choice the official clearly states is "more performant and more predictable."
4. **Make type hints concrete**: `list`/`dict` > `Sequence`/`Mapping`, `TypedDict` for pure data (about 2.5x in the official bench), `FailFast` if you don't need all errors.
5. **Avoid `wrap`/`before` validators** and don't rewrite in Python what the core can do.
6. **`model_construct` for validated data,** `Any` for fields that need no validation — but limitedly, understanding the trade-off with safety.
7. **`defer_build` for startup cost,** `cache_strings` for strings (enabled by default), and leave `validate_default` at `False`.

The most important principle is to **always measure before and after** optimization. Apply `pytest-benchmark` or `timeout`, identify the hot path, and confirm with numbers that the applied optimization **actually worked.** This is exactly the same discipline as PostgreSQL tuning (see the [PostgreSQL performance-tuning practical guide](/blog/postgresql-performance-tuning-production-guide)).

As official primary sources, I recommend re-reading the following from this article's viewpoint.

- [Performance](https://pydantic.dev/docs/validation/latest/concepts/performance/)
- [Unions](https://pydantic.dev/docs/validation/latest/concepts/unions/)
- [Models (`model_construct`)](https://pydantic.dev/docs/validation/latest/concepts/models/)
- [TypeAdapter](https://pydantic.dev/docs/validation/latest/api/pydantic/type_adapter/)

---

### **Consultation on high-throughput Python backends**

The author designed and implemented the backend of a METI-Minister's-Award-winning B2B SaaS with **Python / Flask / SQLAlchemy 2.0 / PostgreSQL 16,** and has run in production the large data of a multi-stage distribution flow. Thoroughly doing type validation at the boundary **without sacrificing speed** directly ties to both the business's reliability and cost efficiency. High-throughput APIs using FastAPI / Pydantic v2, validation pipelines for batch/ETL, cold-start optimization in serverless — I advance **measurement-based, down-to-earth performance improvements** fast and at high quality, leveraging generative AI. Please feel free to consult me.
