Introduction: the question isn't "which is strongest" but "what do you need"
"Which Python data model should I use in the end? Pydantic? dataclass?" — this is often asked, but the question itself is slightly off. The five choices solve different problems. Comparing dataclasses and Pydantic by "which is superior" is like comparing a screwdriver and an impact wrench.
This article compares the five — dataclasses, TypedDict, attrs, msgspec, and Pydantic — fairly, based on official primary sources. It's not an article to sell Pydantic — rather, the judgment "Pydantic isn't needed here" itself shows an engineer's reliability. The comparison with marshmallow is handled in marshmallow vs Pydantic, so this article narrows to comparison with the Python standard library + speed-specialized libraries.
First, I present the single question that runs through selection — "does that data come from an untrusted external source?" If it does, you need a library that validates values at runtime. If it doesn't (trusted data generated/managed within the code), validation is an unnecessary cost. Read on with this axis in mind.
1. The big picture: the five in one table
First, an overview of the conclusion. Each cell is based on each project's official information.
| Axis | Pydantic v2 | dataclasses | TypedDict | attrs | msgspec |
|---|---|---|---|---|---|
| Runtime validation | ✅ Core feature | ❌ None | ❌ None (static only) | △ Opt-in | ✅ At decode |
| Serialization (JSON, etc.) | ✅ Built-in | △ asdict only (no JSON) | ❌ (it's a dict) | ❌ (separate cattrs) | ✅ JSON/msgpack/YAML/TOML |
| JSON Schema generation | ✅ | ❌ | ❌ | ❌ | ✅ |
| Performance positioning | "Fastest-class" / Rust core | No validation = no overhead | Same | None claimed | Speed-specialized (claimed fastest-class) |
| Settings management | ✅ pydantic-settings | ❌ | ❌ | ❌ | ❌ |
| Standard / third-party | Third-party | Standard | Standard | Third-party | Third-party |
| Main use | API, boundary, settings, LLM output | Internal struct | Static typing for dicts | Flexible class generation | High-throughput serialization |
The point is the first 2 rows. The presence of "runtime validation" and "serialization" mostly decides selection. Below, I confirm each one with official information.
2. dataclasses: a standard-library struct (doesn't validate)
Python's standard @dataclass auto-generates __init__ / __repr__ / __eq__, etc. The biggest advantage is writing a structured record with zero dependencies.
from dataclasses import dataclass
@dataclass
class InventoryItem:
name: str
unit_price: float
quantity_on_hand: int = 0
However, type annotations aren't validated at runtime. The official documentation states plainly — "(except for two exceptions) @dataclass does not examine the type specified in the annotation at all."
# 型は str/float/int だが、検証されないので "壊れた" インスタンスが普通に作れる
item = InventoryItem(name=123, unit_price="無料", quantity_on_hand=None) # エラーにならない
You can convert to dict/tuple with asdict / astuple, but there's no JSON serialization or JSON Schema generation.
💡 When to choose
dataclasses: when the data source is trusted internal (a settings object your code generates, an intermediate computation result, a DTO, etc.) and you need neither validation nor serialization. It completes with the standard library alone and doesn't add dependencies — the choice most faithful to KISS and YAGNI.
3. TypedDict: add "static types" to a dict
TypedDict gives type hints to a dict's keys and values. But nothing happens at runtime. Per the official words — "at runtime a TypedDict instance is just a dict" and "this expectation is not checked at runtime and is enforced only by a type checker."
from typing import TypedDict
class Point2D(TypedDict):
x: int
y: int
label: str
p: Point2D = {"x": 1, "y": 2, "label": "A"} # 実体はただの dict。mypy だけが型を見る
💡 When to choose
TypedDict: when you want only static type checking (mypy/Pyright) on a dict payload from JSON. Without adding classes or changing runtime behavior, you get editor completion and static checking. Note that Pydantic can makeTypedDicta validation target, so the combination "staticTypedDict, validate at the boundary with Pydantic" is also possible (as touched in the performance-optimization guide, Pydantic's official docs also acknowledge "TypedDictis about 2.5× faster than a nested model").
4. pydantic.dataclasses: add validation to dataclass syntax
"I want just validation while keeping the dataclass writing feel" — the middle solution is pydantic.dataclasses. Replacing the standard @dataclass with the Pydantic version makes validation and type coercion take effect.
from datetime import datetime
from typing import Optional
from pydantic.dataclasses import dataclass
@dataclass
class User:
id: int
name: str = "John Doe"
signup_ts: Optional[datetime] = None
user = User(id="42", signup_ts="2032-06-21T12:00")
# id="42" → 42 に型強制、signup_ts は datetime にパースされる
The official docs explain "if you don't want to use BaseModel, you get the same data validation with a standard dataclass" while also stating "a Pydantic dataclass is not a replacement for a Pydantic model." Part of model_config and BaseModel's method group (model_dump, etc.) can't be used as-is.
💡 When to choose
pydantic.dataclasses: when you want to introduce validation into existing dataclass-based code with minimal change. Or "I like dataclass syntax but want external-input validation." If you build out an API boundary anew in earnest, theBaseModelfrom the next chapter on has more complete features.
5. attrs: a mature class builder (opt-in validation, separate serialization)
attrs is a mature library that's arguably the origin of dataclass, controlling class generation powerfully and flexibly (slots, converters, validators, etc.). Validation is opt-in — only fields with an explicitly attached validator are validated.
from attrs import define, field
import attrs
@define
class Color:
value = field(validator=attrs.validators.instance_of(int))
@value.validator
def _fits_byte(self, attribute, value):
if not 0 <= value < 256:
raise ValueError("0〜255 の範囲で指定してください")
What's important is that attrs is not a serialization library. As the official docs state "attrs is not a full-fledged serialization library … see the sister project cattrs," mutual conversion with JSON is handled by cattrs (separation of concerns). JSON Schema generation is also not standard.
💡 When to choose
attrs: when you want fine-grained control of class generation (slot optimization, converters, complex__init__logic) and want to intentionally separate serialization (pair with cattrs). It has more freedom over "how to make the class" itself than Pydantic, while you have to assemble validation, schema, and ecosystem yourself.
6. msgspec: validation + serialization all-in on speed
msgspec is a library specialized in the speed of serialization and validation. Define a msgspec.Struct and mutual conversion of JSON, MessagePack, YAML, TOML and type validation at decode are done extremely fast.
import msgspec
class User(msgspec.Struct):
name: str
groups: set[str] = set()
email: str | None = None
msgspec.json.encode(User("alice", groups={"admin"}))
# b'{"name":"alice","groups":["admin"],"email":null}'
msgspec.json.decode(b'{"name":"bob","groups":[123]}', type=User)
# msgspec.ValidationError: Expected `str`, got `int` - at `$.groups[0]`
msgspec also has JSON Schema generation (msgspec.json.schema). On performance, the official docs put up bold numbers as its own benchmark — "the JSON/MessagePack implementation is among the fastest in Python," "msgspec decodes & validates faster than orjson merely decodes," "structs are 5–60× faster on common operations."
⚠️ The numbers are vendor claims: these multipliers are claims by msgspec's own benchmark, not neutrally measured by a third party. Measure on your own workload before adopting. Meanwhile, Pydantic's official docs say at the top of the performance page "in many cases Pydantic isn't a bottleneck." Confirming with measurement whether "speed is truly the bottleneck" comes first (see the Pydantic performance-optimization guide).
💡 When to choose
msgspec: when, in a high-throughput API, message queue, or huge-JSON ingestion, serialization + validation is a measured bottleneck. If you can introduce a speed-specialized third party, it's powerful.
7. Pydantic: a balance of validation, schema, and ecosystem
Pydantic combines runtime validation, JSON Schema generation, pydantic-settings, and a vast ecosystem into one — it's "the boundary default," so to speak. The core pydantic-core is in Rust, and the official docs position it as "one of the fastest data-validation libraries in Python."
from pydantic import BaseModel, Field
class User(BaseModel):
id: int
name: str = Field(min_length=1)
email: str
User.model_validate({"id": "42", "name": "alice", "email": "a@example.com"})
# 検証+型強制+(必要なら)JSON Schema 生成+直列化がワンストップ
Pydantic's true strength is in the ecosystem more than a single feature. FastAPI, SQLModel, Django Ninja, LangChain, and PydanticAI build on Pydantic, and about 8,000 packages on PyPI depend on Pydantic. "A validated model directly becomes the API schema, the settings, the LLM's structured output" — this end-to-end flow is value found nowhere else.
💡 When to choose Pydantic: the boundary handling external input (HTTP API, external API responses, settings, LLM output). When you want to handle validation, serialization, schema, and settings with consistent types. If you use FastAPI, it's the de facto standard. "If in doubt, Pydantic" holds because of this broad coverage.
8. Decision flowchart
Finally, drop selection into one flow.
- Do you need to validate values at runtime? (Does the data come from an untrusted external source?)
- No → go to 2.
- Yes → go to 3.
- (No validation needed) Want static typing while keeping it a dict?
- Yes →
TypedDict - No (want a class) →
dataclasses(if you need fine-grained control,attrs)
- Yes →
- (Validation needed) Is serialization speed a measured bottleneck?
- Yes →
msgspec - No → go to 4.
- Yes →
- Want the ecosystem of JSON Schema, settings management, FastAPI, etc.?
- Yes → Pydantic (the default for a new API boundary)
- Just want to keep dataclass syntax →
pydantic.dataclasses - Want class-generation freedom + separated serialization →
attrs+ cattrs
| Situation | Recommendation |
|---|---|
| Build an API with FastAPI | Pydantic (de facto standard) |
| Settings / secret management | pydantic-settings (guide) |
| LLM structured output | Pydantic / PydanticAI (guide) |
| Internal DTO needing no validation | dataclasses |
| Static typing for dicts | TypedDict |
| High throughput where serialization is the bottleneck | msgspec |
| Fine-grained control of class generation | attrs + cattrs |
Conclusion: choose the tool to fit the problem
The five choices aren't competitors but tools that solve problems at different layers. Restating this article's points.
- The first question of selection is "do you validate external input."
dataclassesandTypedDictdon't validate (stated officially). dataclassesis a standard struct,TypedDictis static typing for dicts — both for trusted internal data.attrsis opt-in validation + separated serialization (cattrs);msgspecis speed-specialized (but the multipliers are vendor claims).- Pydantic is the boundary default with a balance of validation, schema, settings, and ecosystem.
- If in doubt, the flowchart — narrow in the order: need for validation → speed bottleneck → need for ecosystem.
Rather than searching for "the strongest library," discerning where the data in front of you comes from and what you want to guarantee leads to a maintainable and cost-efficient design. With that understood, you can confidently explain both when you choose Pydantic and when you deliberately get by with a standard dataclass.
Each project's primary sources:
- Pydantic / pydantic.dataclasses
- dataclasses (Python standard)
- TypedDict (Python standard)
- attrs / msgspec
Consultation on technology selection and data-modeling design
The author has led technology selection of "what tool to use where" in multiple production systems, including the B2B SaaS that won the METI Minister's Award. Data-modeling selection is a decision that weighs performance, maintainability, the team's learning cost, and the ecosystem. I support, leveraging generative AI fast and at high quality, library selection neither excessive nor deficient for the requirements and the type-safe architecture design based on it. Feel free to consult me about Python backend technology selection.