# Pydantic vs dataclasses vs TypedDict vs attrs vs msgspec: a Python data-modeling selection guide (2026)

> A fair comparison of Python's five data-modeling choices based on official information. dataclasses/TypedDict have no runtime validation, attrs is opt-in validation with no serialization, msgspec is speed-specialized, and Pydantic balances validation + schema + ecosystem. With axes of runtime validation, serialization, JSON Schema, performance, and ecosystem, plus a decision flow, it shows the optimal choice for your project.

- Published: 2026-06-26
- Author: 友田 陽大
- Tags: Python, Pydantic, 型安全, データモデリング, パフォーマンス, アーキテクチャ設計
- URL: https://tomodahinata.com/en/blog/pydantic-vs-dataclasses-typeddict-attrs-msgspec-comparison-guide
- Category: Pydantic & type-safe validation
- Pillar guide: https://tomodahinata.com/en/blog/pydantic-v2-production-validation-type-safety

## Key points

- The first question of selection is 'do you validate values at runtime.' dataclasses and TypedDict don't validate (stated officially). To handle external input at the boundary, you need a validating library.
- dataclasses = stdlib struct (no validation), TypedDict = static typing for dicts (a plain dict at runtime), used for organizing trusted internal data.
- attrs is opt-in validation + separate serialization (cattrs); msgspec is speed-specialized, doing serialization + validation fast (vendor-claimed 5-60x).
- Pydantic is 'the boundary default' with runtime validation, JSON Schema, pydantic-settings, and a vast ecosystem of FastAPI, etc. The official docs also say it's not a bottleneck in many cases.
- If in doubt, the flowchart: no validation needed → dataclass/TypedDict, speed is the bottleneck → msgspec, validation + schema + ecosystem → Pydantic, fine-grained control + separated serialization → attrs+cattrs.

---

## **Introduction: the question isn't "which is strongest" but "what do you need"**

"Which Python data model should I use in the end? Pydantic? dataclass?" — this is often asked, but the question itself is slightly off. **The five choices solve different problems.** Comparing `dataclasses` and Pydantic by "which is superior" is like comparing a screwdriver and an impact wrench.

This article compares the five — `dataclasses`, `TypedDict`, `attrs`, `msgspec`, and **Pydantic** — **fairly, based on official primary sources.** It's not an article to sell Pydantic — rather, the judgment "Pydantic isn't needed here" itself shows an engineer's reliability. The comparison with `marshmallow` is handled in [marshmallow vs Pydantic](/blog/marshmallow-vs-pydantic-comparison-guide), so this article narrows to comparison with **the Python standard library + speed-specialized libraries.**

First, I present **the single question** that runs through selection — **"does that data come from an untrusted external source?"** If it does, you need a library that validates values at runtime. If it doesn't (trusted data generated/managed within the code), validation is an unnecessary cost. Read on with this axis in mind.

---

## **1. The big picture: the five in one table**

First, an overview of the conclusion. Each cell is based on each project's official information.

| Axis | **Pydantic v2** | **dataclasses** | **TypedDict** | **attrs** | **msgspec** |
| --- | --- | --- | --- | --- | --- |
| Runtime validation | ✅ Core feature | ❌ None | ❌ None (static only) | △ Opt-in | ✅ At decode |
| Serialization (JSON, etc.) | ✅ Built-in | △ `asdict` only (no JSON) | ❌ (it's a dict) | ❌ (separate cattrs) | ✅ JSON/msgpack/YAML/TOML |
| JSON Schema generation | ✅ | ❌ | ❌ | ❌ | ✅ |
| Performance positioning | "Fastest-class" / Rust core | No validation = no overhead | Same | None claimed | **Speed-specialized** (claimed fastest-class) |
| Settings management | ✅ pydantic-settings | ❌ | ❌ | ❌ | ❌ |
| Standard / third-party | Third-party | **Standard** | **Standard** | Third-party | Third-party |
| Main use | API, boundary, settings, LLM output | Internal struct | Static typing for dicts | Flexible class generation | High-throughput serialization |

The point is the **first 2 rows.** The presence of "runtime validation" and "serialization" mostly decides selection. Below, I confirm each one with official information.

---

## **2. `dataclasses`: a standard-library struct (doesn't validate)**

Python's standard `@dataclass` auto-generates `__init__` / `__repr__` / `__eq__`, etc. The biggest advantage is writing a structured record with **zero dependencies.**

```python
from dataclasses import dataclass


@dataclass
class InventoryItem:
    name: str
    unit_price: float
    quantity_on_hand: int = 0
```

However, **type annotations aren't validated at runtime.** The official documentation states plainly — *"(except for two exceptions) `@dataclass` does not examine the type specified in the annotation at all."*

```python
# 型は str/float/int だが、検証されないので "壊れた" インスタンスが普通に作れる
item = InventoryItem(name=123, unit_price="無料", quantity_on_hand=None)  # エラーにならない
```

You can convert to dict/tuple with `asdict` / `astuple`, but there's **no JSON serialization or JSON Schema generation.**

> 💡 **When to choose `dataclasses`**: when the data source is **trusted internal** (a settings object your code generates, an intermediate computation result, a DTO, etc.) and you need neither validation nor serialization. It completes with the standard library alone and doesn't add dependencies — the choice most faithful to KISS and YAGNI.

---

## **3. `TypedDict`: add "static types" to a dict**

`TypedDict` gives type hints to a dict's keys and values. But **nothing happens at runtime.** Per the official words — *"at runtime a TypedDict instance is just a dict"* and *"this expectation is not checked at runtime and is enforced only by a type checker."*

```python
from typing import TypedDict


class Point2D(TypedDict):
    x: int
    y: int
    label: str


p: Point2D = {"x": 1, "y": 2, "label": "A"}  # 実体はただの dict。mypy だけが型を見る
```

> 💡 **When to choose `TypedDict`**: when you want **only static type checking (mypy/Pyright)** on a dict payload from JSON. Without adding classes or changing runtime behavior, you get editor completion and static checking. Note that Pydantic can **make `TypedDict` a validation target**, so the combination "static `TypedDict`, validate at the boundary with Pydantic" is also possible (as touched in the [performance-optimization guide](/blog/pydantic-v2-performance-optimization-guide), Pydantic's official docs also acknowledge "`TypedDict` is about 2.5× faster than a nested model").

---

## **4. `pydantic.dataclasses`: add validation to dataclass syntax**

"I want just validation while keeping the dataclass writing feel" — the middle solution is `pydantic.dataclasses`. Replacing the standard `@dataclass` with the Pydantic version makes **validation and type coercion take effect.**

```python
from datetime import datetime
from typing import Optional
from pydantic.dataclasses import dataclass


@dataclass
class User:
    id: int
    name: str = "John Doe"
    signup_ts: Optional[datetime] = None


user = User(id="42", signup_ts="2032-06-21T12:00")
# id="42" → 42 に型強制、signup_ts は datetime にパースされる
```

The official docs explain *"if you don't want to use `BaseModel`, you get the same data validation with a standard dataclass"* while also stating *"a Pydantic dataclass is not a replacement for a Pydantic model."* Part of `model_config` and `BaseModel`'s method group (`model_dump`, etc.) can't be used as-is.

> 💡 **When to choose `pydantic.dataclasses`**: when you want to introduce validation into existing dataclass-based code with minimal change. Or "I like dataclass syntax but want external-input validation." If you build out an API boundary anew in earnest, the `BaseModel` from the next chapter on has more complete features.

---

## **5. `attrs`: a mature class builder (opt-in validation, separate serialization)**

`attrs` is a mature library that's arguably the origin of dataclass, controlling class generation powerfully and flexibly (`slots`, converters, validators, etc.). **Validation is opt-in** — only fields with an explicitly attached validator are validated.

```python
from attrs import define, field
import attrs


@define
class Color:
    value = field(validator=attrs.validators.instance_of(int))

    @value.validator
    def _fits_byte(self, attribute, value):
        if not 0 <= value < 256:
            raise ValueError("0〜255 の範囲で指定してください")
```

What's important is that **`attrs` is not a serialization library.** As the official docs state *"attrs is not a full-fledged serialization library … see the sister project cattrs,"* mutual conversion with JSON is handled by **cattrs** (separation of concerns). JSON Schema generation is also not standard.

> 💡 **When to choose `attrs`**: when you want fine-grained control of class generation (slot optimization, converters, complex `__init__` logic) and want to intentionally separate serialization (pair with cattrs). It has more freedom over "how to make the class" itself than Pydantic, while you have to assemble validation, schema, and ecosystem yourself.

---

## **6. `msgspec`: validation + serialization all-in on speed**

`msgspec` is a library **specialized in the speed of serialization and validation.** Define a `msgspec.Struct` and mutual conversion of JSON, MessagePack, YAML, TOML and **type validation at decode** are done extremely fast.

```python
import msgspec


class User(msgspec.Struct):
    name: str
    groups: set[str] = set()
    email: str | None = None


msgspec.json.encode(User("alice", groups={"admin"}))
# b'{"name":"alice","groups":["admin"],"email":null}'

msgspec.json.decode(b'{"name":"bob","groups":[123]}', type=User)
# msgspec.ValidationError: Expected `str`, got `int` - at `$.groups[0]`
```

msgspec also has JSON Schema generation (`msgspec.json.schema`). On performance, the official docs put up bold numbers **as its own benchmark** — *"the JSON/MessagePack implementation is among the fastest in Python," "msgspec decodes & validates faster than orjson merely decodes," "structs are 5–60× faster on common operations."*

> ⚠️ **The numbers are vendor claims**: these multipliers are claims by msgspec's own benchmark, not neutrally measured by a third party. Measure on your own workload before adopting. Meanwhile, Pydantic's official docs say at the top of the performance page *"in many cases Pydantic isn't a bottleneck."* **Confirming with measurement whether "speed is truly the bottleneck"** comes first (see the [Pydantic performance-optimization guide](/blog/pydantic-v2-performance-optimization-guide)).

> 💡 **When to choose `msgspec`**: when, in a high-throughput API, message queue, or huge-JSON ingestion, **serialization + validation is a measured bottleneck.** If you can introduce a speed-specialized third party, it's powerful.

---

## **7. `Pydantic`: a balance of validation, schema, and ecosystem**

Pydantic combines **runtime validation, JSON Schema generation, `pydantic-settings`, and a vast ecosystem** into one — it's "the boundary default," so to speak. The core `pydantic-core` is in Rust, and the official docs position it as *"one of the fastest data-validation libraries in Python."*

```python
from pydantic import BaseModel, Field


class User(BaseModel):
    id: int
    name: str = Field(min_length=1)
    email: str


User.model_validate({"id": "42", "name": "alice", "email": "a@example.com"})
# 検証＋型強制＋（必要なら）JSON Schema 生成＋直列化がワンストップ
```

Pydantic's true strength is in the **ecosystem** more than a single feature. FastAPI, SQLModel, Django Ninja, LangChain, and PydanticAI build on Pydantic, and about 8,000 packages on PyPI depend on Pydantic. "A validated model directly becomes the API schema, the settings, the LLM's structured output" — this end-to-end flow is value found nowhere else.

> 💡 **When to choose Pydantic**: the **boundary** handling external input (HTTP API, external API responses, settings, LLM output). When you want to handle validation, serialization, schema, and settings with consistent types. If you use FastAPI, it's the de facto standard. "If in doubt, Pydantic" holds because of this broad coverage.

---

## **8. Decision flowchart**

Finally, drop selection into one flow.

1. **Do you need to validate values at runtime?** (Does the data come from an untrusted external source?)
   - **No** → go to 2.
   - **Yes** → go to 3.
2. (No validation needed) **Want static typing while keeping it a dict?**
   - Yes → **`TypedDict`**
   - No (want a class) → **`dataclasses`** (if you need fine-grained control, `attrs`)
3. (Validation needed) **Is serialization speed a measured bottleneck?**
   - Yes → **`msgspec`**
   - No → go to 4.
4. **Want the ecosystem of JSON Schema, settings management, FastAPI, etc.?**
   - Yes → **Pydantic** (the default for a new API boundary)
   - Just want to keep dataclass syntax → **`pydantic.dataclasses`**
   - Want class-generation freedom + separated serialization → **`attrs` + cattrs**

| Situation | Recommendation |
| --- | --- |
| Build an API with FastAPI | **Pydantic** (de facto standard) |
| Settings / secret management | **pydantic-settings** ([guide](/blog/pydantic-settings-configuration-management-secrets-guide)) |
| LLM structured output | **Pydantic / PydanticAI** ([guide](/blog/pydantic-ai-agent-framework-production-guide)) |
| Internal DTO needing no validation | **dataclasses** |
| Static typing for dicts | **TypedDict** |
| High throughput where serialization is the bottleneck | **msgspec** |
| Fine-grained control of class generation | **attrs + cattrs** |

---

## **Conclusion: choose the tool to fit the problem**

The five choices aren't competitors but tools that **solve problems at different layers.** Restating this article's points.

1. The first question of selection is **"do you validate external input."** `dataclasses` and `TypedDict` **don't validate** (stated officially).
2. **`dataclasses`** is a standard struct, **`TypedDict`** is static typing for dicts — both for trusted internal data.
3. **`attrs`** is opt-in validation + separated serialization (cattrs); **`msgspec`** is speed-specialized (but the multipliers are vendor claims).
4. **Pydantic** is **the boundary default** with a balance of validation, schema, settings, and ecosystem.
5. If in doubt, the **flowchart** — narrow in the order: need for validation → speed bottleneck → need for ecosystem.

Rather than searching for "the strongest library," discerning **where the data in front of you comes from and what you want to guarantee** leads to a maintainable and cost-efficient design. With that understood, you can confidently explain both when you choose Pydantic and when you deliberately get by with a standard `dataclass`.

Each project's primary sources:

- [Pydantic](https://pydantic.dev/docs/validation/latest/) / [pydantic.dataclasses](https://pydantic.dev/docs/validation/latest/concepts/dataclasses/)
- [dataclasses (Python standard)](https://docs.python.org/3/library/dataclasses.html)
- [TypedDict (Python standard)](https://docs.python.org/3/library/typing.html#typing.TypedDict)
- [attrs](https://www.attrs.org/en/stable/) / [msgspec](https://jcristharif.com/msgspec/)

---

### **Consultation on technology selection and data-modeling design**

The author has led technology selection of "what tool to use where" in multiple production systems, including the B2B SaaS that won the METI Minister's Award. Data-modeling selection is a decision that weighs performance, maintainability, the team's learning cost, and the ecosystem. I support, leveraging generative AI fast and at high quality, **library selection neither excessive nor deficient** for the requirements and the type-safe architecture design based on it. Feel free to consult me about Python backend technology selection.
