Python Data Types Complete Guide: The 'Right Use' of Numbers, Strings, and Collections, and Designs That Don't Break in Production

Introduction: a data type is not a "list to memorize" but a "vocabulary of design"

Search "Python data types" and you usually arrive at a list table: int / float / str / list / dict … That's correct as a starting point. But what separates people on the front line is not knowing "what types exist" but whether you hold "when, why, and which type to choose" as a decision axis.

Why does using float for money calculations become a production accident? Why does an in test on a list not scale? Why does "a function given an empty list as the initial value" drag past data every time it's called? These are all bugs that can be prevented in advance if you understand the type's "internal behavior." Conversely, write dynamically-typed Python while leaving this vague, and it slips past tests and breaks only in production.

This article builds on the range covered by the world-read Real Python "Basic Data Types in Python"—numbers, strings, booleans, None, and collections—and digs end-to-end from there into the "implementation knowledge that pays off in production." Specifically,

CPython's object model (why confusing is and == hurts)
float's IEEE 754 trap and the right answer for handling money (Decimal / integer smallest unit)
Choosing collections based on complexity (Big-O)
Type hints that give static safety to a dynamic language, and "type design" via dataclass / Enum / TypedDict
Validation at the system boundary (Pydantic / marshmallow)

I have designed and implemented the backend of a Minister of Economy, Trade and Industry Award-winning B2B SaaS in Python / Flask / SQLAlchemy / PostgreSQL, and led the payment-reliability layer on a serverless payment platform that achieved zero double charges in production. The principles appearing throughout this article—"don't hold money in float," "make invalid states unrepresentable"—all come from that real combat. I wrote it to be read not as a mere grammar translation but as design decisions you can entrust with confidence.

💡 Target versions: this article assumes Python 3.12 / 3.13 (most of the description is valid on 3.10+). For version-dependent features, I note the version they were introduced in.

0. The starting point of everything: in Python, "everything is an object"

Start the data-type discussion from "the list of types" and you'll get stuck somewhere. The correct starting point is Python's object model. Grasp this in 5 minutes and all the later talk of mutability, copying, and is/== connects on a single line.

In Python, integers, strings, functions, and classes are all "objects." And every object has three attributes.

Identity: a unique ID in memory. Obtained with id(), unchanging while alive.
Type: what the object is. Obtained with type().
Value: the contents.

What's decisively important here is that a variable is not a box that holds a "value" but a "name tag (reference) attached to an object." x = 1 is not "put 1 into the box x" but the operation "attach the name tag x to the object 1."

a = [1, 2, 3]
b = a            # b は「a と同じリスト」に別の名札を貼っただけ（コピーではない）

b.append(4)
print(a)         # → [1, 2, 3, 4]   ← a も変わる！ 同じオブジェクトだから

print(a is b)    # → True           ← 同一オブジェクト（同じ id）
print(id(a) == id(b))  # → True

This model of "assignment is sharing a reference, not copying" is the single biggest cause of bugs around Python's mutable objects. Conversely, understand this and all the pitfalls described later become visible as "obvious consequences."

💡 For people who write JavaScript / Java: think of the feel as close to "everything, including primitives, is a reference type." But int and str are immutable, so even when shared, they can't be "rewritten and broken"—that's why they merely look safe.

1. Numeric types: `int` / `float` / `Decimal` / `Fraction` / `complex`

1-1. `int`: arbitrary-precision integers that don't overflow

Many languages' integers are fixed-width (32-bit / 64-bit) and overflow when they exceed the limit. Python's int is different. It's an arbitrary-precision integer that grows as large as memory allows.

2 ** 100        # → 1267650600228229401496703205376  桁あふれしない
factorial = 1
for i in range(1, 101):
    factorial *= i
len(str(factorial))   # → 158  （100! は158桁）

This is a big advantage: "C-like int-overflow-derived security vulnerabilities can't occur in principle in Python." The cost is speed and memory, but it's not a problem in normal app development.

Literals can change base, and _ can be used as a digit separator (Python 3.6+, PEP 515). Use it actively for readability.

0b1010        # 2進数 → 10
0o17          # 8進数 → 15
0xFF          # 16進数 → 255
1_000_000     # → 1000000   （アンダースコアは無視される。可読性のため）

int also has bit operations (& | ^ ~ << >>) and bit-count retrieval ((255).bit_length() → 8, (7).bit_count() → 3), holding up for flag management and low-level processing.

1-2. `float`: fast but "not exact"

float is a 64-bit IEEE 754 double-precision floating-point number. The CPU handles it directly so it's fast, but it can't exactly represent decimal fractions. This isn't a Python bug but the mathematical fact that 0.1 can't be represented in finite digits in binary.

0.1 + 0.2          # → 0.30000000000000004   （0.3 ではない！）
0.1 + 0.2 == 0.3   # → False

This behavior that always surprises beginners becomes a fatal bug as-is in domains where exactness is a requirement, like finance, billing, and inventory. When comparing floats, compare with a tolerance, not strict equality (Python 3.5+, PEP 485).

import math
math.isclose(0.1 + 0.2, 0.3)   # → True   （相対・絶対許容差で比較）

And another trap is round(). Python's round() is not the "round half up" you learned in school but banker's rounding (round half to even). It's the correct behavior for suppressing statistical bias, but without knowing it you get confused: "why doesn't it round up?"

round(0.5)    # → 0   （0.5 は偶数の 0 へ）
round(1.5)    # → 2
round(2.5)    # → 2   （2.5 は偶数の 2 へ。3 ではない）

Remember the special values too. There are float('inf') (infinity) and float('nan') (not a number), and nan isn't even equal to itself (nan == nan is False)—a property that tends to break conditional branches. Always use math.isnan() to test for nan.

1-3. [Most important] Don't hold money in `float` — `Decimal` is the right answer

This is the most practically relevant section in this article. The moment you handle amounts, currency, tax rates, or billing in float, your system is fated to "drift."

# アンチパターン：float で金額を足し込む
total = 0.0
for _ in range(10):
    total += 0.1
print(total)        # → 0.9999999999999999   （1.0 にならない）

A 1-yen drift, piled up over 100,000 payments, stops adding up in accounting and becomes a breeding ground for double charges and missed refunds. The right answer is decimal.Decimal. It exactly holds decimal numbers and lets you explicitly control the rounding mode.

from decimal import Decimal, ROUND_HALF_UP

# ① 必ず「文字列」から生成する（float から作ると誤差を引き継ぐ）
price = Decimal("19.99")
tax_rate = Decimal("0.10")

tax = (price * tax_rate).quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
total = price + tax
print(total)        # → 20.99   （正確）

# ② float から作ると誤差が入る — 絶対に避ける
Decimal(0.1)        # → Decimal('0.1000000000000000055511151231257827021181583404541015625')
Decimal("0.1")      # → Decimal('0.1')   ← 文字列から作るのが鉄則

In practice, there are two more choices at the design level.

Method	Internal representation	Suited case	Caveat
`Decimal`	Decimal fixed-point	Forex, tax calculation, complex rounding	Be strict with type conversion on DB round-trips
Integer smallest unit	Yen, sen, cent as integers	High-frequency, high-performance payment processing	Unify the convention of dividing by 100 at display time across the team

For example, the design of "always hold amounts as int of cents (the smallest currency unit), and convert to currency only at the moment of display" is the standard adopted by many payment systems including Stripe. On the platform where I led the payment-reliability layer, I guarded balance updates with atomic transactions and idempotency keys, and completely eliminated float from amount representation, achieving zero double charges in production. "Don't hold money in float" is not a slogan but a rule written in the blood of the front line.

💡 Fraction as a choice: if you want to exactly hold a rational number as "numerator/denominator," fractions.Fraction("1/3") works. Fraction(1, 3) * 3 == 1 (zero error). It shines in cumulative calculations of probabilities and ratios.

1-4. `complex`: complex numbers for scientific computing

Python has complex numbers built into the language. The imaginary unit is j. Used in signal processing, electrical circuits, Fourier transforms, etc.

z = 3 + 4j
z.real          # → 3.0
z.imag          # → 4.0
abs(z)          # → 5.0   （複素数の絶対値＝大きさ）

1-5. `bool`: actually a subclass of `int` (a hidden trap)

True / False are booleans, but in Python bool is a subclass of int, with True == 1 and False == 0. This is convenient but breeds silent bugs.

True + True         # → 2          （bool は int なので算術できる）
sum([True, False, True])  # → 2    （イテラブル中の True を数えるイディオム）

isinstance(True, int)     # → True  ← ここが落とし穴

The last line is the problem. Even if you intend isinstance(x, int) to "let through only integers," True / False slip through. When writing integer validation at a boundary, judge strictly with type(x) is int, or leave it to a validator like Pydantic that separates bool from int.

"Truthiness" is an important concept too. The condition of an if is evaluated even if it isn't bool, and empty collections, 0, "", and None are treated as falsy.

items = []
if not items:           # 空リストは偽 → Pythonic
    print("空です")

# アンチパターン: if len(items) == 0:  ← 冗長
# アンチパターン: if items == []:      ← 型に依存して脆い

⚠️ The truthiness pitfall: judging "was a value passed" with if value: mis-judges even legitimate values like 0, "", or an empty list as "unspecified." When you want to distinguish "unspecified," use the if value is None: described later.

2. Strings `str`: an immutable sequence of Unicode code points

2-1. The essence of str: immutable, Unicode, a sequence

str is an immutable sequence of Unicode code points. Three keywords explain everything.

Immutable: a string once made can't be changed. s[0] = "X" is an error. "Changing" is always creating a new string.
Unicode: len("こんにちは") is 5 (the character count, not the byte count). Be careful with emoji and combining characters, but basically per code point.
Sequence: indexable, sliceable, iterable.

s = "Python"
s[0]            # → 'P'
s[-1]           # → 'n'
s[1:4]          # → 'yth'   （スライス。元を壊さず新しい str を返す）
s[::-1]         # → 'nohtyP' （逆順のイディオム）
len(s)          # → 6

String literals are varied. What to grasp in practice is the following.

name = "友田"
# f-string（3.6+）：最も推奨される文字列整形
greeting = f"こんにちは、{name}さん"
# f-string の = デバッグ（3.8+）：変数名と値を同時に出す
value = 42
print(f"{value=}")          # → value=42

raw = r"C:\Users\path"      # raw 文字列：\ をエスケープとして扱わない（正規表現・パスで必須）
multi = """複数行を
そのまま書ける"""             # 三連クォート

2-2. Why you must not "concatenate in a loop with +="

Because strings are immutable, += in a loop rebuilds a new string each time, becoming O(n²) at worst. The right answer is str.join().

# アンチパターン：O(n²) になりうる
result = ""
for word in words:
    result += word          # 毎回新オブジェクト生成

# 正解：O(n)。可読性も高い
result = "".join(words)

💡 CPython has an implementation that optimizes +=, but it's implementation-dependent and not guaranteed. join is fast as a spec and clear in intent. "Choosing the right idiom" is a matter not only of performance but of readability and portability.

Organize the main methods into "search, transform, split, join, judge" and they're easy to memorize and apply. strip() / lower() / upper() / replace() / split() / startswith() / endswith() / find() / format(). For case-insensitive comparison, the right answer is casefold() (a stricter Unicode folding), not lower().

2-3. `str` and `bytes`: the boundary of text and binary

This is a wall you always hit when handling networks, files, or crypto.

str: text humans read (Unicode code points).
bytes: a raw byte sequence machines handle (an immutable sequence of 0–255). Written with b"...".

Conversion between the two goes through an explicit encoding. str → bytes is encode(), bytes → str is decode(). 90% of mojibake is caused by leaving the encoding implicit here. Always specify utf-8.

text = "日本語"
data = text.encode("utf-8")   # → b'\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'（9バイト）
data.decode("utf-8")          # → '日本語'

len(text)                     # → 3   （文字数）
len(data)                     # → 9   （バイト数。UTF-8 で日本語は1文字3バイト）

If you need a mutable byte sequence, use bytearray; to peek at a byte sequence without a memory copy, memoryview. In a backend handling large binaries, this distinction affects memory efficiency.

3. `None`: the sole existence representing "no value"

None is a special object representing "no value," "unset," "not applicable," with type NoneType, and only one of it exists in the entire program (a singleton). That's exactly why the iron rule for comparing None is to use is, not ==.

result = None

if result is None:        # 正解：アイデンティティで比較
    ...

# アンチパターン：== は __eq__ をオーバーライドした型で誤動作しうる
# if result == None:

Functions returning None (no matching record in the DB, etc.) are frequent. In type hints, express "an X, or None if absent" with X | None (Python 3.10+; before that Optional[X]). This is a contract that "forces a None check on the caller," and a weapon to crush the most frequent error NoneType has no attribute ... with static analysis.

def find_user(user_id: int) -> "User | None":
    ...

user = find_user(1)
user.name             # mypy / pyright が「None かもしれない」と警告 → 事故を防ぐ
if user is not None:
    user.name         # ここでは安全

💡 The sentinel pattern (advanced): in an API where None itself can be a legitimate value (e.g., "the key exists but the value is None"), you can't use None as the mark of "unspecified." In that case, make a dedicated sentinel object like _MISSING = object() and distinguish with if value is _MISSING:. The internals of dict.get(key, default) use this idea too.

4. Collections: choose by mutability, order, hashability, and complexity

This is the domain where design skill shows most. Don't choose list / tuple / dict / set "by feel"—judge by four axes.

Mutable or immutable: do you change the contents after making it.
Does it preserve order: does the ordering have meaning.
Hashable: can it be a dict key or a set element (= the condition is being immutable).
Complexity (Big-O): does that operation scale.

4-1. `list`: a mutable dynamic array

Ordered, mutable, allows duplicates—the most general-purpose collection. Internally a dynamic array, so appending to the tail is fast (amortized O(1)), but inserting/removing at the head is slow (O(n)).

nums = [3, 1, 4, 1, 5]
nums.append(9)          # 末尾追加：償却 O(1)
nums.insert(0, 2)       # 先頭挿入：O(n)（全要素をずらす）
nums.sort()             # その場ソート：O(n log n)
9 in nums               # メンバーシップ判定：O(n) ← 大きいと遅い

squares = [x * x for x in range(5)]   # リスト内包表記：速くて読みやすい

💡 If head operations are frequent, collections.deque: a double-ended queue. appendleft / popleft are O(1). Code using list.pop(0) for a FIFO queue or sliding window gets dramatically faster just by switching to deque. The Real Python intro article doesn't touch it, but it's a frequent optimization on the front line.

4-2. `tuple`: immutable and lightweight, so it can be a "key"

A tuple is like an immutable list, but its use is clearly different. Use it to represent "a set of data that doesn't change / mustn't be changed." Being immutable, it's hashable and can be a dict key or a set element.

point = (35.6895, 139.6917)     # 緯度・経度：意味のある固定の組
# point[0] = 0  ← TypeError（不変なので安全）

# 複数戻り値は実はタプル
def divmod_(a, b):
    return a // b, a % b        # (商, 余り) というタプル
q, r = divmod_(17, 5)           # アンパック代入

cache = {(35.68, 139.69): "Tokyo"}   # タプルを dict のキーに（list ではできない）

When you want to "declare read-only via the type" or "have meaning as a set like coordinates or a key," choose tuple over list—this alone makes the intent clear and prevents mistaken changes before compile/run.

4-3. `dict`: a key-value mapping (preserves insertion order)

dict is an average-O(1) mapping from key to value. Since Python 3.7, insertion-order preservation is guaranteed as a language spec (in 3.6 it was a CPython implementation detail). Keys must be hashable (= immutable types).

user = {"id": 1, "name": "友田", "role": "engineer"}

user.get("email")               # キーがなければ None（KeyError を出さない）
user.get("email", "未設定")      # デフォルト付き取得
user.setdefault("tags", []).append("python")  # なければ作って操作

# 内包表記とマージ（3.9+ の | 演算子）
squared = {k: v * v for k, v in {"a": 2, "b": 3}.items()}
merged = {"a": 1} | {"b": 2}    # → {'a': 1, 'b': 2}

.get() / .setdefault() to avoid KeyError, the aggregation standard collections.Counter, and collections.defaultdict that auto-generates on a missing key make front-line dict operations one notch cleaner.

from collections import Counter, defaultdict

Counter("mississippi")          # → Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})

groups = defaultdict(list)
for word in ["apple", "avocado", "banana"]:
    groups[word[0]].append(word)   # 'a'/'b' キーを自動生成

4-4. `set` / `frozenset`: deduplication and fast membership

A set is an unordered, non-duplicating collection whose elements are hashable. Its biggest value is that membership tests are O(1) and set operations can be written at the language level.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a & b           # 積集合（共通）→ {3, 4}
a | b           # 和集合 → {1, 2, 3, 4, 5, 6}
a - b           # 差集合 → {1, 2}
a ^ b           # 対称差 → {1, 2, 5, 6}

# 重複排除のイディオム
unique = list(set([1, 1, 2, 2, 3]))   # → [1, 2, 3]（順序は保証されない点に注意）

3 in a          # メンバーシップ：O(1) ← list の O(n) と決定的に違う

Code that "repeats x in some_list on a large amount of data" turns O(n²) into O(n) just by changing some_list to a set. This is one of the most cost-effective optimizations on the front line. frozenset is the immutable version and can be a dict key or a set element.

4-5. Complexity quick reference (this is the core of "which to use")

The average complexity of major operations. Type selection ultimately consolidates into this table.

Operation	list	deque	dict	set
Append to tail	Amortized O(1)	O(1)	—	—
Append to head	O(n)	O(1)	—	—
Index access	O(1)	O(n)	—	—
Key / element search (in)	O(n)	O(n)	O(1)	O(1)
Get value by key	—	—	O(1)	—
Remove element (arbitrary)	O(n)	O(n)	O(1)	O(1)

The decision guideline is simple. "For ordered iteration, list; for both-end operations, deque; for lookup by key, dict; for existence tests and deduplication, set; for an unchanging set, tuple."

5. The top 3 production bugs caused by mutability (this is where you differentiate)

This is the section converting data-type knowledge into accident prevention. The bulk of dynamically-typed Python bugs consolidate into these three.

5-1. Mutable default arguments (Python's worst trap)

A function's default argument is evaluated only once at function-definition time and shared across calls. Make a mutable object the default, and the result of the previous call leaks into the next.

# アンチパターン：空リストをデフォルトに
def add_item(item, basket=[]):
    basket.append(item)
    return basket

add_item("apple")     # → ['apple']
add_item("banana")    # → ['apple', 'banana']  ← 前回の 'apple' が残る！

The right answer is the None sentinel. Make "create a new list per call" explicit.

def add_item(item, basket=None):
    if basket is None:
        basket = []
    basket.append(item)
    return basket

This trap can't be prevented by type hints either—only review stops it. That's exactly why having "mutable default → immediately None sentinel" writable by reflex is a professional minimum.

5-2. Shared references and "shallow copies"

Where Section 0's "assignment isn't a copy" bares its fangs is copying. copy() and slicing are shallow copies—they duplicate the first level, but leave nested elements shared.

import copy

original = [[1, 2], [3, 4]]
shallow = original[:]              # 浅いコピー
shallow[0].append(99)
print(original)                   # → [[1, 2, 99], [3, 4]]  ← 内側が共有されている！

deep = copy.deepcopy(original)    # 深いコピー：再帰的に複製。完全に独立

In scenes like "duplicate a config dict and change only part of it" or "reuse a test fixture," this silently contaminates data. If there's nesting, deepcopy, or design with immutable types (tuple / frozenset) in the first place is safe.

5-3. Trying to make an unhashable type a key

A dict key or a set element must be hashable. list / dict / set are mutable so unhashable, and int / str / tuple (if all contents are hashable) are hashable.

{[1, 2]: "x"}              # TypeError: unhashable type: 'list'
{(1, 2): "x"}              # OK：tuple はハッシュ可能
{(1, [2]): "x"}            # TypeError：中に list を含む tuple は不可

Turning this property around, "declaring a 'value that should be immutable' via the type with hashability" is an advanced design. Hold coordinates and keys as tuple, and a set of constants as frozenset—the type itself becomes documentation that "this must not be changed."

Bonus: `is` and `==`, and small-integer caching

== compares value (calls __eq__).
is compares identity (whether it's the same object).

Use is only for None / True / False / sentinels, and never for value comparison. The reason is CPython's "small-integer caching." Because CPython reuses ints from -5 to 256, an implementation-dependent trap like the following arises.

a = 256; b = 256
a is b          # → True   （キャッシュされた同一オブジェクト）

a = 257; b = 257
a is b          # → False  （別オブジェクト。環境により変わる）
257 == 257      # → True   ← 値の比較は常に正しい。これを使う

Code relying on a is b being True breaks the moment the number exceeds 256. Mechanically keep "equality is ==, identity is is."

6. Checking the type: `isinstance()` rather than `type()`, and duck typing

There are two ways to check the type at runtime.

type(42) is int            # 厳密に int か（サブクラスは弾く）
isinstance(42, int)        # int か、その「サブクラス」か
isinstance(x, (int, float))  # 複数候補のいずれか

As a principle, use isinstance(). It respects inheritance and abstract base classes, so it's more flexible and correct. But, as seen in Section 1-5, use type(x) is int only when you need strict judgment like "reject bool and let through only int."

Even more Pythonic is duck typing—the idea of judging "not what type it is, but whether it has the needed behavior." Using the abstract base classes of collections.abc, you can judge "is it iterable" or "is it a mapping" without binding to a concrete type.

from collections.abc import Iterable, Mapping

def total(values):
    if not isinstance(values, Iterable):   # list でも set でも generator でも OK
        raise TypeError("反復可能オブジェクトが必要です")
    return sum(values)

This is "depend on the abstraction (protocol), not the concrete type" itself—an extension-friendly design (ETC in CLAUDE.md's terms).

7. Giving "static safety" to a dynamic language: type hints

Python is dynamically typed, but with type hints (PEP 484) you can annotate types, and static analyzers like mypy / pyright detect bugs before runtime. Type hints aren't enforced at runtime, but in modern production Python they're effectively mandatory.

def greet(name: str, times: int = 1) -> str:
    return f"Hello, {name}! " * times

# 3.9+ では組み込み型がそのままジェネリックに（PEP 585）
def first(items: list[int]) -> int | None:
    return items[0] if items else None

from typing import Final, Literal
MAX_RETRIES: Final = 3                          # 再代入を静的に禁止
def set_mode(mode: Literal["r", "w", "a"]) -> None: ...   # 取りうる値を型で限定

On my front lines, I'm thorough with the discipline of "ban any-equivalents (giving up on types), fix types at the boundary, and make type checking mandatory in CI." Even in a dynamic language, this gets you close to a "make invalid states unrepresentable" design. The practice of pushing the same philosophy to its limit in TypeScript is consolidated in The discipline of TypeScript type safety (Zod, NeverError, no-any). The languages differ, but the principle "validate at the boundary, defend the inside with types" is completely common.

8. Beyond standard types, "design your own type"

The difference between world-class code and ordinary code shows in whether you "use built-in types as-is" or "design a type suited to the domain." Stop expressing everything with dict, and let the intent speak through the type.

8-1. `dataclass`: the first choice for structured data

@dataclass (3.7+) auto-generates __init__ / __repr__ / __eq__, eliminating boilerplate. frozen=True makes it immutable, and slots=True (3.10+) improves memory efficiency and speed.

from dataclasses import dataclass, field

@dataclass(frozen=True, slots=True)
class Money:
    amount: int          # 最小通貨単位（cent）で持つ
    currency: str = "JPY"

@dataclass
class Order:
    id: int
    items: list[str] = field(default_factory=list)   # ← 可変デフォルトの正しい書き方

m = Money(1999)          # 不変なのでハッシュ可能・安全に共有できる

Note field(default_factory=list). This is the official practice by which dataclass correctly solves Section 5-1's mutable-default problem.

8-2. `Enum`: stop "string constants"

Hold states or kinds as raw strings, and a typo like "acitve" doesn't surface until runtime. With Enum (StrEnum is 3.11+), make the possible values a closed set and you can exclude invalid values via the type.

from enum import StrEnum

class OrderStatus(StrEnum):
    PENDING = "pending"
    PAID = "paid"
    SHIPPED = "shipped"

status = OrderStatus.PAID
status == "paid"         # → True（StrEnum は str でもある）
# OrderStatus("unknown") → ValueError（不正値を即座に弾く）

8-3. `NamedTuple` / `TypedDict`: lightweight typing

NamedTuple: immutable, tuple-compatible, when you want to name the fields.
TypedDict: when you want to declare "the shape of a dict" via the type (ideal for typing API responses).

from typing import NamedTuple, TypedDict

class Point(NamedTuple):
    x: float
    y: float

class UserDict(TypedDict):
    id: int
    name: str
    email: str | None

Just replacing "code that passes dict around" with dataclass / TypedDict, IDE completion works, typos disappear, and refactoring becomes safe. This is a direct investment in maintainability.

9. At the system boundary, "validate" the type: Pydantic / marshmallow

Finally, connect the knowledge so far to production architecture. Type hints are "a shield protecting internal code," but they aren't enforced at runtime. So data coming from outside the system boundary—HTTP requests, external-API responses, environment variables, message queues—must always be validated at runtime. "Don't trust data coming from outside"—this is the first principle of a secure backend.

What stands at this boundary is Pydantic v2 and marshmallow.

from pydantic import BaseModel, EmailStr, Field

class CreateUser(BaseModel):
    name: str = Field(min_length=1, max_length=50)
    email: EmailStr
    age: int = Field(ge=0, le=150)

# 不正な dict（外部入力）を渡すと ValidationError で堰き止める
user = CreateUser.model_validate({"name": "友田", "email": "a@example.com", "age": 30})

By now you've understood it. Pydantic / marshmallow are devices that turn the "types" learned in this article into runtime contracts. The range of an int, the length of a str, None tolerance (Optional), nested structure—declaratively validate all of them, and let through only trustworthy data to the inside. Keeping a dynamic language's flexibility while acquiring static-language-grade robustness at the boundary is the endpoint of modern Python backend design.

Type-first boundary validation → Pydantic v2 practical guide
ORM / framework-independent serialization/validation → marshmallow practical guide
Input validation at the web-framework layer → FastAPI production-operations guide and FastAPI request validation
Type safety at the persistence layer → SQLAlchemy 2.0 practical guide
Designing money and idempotency → Idempotency design to prevent double charges in payments

Summary: design data types as "constraints"

Python's data types are not a list to memorize but "a vocabulary for expressing correctness, speed, and safety through the structure of code." Take this article's points home as decision axes.

Everything is an object. A variable is a reference. So mutability, is/==, and copying behavior all connect on a single line.
Don't hold money in float. Decimal (created from a string) or an integer smallest unit. This is the front-line iron rule.
Choose collections by complexity. For existence tests and deduplication, set / dict (O(1)), not list.
Prevent mutability bugs by reflex. Mutable defaults get the None sentinel, nesting gets deepcopy or immutable types.
Design types. Make invalid states unrepresentable with dataclass / Enum / TypedDict, and validate the boundary with Pydantic / marshmallow.

It's not "dynamic typing, so it's OK to be sloppy." Precisely because it's dynamic typing, a deep understanding of and discipline toward types separate production quality. I have practiced this principle on Python / Flask / SQLAlchemy backends and on a payment foundation that achieved zero double charges in production. The mindset of "designing data types as constraints" is what eliminates test-evading bugs in advance and builds a change-resistant codebase.

Frequently Asked Questions (FAQ)

Q. How many Python data types do I need to memorize in the end?

What's frequent in practice is about 10: numbers (int / float / Decimal), strings (str / bytes), boolean (bool), None, and collections (list / tuple / dict / set). First grasp their "mutability, order, hashability, complexity," and you can handle 90% of situations.

Q. How do I use `list` and `tuple` appropriately?

If you'll change it, list; for a set you don't (mustn't) change, tuple. A tuple is immutable so hashable, and can be a dict key or a set element. Coordinates, multiple return values, and fixed records are natural as tuple; collections you add to/remove from are natural as list.

Q. Why doesn't `0.1 + 0.2` become `0.3`? Is it a bug?

It's not a bug. float is binary floating-point (IEEE 754) and can't exactly represent 0.1 in finite digits. Compare with math.isclose(), and use Decimal or an integer smallest unit for money calculations.

Q. Should I use `is` or `==`?

Whether values are equal is ==, whether it's the same object is is. Use is only for judging None / True / False / sentinels, not for comparing numbers or strings (relying on CPython's small-integer / string caching breaks).

Q. Do type hints take effect at runtime? Is there a point in writing them?

They aren't enforced at runtime (they're ignored), but mypy / pyright detect bugs before runtime, so they're effectively mandatory in production code. If you want to validate external input at runtime, combine with Pydantic or marshmallow.

Introduction: a data type is not a "list to memorize" but a "vocabulary of design"

0. The starting point of everything: in Python, "everything is an object"

1. Numeric types: int / float / Decimal / Fraction / complex

1-1. int: arbitrary-precision integers that don't overflow

1-2. float: fast but "not exact"

1-3. [Most important] Don't hold money in float — Decimal is the right answer

1-4. complex: complex numbers for scientific computing

1-5. bool: actually a subclass of int (a hidden trap)

2. Strings str: an immutable sequence of Unicode code points

2-1. The essence of str: immutable, Unicode, a sequence

2-2. Why you must not "concatenate in a loop with +="

2-3. str and bytes: the boundary of text and binary

3. None: the sole existence representing "no value"

4. Collections: choose by mutability, order, hashability, and complexity

4-1. list: a mutable dynamic array

4-2. tuple: immutable and lightweight, so it can be a "key"

4-3. dict: a key-value mapping (preserves insertion order)

4-4. set / frozenset: deduplication and fast membership

4-5. Complexity quick reference (this is the core of "which to use")

5. The top 3 production bugs caused by mutability (this is where you differentiate)

5-1. Mutable default arguments (Python's worst trap)

5-2. Shared references and "shallow copies"

5-3. Trying to make an unhashable type a key

Bonus: is and ==, and small-integer caching

6. Checking the type: isinstance() rather than type(), and duck typing

7. Giving "static safety" to a dynamic language: type hints

8. Beyond standard types, "design your own type"

8-1. dataclass: the first choice for structured data

8-2. Enum: stop "string constants"

8-3. NamedTuple / TypedDict: lightweight typing

9. At the system boundary, "validate" the type: Pydantic / marshmallow

Summary: design data types as "constraints"

Frequently Asked Questions (FAQ)

Q. How many Python data types do I need to memorize in the end?

Q. How do I use list and tuple appropriately?

Q. Why doesn't 0.1 + 0.2 become 0.3? Is it a bug?

Q. Should I use is or ==?