Skip to main content
友田 陽大
marshmallow
Python
marshmallow
シリアライズ
バリデーション
型安全
Flask
SQLAlchemy
アーキテクチャ設計

marshmallow Practical Guide: Robustly Designing Python Object Serialization / Validation at the Boundary (v4-Compatible)

Faithfully to the marshmallow official documentation (v4.3), explains from a practical standpoint: the bidirectional serialization of Schema/fields, boundary validation with load(), @validates/@validates_schema, Nested, the safe design of load_only/dump_only, marshmallow-sqlalchemy integration, the 3→4 migration, and how to choose between it and Pydantic.

Published
Reading time
18 min read
Author
友田 陽大
Share
Contents

Introduction: why dare to use marshmallow now

The design philosophy of a robust backend can be condensed into one sentence — "Never trust data coming from outside the system boundary." An HTTP request body, an external API's response, form input, a message-queue payload. These are all "data with no type guarantee, unvalidated," and the moment you pass them straight through to the inside of the application, they transform into KeyError, AttributeError, and in the worst case mass assignment (illegal privilege escalation) or data leakage.

marshmallow is the "bidirectional gatekeeper" standing at this boundary. If Pydantic is "a type-first model," marshmallow has matured since 2013 as a dedicated serialization / deserialization library independent of any ORM / framework. In the Flask + SQLAlchemy stack, it's still the de facto standard. Its essence condenses into just 2 directions.

  • load() (deserialization): validate untrustworthy external input (dict / JSON), normalize it, and convert it to a form usable internally. If validation fails, dam it with a ValidationError.
  • dump() (serialization): shape an internal Python object (an ORM model, etc.) into a safe outward representation (dict / JSON).

The author has designed and implemented the backend of a B2B SaaS that won the Minister of Economy, Trade and Industry Award in Python / Flask / SQLAlchemy / PostgreSQL, and operated it in production with the strict layer separation Router → UseCase → Repository → Model. What handled that project's boundary validation was exactly marshmallow. This article organizes that real-combat knowledge, faithfully to the latest marshmallow official documentation (marshmallow.readthedocs.io) yet one level more understandable.

💡 The version covered in this article: it assumes marshmallow 4.3.0 (the stable version as of April 2026). Much of the code in web articles and that generative AI outputs is still written in the 3.x-family legacy style (missing= / default= / Schema.context). This article is written in the v4 canonical style, and a 3 → 4 migration cheat sheet is prepared at the end of the article.

💡 This article is part of a series on Python backend design. The type-first paired option is the Pydantic v2 practical guide, the web-framework layer is the FastAPI production operation guide, and the persistence layer is the SQLAlchemy 2.0 practical guide; reading them together gives a view of a consistent design from the boundary to the DB.


1. Schema and fields: declaratively define bidirectional conversion

marshmallow's starting point is inheriting the Schema class. Just line up fields descriptors as class attributes, and that becomes the single source of truth for validation, serialization, and deserialization.

from datetime import datetime
from marshmallow import Schema, fields


class UserSchema(Schema):
    name = fields.Str()
    email = fields.Email()
    created_at = fields.DateTime()


user = {"name": "友田", "email": "tomoda@example.com", "created_at": datetime.now()}
schema = UserSchema()

schema.dump(user)   # → dict:  {'name': '友田', 'email': 'tomoda@example.com', 'created_at': '2026-06-26T...'}
schema.dumps(user)  # → JSON 文字列:  '{"name": "友田", "email": "tomoda@example.com", ...}'

dump() returns a Python dict, and dumps() a JSON string (the trailing s is the s of string). A JSON-incompatible type like datetime is auto-converted to an ISO 8601 string by fields.DateTime() — this is dump's job of "internal type → outward representation."

Note that instead of a class definition, you can also generate a schema at runtime with Schema.from_dict(). It's useful when you want to assemble a schema dynamically from configuration values.

UserSchema = Schema.from_dict(
    {"name": fields.Str(required=True), "email": fields.Email(required=True)}
)

Handle collections with many=True

For bulk conversion of multiple objects, specify many=True. You can specify it at schema generation or at the dump()/load() call.

UserSchema(many=True).dump(users)      # ① スキーマ単位
UserSchema().dump(users, many=True)    # ② 呼び出し単位

Why is this superior? The knowledge of "which items, in what type, with what conversion, to input/output" coheres in one place called the Schema. Because the response-shaping logic isn't scattered across view functions and the service layer, even if the output spec changes, the fix is localized to the schema definition. This is the practice of DRY and ETC (Easy To Change) as in CLAUDE.md.


2. load(): validate and dam the boundary's invalid data

The inverse of dump() is load(). Validate untrustworthy external input and return a normalized dict — on failure, send a ValidationError. This is marshmallow's most important feature.

from marshmallow import ValidationError

try:
    data = UserSchema().load({"name": "友田", "email": "not-an-email"})
except ValidationError as err:
    print(err.messages)    # {'email': ['Not a valid email address.']}
    print(err.valid_data)  # {'name': '友田'}  ← 検証を通った分だけ取り出せる

A ValidationError holds 2 pieces of information.

  • err.messages: a dict of error messages keyed by field name. You can make it the body of a 422 response as-is.
  • err.valid_data: a dict containing only the fields that passed validation. Usable for partial processing.

With many=True, errors return keyed by the element's index, so "the which-th element of the array, which field, and why it failed" is clear at a glance.

# {1: {'email': ['Not a valid email address.']},
#  3: {'name': ['Missing data for required field.']}}

required / allow_none / load_default / dump_default

Declare an input's required/optional/default with the field's arguments. In v4, missing=/default= were removed and unified into load_default (the default at input) / dump_default (the default at output). It's an excellent change where "the meaning differs between input and output" is clear from the name.

import uuid
from datetime import datetime
from marshmallow import Schema, fields


class AccountSchema(Schema):
    # required=True:欠落したら "Missing data for required field." で落とす
    email = fields.Email(required=True, error_messages={"required": "メールアドレスは必須です。"})

    # 明示的に None を許容する(デフォルトでは None は不許可)
    nickname = fields.Str(allow_none=True)

    # load 時に値が無ければこのデフォルトを補う(呼び出し可能オブジェクトも可)
    id = fields.UUID(load_default=uuid.uuid4)

    # dump 時に値が無ければこのデフォルトで出力する
    role = fields.Str(dump_default="member")

data_key: separate external naming from internal naming

External API is camelCase, internal code is snake_case — a common collision. data_key confines this translation to the boundary.

class UserSchema(Schema):
    user_name = fields.Str(data_key="userName")
    email = fields.Email(data_key="emailAddress")

# load:  {"userName": "友田", "emailAddress": "a@b.com"} → {'user_name': '友田', 'email': 'a@b.com'}
# dump:  内部の snake_case → 外向きの camelCase へ戻る

partial: partial validation for PATCH

In update-system APIs, there are scenes of "wanting to validate only the sent fields." With partial=True you can skip all required, and with partial=("name",) the required of a specific field.

UserSchema().load({"email": "a@b.com"}, partial=("name",))  # name の必須を免除

Why is this superior? The discipline of "becoming an internal type only after validation" appears in the code in the form of the function call load(). Data that passed the boundary is guaranteed in code to be "validated," and all downstream processing can trust that premise. It prevents hand-written validation if statements from mixing into the business logic, like if not data.get("email"): ..., protecting SRP (single responsibility).


3. The Schema as a security boundary: 3 safety devices

marshmallow's true value is the point that you can control "what to let inside and what not to let outside" with the schema's structure. This is preventing the typical vulnerabilities OWASP warns about with the structure of code rather than operational vigilance.

dump_only: a field you "don't let the client write"

Receive a value the server should decide like id / created_at / role defenselessly from request.json, and it becomes a mass-assignment vulnerability where an attacker sends in {"role": "admin"}. A field with dump_only=True becomes output-only and is completely ignored in load().

class UserSchema(Schema):
    id = fields.Int(dump_only=True)              # 出力のみ。load では絶対に書き込めない
    role = fields.Str(dump_only=True)            # 権限はサーバーが決定する
    created_at = fields.DateTime(dump_only=True)
    email = fields.Email(required=True)          # これは load で受け付ける

load_only: a field you "absolutely don't put in the response"

Accidentally including a password or token in a response is a typical information-leakage accident. A field with load_only=True becomes input-only and is never included in dump()'s output.

class SignupSchema(Schema):
    email = fields.Email(required=True)
    password = fields.Str(load_only=True, required=True, validate=validate.Length(min=12))
    # password は load では受け取るが、dump では出力されない → 漏洩を構造的に防ぐ

unknown=RAISE: reject unknown keys (the default)

marshmallow by default rejects unknown fields with a ValidationError (unknown=RAISE). This is a safe-side-leaning design, the polar opposite of a defenseless expansion like Model(**request.json). The behavior can be switched explicitly.

from marshmallow import Schema, fields, EXCLUDE, INCLUDE, RAISE


class StrictSchema(Schema):
    class Meta:
        unknown = RAISE   # デフォルト:未知のキーはエラー(最も安全)
        # EXCLUDE → 未知のキーを黙って捨てる / INCLUDE → そのまま通す

    name = fields.Str()


# 呼び出し単位での上書きも可能
StrictSchema().load(payload, unknown=EXCLUDE)
OptionTreatment of unknown keysMain use
RAISE (default)Sends a ValidationErrorStrict input validation (first choice)
EXCLUDESilently discardsWhen you want to ignore an external API's extra keys
INCLUDEPasses without validationWhen passing through schema-less items (caution needed)

Why is this superior? Rely on "noticing in review" for security and it will definitely leak someday. dump_only / load_only / unknown=RAISE make "carelessly writable / carelessly emitted" structurally impossible by declaring the safety constraint in the schema definition itself. This is the concrete implementation of CLAUDE.md's security principle "validate and sanitize all external input at the boundary, and apply least privilege."


4. Validation: the three layers of fields@validates@validates_schema

Type validation alone can't express business rules like "18 or older" or "the end time is after the start time." marshmallow provides validation in three layers according to responsibility.

The first layer: the validate= argument (a single-field static rule)

The most lightweight validation is the method of passing a marshmallow.validate validator to validate=.

from marshmallow import Schema, fields, validate


class UserSchema(Schema):
    name = fields.Str(validate=validate.Length(min=1, max=120))
    age = fields.Int(validate=validate.Range(min=18, max=120))
    permission = fields.Str(validate=validate.OneOf(["read", "write", "admin"]))
    sku = fields.Str(validate=validate.Regexp(r"^[A-Z]{3}-\d{4}$"))
    # 複数のバリデータはリストで合成できる
    slug = fields.Str(validate=[validate.Length(min=3), validate.Regexp(r"^[a-z0-9-]+$")])

The main built-in validators are, per the official docs, Length / Range / OneOf / Regexp / Email / Equal / ContainsOnly, and so on.

⚠️ A v4 breaking change: in 3.x you could use a function that returns False, like validate=lambda x: x == "ok", but in v4 a validator must always raise a ValidationError. The form of returning False stops working.

The second layer: @validates (custom validation per field)

Logic that can't be expressed with the built-ins is written with the @validates decorator. In v4 you can pass multiple field names, and the method receives data_key.

from marshmallow import Schema, fields, validates, ValidationError


class ItemSchema(Schema):
    quantity = fields.Integer(required=True)
    reserved = fields.Integer(required=True)

    @validates("quantity", "reserved")   # 複数フィールドを一括検証(v4)
    def validate_non_negative(self, value, data_key):
        if value < 0:
            # data_key で「どのフィールドのエラーか」を動的にメッセージへ反映できる
            raise ValidationError(f"{data_key} は 0 以上である必要があります。")

The third layer: @validates_schema (inter-field relationship validation)

An invariant spanning multiple fields, like "the end time is after the start time" or "the discounted price is below the list price," is consolidated in @validates_schema.

from marshmallow import Schema, fields, validates_schema, ValidationError


class BookingSchema(Schema):
    start_at = fields.DateTime(required=True)
    end_at = fields.DateTime(required=True)

    @validates_schema
    def validate_period(self, data, **kwargs):
        # ここに来る時点で start_at / end_at は型検証済み(後述の skip_on_field_errors)
        if data["start_at"] >= data["end_at"]:
            # 第2引数でエラーを特定フィールドに紐付けられる
            raise ValidationError("終了日時は開始日時より後にしてください。", "end_at")

When you want to assign errors to multiple fields, pass a dict keyed by field name.

    @validates_schema
    def validate_bounds(self, data, **kwargs):
        errors = {}
        if data["field_b"] <= data["field_a"]:
            errors["field_b"] = ["field_b は field_a より大きくしてください。"]
        if errors:
            raise ValidationError(errors)

💡 skip_on_field_errors defaults to True: @validates_schema is skipped if individual field validation has already failed (the default since v3.0). This prevents the accident of a schema validator raising a KeyError even though data["start_at"] doesn't exist (it failed type validation). A schema-wide error with no field name is stored in the _schema key of err.messages.

Why is this three-layer split superior? Write validation logic in "a service-layer if statement for now" and business logic and validation mix, making testing and reuse difficult. marshmallow's three layers clearly separate responsibility — static rules in validate=, field-specific logic in @validates, and inter-field invariants in @validates_schema. The guarantee "an instance of BookingSchema exists = the period is correct" completes inside the schema, and downstream code can trust that premise (the thoroughness of SRP).


5. Pre- and post-processing: map to the domain with @pre_load / @post_load

Being able to insert hooks before and after validation is another of marshmallow's strengths. Processing flows in the order @pre_load → field validation → @post_load.

@pre_load: normalize input before validation

Normalization like "strip leading/trailing whitespace" or "lowercase the email" should be done before validation.

from marshmallow import Schema, fields, pre_load


class UserSchema(Schema):
    email = fields.Email(required=True)

    @pre_load
    def normalize(self, data, **kwargs):
        if isinstance(data.get("email"), str):
            data["email"] = data["email"].strip().lower()
        return data

@post_load: turn the validated dict into a domain object

load() returns a dict by default, but with @post_load you can convert the validated data into an instance of your domain class. This realizes the design of "once you cross the boundary, it's no longer a dict but a typed object."

from dataclasses import dataclass
from marshmallow import Schema, fields, post_load


@dataclass
class User:
    name: str
    email: str


class UserSchema(Schema):
    name = fields.Str(required=True)
    email = fields.Email(required=True)

    @post_load
    def make_user(self, data, **kwargs):
        return User(**data)


UserSchema().load({"name": "友田", "email": "a@b.com"})  # → User(name='友田', email='a@b.com')

@post_dump: wrap the output in an envelope

When you want to wrap an API response in a common envelope like {"result": ...} / {"results": [...]}, use @post_dump. To receive many, specify pass_collection=True in v4 (renamed from 3.x's pass_many).

from marshmallow import Schema, post_dump


class EnvelopeSchema(Schema):
    @post_dump(pass_collection=True)
    def wrap(self, data, many, **kwargs):
        key = "results" if many else "result"
        return {key: data}

💡 A shortcut in the latest version (4.3.0): in marshmallow 4.3.0, pre_load / post_load arguments were added to fields.Field, letting you declare per-field pre/post processing without a decorator. It's useful in cases where you want to shape only a specific field rather than the whole schema (for details, see the official changelog).


6. Nesting: assemble composite data structures

Real-world data isn't flat. With fields.Nested, you can nest schemas.

class AuthorSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)


class BookSchema(Schema):
    title = fields.Str(required=True)
    author = fields.Nested(AuthorSchema)                       # 単一のネスト
    reviewers = fields.List(fields.Nested(AuthorSchema))       # ネストのコレクション

only / exclude: use only part of the nesting

In a list API, "the author is just the name" is often enough. You can narrow only part of the nesting with only / exclude. You can also specify multiple levels with dot notation.

class BookListSchema(Schema):
    title = fields.Str()
    author = fields.Nested(AuthorSchema(only=("name",)))       # 著者は名前だけ


class SiteSchema(Schema):
    book = fields.Nested(BookListSchema)

# 2階層下のフィールドだけを抜き出す
SiteSchema(only=("book.author.name",)).dump(site)

fields.Pluck: flatten the nesting into 1 attribute

When you want "not the author object, but only an array of author names," fields.Pluck is the shortest.

class BookSchema(Schema):
    title = fields.Str()
    author = fields.Pluck(AuthorSchema, "name")   # → {"title": "...", "author": "友田"}

Resolve circular / self-references with lambda

Mutually referencing schemas (author ⇄ book) or a self-reference (employee → manager) avoid definition-order / cycle problems by lazily evaluating with lambda. There's also a way of passing the class name as a string, effective for avoiding circular imports.

class UserSchema(Schema):
    name = fields.Str()
    # 自己参照:employer から先は employer を畳んで無限再帰を防ぐ
    employer = fields.Nested(lambda: UserSchema(exclude=("employer",)))

Why is this superior? From the single definition of the same AuthorSchema, just by switching only / exclude / Pluck you can make a "detail view," "list view," and "embedded view." You don't need to add a model per view, avoiding definition duplication (a DRY violation). This is the core of why marshmallow is said to be strong at "presentation-layer serialization."


7. Real-combat application: connect to the ORM with marshmallow-sqlalchemy

In the Flask + SQLAlchemy stack, using marshmallow-sqlalchemy lets you auto-generate a schema from a SQLAlchemy model, drastically reducing boilerplate.

from marshmallow_sqlalchemy import SQLAlchemyAutoSchema, auto_field


class AuthorSchema(SQLAlchemyAutoSchema):
    class Meta:
        model = Author              # このモデルの列からフィールドを自動生成
        load_instance = True        # load() が dict ではなく Author インスタンスを返す
        include_relationships = True # リレーションも出力に含める
        include_fk = True           # 外部キー列も含める

    # 自動生成を上書きしたい列だけ auto_field で個別宣言できる
    email = auto_field(required=True)

load_instance = True is the crux. Without writing @post_load yourself, load() returns a validated ORM instance, so you can session.add() it as-is.

A Flask endpoint: protect the "entrance" and "exit" with one schema

Combine the elements so far and an API endpoint becomes astonishingly concise and robust. load() handles the input boundary and dump() the output boundary, and validation errors are returned with 422.

@app.post("/authors")
def create_author():
    schema = AuthorSchema()
    try:
        # ① 入口:未知キー拒否・型検証・業務ルール検証をすべて通過した ORM インスタンスだけが残る
        author = schema.load(request.get_json(), session=db.session)
    except ValidationError as err:
        # ② エラーは構造化されたまま 422 で返す(err.messages はフィールド名→メッセージの dict)
        return jsonify(errors=err.messages), 422

    db.session.add(author)
    db.session.commit()

    # ③ 出口:dump_only / load_only により、安全に整形された表現だけが外へ出る
    return jsonify(schema.dump(author)), 201

Why is this superior? From the view function, all of validation, type conversion, and shaping disappears, and what remains is just the essential processing of "save." The safety of input (mass-assignment prevention) and the safety of output (confidentiality-leak prevention) are delegated to the declaration that is the schema definition. This is the design principle of "separate I/O, validation, and business logic by layer" itself.


8. marshmallow 34 migration: a cheat sheet of the changes you'll hit in production

When you encounter existing 3.x code, or the 3.x style that generative AI tends to output, you can mechanically replace it with the next correspondence table. These are the major breaking changes based on the official upgrade guide.

3.x (old)4.x (new)Kind
fields.Str(missing=...)fields.Str(load_default=...)The default at input
fields.Int(default=...)fields.Int(dump_default=...)The default at output
Direct use of fields.Number() / fields.Mapping() / fields.Field()fields.Integer() / Float() / Decimal() / fields.Dict()Banning instantiation of abstract base classes
validate= returns Falseraise a ValidationErrorThe validator's return value
@post_dump(pass_many=True)@post_dump(pass_collection=True)The decorator argument name
class Meta: fields = (...) / additionalExplicitly declare fieldsAbolition of implicit field generation
schema.context = {...}contextvars.ContextVar / experimental.ContextContext passing
Defining @validates("name") multiple times individually@validates("name", "nickname") + the method receives data_keyMultiple-field support
class MyField(fields.Field)class MyField(fields.Field[T])Genericizing a custom field
marshmallow.utils.from_iso_date etc.The standard library (date.fromisoformat etc.)Removal of date utilities
_bind_to_schema(self, field_name, schema)_bind_to_schema(self, field_name, parent)The custom field's argument name

💡 The crux of migration: the most frequent are missing / defaultload_default / dump_default and pass_manypass_collection. It's surest to mechanically surface them with grep -rn "missing=\|default=\|pass_many=\|\.context" .. Direct use of fields.Number() is also a good chance to make the intended type (integer or decimal) explicit.


9. marshmallow or Pydantic: the axis of selection

The two are often compared, but they're chosen by role, not as exclusive. The starting point of design is fundamentally different.

AspectmarshmallowPydantic v2
Schema definitionSchema class + fields descriptors (explicit)Type annotations + BaseModel (type-first)
Main focusBidirectional serialization / deserialization (presentation)Type-driven domain model & validation
SpeedPure Python implementationFast with the Rust pydantic-core
EcosystemFlask / SQLAlchemy (marshmallow-sqlalchemy) is matureIntegrated with FastAPI, outputs JSON Schema by default
Multiple views of the same dataEasy with only / exclude / PluckTends to define a separate model per view
Type-checker integrationSlightly weak, being descriptor-basedPowerful, directly tied to annotations

The selection guideline is clear.

  • Choose marshmallow: you have existing Flask / SQLAlchemy assets, you want to flexibly make multiple representations of the same data (list / detail / admin), or you want to explicitly separate serialization / validation logic from the ORM model.
  • Choose Pydantic v2: you're building newly with FastAPI, you want to maximize IDE completion and static analysis type-first, speed is a requirement at high QPS, or you want to auto-generate JSON Schema.

The author adopts marshmallow in the award-winning B2B SaaS built with Flask/SQLAlchemy, and Pydantic in FastAPI-based new projects. The discipline of "don't trust outside the boundary" is completely identical in both, and whichever you choose, the essence doesn't change. The Pydantic-side design is detailed in the Pydantic v2 practical guide.


Conclusion: elevate serialization and validation into "boundary design"

marshmallow is a mature serialization / deserialization library independent of any ORM or framework. Let me re-list this article's key points.

  1. With Schema + fields, declare bidirectional conversion, output with dump(), and validate the boundary's input with load().
  2. With dump_only / load_only / unknown=RAISE, prevent mass assignment and confidentiality leakage with the schema's structure.
  3. Separate validation responsibility in three layers: validate= (static) → @validates (field-specific) → @validates_schema (inter-field).
  4. Normalize with @pre_load, turn into a domain object with @post_load, and assemble composite structures with fields.Nested / Pluck.
  5. Connect to the ORM with marshmallow-sqlalchemy, protecting the API's entrance and exit simultaneously with one schema.
  6. The 3 → 4 migration centers on replacing with load_default / dump_default / pass_collection. Handle it mechanically with the cheat sheet.

The difference between "working code" and "code you can operate for 10 years" lies in the accumulation of boundary design — where and how you dam untrustworthy data, and how you safely emit internal values outward. marshmallow is a proven tool that expresses that boundary as a declarative schema.

For further exploration, I recommend re-reading the following of the official documentation with this article's design viewpoint in mind.


Consultation on type-safe backend design

The author has implemented and operated the discipline explained here — "always validate external input at the system boundary, and safely shape and return internal values" — as boundary validation with marshmallow in the production environment of a B2B SaaS that won the Minister of Economy, Trade and Industry Award. In a FastAPI-based stack, Pydantic v2 handles that role. I build, fast and high-quality with generative AI, the foundation directly connected to a business's reliability — type-safe input validation, response shaping, mass-assignment countermeasures, and the boundary defense of ORM integration. On backend development using Python and the type-safe-ification of existing systems, feel free to consult me.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading