Introduction: why dare to use marshmallow now
The design philosophy of a robust backend can be condensed into one sentence — "Never trust data coming from outside the system boundary." An HTTP request body, an external API's response, form input, a message-queue payload. These are all "data with no type guarantee, unvalidated," and the moment you pass them straight through to the inside of the application, they transform into KeyError, AttributeError, and in the worst case mass assignment (illegal privilege escalation) or data leakage.
marshmallow is the "bidirectional gatekeeper" standing at this boundary. If Pydantic is "a type-first model," marshmallow has matured since 2013 as a dedicated serialization / deserialization library independent of any ORM / framework. In the Flask + SQLAlchemy stack, it's still the de facto standard. Its essence condenses into just 2 directions.
load()(deserialization): validate untrustworthy external input (dict / JSON), normalize it, and convert it to a form usable internally. If validation fails, dam it with aValidationError.dump()(serialization): shape an internal Python object (an ORM model, etc.) into a safe outward representation (dict / JSON).
The author has designed and implemented the backend of a B2B SaaS that won the Minister of Economy, Trade and Industry Award in Python / Flask / SQLAlchemy / PostgreSQL, and operated it in production with the strict layer separation Router → UseCase → Repository → Model. What handled that project's boundary validation was exactly marshmallow. This article organizes that real-combat knowledge, faithfully to the latest marshmallow official documentation (marshmallow.readthedocs.io) yet one level more understandable.
💡 The version covered in this article: it assumes marshmallow 4.3.0 (the stable version as of April 2026). Much of the code in web articles and that generative AI outputs is still written in the 3.x-family legacy style (
missing=/default=/Schema.context). This article is written in the v4 canonical style, and a 3 → 4 migration cheat sheet is prepared at the end of the article.
💡 This article is part of a series on Python backend design. The type-first paired option is the Pydantic v2 practical guide, the web-framework layer is the FastAPI production operation guide, and the persistence layer is the SQLAlchemy 2.0 practical guide; reading them together gives a view of a consistent design from the boundary to the DB.
1. Schema and fields: declaratively define bidirectional conversion
marshmallow's starting point is inheriting the Schema class. Just line up fields descriptors as class attributes, and that becomes the single source of truth for validation, serialization, and deserialization.
from datetime import datetime
from marshmallow import Schema, fields
class UserSchema(Schema):
name = fields.Str()
email = fields.Email()
created_at = fields.DateTime()
user = {"name": "友田", "email": "tomoda@example.com", "created_at": datetime.now()}
schema = UserSchema()
schema.dump(user) # → dict: {'name': '友田', 'email': 'tomoda@example.com', 'created_at': '2026-06-26T...'}
schema.dumps(user) # → JSON 文字列: '{"name": "友田", "email": "tomoda@example.com", ...}'
dump() returns a Python dict, and dumps() a JSON string (the trailing s is the s of string). A JSON-incompatible type like datetime is auto-converted to an ISO 8601 string by fields.DateTime() — this is dump's job of "internal type → outward representation."
Note that instead of a class definition, you can also generate a schema at runtime with Schema.from_dict(). It's useful when you want to assemble a schema dynamically from configuration values.
UserSchema = Schema.from_dict(
{"name": fields.Str(required=True), "email": fields.Email(required=True)}
)
Handle collections with many=True
For bulk conversion of multiple objects, specify many=True. You can specify it at schema generation or at the dump()/load() call.
UserSchema(many=True).dump(users) # ① スキーマ単位
UserSchema().dump(users, many=True) # ② 呼び出し単位
Why is this superior?
The knowledge of "which items, in what type, with what conversion, to input/output" coheres in one place called the Schema. Because the response-shaping logic isn't scattered across view functions and the service layer, even if the output spec changes, the fix is localized to the schema definition. This is the practice of DRY and ETC (Easy To Change) as in CLAUDE.md.
2. load(): validate and dam the boundary's invalid data
The inverse of dump() is load(). Validate untrustworthy external input and return a normalized dict — on failure, send a ValidationError. This is marshmallow's most important feature.
from marshmallow import ValidationError
try:
data = UserSchema().load({"name": "友田", "email": "not-an-email"})
except ValidationError as err:
print(err.messages) # {'email': ['Not a valid email address.']}
print(err.valid_data) # {'name': '友田'} ← 検証を通った分だけ取り出せる
A ValidationError holds 2 pieces of information.
err.messages: a dict of error messages keyed by field name. You can make it the body of a 422 response as-is.err.valid_data: a dict containing only the fields that passed validation. Usable for partial processing.
With many=True, errors return keyed by the element's index, so "the which-th element of the array, which field, and why it failed" is clear at a glance.
# {1: {'email': ['Not a valid email address.']},
# 3: {'name': ['Missing data for required field.']}}
required / allow_none / load_default / dump_default
Declare an input's required/optional/default with the field's arguments. In v4, missing=/default= were removed and unified into load_default (the default at input) / dump_default (the default at output). It's an excellent change where "the meaning differs between input and output" is clear from the name.
import uuid
from datetime import datetime
from marshmallow import Schema, fields
class AccountSchema(Schema):
# required=True:欠落したら "Missing data for required field." で落とす
email = fields.Email(required=True, error_messages={"required": "メールアドレスは必須です。"})
# 明示的に None を許容する(デフォルトでは None は不許可)
nickname = fields.Str(allow_none=True)
# load 時に値が無ければこのデフォルトを補う(呼び出し可能オブジェクトも可)
id = fields.UUID(load_default=uuid.uuid4)
# dump 時に値が無ければこのデフォルトで出力する
role = fields.Str(dump_default="member")
data_key: separate external naming from internal naming
External API is camelCase, internal code is snake_case — a common collision. data_key confines this translation to the boundary.
class UserSchema(Schema):
user_name = fields.Str(data_key="userName")
email = fields.Email(data_key="emailAddress")
# load: {"userName": "友田", "emailAddress": "a@b.com"} → {'user_name': '友田', 'email': 'a@b.com'}
# dump: 内部の snake_case → 外向きの camelCase へ戻る
partial: partial validation for PATCH
In update-system APIs, there are scenes of "wanting to validate only the sent fields." With partial=True you can skip all required, and with partial=("name",) the required of a specific field.
UserSchema().load({"email": "a@b.com"}, partial=("name",)) # name の必須を免除
Why is this superior?
The discipline of "becoming an internal type only after validation" appears in the code in the form of the function call load(). Data that passed the boundary is guaranteed in code to be "validated," and all downstream processing can trust that premise. It prevents hand-written validation if statements from mixing into the business logic, like if not data.get("email"): ..., protecting SRP (single responsibility).
3. The Schema as a security boundary: 3 safety devices
marshmallow's true value is the point that you can control "what to let inside and what not to let outside" with the schema's structure. This is preventing the typical vulnerabilities OWASP warns about with the structure of code rather than operational vigilance.
① dump_only: a field you "don't let the client write"
Receive a value the server should decide like id / created_at / role defenselessly from request.json, and it becomes a mass-assignment vulnerability where an attacker sends in {"role": "admin"}. A field with dump_only=True becomes output-only and is completely ignored in load().
class UserSchema(Schema):
id = fields.Int(dump_only=True) # 出力のみ。load では絶対に書き込めない
role = fields.Str(dump_only=True) # 権限はサーバーが決定する
created_at = fields.DateTime(dump_only=True)
email = fields.Email(required=True) # これは load で受け付ける
② load_only: a field you "absolutely don't put in the response"
Accidentally including a password or token in a response is a typical information-leakage accident. A field with load_only=True becomes input-only and is never included in dump()'s output.
class SignupSchema(Schema):
email = fields.Email(required=True)
password = fields.Str(load_only=True, required=True, validate=validate.Length(min=12))
# password は load では受け取るが、dump では出力されない → 漏洩を構造的に防ぐ
③ unknown=RAISE: reject unknown keys (the default)
marshmallow by default rejects unknown fields with a ValidationError (unknown=RAISE). This is a safe-side-leaning design, the polar opposite of a defenseless expansion like Model(**request.json). The behavior can be switched explicitly.
from marshmallow import Schema, fields, EXCLUDE, INCLUDE, RAISE
class StrictSchema(Schema):
class Meta:
unknown = RAISE # デフォルト:未知のキーはエラー(最も安全)
# EXCLUDE → 未知のキーを黙って捨てる / INCLUDE → そのまま通す
name = fields.Str()
# 呼び出し単位での上書きも可能
StrictSchema().load(payload, unknown=EXCLUDE)
| Option | Treatment of unknown keys | Main use |
|---|---|---|
RAISE (default) | Sends a ValidationError | Strict input validation (first choice) |
EXCLUDE | Silently discards | When you want to ignore an external API's extra keys |
INCLUDE | Passes without validation | When passing through schema-less items (caution needed) |
Why is this superior?
Rely on "noticing in review" for security and it will definitely leak someday. dump_only / load_only / unknown=RAISE make "carelessly writable / carelessly emitted" structurally impossible by declaring the safety constraint in the schema definition itself. This is the concrete implementation of CLAUDE.md's security principle "validate and sanitize all external input at the boundary, and apply least privilege."
4. Validation: the three layers of fields → @validates → @validates_schema
Type validation alone can't express business rules like "18 or older" or "the end time is after the start time." marshmallow provides validation in three layers according to responsibility.
The first layer: the validate= argument (a single-field static rule)
The most lightweight validation is the method of passing a marshmallow.validate validator to validate=.
from marshmallow import Schema, fields, validate
class UserSchema(Schema):
name = fields.Str(validate=validate.Length(min=1, max=120))
age = fields.Int(validate=validate.Range(min=18, max=120))
permission = fields.Str(validate=validate.OneOf(["read", "write", "admin"]))
sku = fields.Str(validate=validate.Regexp(r"^[A-Z]{3}-\d{4}$"))
# 複数のバリデータはリストで合成できる
slug = fields.Str(validate=[validate.Length(min=3), validate.Regexp(r"^[a-z0-9-]+$")])
The main built-in validators are, per the official docs, Length / Range / OneOf / Regexp / Email / Equal / ContainsOnly, and so on.
⚠️ A v4 breaking change: in 3.x you could use a function that returns
False, likevalidate=lambda x: x == "ok", but in v4 a validator must alwaysraiseaValidationError. The form of returningFalsestops working.
The second layer: @validates (custom validation per field)
Logic that can't be expressed with the built-ins is written with the @validates decorator. In v4 you can pass multiple field names, and the method receives data_key.
from marshmallow import Schema, fields, validates, ValidationError
class ItemSchema(Schema):
quantity = fields.Integer(required=True)
reserved = fields.Integer(required=True)
@validates("quantity", "reserved") # 複数フィールドを一括検証(v4)
def validate_non_negative(self, value, data_key):
if value < 0:
# data_key で「どのフィールドのエラーか」を動的にメッセージへ反映できる
raise ValidationError(f"{data_key} は 0 以上である必要があります。")
The third layer: @validates_schema (inter-field relationship validation)
An invariant spanning multiple fields, like "the end time is after the start time" or "the discounted price is below the list price," is consolidated in @validates_schema.
from marshmallow import Schema, fields, validates_schema, ValidationError
class BookingSchema(Schema):
start_at = fields.DateTime(required=True)
end_at = fields.DateTime(required=True)
@validates_schema
def validate_period(self, data, **kwargs):
# ここに来る時点で start_at / end_at は型検証済み(後述の skip_on_field_errors)
if data["start_at"] >= data["end_at"]:
# 第2引数でエラーを特定フィールドに紐付けられる
raise ValidationError("終了日時は開始日時より後にしてください。", "end_at")
When you want to assign errors to multiple fields, pass a dict keyed by field name.
@validates_schema
def validate_bounds(self, data, **kwargs):
errors = {}
if data["field_b"] <= data["field_a"]:
errors["field_b"] = ["field_b は field_a より大きくしてください。"]
if errors:
raise ValidationError(errors)
💡
skip_on_field_errorsdefaults toTrue:@validates_schemais skipped if individual field validation has already failed (the default since v3.0). This prevents the accident of a schema validator raising aKeyErroreven thoughdata["start_at"]doesn't exist (it failed type validation). A schema-wide error with no field name is stored in the_schemakey oferr.messages.
Why is this three-layer split superior?
Write validation logic in "a service-layer if statement for now" and business logic and validation mix, making testing and reuse difficult. marshmallow's three layers clearly separate responsibility — static rules in validate=, field-specific logic in @validates, and inter-field invariants in @validates_schema. The guarantee "an instance of BookingSchema exists = the period is correct" completes inside the schema, and downstream code can trust that premise (the thoroughness of SRP).
5. Pre- and post-processing: map to the domain with @pre_load / @post_load
Being able to insert hooks before and after validation is another of marshmallow's strengths. Processing flows in the order @pre_load → field validation → @post_load.
@pre_load: normalize input before validation
Normalization like "strip leading/trailing whitespace" or "lowercase the email" should be done before validation.
from marshmallow import Schema, fields, pre_load
class UserSchema(Schema):
email = fields.Email(required=True)
@pre_load
def normalize(self, data, **kwargs):
if isinstance(data.get("email"), str):
data["email"] = data["email"].strip().lower()
return data
@post_load: turn the validated dict into a domain object
load() returns a dict by default, but with @post_load you can convert the validated data into an instance of your domain class. This realizes the design of "once you cross the boundary, it's no longer a dict but a typed object."
from dataclasses import dataclass
from marshmallow import Schema, fields, post_load
@dataclass
class User:
name: str
email: str
class UserSchema(Schema):
name = fields.Str(required=True)
email = fields.Email(required=True)
@post_load
def make_user(self, data, **kwargs):
return User(**data)
UserSchema().load({"name": "友田", "email": "a@b.com"}) # → User(name='友田', email='a@b.com')
@post_dump: wrap the output in an envelope
When you want to wrap an API response in a common envelope like {"result": ...} / {"results": [...]}, use @post_dump. To receive many, specify pass_collection=True in v4 (renamed from 3.x's pass_many).
from marshmallow import Schema, post_dump
class EnvelopeSchema(Schema):
@post_dump(pass_collection=True)
def wrap(self, data, many, **kwargs):
key = "results" if many else "result"
return {key: data}
💡 A shortcut in the latest version (4.3.0): in marshmallow 4.3.0,
pre_load/post_loadarguments were added tofields.Field, letting you declare per-field pre/post processing without a decorator. It's useful in cases where you want to shape only a specific field rather than the whole schema (for details, see the official changelog).
6. Nesting: assemble composite data structures
Real-world data isn't flat. With fields.Nested, you can nest schemas.
class AuthorSchema(Schema):
id = fields.Int(dump_only=True)
name = fields.Str(required=True)
class BookSchema(Schema):
title = fields.Str(required=True)
author = fields.Nested(AuthorSchema) # 単一のネスト
reviewers = fields.List(fields.Nested(AuthorSchema)) # ネストのコレクション
only / exclude: use only part of the nesting
In a list API, "the author is just the name" is often enough. You can narrow only part of the nesting with only / exclude. You can also specify multiple levels with dot notation.
class BookListSchema(Schema):
title = fields.Str()
author = fields.Nested(AuthorSchema(only=("name",))) # 著者は名前だけ
class SiteSchema(Schema):
book = fields.Nested(BookListSchema)
# 2階層下のフィールドだけを抜き出す
SiteSchema(only=("book.author.name",)).dump(site)
fields.Pluck: flatten the nesting into 1 attribute
When you want "not the author object, but only an array of author names," fields.Pluck is the shortest.
class BookSchema(Schema):
title = fields.Str()
author = fields.Pluck(AuthorSchema, "name") # → {"title": "...", "author": "友田"}
Resolve circular / self-references with lambda
Mutually referencing schemas (author ⇄ book) or a self-reference (employee → manager) avoid definition-order / cycle problems by lazily evaluating with lambda. There's also a way of passing the class name as a string, effective for avoiding circular imports.
class UserSchema(Schema):
name = fields.Str()
# 自己参照:employer から先は employer を畳んで無限再帰を防ぐ
employer = fields.Nested(lambda: UserSchema(exclude=("employer",)))
Why is this superior?
From the single definition of the same AuthorSchema, just by switching only / exclude / Pluck you can make a "detail view," "list view," and "embedded view." You don't need to add a model per view, avoiding definition duplication (a DRY violation). This is the core of why marshmallow is said to be strong at "presentation-layer serialization."
7. Real-combat application: connect to the ORM with marshmallow-sqlalchemy
In the Flask + SQLAlchemy stack, using marshmallow-sqlalchemy lets you auto-generate a schema from a SQLAlchemy model, drastically reducing boilerplate.
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema, auto_field
class AuthorSchema(SQLAlchemyAutoSchema):
class Meta:
model = Author # このモデルの列からフィールドを自動生成
load_instance = True # load() が dict ではなく Author インスタンスを返す
include_relationships = True # リレーションも出力に含める
include_fk = True # 外部キー列も含める
# 自動生成を上書きしたい列だけ auto_field で個別宣言できる
email = auto_field(required=True)
load_instance = True is the crux. Without writing @post_load yourself, load() returns a validated ORM instance, so you can session.add() it as-is.
A Flask endpoint: protect the "entrance" and "exit" with one schema
Combine the elements so far and an API endpoint becomes astonishingly concise and robust. load() handles the input boundary and dump() the output boundary, and validation errors are returned with 422.
@app.post("/authors")
def create_author():
schema = AuthorSchema()
try:
# ① 入口:未知キー拒否・型検証・業務ルール検証をすべて通過した ORM インスタンスだけが残る
author = schema.load(request.get_json(), session=db.session)
except ValidationError as err:
# ② エラーは構造化されたまま 422 で返す(err.messages はフィールド名→メッセージの dict)
return jsonify(errors=err.messages), 422
db.session.add(author)
db.session.commit()
# ③ 出口:dump_only / load_only により、安全に整形された表現だけが外へ出る
return jsonify(schema.dump(author)), 201
Why is this superior? From the view function, all of validation, type conversion, and shaping disappears, and what remains is just the essential processing of "save." The safety of input (mass-assignment prevention) and the safety of output (confidentiality-leak prevention) are delegated to the declaration that is the schema definition. This is the design principle of "separate I/O, validation, and business logic by layer" itself.
8. marshmallow 3 → 4 migration: a cheat sheet of the changes you'll hit in production
When you encounter existing 3.x code, or the 3.x style that generative AI tends to output, you can mechanically replace it with the next correspondence table. These are the major breaking changes based on the official upgrade guide.
| 3.x (old) | 4.x (new) | Kind |
|---|---|---|
fields.Str(missing=...) | fields.Str(load_default=...) | The default at input |
fields.Int(default=...) | fields.Int(dump_default=...) | The default at output |
Direct use of fields.Number() / fields.Mapping() / fields.Field() | fields.Integer() / Float() / Decimal() / fields.Dict() | Banning instantiation of abstract base classes |
validate= returns False | raise a ValidationError | The validator's return value |
@post_dump(pass_many=True) | @post_dump(pass_collection=True) | The decorator argument name |
class Meta: fields = (...) / additional | Explicitly declare fields | Abolition of implicit field generation |
schema.context = {...} | contextvars.ContextVar / experimental.Context | Context passing |
Defining @validates("name") multiple times individually | @validates("name", "nickname") + the method receives data_key | Multiple-field support |
class MyField(fields.Field) | class MyField(fields.Field[T]) | Genericizing a custom field |
marshmallow.utils.from_iso_date etc. | The standard library (date.fromisoformat etc.) | Removal of date utilities |
_bind_to_schema(self, field_name, schema) | _bind_to_schema(self, field_name, parent) | The custom field's argument name |
💡 The crux of migration: the most frequent are
missing/default→load_default/dump_defaultandpass_many→pass_collection. It's surest to mechanically surface them withgrep -rn "missing=\|default=\|pass_many=\|\.context" .. Direct use offields.Number()is also a good chance to make the intended type (integer or decimal) explicit.
9. marshmallow or Pydantic: the axis of selection
The two are often compared, but they're chosen by role, not as exclusive. The starting point of design is fundamentally different.
| Aspect | marshmallow | Pydantic v2 |
|---|---|---|
| Schema definition | Schema class + fields descriptors (explicit) | Type annotations + BaseModel (type-first) |
| Main focus | Bidirectional serialization / deserialization (presentation) | Type-driven domain model & validation |
| Speed | Pure Python implementation | Fast with the Rust pydantic-core |
| Ecosystem | Flask / SQLAlchemy (marshmallow-sqlalchemy) is mature | Integrated with FastAPI, outputs JSON Schema by default |
| Multiple views of the same data | Easy with only / exclude / Pluck | Tends to define a separate model per view |
| Type-checker integration | Slightly weak, being descriptor-based | Powerful, directly tied to annotations |
The selection guideline is clear.
- Choose marshmallow: you have existing Flask / SQLAlchemy assets, you want to flexibly make multiple representations of the same data (list / detail / admin), or you want to explicitly separate serialization / validation logic from the ORM model.
- Choose Pydantic v2: you're building newly with FastAPI, you want to maximize IDE completion and static analysis type-first, speed is a requirement at high QPS, or you want to auto-generate JSON Schema.
The author adopts marshmallow in the award-winning B2B SaaS built with Flask/SQLAlchemy, and Pydantic in FastAPI-based new projects. The discipline of "don't trust outside the boundary" is completely identical in both, and whichever you choose, the essence doesn't change. The Pydantic-side design is detailed in the Pydantic v2 practical guide.
Conclusion: elevate serialization and validation into "boundary design"
marshmallow is a mature serialization / deserialization library independent of any ORM or framework. Let me re-list this article's key points.
- With
Schema+fields, declare bidirectional conversion, output withdump(), and validate the boundary's input withload(). - With
dump_only/load_only/unknown=RAISE, prevent mass assignment and confidentiality leakage with the schema's structure. - Separate validation responsibility in three layers:
validate=(static) →@validates(field-specific) →@validates_schema(inter-field). - Normalize with
@pre_load, turn into a domain object with@post_load, and assemble composite structures withfields.Nested/Pluck. - Connect to the ORM with
marshmallow-sqlalchemy, protecting the API's entrance and exit simultaneously with one schema. - The
3 → 4migration centers on replacing withload_default/dump_default/pass_collection. Handle it mechanically with the cheat sheet.
The difference between "working code" and "code you can operate for 10 years" lies in the accumulation of boundary design — where and how you dam untrustworthy data, and how you safely emit internal values outward. marshmallow is a proven tool that expresses that boundary as a declarative schema.
For further exploration, I recommend re-reading the following of the official documentation with this article's design viewpoint in mind.
- Quickstart
- Nesting Schemas
- Custom Fields
- Extending Schemas (pre/post processing, schema validation)
- Upgrading to newer releases (3→4 migration)
- marshmallow-sqlalchemy
Consultation on type-safe backend design
The author has implemented and operated the discipline explained here — "always validate external input at the system boundary, and safely shape and return internal values" — as boundary validation with marshmallow in the production environment of a B2B SaaS that won the Minister of Economy, Trade and Industry Award. In a FastAPI-based stack, Pydantic v2 handles that role. I build, fast and high-quality with generative AI, the foundation directly connected to a business's reliability — type-safe input validation, response shaping, mass-assignment countermeasures, and the boundary defense of ORM integration. On backend development using Python and the type-safe-ification of existing systems, feel free to consult me.