Skip to main content
友田 陽大
Flask in production
Python
Flask
可観測性
ロギング
エラーハンドリング
本番運用
バックエンド

Flask Error Handling, Logging, and Observability Guide (3.1 line): JSON Error Design, Structured Logs, Request IDs, Sentry, and Health Checks

Systematizing Flask 3.1-line production error handling and observability. From errorhandler/abort, custom HTTPException, and resolution order, HTTPException→JSON and a common error envelope, structured logs via dictConfig, request-ID correlation, Sentry integration and PII scrubbing, to ALB/ECS-oriented health checks—all explained with official-compliant real code.

Published
Reading time
23 min read
Author
友田 陽大
Share
Contents

Introduction: the scariest thing in production is the "silent exception"

In a production Flask app, the first thing that breaks is not a feature but "how a failure looks." A report comes from a user: "I get an error." But the response is a flavorless 500 Internal Server Error HTML, the logs have no correlation info, and which request, for which user, at which stage it fell over—none of it connects. This "silence" turns a 5-minute investigation into a 2-hour hell.

In production mode, Flask has the smart default of, on an exception, "displaying a very simple page and recording the exception to the logger" (this is the official statement). It's sufficient for learning, but for production operation of a REST API, you must not use this default as-is. What an API's clients (frontend, mobile, other services) need is not an HTML page but machine-readable structured JSON errors, and what the investigator needs is structured logs correlated by a request ID.

This article is a spoke that digs §6 "Error Handling and Logging" of the Flask production-operations guide (pillar) down to production quality. It covers the following 3 layers.

  1. Error handling: errorhandler / abort / custom HTTPException, resolution order, JSON error design for APIs.
  2. Logging: dictConfig, app.logger, structured logs, request-ID correlation.
  3. Observability: error aggregation via Sentry, PII scrubbing, health checks, and the bridge to tracing.

I have designed and implemented the backend of a Minister of Economy, Trade and Industry Award-winning B2B SaaS in Python / Flask / SQLAlchemy / PostgreSQL, and operated it in production on ALB → ECS (Fargate) with structured logs and error tracking. What I show here is only the design needed in that real combat to make failures "traceable in 5 minutes."

💡 Versions covered in this article: assuming the Flask 3.1 line. Error handling builds directly on werkzeug.exceptions, logging on the standard logging. The code is based on patterns from the official documentation (flask.palletsprojects.com stable).


1. Default error behavior: why an API must "design its own errors"

First, grasp precisely what happens when Flask does nothing. In the official's own words—"when the application is running in production mode and an exception is raised, Flask displays a very simple page and records the exception to the logger."

This is a reasonable default as a web page. But for an API there are two problems.

  • The response is HTML: a 500 page with Content-Type: text/html is returned, so a client expecting JSON fails to parse it. The client can't mechanically distinguish the kind of error (validation failure, resource absence, or internal server error).
  • The debug-mode trap: in development with debug=True, the 500 handler isn't used and the interactive debugger is displayed instead. Exposing this in production is a serious vulnerability leading to arbitrary code execution (always disable DEBUG in production).

⚠️ DEBUG=True in production is strictly forbidden. The official docs state clearly that "in debug mode, the 'Internal Server Error' handler is not used, and the interactive debugger is shown instead." The debugger is Werkzeug's console and can execute arbitrary Python on the server. Expose it in production and you're compromised instantly. For handling DEBUG, see the configuration chapter of the production-operations guide.

The conclusion is simple. In an API, don't allow an exception to become a 500 HTML as-is; design yourself "which exception, with which HTTP status, in what JSON to return." That's the theme from this chapter on.


2. The basics of error handlers: errorhandler / register_error_handler / abort

2.1 Two ways to register a handler

Processing for exceptions and status codes is registered as a handler function. There are two ways to write it, with the same functionality.

from werkzeug.exceptions import BadRequest

# 方法 A:デコレータ(局所的で読みやすい)
@app.errorhandler(BadRequest)
def handle_bad_request(e):
    return "bad request!", 400

# 方法 B:register_error_handler(ファクトリ内でまとめて登録しやすい)
app.register_error_handler(400, handle_bad_request)

There are two important official facts here.

  • HTTPException subclasses and HTTP codes are interchangeable at registration. Since BadRequest.code == 400, @app.errorhandler(BadRequest) and @app.errorhandler(400) mean the same thing.
  • The handler's return status code is not set automatically. The official docs caution that "the status code of the response is not set to the handler's code. Make sure to specify the appropriate HTTP status code when returning a response from a handler." That is, you must explicitly specify the status in the second return value like return "bad request!", 400, or it returns with 200.

⚠️ "Forgetting the status code" is a frequent bug. If inside @app.errorhandler(404) you write only return jsonify(error="not found"), the HTTP status becomes 200. Always attach the code like , 404 to an error handler's return value. Registration in an application-factory configuration (calling register_error_handler inside create_app) aligns with the factory chapter of the production-operations guide.

2.2 abort(code, description): explicitly interrupt from a view

When you want to "return a 404 here" partway through a view function, use abort(). abort() raises the corresponding HTTPException and delegates processing to the registered handler. The official pattern:

from flask import abort, render_template, request


@app.route("/profile")
def user_profile():
    username = request.args.get("username")   # クエリ文字列から取得
    if username is None:
        abort(400)                            # 引数不正 → 400
    user = get_user(username=username)
    if user is None:
        abort(404)                            # 該当ユーザーなし → 404
    return render_template("profile.html", user=user)


@app.errorhandler(404)
def page_not_found(e):
    return render_template("404.html"), 404

You can override the description with the second argument like abort(404, description="Resource not found"). This is useful in the JSON errors described later for telling the client a specific reason.

💡 Early-aborting in a view like "400 if there's no username," "404 if the user doesn't exist" is a form of guard clause (early return). It avoids deep nesting and lets you line up the normal-path logic straight below. Fetching request.args and passing request-scope values via g are covered in detail in the application/request context explainer.

2.3 Custom HTTPException subclasses: define domain-specific errors

When you want to define a status not in the standard (e.g., 507 Insufficient Storage) or a domain-specific error, inherit from HTTPException. The official pattern:

import werkzeug.exceptions


class InsufficientStorage(werkzeug.exceptions.HTTPException):
    code = 507
    description = "Not enough storage space."


app.register_error_handler(InsufficientStorage, handle_507)

raise InsufficientStorage()

Just by giving code and description as class attributes, you can make a custom exception you only need to raise. Because you can write the code with a business-meaningful name (InsufficientStorage), the readability of the view function goes up.


3. Resolution order: which handler is chosen (code → class hierarchy → most specific)

When you've registered multiple handlers, "which is chosen" is the core that determines production behavior. Let me quote the official rule directly.

When Flask catches an exception while handling a request, it is first looked up by code. If there's no handler for that code, Flask looks up the error by class hierarchy, and the most specific handler is chosen. If no handler is registered at all, HTTPException subclasses show a generic message about their code, and other exceptions are converted to a generic "500 Internal Server Error."

Let me organize this in a table.

StageMatched againstExample
① Code matchA numeric code like @app.errorhandler(404)abort(404) → the code-404 handler
② Class hierarchyIf no code match, traverse the exception classBadRequest (= a subclass of HTTPException) → the HTTPException handler
③ Most specificOn the hierarchy, prefer the closer (more specific) handlerIf both Exception and HTTPException exist, the latter for HTTPException exceptions
④ When unregisteredHTTPException gets a generic message, others go to 500An unexpected exception → 500 Internal Server Error

There are two practical implications of this rule.

  • You may coexist broad handlers (HTTPException / Exception) with narrow handlers (404). Since the narrow one is preferred, you can build a two-tier setup: "an error you want to craft individually (404) gets a dedicated handler, and everything else is JSON-ified by an umbrella handler."
  • The Exception handler doesn't steal HTTPException. The official docs state clearly—"if you register handlers for both HTTPException and Exception, the HTTPException handler is more specific, so the Exception handler doesn't handle HTTPException subclasses." This property enables the next chapter's design of "catching only unexpected exceptions as 500."

3.1 Blueprints and resolution order: only 404 is the exception

When you register handlers per Blueprint, for errors under that Blueprint, the Blueprint's handler takes priority over the app-wide handler. But there's one serious exception. In the official words:

A handler registered on a Blueprint takes priority over one registered globally. However, a Blueprint cannot handle 404 routing errors. A 404 occurs at the routing stage, before which Blueprint is determined.

⚠️ "Trying to catch a 404 in a Blueprint but failing to" is a typical stumbling point. A nonexistent URL becomes a 404 at the routing layer before reaching any Blueprint. Therefore, register the 404 handler at the app level (@app.errorhandler(404)). A Blueprint-level handler only works for "post-arrival errors" like abort(403) within that Blueprint. The Blueprint configuration itself is covered in the production-operations guide.


4. JSON error design: a "consistent error envelope" for APIs

From here is the heart of API design. The goal is that all errors are returned in the same shape of JSON. The client can handle errors with just one kind of parsing.

4.1 Turn every HTTPException into JSON

What abort(404) or abort(400) returns is HTML by default. What converts this to JSON in bulk is the HTTPException handler shown by the official docs.

from flask import json
from werkzeug.exceptions import HTTPException


@app.errorhandler(HTTPException)
def handle_exception(e):
    """すべての HTTP エラーを JSON で返す。"""
    # HTTPException が持つレスポンスをベースに、本文だけ JSON に差し替える
    response = e.get_response()
    response.data = json.dumps({
        "code": e.code,
        "name": e.name,
        "description": e.description,
    })
    response.content_type = "application/json"
    return response

The response e.get_response() returns already has the status code correctly set, so you don't need to attach , e.code here (the §2.1 caution is about returning a tuple, not a response object). Putting code / name / description in the body makes the description from abort(404, description="...") reach the client as-is.

4.2 Turn unexpected exceptions into 500 JSON too (isinstance pass-through)

The HTTPException handler handles "HTTP errors Flask knows." But unexpected exceptions like KeyError or ZeroDivisionError are not HTTPException. If you want to return these as JSON too, add an Exception handler. The official pattern:

from werkzeug.exceptions import HTTPException


@app.errorhandler(Exception)
def handle_exception(e):
    # HTTPException はここで握らず、専用ハンドラ(§4.1)に委ねる
    if isinstance(e, HTTPException):
        return e
    # 想定外の例外だけを 500 として扱う
    return render_template("500_generic.html", e=e), 500

Here the pass-through if isinstance(e, HTTPException): return e is the point. As seen in §3, the HTTPException handler is more specific, so a HTTPException normally doesn't reach the Exception handler, but adding this guard makes the code's intent explicit and robust to future registration-order changes. For an API, return JSON instead of render_template.

from flask import jsonify
from werkzeug.exceptions import HTTPException


@app.errorhandler(Exception)
def handle_unexpected_exception(e):
    if isinstance(e, HTTPException):
        return e
    # 想定外の例外は詳細を漏らさず、一貫したエンベロープで 500 を返す
    app.logger.exception("unhandled exception")   # スタックトレースはログへ
    return jsonify(code=500, name="Internal Server Error",
                   description="予期しないエラーが発生しました。"), 500

⚠️ Don't put internal info in an error response. Returning a stack trace, SQL statement, or file path to the client is information leakage. Record the details on the log side with app.logger.exception(), and return only a generic message to the client. The principle of not exposing secrets or PII is consistent with the security implementation guide.

4.3 The identity of an unexpected exception: InternalServerError and original_exception

When an uncaught exception is passed to the 500 handler, the e passed is not the original exception itself. An important official fact:

Since Flask 1.1.0, this error handler is always passed an instance of InternalServerError, not the original uncaught exception itself. The original error can be referenced via e.original_exception.

from werkzeug.exceptions import InternalServerError


@app.errorhandler(InternalServerError)
def handle_500(e):
    original = e.original_exception   # 元の例外(KeyError 等)を取り出す
    app.logger.error("internal error: %s", original)
    return jsonify(code=500, name="Internal Server Error"), 500

In addition, as touched on in §1, in debug mode the 500 handler isn't used and the debugger appears, so you need to verify the 500 handler's behavior with DEBUG=False.

4.4 An app-specific error type: the InvalidAPIUsage pattern

When you want to "return a business error (out of stock, usage-limit exceeded, etc.) with a message and an optional payload," instead of inheriting from HTTPException, the official pattern of making an API error type inheriting from plain Exception is convenient.

from flask import jsonify


class InvalidAPIUsage(Exception):
    status_code = 400

    def __init__(self, message, status_code=None, payload=None):
        super().__init__()
        self.message = message
        if status_code is not None:
            self.status_code = status_code
        self.payload = payload

    def to_dict(self):
        rv = dict(self.payload or ())
        rv["message"] = self.message
        return rv


@app.errorhandler(InvalidAPIUsage)
def invalid_api_usage(e):
    return jsonify(e.to_dict()), e.status_code

From a view, you can express a business-rule violation in one line.

@app.route("/api/orders", methods=["POST"])
def create_order():
    if not has_stock():
        raise InvalidAPIUsage("在庫が不足しています。", status_code=409,
                              payload={"reason": "out_of_stock"})
    ...

Since you can add arbitrary keys to payload, you can attach machine-readable info (like reason) the client can branch on. Making status_code variable lets you express both 400s and 409s with the same type.

4.5 Align marshmallow's ValidationError to the same envelope

If you use marshmallow for input validation, you should integrate its ValidationError into the same error envelope too. In the companion Designing a production REST API with marshmallow × Flask × SQLAlchemy, I showed a handler that converts err.messages to 422 (Unprocessable Entity) with @app.errorhandler(ValidationError). The production form is to consolidate this and this article's handlers into a single registration function.

# errors.py — エラーハンドラを 1 箇所に集約する
from flask import Flask, jsonify
from marshmallow import ValidationError
from werkzeug.exceptions import HTTPException


def register_error_handlers(app: Flask) -> None:
    @app.errorhandler(ValidationError)
    def handle_validation_error(e: ValidationError):
        # バリデーション失敗 → 422。フィールド別メッセージを返す
        return jsonify(code=422, name="Unprocessable Entity",
                       errors=e.messages), 422

    @app.errorhandler(HTTPException)
    def handle_http_exception(e: HTTPException):
        # 既知の HTTP エラー → JSON 化(abort(404, description=...) 等)
        return jsonify(code=e.code, name=e.name, description=e.description), e.code or 500

    @app.errorhandler(InvalidAPIUsage)
    def handle_invalid_api_usage(e: InvalidAPIUsage):
        # 業務ルール違反 → 任意のステータス+ペイロード
        return jsonify(code=e.status_code, **e.to_dict()), e.status_code

    @app.errorhandler(Exception)
    def handle_unexpected(e: Exception):
        if isinstance(e, HTTPException):
            return e   # 既知の HTTP エラーは専用ハンドラへ委ねる
        app.logger.exception("unhandled exception")
        return jsonify(code=500, name="Internal Server Error",
                       description="予期しないエラーが発生しました。"), 500

With this, an API's errors become, regardless of origin (HTTP, validation, business rule, unexpected), consistent JSON holding code / name as common keys. The client can branch mechanically on code, and read origin-specific info like errors (422) or reason (business) as needed.

Error originExceptionStatusMain keys
Input validationValidationError (marshmallow)422errors (per field)
Known HTTP errorHTTPException / abort()4xx/5xxdescription
Business-rule violationInvalidAPIUsageArbitrary (409 etc.)message / reason
UnexpectedAll other Exception500description (generic)

💡 By calling this register_error_handlers(app) from create_app(), handler registration is centrally managed as part of the factory. Don't scatter try / except across each route—this is the coexistence of DRY and testability.


5. Logging basics: dictConfig "before app creation"

Once you can return errors as JSON, next is logs for investigation. Flask's logging is the standard logging itself. You don't need to learn a new API, but the timing of configuration has a Flask-specific iron rule.

5.1 The most important iron rule: configure before app creation

The most important official caution—"if app.logger is accessed before logging configuration, a default handler is added. If possible, configure logging before creating the application object."

The canonical form is logging.config.dictConfig. Here's the official template as-is.

from logging.config import dictConfig

dictConfig({
    "version": 1,
    "formatters": {"default": {
        "format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s",
    }},
    "handlers": {"wsgi": {
        "class": "logging.StreamHandler",
        "stream": "ext://flask.logging.wsgi_errors_stream",
        "formatter": "default",
    }},
    "root": {"level": "INFO", "handlers": ["wsgi"]},
})

app = Flask(__name__)   # ← dictConfig の「後」で生成する

stream: ext://flask.logging.wsgi_errors_stream is a Flask-specific designation that writes to the WSGI server's error stream (environ["wsgi.errors"]) during request handling and to sys.stderr otherwise.

⚠️ Writing dictConfig inside create_app (after Flask(__name__)) can be too late. In a factory configuration, run dictConfig before calling create_app, or place the logging configuration at module import time (top level, before the create_app definition). Design the factory's startup order in alignment with the factory chapter of the production-operations guide.

5.2 How to use app.logger and log levels

Use app.logger for log output. Using %s-style lazy formatting is the standard practice.

app.logger.debug("詳細な診断情報")
app.logger.info("%s logged in successfully", user.username)
app.logger.warning("認証に %d 回失敗", attempts)
app.logger.error("DB への接続に失敗: %s", err)

⚠️ The log-level pitfall. Python logging's default level is usually WARNING. That is, no logs below the configured level come out at all. Most accidents of "I put in app.logger.info(...) but nothing comes out" are this. Explicitly set dictConfig's root.level (or the relevant logger's level) to INFO or higher.

5.3 The behavior of the default handler, and removing it

If you configure no logging at all, Flask adds one StreamHandler to app.logger, writing to environ["wsgi.errors"] (usually sys.stderr) during requests and to sys.stderr otherwise. When you want to attach your own handler but the default remains, you get double output, so explicitly remove it.

from flask.logging import default_handler

app.logger.removeHandler(default_handler)

Note that Werkzeug (the dev server) logs requests/responses to the werkzeug logger. In production, access logs are often handled on the Gunicorn / reverse-proxy side, and that design is covered in the deployment guide.


6. Structured logs and request IDs: get logs into a "correlatable" state

A line log like [2026-06-20 10:00:00] INFO in views: order created is enough for a human reading line by line, but unsuited for mechanically searching and correlating in an aggregation foundation (CloudWatch Logs / Datadog / Loki, etc.). In production, add two enhancements—(1) injecting request context and (2) JSON structuring.

6.1 Inject request context into logs (RequestFormatter)

The official docs show a custom Formatter that injects request info (URL, connection source) into log records. The point is to determine the presence of a request context with has_request_context() (startup and background logs come out outside a request).

import logging

from flask import has_request_context, request
from flask.logging import default_handler


class RequestFormatter(logging.Formatter):
    def format(self, record):
        if has_request_context():
            record.url = request.url
            record.remote_addr = request.remote_addr
        else:
            record.url = None
            record.remote_addr = None
        return super().format(record)


formatter = RequestFormatter(
    "[%(asctime)s] %(remote_addr)s requested %(url)s\n"
    "%(levelname)s in %(module)s: %(message)s"
)
default_handler.setFormatter(formatter)

Going through has_request_context() makes the same formatter safely usable both during a request and in the background. For details on has_request_context() and context lifetime, see the context explainer.

6.2 Issue a request ID (correlation ID) and run it through all logs

What's decisive in failure investigation is the request ID (correlation ID). Assign a unique ID per request, and include the same ID in all log lines during that request, and you can search "what happened in this request" with a single ID. Issue it in before_request, put it on g, and inject it into log records.

import uuid

from flask import g, has_request_context, request


@app.before_request
def assign_request_id():
    # 上流(ALB 等)が X-Request-ID を付けていれば踏襲、無ければ発番する
    g.request_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))


@app.after_request
def echo_request_id(response):
    # クライアント/上流が突き合わせられるよう、レスポンスにも載せる
    if "request_id" in g:
        response.headers["X-Request-ID"] = g.request_id
    return response

Build this into RequestFormatter.

class RequestFormatter(logging.Formatter):
    def format(self, record):
        if has_request_context():
            record.request_id = g.get("request_id", "-")
            record.url = request.url
            record.remote_addr = request.remote_addr
        else:
            record.request_id = "-"
            record.url = None
            record.remote_addr = None
        return super().format(record)

💡 g is independent per request. Because it's implemented as a context-local, even if Gunicorn's multiple workers/threads process concurrently, request IDs don't get mixed up. The identity of g (a "context-internal global" based on contextvars) is detailed in the context explainer. The design of inheriting the ID attached by an upstream ALB or reverse proxy also meshes with the deployment guide.

6.3 JSON structured logs: make them searchable in an aggregation foundation

To search and aggregate field-by-field in CloudWatch Logs Insights or Datadog, make the log itself a JSON line. Because the request ID, level, and module become structured fields, queries like "all logs where request_id = ..." or "level = ERROR in time order" can be written instantly.

import json
import logging

from flask import g, has_request_context, request


class JsonFormatter(logging.Formatter):
    def format(self, record):
        log = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "module": record.module,
            "message": record.getMessage(),
        }
        if has_request_context():
            log["request_id"] = g.get("request_id", "-")
            log["method"] = request.method
            log["path"] = request.path
            log["remote_addr"] = request.remote_addr
        # 例外があればスタックトレースも構造化して載せる
        if record.exc_info:
            log["exc_info"] = self.formatException(record.exc_info)
        return json.dumps(log, ensure_ascii=False)

💡 In production, don't write logs to a file; emit them to stdout/stderr. In containers (ECS/Kubernetes), per the 12-factor principle, the app emits logs as a stream to standard output, and leaves collection to the platform (CloudWatch Logs / Fluent Bit, etc.). A logging.StreamHandler (stdout) + JsonFormatter is sufficient.

Note that the standard library also has logging.handlers.RotatingFileHandler (a file handler that rotates by size), but this is not in Flask's logging docs example—it's a standard-library feature. It's an option in a configuration that writes directly to a file on a VM, but in containers, stdout is recommended as above.

Building it into real operation takes the form of defining, in dictConfig's handlers, a formatter pointing to JsonFormatter. With this, logs flow into the aggregation foundation structured, on a path separate from §4's JSON error responses.


7. Error aggregation (Sentry) and PII scrubbing

Even aggregating logs, to aggregate and notify "when, which error, how many times, in which release it increased," you need an error-tracking foundation. The representative one is Sentry. Connect it to Flask with the official integration.

pip install "sentry-sdk[flask]"
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn="https://<key>@<org>.ingest.sentry.io/<project>",   # 環境変数から注入する
    integrations=[FlaskIntegration()],
    traces_sample_rate=0.1,      # トレースは一部サンプリング(コスト最適化)
    send_default_pii=False,      # 既定で PII を送らない(後述)
    environment="production",
)

Just by adding FlaskIntegration, uncaught exceptions are automatically sent to Sentry, aggregated and notified by stack trace, request info, and release. Logs equivalent to app.logger.error(...) can be ingested too.

7.1 Most important: scrub PII and secrets

This is the point to be most careful about in production. The moment personal information (PII) or secrets flow into an error-tracking foundation, that foundation itself becomes a leakage path.

  • Explicitly set send_default_pii=False so Sentry doesn't automatically grab user info, cookies, or bodies.
  • Mask contacts, passwords, and tokens contained in the request body with a before-send hook.
  • What you need for correlation is not "who" but "the same person," so hash the user identifier before using it.
def scrub(event, hint):
    # 機密ヘッダを落とす
    headers = event.get("request", {}).get("headers", {})
    for key in ("Authorization", "Cookie", "X-Api-Key"):
        headers.pop(key, None)
    return event


sentry_sdk.init(
    dsn="...",
    integrations=[FlaskIntegration()],
    send_default_pii=False,
    before_send=scrub,   # 送信直前に PII/秘密をスクラブする
)

⚠️ This is consistent with this site's own security policy too—don't leave PII like contact-form data in logs or the error tracker. Don't put personal names or contacts in logs, error events, or span attributes. The handling of secrets is detailed in the security implementation guide, and avoiding PII contamination of telemetry in the OpenTelemetry observability guide.

7.2 Notify errors by email via SMTP (for small scale)

In a small configuration without Sentry, there's also the hand of using the standard library's SMTPHandler to "email ERROR and above." Don't send during debugging (if not app.debug) is the practice.

import logging
from logging.handlers import SMTPHandler

if not app.debug:
    mail_handler = SMTPHandler(
        mailhost="127.0.0.1",
        fromaddr="server-error@example.com",
        toaddrs=["ops@example.com"],
        subject="Application Error",
    )
    mail_handler.setLevel(logging.ERROR)   # ERROR 以上だけ通知
    app.logger.addHandler(mail_handler)

Email is noisy and tends to break down at scale, so for serious operation, an aggregation foundation like Sentry is recommended.


8. Health checks, and the bridge to tracing

The last piece of observability is the health check, which tells the load balancer "is the app alive, is it ready to accept." In an ALB → ECS (Fargate) configuration, this drives automatic detachment and restart of abnormal tasks.

8.1 Split liveness and readiness

The production form is to split the two kinds of checks into separate endpoints. Mixing them invites an overreaction where a temporary DB hiccup kills the whole app.

KindEndpointMeaningOn-failure behavior
liveness/healthIs the process alive (don't look at dependencies)Restart the task
readiness/readyCan it accept traffic (confirm dependencies like the DB)Temporarily detach from the LB
from flask import jsonify
from sqlalchemy import text

from .extensions import db


@app.get("/health")
def health():
    # liveness: 依存を一切叩かず、プロセスの生存だけを返す(軽量・高速)
    return jsonify(status="ok"), 200


@app.get("/ready")
def ready():
    # readiness: DB へ最小クエリを投げ、受け入れ可能かを判定する
    try:
        db.session.execute(text("SELECT 1"))
    except Exception:
        app.logger.exception("readiness check failed: db unreachable")
        return jsonify(status="unavailable", checks={"db": "fail"}), 503
    return jsonify(status="ok", checks={"db": "ok"}), 200

💡 The iron rule is that liveness doesn't hit dependencies. If /health confirms the DB, the ALB may judge the app's task "dead" and restart it just because the DB clogged momentarily, hindering recovery. Receive only "failures fixed by a restart (process abnormalities)" with liveness, and "failures fixed by waiting (a temporary dependency hiccup)" with readiness—this division prevents needless restart loops. The ALB health-check path setting and its correspondence to the ECS task definition are covered in the deployment guide.

8.2 The bridge to tracing and metrics

Once you've arranged logs, errors, and health checks, what you next want is a trace that follows "how one request passed through multiple services" as a line. This article's structured logs and request IDs are the entry point to it.

  • This article's request_id develops into distributed tracing's trace_id. Introduce OpenTelemetry, and instead of issuing the request ID yourself, you can automatically correlate across multiple services with standard context propagation.
  • Put trace_id / span_id in logs, and the 3-signal correlation of "notice with metrics → locate with traces → read why with logs" holds.

The overall design of this "correlating the 3 signals (traces, metrics, logs)" observability, instrumentation, sampling, and PII-scrubbing details are consolidated in the OpenTelemetry production-observability guide. And the operational side of "who moves how and records what" when a failure actually occurs is covered in the incident-response / postmortem / on-call guide. This article's JSON errors and request IDs are the foundation that becomes the starting point of that investigation.


Summary: production Flask "doesn't stay silent"

Flask's default is smart, but you can't use it as-is for a production API. Only by designing errors, making logs correlatable, and aggregating failures does "running but unintelligible" disappear. Let me restate this article's points.

  1. Don't allow the default 500 HTML. In an API, design errors to return as JSON yourself. DEBUG=True in production is a serious vulnerability exposing the debugger.
  2. Handle errors declaratively with errorhandler / abort / custom HTTPException, and understand the resolution order (code → class hierarchy → most specific). Note that a Blueprint can't catch a 404.
  3. Unify JSON errors into a consistent envelope. Consolidate HTTPException→JSON, Exception (isinstance pass-through), InvalidAPIUsage, and marshmallow's ValidationError→422 into one register_error_handlers.
  4. Grasp the identity of an unexpected exception with InternalServerError.original_exception, and record internal info to logs without leaking it to the client.
  5. Run dictConfig before app creation. Configure before touching app.logger, or a default handler gets attached. Beware the log-level default WARNING trap.
  6. With RequestFormatter + request ID (g + before_request) + JSON structured logs, get into a state where you can skewer-search from the aggregation foundation with a single ID. In containers, stdout, not a file.
  7. Aggregate errors with Sentry (sentry-sdk[flask]), and scrub PII / secrets with send_default_pii=False + before_send.
  8. Split /health (liveness) and /ready (readiness, DB connectivity) and correctly drive ALB/ECS auto-recovery. The request_id bridges to tracing.

Production logging / error-design checklist

  • Are all API errors returned as JSON (no HTML 500 leaking)
  • Is DEBUG reliably disabled in production (no debugger exposed)
  • Do error handlers' return values explicitly specify the HTTP status (not returning 200)
  • Is the 404 handler registered at the app level (a Blueprint can't catch it)
  • Is the error envelope consistent across all origins, including ValidationError→422
  • Are unexpected exceptions' details (stack trace, SQL, paths) not returned to the client
  • Is dictConfig run before app creation
  • Is the log level explicitly INFO or higher, and do logs that should come out come out
  • Is a request ID issued and correlated into all log lines and the response header
  • In containers, are logs emitted to stdout/stderr (JSON-structured), not written to a file
  • Are PII / secrets not flowing into Sentry etc. (send_default_pii=False + scrub)
  • Are /health (liveness, doesn't hit dependencies) and /ready (readiness, DB check) split

Error handling, logging, and observability are not features but an investment that decides "your own speed when a failure occurs." Flask's thinness means the freedom—and responsibility—to design this layer yourself. For the whole picture, go to the Flask production-operations guide; for input-boundary design, the marshmallow × Flask × SQLAlchemy guide; and for 3-signal observability, the OpenTelemetry guide.

友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading