Introduction: the scariest thing in production is the "silent exception"
In a production Flask app, the first thing that breaks is not a feature but "how a failure looks." A report comes from a user: "I get an error." But the response is a flavorless 500 Internal Server Error HTML, the logs have no correlation info, and which request, for which user, at which stage it fell over—none of it connects. This "silence" turns a 5-minute investigation into a 2-hour hell.
In production mode, Flask has the smart default of, on an exception, "displaying a very simple page and recording the exception to the logger" (this is the official statement). It's sufficient for learning, but for production operation of a REST API, you must not use this default as-is. What an API's clients (frontend, mobile, other services) need is not an HTML page but machine-readable structured JSON errors, and what the investigator needs is structured logs correlated by a request ID.
This article is a spoke that digs §6 "Error Handling and Logging" of the Flask production-operations guide (pillar) down to production quality. It covers the following 3 layers.
- Error handling:
errorhandler/abort/ customHTTPException, resolution order, JSON error design for APIs. - Logging:
dictConfig,app.logger, structured logs, request-ID correlation. - Observability: error aggregation via Sentry, PII scrubbing, health checks, and the bridge to tracing.
I have designed and implemented the backend of a Minister of Economy, Trade and Industry Award-winning B2B SaaS in Python / Flask / SQLAlchemy / PostgreSQL, and operated it in production on ALB → ECS (Fargate) with structured logs and error tracking. What I show here is only the design needed in that real combat to make failures "traceable in 5 minutes."
💡 Versions covered in this article: assuming the Flask 3.1 line. Error handling builds directly on
werkzeug.exceptions, logging on the standardlogging. The code is based on patterns from the official documentation (flask.palletsprojects.com stable).
1. Default error behavior: why an API must "design its own errors"
First, grasp precisely what happens when Flask does nothing. In the official's own words—"when the application is running in production mode and an exception is raised, Flask displays a very simple page and records the exception to the logger."
This is a reasonable default as a web page. But for an API there are two problems.
- The response is HTML: a 500 page with
Content-Type: text/htmlis returned, so a client expecting JSON fails to parse it. The client can't mechanically distinguish the kind of error (validation failure, resource absence, or internal server error). - The debug-mode trap: in development with
debug=True, the 500 handler isn't used and the interactive debugger is displayed instead. Exposing this in production is a serious vulnerability leading to arbitrary code execution (always disableDEBUGin production).
⚠️
DEBUG=Truein production is strictly forbidden. The official docs state clearly that "in debug mode, the 'Internal Server Error' handler is not used, and the interactive debugger is shown instead." The debugger is Werkzeug's console and can execute arbitrary Python on the server. Expose it in production and you're compromised instantly. For handlingDEBUG, see the configuration chapter of the production-operations guide.
The conclusion is simple. In an API, don't allow an exception to become a 500 HTML as-is; design yourself "which exception, with which HTTP status, in what JSON to return." That's the theme from this chapter on.
2. The basics of error handlers: errorhandler / register_error_handler / abort
2.1 Two ways to register a handler
Processing for exceptions and status codes is registered as a handler function. There are two ways to write it, with the same functionality.
from werkzeug.exceptions import BadRequest
# 方法 A:デコレータ(局所的で読みやすい)
@app.errorhandler(BadRequest)
def handle_bad_request(e):
return "bad request!", 400
# 方法 B:register_error_handler(ファクトリ内でまとめて登録しやすい)
app.register_error_handler(400, handle_bad_request)
There are two important official facts here.
HTTPExceptionsubclasses and HTTP codes are interchangeable at registration. SinceBadRequest.code == 400,@app.errorhandler(BadRequest)and@app.errorhandler(400)mean the same thing.- The handler's return status code is not set automatically. The official docs caution that "the status code of the response is not set to the handler's code. Make sure to specify the appropriate HTTP status code when returning a response from a handler." That is, you must explicitly specify the status in the second return value like
return "bad request!", 400, or it returns with 200.
⚠️ "Forgetting the status code" is a frequent bug. If inside
@app.errorhandler(404)you write onlyreturn jsonify(error="not found"), the HTTP status becomes 200. Always attach the code like, 404to an error handler's return value. Registration in an application-factory configuration (callingregister_error_handlerinsidecreate_app) aligns with the factory chapter of the production-operations guide.
2.2 abort(code, description): explicitly interrupt from a view
When you want to "return a 404 here" partway through a view function, use abort(). abort() raises the corresponding HTTPException and delegates processing to the registered handler. The official pattern:
from flask import abort, render_template, request
@app.route("/profile")
def user_profile():
username = request.args.get("username") # クエリ文字列から取得
if username is None:
abort(400) # 引数不正 → 400
user = get_user(username=username)
if user is None:
abort(404) # 該当ユーザーなし → 404
return render_template("profile.html", user=user)
@app.errorhandler(404)
def page_not_found(e):
return render_template("404.html"), 404
You can override the description with the second argument like abort(404, description="Resource not found"). This is useful in the JSON errors described later for telling the client a specific reason.
💡 Early-
aborting in a view like "400 if there's nousername," "404 if the user doesn't exist" is a form of guard clause (early return). It avoids deep nesting and lets you line up the normal-path logic straight below. Fetchingrequest.argsand passing request-scope values viagare covered in detail in the application/request context explainer.
2.3 Custom HTTPException subclasses: define domain-specific errors
When you want to define a status not in the standard (e.g., 507 Insufficient Storage) or a domain-specific error, inherit from HTTPException. The official pattern:
import werkzeug.exceptions
class InsufficientStorage(werkzeug.exceptions.HTTPException):
code = 507
description = "Not enough storage space."
app.register_error_handler(InsufficientStorage, handle_507)
raise InsufficientStorage()
Just by giving code and description as class attributes, you can make a custom exception you only need to raise. Because you can write the code with a business-meaningful name (InsufficientStorage), the readability of the view function goes up.
3. Resolution order: which handler is chosen (code → class hierarchy → most specific)
When you've registered multiple handlers, "which is chosen" is the core that determines production behavior. Let me quote the official rule directly.
When Flask catches an exception while handling a request, it is first looked up by code. If there's no handler for that code, Flask looks up the error by class hierarchy, and the most specific handler is chosen. If no handler is registered at all,
HTTPExceptionsubclasses show a generic message about their code, and other exceptions are converted to a generic "500 Internal Server Error."
Let me organize this in a table.
| Stage | Matched against | Example |
|---|---|---|
| ① Code match | A numeric code like @app.errorhandler(404) | abort(404) → the code-404 handler |
| ② Class hierarchy | If no code match, traverse the exception class | BadRequest (= a subclass of HTTPException) → the HTTPException handler |
| ③ Most specific | On the hierarchy, prefer the closer (more specific) handler | If both Exception and HTTPException exist, the latter for HTTPException exceptions |
| ④ When unregistered | HTTPException gets a generic message, others go to 500 | An unexpected exception → 500 Internal Server Error |
There are two practical implications of this rule.
- You may coexist broad handlers (
HTTPException/Exception) with narrow handlers (404). Since the narrow one is preferred, you can build a two-tier setup: "an error you want to craft individually (404) gets a dedicated handler, and everything else is JSON-ified by an umbrella handler." - The
Exceptionhandler doesn't stealHTTPException. The official docs state clearly—"if you register handlers for bothHTTPExceptionandException, theHTTPExceptionhandler is more specific, so theExceptionhandler doesn't handleHTTPExceptionsubclasses." This property enables the next chapter's design of "catching only unexpected exceptions as 500."
3.1 Blueprints and resolution order: only 404 is the exception
When you register handlers per Blueprint, for errors under that Blueprint, the Blueprint's handler takes priority over the app-wide handler. But there's one serious exception. In the official words:
A handler registered on a Blueprint takes priority over one registered globally. However, a Blueprint cannot handle 404 routing errors. A 404 occurs at the routing stage, before which Blueprint is determined.
⚠️ "Trying to catch a 404 in a Blueprint but failing to" is a typical stumbling point. A nonexistent URL becomes a 404 at the routing layer before reaching any Blueprint. Therefore, register the 404 handler at the app level (
@app.errorhandler(404)). A Blueprint-level handler only works for "post-arrival errors" likeabort(403)within that Blueprint. The Blueprint configuration itself is covered in the production-operations guide.
4. JSON error design: a "consistent error envelope" for APIs
From here is the heart of API design. The goal is that all errors are returned in the same shape of JSON. The client can handle errors with just one kind of parsing.
4.1 Turn every HTTPException into JSON
What abort(404) or abort(400) returns is HTML by default. What converts this to JSON in bulk is the HTTPException handler shown by the official docs.
from flask import json
from werkzeug.exceptions import HTTPException
@app.errorhandler(HTTPException)
def handle_exception(e):
"""すべての HTTP エラーを JSON で返す。"""
# HTTPException が持つレスポンスをベースに、本文だけ JSON に差し替える
response = e.get_response()
response.data = json.dumps({
"code": e.code,
"name": e.name,
"description": e.description,
})
response.content_type = "application/json"
return response
The response e.get_response() returns already has the status code correctly set, so you don't need to attach , e.code here (the §2.1 caution is about returning a tuple, not a response object). Putting code / name / description in the body makes the description from abort(404, description="...") reach the client as-is.
4.2 Turn unexpected exceptions into 500 JSON too (isinstance pass-through)
The HTTPException handler handles "HTTP errors Flask knows." But unexpected exceptions like KeyError or ZeroDivisionError are not HTTPException. If you want to return these as JSON too, add an Exception handler. The official pattern:
from werkzeug.exceptions import HTTPException
@app.errorhandler(Exception)
def handle_exception(e):
# HTTPException はここで握らず、専用ハンドラ(§4.1)に委ねる
if isinstance(e, HTTPException):
return e
# 想定外の例外だけを 500 として扱う
return render_template("500_generic.html", e=e), 500
Here the pass-through if isinstance(e, HTTPException): return e is the point. As seen in §3, the HTTPException handler is more specific, so a HTTPException normally doesn't reach the Exception handler, but adding this guard makes the code's intent explicit and robust to future registration-order changes. For an API, return JSON instead of render_template.
from flask import jsonify
from werkzeug.exceptions import HTTPException
@app.errorhandler(Exception)
def handle_unexpected_exception(e):
if isinstance(e, HTTPException):
return e
# 想定外の例外は詳細を漏らさず、一貫したエンベロープで 500 を返す
app.logger.exception("unhandled exception") # スタックトレースはログへ
return jsonify(code=500, name="Internal Server Error",
description="予期しないエラーが発生しました。"), 500
⚠️ Don't put internal info in an error response. Returning a stack trace, SQL statement, or file path to the client is information leakage. Record the details on the log side with
app.logger.exception(), and return only a generic message to the client. The principle of not exposing secrets or PII is consistent with the security implementation guide.
4.3 The identity of an unexpected exception: InternalServerError and original_exception
When an uncaught exception is passed to the 500 handler, the e passed is not the original exception itself. An important official fact:
Since Flask 1.1.0, this error handler is always passed an instance of
InternalServerError, not the original uncaught exception itself. The original error can be referenced viae.original_exception.
from werkzeug.exceptions import InternalServerError
@app.errorhandler(InternalServerError)
def handle_500(e):
original = e.original_exception # 元の例外(KeyError 等)を取り出す
app.logger.error("internal error: %s", original)
return jsonify(code=500, name="Internal Server Error"), 500
In addition, as touched on in §1, in debug mode the 500 handler isn't used and the debugger appears, so you need to verify the 500 handler's behavior with DEBUG=False.
4.4 An app-specific error type: the InvalidAPIUsage pattern
When you want to "return a business error (out of stock, usage-limit exceeded, etc.) with a message and an optional payload," instead of inheriting from HTTPException, the official pattern of making an API error type inheriting from plain Exception is convenient.
from flask import jsonify
class InvalidAPIUsage(Exception):
status_code = 400
def __init__(self, message, status_code=None, payload=None):
super().__init__()
self.message = message
if status_code is not None:
self.status_code = status_code
self.payload = payload
def to_dict(self):
rv = dict(self.payload or ())
rv["message"] = self.message
return rv
@app.errorhandler(InvalidAPIUsage)
def invalid_api_usage(e):
return jsonify(e.to_dict()), e.status_code
From a view, you can express a business-rule violation in one line.
@app.route("/api/orders", methods=["POST"])
def create_order():
if not has_stock():
raise InvalidAPIUsage("在庫が不足しています。", status_code=409,
payload={"reason": "out_of_stock"})
...
Since you can add arbitrary keys to payload, you can attach machine-readable info (like reason) the client can branch on. Making status_code variable lets you express both 400s and 409s with the same type.
4.5 Align marshmallow's ValidationError to the same envelope
If you use marshmallow for input validation, you should integrate its ValidationError into the same error envelope too. In the companion Designing a production REST API with marshmallow × Flask × SQLAlchemy, I showed a handler that converts err.messages to 422 (Unprocessable Entity) with @app.errorhandler(ValidationError). The production form is to consolidate this and this article's handlers into a single registration function.
# errors.py — エラーハンドラを 1 箇所に集約する
from flask import Flask, jsonify
from marshmallow import ValidationError
from werkzeug.exceptions import HTTPException
def register_error_handlers(app: Flask) -> None:
@app.errorhandler(ValidationError)
def handle_validation_error(e: ValidationError):
# バリデーション失敗 → 422。フィールド別メッセージを返す
return jsonify(code=422, name="Unprocessable Entity",
errors=e.messages), 422
@app.errorhandler(HTTPException)
def handle_http_exception(e: HTTPException):
# 既知の HTTP エラー → JSON 化(abort(404, description=...) 等)
return jsonify(code=e.code, name=e.name, description=e.description), e.code or 500
@app.errorhandler(InvalidAPIUsage)
def handle_invalid_api_usage(e: InvalidAPIUsage):
# 業務ルール違反 → 任意のステータス+ペイロード
return jsonify(code=e.status_code, **e.to_dict()), e.status_code
@app.errorhandler(Exception)
def handle_unexpected(e: Exception):
if isinstance(e, HTTPException):
return e # 既知の HTTP エラーは専用ハンドラへ委ねる
app.logger.exception("unhandled exception")
return jsonify(code=500, name="Internal Server Error",
description="予期しないエラーが発生しました。"), 500
With this, an API's errors become, regardless of origin (HTTP, validation, business rule, unexpected), consistent JSON holding code / name as common keys. The client can branch mechanically on code, and read origin-specific info like errors (422) or reason (business) as needed.
| Error origin | Exception | Status | Main keys |
|---|---|---|---|
| Input validation | ValidationError (marshmallow) | 422 | errors (per field) |
| Known HTTP error | HTTPException / abort() | 4xx/5xx | description |
| Business-rule violation | InvalidAPIUsage | Arbitrary (409 etc.) | message / reason |
| Unexpected | All other Exception | 500 | description (generic) |
💡 By calling this
register_error_handlers(app)fromcreate_app(), handler registration is centrally managed as part of the factory. Don't scattertry / exceptacross each route—this is the coexistence of DRY and testability.
5. Logging basics: dictConfig "before app creation"
Once you can return errors as JSON, next is logs for investigation. Flask's logging is the standard logging itself. You don't need to learn a new API, but the timing of configuration has a Flask-specific iron rule.
5.1 The most important iron rule: configure before app creation
The most important official caution—"if app.logger is accessed before logging configuration, a default handler is added. If possible, configure logging before creating the application object."
The canonical form is logging.config.dictConfig. Here's the official template as-is.
from logging.config import dictConfig
dictConfig({
"version": 1,
"formatters": {"default": {
"format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s",
}},
"handlers": {"wsgi": {
"class": "logging.StreamHandler",
"stream": "ext://flask.logging.wsgi_errors_stream",
"formatter": "default",
}},
"root": {"level": "INFO", "handlers": ["wsgi"]},
})
app = Flask(__name__) # ← dictConfig の「後」で生成する
stream: ext://flask.logging.wsgi_errors_stream is a Flask-specific designation that writes to the WSGI server's error stream (environ["wsgi.errors"]) during request handling and to sys.stderr otherwise.
⚠️ Writing
dictConfiginsidecreate_app(afterFlask(__name__)) can be too late. In a factory configuration, rundictConfigbefore callingcreate_app, or place the logging configuration at module import time (top level, before thecreate_appdefinition). Design the factory's startup order in alignment with the factory chapter of the production-operations guide.
5.2 How to use app.logger and log levels
Use app.logger for log output. Using %s-style lazy formatting is the standard practice.
app.logger.debug("詳細な診断情報")
app.logger.info("%s logged in successfully", user.username)
app.logger.warning("認証に %d 回失敗", attempts)
app.logger.error("DB への接続に失敗: %s", err)
⚠️ The log-level pitfall. Python
logging's default level is usuallyWARNING. That is, no logs below the configured level come out at all. Most accidents of "I put inapp.logger.info(...)but nothing comes out" are this. Explicitly setdictConfig'sroot.level(or the relevant logger's level) toINFOor higher.
5.3 The behavior of the default handler, and removing it
If you configure no logging at all, Flask adds one StreamHandler to app.logger, writing to environ["wsgi.errors"] (usually sys.stderr) during requests and to sys.stderr otherwise. When you want to attach your own handler but the default remains, you get double output, so explicitly remove it.
from flask.logging import default_handler
app.logger.removeHandler(default_handler)
Note that Werkzeug (the dev server) logs requests/responses to the werkzeug logger. In production, access logs are often handled on the Gunicorn / reverse-proxy side, and that design is covered in the deployment guide.
6. Structured logs and request IDs: get logs into a "correlatable" state
A line log like [2026-06-20 10:00:00] INFO in views: order created is enough for a human reading line by line, but unsuited for mechanically searching and correlating in an aggregation foundation (CloudWatch Logs / Datadog / Loki, etc.). In production, add two enhancements—(1) injecting request context and (2) JSON structuring.
6.1 Inject request context into logs (RequestFormatter)
The official docs show a custom Formatter that injects request info (URL, connection source) into log records. The point is to determine the presence of a request context with has_request_context() (startup and background logs come out outside a request).
import logging
from flask import has_request_context, request
from flask.logging import default_handler
class RequestFormatter(logging.Formatter):
def format(self, record):
if has_request_context():
record.url = request.url
record.remote_addr = request.remote_addr
else:
record.url = None
record.remote_addr = None
return super().format(record)
formatter = RequestFormatter(
"[%(asctime)s] %(remote_addr)s requested %(url)s\n"
"%(levelname)s in %(module)s: %(message)s"
)
default_handler.setFormatter(formatter)
Going through has_request_context() makes the same formatter safely usable both during a request and in the background. For details on has_request_context() and context lifetime, see the context explainer.
6.2 Issue a request ID (correlation ID) and run it through all logs
What's decisive in failure investigation is the request ID (correlation ID). Assign a unique ID per request, and include the same ID in all log lines during that request, and you can search "what happened in this request" with a single ID. Issue it in before_request, put it on g, and inject it into log records.
import uuid
from flask import g, has_request_context, request
@app.before_request
def assign_request_id():
# 上流(ALB 等)が X-Request-ID を付けていれば踏襲、無ければ発番する
g.request_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))
@app.after_request
def echo_request_id(response):
# クライアント/上流が突き合わせられるよう、レスポンスにも載せる
if "request_id" in g:
response.headers["X-Request-ID"] = g.request_id
return response
Build this into RequestFormatter.
class RequestFormatter(logging.Formatter):
def format(self, record):
if has_request_context():
record.request_id = g.get("request_id", "-")
record.url = request.url
record.remote_addr = request.remote_addr
else:
record.request_id = "-"
record.url = None
record.remote_addr = None
return super().format(record)
💡
gis independent per request. Because it's implemented as a context-local, even if Gunicorn's multiple workers/threads process concurrently, request IDs don't get mixed up. The identity ofg(a "context-internal global" based oncontextvars) is detailed in the context explainer. The design of inheriting the ID attached by an upstream ALB or reverse proxy also meshes with the deployment guide.
6.3 JSON structured logs: make them searchable in an aggregation foundation
To search and aggregate field-by-field in CloudWatch Logs Insights or Datadog, make the log itself a JSON line. Because the request ID, level, and module become structured fields, queries like "all logs where request_id = ..." or "level = ERROR in time order" can be written instantly.
import json
import logging
from flask import g, has_request_context, request
class JsonFormatter(logging.Formatter):
def format(self, record):
log = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"module": record.module,
"message": record.getMessage(),
}
if has_request_context():
log["request_id"] = g.get("request_id", "-")
log["method"] = request.method
log["path"] = request.path
log["remote_addr"] = request.remote_addr
# 例外があればスタックトレースも構造化して載せる
if record.exc_info:
log["exc_info"] = self.formatException(record.exc_info)
return json.dumps(log, ensure_ascii=False)
💡 In production, don't write logs to a file; emit them to stdout/stderr. In containers (ECS/Kubernetes), per the 12-factor principle, the app emits logs as a stream to standard output, and leaves collection to the platform (CloudWatch Logs / Fluent Bit, etc.). A
logging.StreamHandler(stdout) +JsonFormatteris sufficient.Note that the standard library also has
logging.handlers.RotatingFileHandler(a file handler that rotates by size), but this is not in Flask's logging docs example—it's a standard-library feature. It's an option in a configuration that writes directly to a file on a VM, but in containers, stdout is recommended as above.
Building it into real operation takes the form of defining, in dictConfig's handlers, a formatter pointing to JsonFormatter. With this, logs flow into the aggregation foundation structured, on a path separate from §4's JSON error responses.
7. Error aggregation (Sentry) and PII scrubbing
Even aggregating logs, to aggregate and notify "when, which error, how many times, in which release it increased," you need an error-tracking foundation. The representative one is Sentry. Connect it to Flask with the official integration.
pip install "sentry-sdk[flask]"
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration
sentry_sdk.init(
dsn="https://<key>@<org>.ingest.sentry.io/<project>", # 環境変数から注入する
integrations=[FlaskIntegration()],
traces_sample_rate=0.1, # トレースは一部サンプリング(コスト最適化)
send_default_pii=False, # 既定で PII を送らない(後述)
environment="production",
)
Just by adding FlaskIntegration, uncaught exceptions are automatically sent to Sentry, aggregated and notified by stack trace, request info, and release. Logs equivalent to app.logger.error(...) can be ingested too.
7.1 Most important: scrub PII and secrets
This is the point to be most careful about in production. The moment personal information (PII) or secrets flow into an error-tracking foundation, that foundation itself becomes a leakage path.
- Explicitly set
send_default_pii=Falseso Sentry doesn't automatically grab user info, cookies, or bodies. - Mask contacts, passwords, and tokens contained in the request body with a before-send hook.
- What you need for correlation is not "who" but "the same person," so hash the user identifier before using it.
def scrub(event, hint):
# 機密ヘッダを落とす
headers = event.get("request", {}).get("headers", {})
for key in ("Authorization", "Cookie", "X-Api-Key"):
headers.pop(key, None)
return event
sentry_sdk.init(
dsn="...",
integrations=[FlaskIntegration()],
send_default_pii=False,
before_send=scrub, # 送信直前に PII/秘密をスクラブする
)
⚠️ This is consistent with this site's own security policy too—don't leave PII like contact-form data in logs or the error tracker. Don't put personal names or contacts in logs, error events, or span attributes. The handling of secrets is detailed in the security implementation guide, and avoiding PII contamination of telemetry in the OpenTelemetry observability guide.
7.2 Notify errors by email via SMTP (for small scale)
In a small configuration without Sentry, there's also the hand of using the standard library's SMTPHandler to "email ERROR and above." Don't send during debugging (if not app.debug) is the practice.
import logging
from logging.handlers import SMTPHandler
if not app.debug:
mail_handler = SMTPHandler(
mailhost="127.0.0.1",
fromaddr="server-error@example.com",
toaddrs=["ops@example.com"],
subject="Application Error",
)
mail_handler.setLevel(logging.ERROR) # ERROR 以上だけ通知
app.logger.addHandler(mail_handler)
Email is noisy and tends to break down at scale, so for serious operation, an aggregation foundation like Sentry is recommended.
8. Health checks, and the bridge to tracing
The last piece of observability is the health check, which tells the load balancer "is the app alive, is it ready to accept." In an ALB → ECS (Fargate) configuration, this drives automatic detachment and restart of abnormal tasks.
8.1 Split liveness and readiness
The production form is to split the two kinds of checks into separate endpoints. Mixing them invites an overreaction where a temporary DB hiccup kills the whole app.
| Kind | Endpoint | Meaning | On-failure behavior |
|---|---|---|---|
| liveness | /health | Is the process alive (don't look at dependencies) | Restart the task |
| readiness | /ready | Can it accept traffic (confirm dependencies like the DB) | Temporarily detach from the LB |
from flask import jsonify
from sqlalchemy import text
from .extensions import db
@app.get("/health")
def health():
# liveness: 依存を一切叩かず、プロセスの生存だけを返す(軽量・高速)
return jsonify(status="ok"), 200
@app.get("/ready")
def ready():
# readiness: DB へ最小クエリを投げ、受け入れ可能かを判定する
try:
db.session.execute(text("SELECT 1"))
except Exception:
app.logger.exception("readiness check failed: db unreachable")
return jsonify(status="unavailable", checks={"db": "fail"}), 503
return jsonify(status="ok", checks={"db": "ok"}), 200
💡 The iron rule is that liveness doesn't hit dependencies. If
/healthconfirms the DB, the ALB may judge the app's task "dead" and restart it just because the DB clogged momentarily, hindering recovery. Receive only "failures fixed by a restart (process abnormalities)" with liveness, and "failures fixed by waiting (a temporary dependency hiccup)" with readiness—this division prevents needless restart loops. The ALB health-check path setting and its correspondence to the ECS task definition are covered in the deployment guide.
8.2 The bridge to tracing and metrics
Once you've arranged logs, errors, and health checks, what you next want is a trace that follows "how one request passed through multiple services" as a line. This article's structured logs and request IDs are the entry point to it.
- This article's request_id develops into distributed tracing's trace_id. Introduce OpenTelemetry, and instead of issuing the request ID yourself, you can automatically correlate across multiple services with standard context propagation.
- Put
trace_id/span_idin logs, and the 3-signal correlation of "notice with metrics → locate with traces → read why with logs" holds.
The overall design of this "correlating the 3 signals (traces, metrics, logs)" observability, instrumentation, sampling, and PII-scrubbing details are consolidated in the OpenTelemetry production-observability guide. And the operational side of "who moves how and records what" when a failure actually occurs is covered in the incident-response / postmortem / on-call guide. This article's JSON errors and request IDs are the foundation that becomes the starting point of that investigation.
Summary: production Flask "doesn't stay silent"
Flask's default is smart, but you can't use it as-is for a production API. Only by designing errors, making logs correlatable, and aggregating failures does "running but unintelligible" disappear. Let me restate this article's points.
- Don't allow the default 500 HTML. In an API, design errors to return as JSON yourself.
DEBUG=Truein production is a serious vulnerability exposing the debugger. - Handle errors declaratively with
errorhandler/abort/ customHTTPException, and understand the resolution order (code → class hierarchy → most specific). Note that a Blueprint can't catch a 404. - Unify JSON errors into a consistent envelope. Consolidate
HTTPException→JSON,Exception(isinstancepass-through),InvalidAPIUsage, and marshmallow'sValidationError→422 into oneregister_error_handlers. - Grasp the identity of an unexpected exception with
InternalServerError.original_exception, and record internal info to logs without leaking it to the client. - Run
dictConfigbefore app creation. Configure before touchingapp.logger, or a default handler gets attached. Beware the log-level defaultWARNINGtrap. - With
RequestFormatter+ request ID (g+before_request) + JSON structured logs, get into a state where you can skewer-search from the aggregation foundation with a single ID. In containers, stdout, not a file. - Aggregate errors with Sentry (
sentry-sdk[flask]), and scrub PII / secrets withsend_default_pii=False+before_send. - Split
/health(liveness) and/ready(readiness, DB connectivity) and correctly drive ALB/ECS auto-recovery. The request_id bridges to tracing.
Production logging / error-design checklist
- Are all API errors returned as JSON (no HTML 500 leaking)
- Is
DEBUGreliably disabled in production (no debugger exposed) - Do error handlers' return values explicitly specify the HTTP status (not returning 200)
- Is the 404 handler registered at the app level (a Blueprint can't catch it)
- Is the error envelope consistent across all origins, including
ValidationError→422 - Are unexpected exceptions' details (stack trace, SQL, paths) not returned to the client
- Is
dictConfigrun before app creation - Is the log level explicitly
INFOor higher, and do logs that should come out come out - Is a request ID issued and correlated into all log lines and the response header
- In containers, are logs emitted to stdout/stderr (JSON-structured), not written to a file
- Are PII / secrets not flowing into Sentry etc. (
send_default_pii=False+ scrub) - Are
/health(liveness, doesn't hit dependencies) and/ready(readiness, DB check) split
Error handling, logging, and observability are not features but an investment that decides "your own speed when a failure occurs." Flask's thinness means the freedom—and responsibility—to design this layer yourself. For the whole picture, go to the Flask production-operations guide; for input-boundary design, the marshmallow × Flask × SQLAlchemy guide; and for 3-signal observability, the OpenTelemetry guide.