When someone says "the API is slow in production," can you instantly answer which process, by how much, and why it is slow? An operation where you grep logs and guess wastes time on every incident. Observability is creating a state where you can trace a stalled or slow process at a glance — with data, not guesses.
This article is the observability chapter of the Go Echo production-operations guide. We take traces and metrics with OpenTelemetry (OTel) and correlate them with slog structured logs. Platform-wide observability design is left to the OpenTelemetry practical guide; here we focus on an implementation that works right now on Echo v5.
Rules for this article: Echo's API is based on the official documentation (v5, as of June 2026). Important: the once-standard
otelecho(go.opentelemetry.io/contrib/.../labstack/echo/otelecho) is deprecated and assumes Echo v4. This article adopts custom middleware that uses the OTel SDK directly, independent of it (the reasoning is in chapter 1). The OTel SDK is updated, so confirm the latest API in the official docs.
0. The three pillars: "correlate" traces, metrics, and logs
Observability is built from three signals. The value is in correlating them, not collecting them separately.
- Traces: a breakdown of "which process, in what order, and how long" one request took. Effective for pinpointing the culprit of latency.
- Metrics: aggregate values (request count, error rate, duration distribution). Effective for trends and alerts.
- Logs: the detail of individual events. Effective for the context of the cause.
When you tie these together by trace_id, you get a single investigative line: "notice the error rate rising in metrics → identify the slow span in traces for that time window → jump to the logs by that span's trace_id and read the cause." This is the goal of the article.
1. Why custom middleware instead of otelecho
The standard is the otelecho middleware, but as of June 2026 there are two problems.
- Deprecated: the package itself has been deprecated.
- Assumes Echo v4: in v5 the handler signature changed to
func(c *echo.Context) error, so v4-premised instrumentation doesn't mesh as-is.
OpenTelemetry's core SDK (go.opentelemetry.io/otel) is framework-independent. So having thin instrumentation that calls the OTel SDK directly as Echo middleware yourself is unaffected by versions, not dragged into deprecation, and understandable inside — a robust choice in terms of ETC (ease of change). The code is only a few dozen lines.
2. Trace-instrumentation middleware: propagation is everything
The crux of distributed tracing is context propagation. You Extract the parent trace context from the incoming request's header, start a span, and always carry that ctx to downstream (DB / external API) — if this is cut, the trace becomes fragmented.
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/propagation"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
"go.opentelemetry.io/otel/trace"
)
func OTelTracing(service string) echo.MiddlewareFunc {
tracer := otel.Tracer(service)
propagator := otel.GetTextMapPropagator()
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c *echo.Context) error {
req := c.Request()
// ① 受信ヘッダから親トレース文脈を取り出す(W3C traceparent 等)
ctx := propagator.Extract(req.Context(), propagation.HeaderCarrier(req.Header))
// ② span を開始。ルートパターンを名前にする(カーディナリティを抑える)
route := c.Path() // "/users/:id"(実値ではなくパターン=低カーディナリティ)
ctx, span := tracer.Start(ctx, req.Method+" "+route,
trace.WithSpanKind(trace.SpanKindServer),
trace.WithAttributes(
semconv.HTTPRequestMethodKey.String(req.Method),
semconv.HTTPRouteKey.String(route),
),
)
defer span.End()
// ③ 後続(ハンドラ→DB→外部API)へ ctx を貫通させる(最重要)
c.SetRequest(req.WithContext(ctx))
err := next(c)
// ④ 結果を span に記録
status := c.Response().Status
span.SetAttributes(semconv.HTTPResponseStatusCodeKey.Int(status))
if err != nil || status >= 500 {
span.SetStatus(codes.Error, http.StatusText(status))
if err != nil {
span.RecordError(err)
}
}
return err
}
}
}
Key design points:
- The span name is
c.Path()(the route pattern). If you name it with an actual value like/users/42, each ID is treated as a separate span and cardinality explodes. Normalize to/users/:id. c.SetRequest(req.WithContext(ctx))is the heart of propagation. Forget it and the span doesn't ride onc.Request().Context()inside the handler, so the child spans of the DB query and the external API won't connect to the parent.- Place it inside
Recoverso that panics are also recorded to the span via the centralized error handler.
3. Child spans: thread DB and external HTTP into one line
If you pass down the ctx created in the middleware, the lower-level processes hang off the parent as child spans. This visualizes "the API is fast but the DB is slow."
// DB:pgx なら otelpgx で自動計装、または手動で子 span
func (r *UserRepo) FindByID(ctx context.Context, id string) (*User, error) {
ctx, span := otel.Tracer("repo").Start(ctx, "UserRepo.FindByID") // 親 ctx から子 span
defer span.End()
// ... r.pool.Query(ctx, ...) ← ctx 経由で DB span が親に繋がる
}
// 外部 HTTP:otelhttp はフレームワーク非依存なのでそのまま使える
import "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
client := &http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
// client.Do(req.WithContext(ctx)) ← 送信先へ traceparent を自動伝播
otelhttp(outbound HTTP instrumentation) just wraps thenet/httptransport, so it does not depend on Echo's version. pgx also has instrumentation libraries likeotelpgx. "Use framework-independent instrumentation, and replace framework-tightly-coupled instrumentation (otelecho) with your own" is the robust policy for the transition period.
4. Metrics: RED in a minimal setup
Hold metrics based on RED (Rate, Errors, Duration). The minimal setup is a Histogram of request duration and an UpDownCounter of in-flight requests. Add these to the same middleware.
import "go.opentelemetry.io/otel/metric"
func OTelMetrics(service string) echo.MiddlewareFunc {
meter := otel.Meter(service)
duration, _ := meter.Float64Histogram("http.server.request.duration",
metric.WithUnit("s"), metric.WithDescription("HTTP request duration"))
inflight, _ := meter.Int64UpDownCounter("http.server.active_requests")
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c *echo.Context) error {
ctx := c.Request().Context()
start := time.Now()
inflight.Add(ctx, 1)
defer inflight.Add(ctx, -1)
err := next(c)
// 属性はルートパターン+ステータスクラスに絞る(カーディナリティ管理)
attrs := metric.WithAttributes(
attribute.String("http.route", c.Path()),
attribute.String("http.method", c.Request().Method),
attribute.Int("http.status_code", c.Response().Status),
)
duration.Record(ctx, time.Since(start).Seconds(), attrs)
return err
}
}
}
The cardinality trap: putting a user ID or raw URL in a metric's attributes explodes the combinations of time series, destroying cost and storage (directly tied to cost efficiency). Strictly limit attributes to low cardinality like the route pattern, method, and status. For SLO / error-budget design, go to the observability / SRE practice.
5. Log correlation: put trace_id on slog
The last piece is correlating logs and traces. Put trace_id/span_id on the v5-standard slog and you can jump from one log line to the trace. Wire in a helper that pulls the span context out of ctx.
// ctx の span 文脈を slog 属性に変換する
func traceAttrs(ctx context.Context) []slog.Attr {
sc := trace.SpanContextFromContext(ctx)
if !sc.IsValid() {
return nil
}
return []slog.Attr{
slog.String("trace_id", sc.TraceID().String()),
slog.String("span_id", sc.SpanID().String()),
}
}
// RequestLogger の LogValuesFunc で相関ログを出す
e.Use(middleware.RequestLoggerWithConfig(middleware.RequestLoggerConfig{
LogStatus: true, LogURI: true, LogError: true, LogLatency: true, HandleError: true,
LogValuesFunc: func(c *echo.Context, v middleware.RequestLoggerValues) error {
ctx := c.Request().Context()
attrs := append(traceAttrs(ctx),
slog.String("uri", v.URI),
slog.Int("status", v.Status),
slog.Duration("latency", v.Latency),
)
level := slog.LevelInfo
if v.Error != nil {
level = slog.LevelError
attrs = append(attrs, slog.String("err", v.Error.Error()))
}
logger.LogAttrs(ctx, level, "REQUEST", attrs...)
return nil
},
}))
With this, the investigation that threads through the three pillars holds: notice a spike in the error rate via metrics → find the slow span in the traces for that time window → pull the logs by that trace_id. Place RequestLogger inside the OTel middleware in the middleware ordering so logs are emitted with the span context already on them.
6. Export: send to the collection backend with OTLP
The instrumented signals are sent to the collection backend (OpenTelemetry Collector → Grafana Tempo / Jaeger / Datadog / each cloud) via OTLP (OpenTelemetry Protocol). Configure the TracerProvider/MeterProvider at app startup and flush on graceful shutdown.
import (
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func initTracing(ctx context.Context, service string) (func(context.Context) error, error) {
exp, err := otlptracegrpc.New(ctx) // 送信先は OTEL_EXPORTER_OTLP_ENDPOINT 環境変数
if err != nil {
return nil, err
}
res, _ := resource.New(ctx, resource.WithAttributes(semconv.ServiceName(service)))
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exp), // バッチ送信(性能・コスト)
sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))), // 10%サンプリング
sdktrace.WithResource(res),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.TraceContext{}) // W3C 伝播
return tp.Shutdown, nil // ← main の defer で呼び、未送信 span を flush
}
// main 側:起動時に初期化、終了時に flush
shutdown, err := initTracing(ctx, "user-api")
if err != nil { /* ... */ }
defer shutdown(context.Background()) // グレースフル停止時に未送信分を送る
Cost optimization: in production, hold trace volume and cost down with sampling (e.g.,
TraceIDRatioBased(0.1)for 10%). "Tail sampling," which prioritizes keeping errors and slow traces, is done on the Collector side. Make the destination configurable viaOTEL_EXPORTER_OTLP_ENDPOINTas an environment variable and don't bake the endpoint into code.
Conclusion: 7 principles for bringing Echo observability to production quality
- Correlate the three pillars by
trace_idso one request can be traced end-to-end. - otelecho is deprecated and assumes v4. Custom middleware that uses the OTel SDK directly is version-independent and robust (ETC).
- For traces, propagation is everything.
Extract→ span →c.SetRequest(req.WithContext(ctx))threads through downstream. - Normalize span names and attributes to the route pattern to prevent cardinality explosion.
- Metrics are RED (duration Histogram + in-flight UpDownCounter) in a minimal setup.
- Put
trace_idon slog so logs ↔ traces can be jumped between mutually. - Export with OTLP, optimize cost with sampling, and flush on shutdown.
Observability is not "emitting logs" but "making guesswork zero during an incident." On Echo v5, thinly instrumenting the OTel SDK yourself without relying on deprecated tools turns out to be robust, cheap, and understandable. For cross-platform observability go to the OpenTelemetry practical guide, and for the full picture of Echo go to the production-operations guide.