structured-logging

Chemiseblanc's avatarfrom Chemiseblanc

Guide for writing effective log messages using wide events / canonical log lines. Use when writing logging code, adding instrumentation, improving observability, or reviewing log statements. Teaches high-cardinality, high-dimensionality structured logging that enables debugging.

0stars🔀0forks📁View on GitHub🕐Updated Jan 5, 2026

When & Why to Use This Skill

This Claude skill provides a comprehensive guide for implementing structured logging using wide events and canonical log lines. It helps developers and SREs transition from fragmented, low-context logs to high-cardinality, high-dimensionality data structures, significantly enhancing system observability and simplifying the debugging of complex distributed systems.

Use Cases

  • Designing a robust logging strategy for new microservices to ensure all request context is captured in a single, searchable wide event.
  • Refactoring legacy 'string-concatenation' logs into structured JSON formats to enable advanced SQL-based log analysis and performance auditing.
  • Troubleshooting intermittent production errors by correlating high-cardinality fields like user_id and request_id across multiple service boundaries.
  • Improving incident response times by providing SRE teams with rich, contextual data for every failed request, including business logic and infrastructure state.
namestructured-logging
descriptionGuide for writing effective log messages using wide events / canonical log lines. Use when writing logging code, adding instrumentation, improving observability, or reviewing log statements. Teaches high-cardinality, high-dimensionality structured logging that enables debugging.

Structured Logging

Core Principle

Emit one wide event per request per service, not scattered log statements.

  • Build context throughout the request lifecycle
  • Emit once at the end (in a finally block)
  • Optimize for querying, not writing
  • Include high-cardinality fields (user_id, request_id) that enable debugging
  • Include high-dimensionality (50+ fields) capturing full context

Anti-patterns

Avoid these common mistakes:

// String concatenation - loses structure
log("User " + userId + " payment failed: " + error)

// Scattered statements - 17 lines for one request
log("Starting request")
log("Validating token")
log("Token valid")
log("Fetching user")
log("User found")
log("Processing payment")
log("Payment failed")

// Low cardinality only - can't debug specific users
log({level: "error", message: "Payment failed"})

// Missing business context - no debugging power
log({user_id: "123", error: "failed"})

Wide Event Structure

One comprehensive event per request:

{
  "timestamp": "2025-01-15T10:23:45.612Z",
  "level": "error",

  "request_id": "req_8bf7ec2d",
  "trace_id": "abc123",
  "span_id": "span_456",

  "service": "checkout-service",
  "version": "2.4.1",
  "deployment_id": "deploy_789",
  "region": "us-east-1",

  "method": "POST",
  "path": "/api/checkout",
  "status_code": 500,
  "duration_ms": 1247,

  "user": {
    "id": "user_456",
    "subscription": "premium",
    "account_age_days": 847,
    "lifetime_value_cents": 284700
  },

  "cart": {
    "id": "cart_xyz",
    "item_count": 3,
    "total_cents": 15999,
    "coupon_applied": "SAVE20"
  },

  "payment": {
    "method": "card",
    "provider": "stripe",
    "latency_ms": 1089,
    "attempt": 3
  },

  "error": {
    "type": "PaymentError",
    "code": "card_declined",
    "message": "Card declined by issuer",
    "retriable": false,
    "stripe_decline_code": "insufficient_funds"
  },

  "feature_flags": {
    "new_checkout_flow": true,
    "express_payment": false
  },

  "timing": {
    "db_query_ms": 45,
    "external_api_ms": 1089,
    "db_queries_count": 3,
    "cache_hit": false
  }
}

For the comprehensive field reference, see references/wide-event-fields.md.

Log Levels

Use log levels to indicate severity and required action:

Level When to Use Examples
ERROR Request failed, needs investigation Unhandled exception, 5xx response, data corruption
WARN Degraded but recovered Retry succeeded, fallback used, rate limit approaching, deprecated API called
INFO Normal request completion The canonical log line for successful requests
DEBUG Detailed diagnostics Cache lookups, query plans, intermediate state (usually sampled out in production)

The wide event's level field reflects the worst outcome of the request:

  • Request succeeded normally: info
  • Request succeeded after retry/fallback: warn
  • Request failed: error

Examples

Bad: Scattered Logs

log.info("Incoming request", {path: "/checkout"})
log.debug("Fetching user")
log.debug("User found", {user_id: "123"})
log.debug("Fetching cart")
log.info("Processing payment")
log.warn("Payment attempt 1 failed, retrying")
log.warn("Payment attempt 2 failed, retrying")
log.error("Payment failed after 3 attempts")

Good: Single Wide Event

log.error({
  request_id: "req_abc",
  path: "/checkout",
  method: "POST",
  status_code: 500,
  duration_ms: 3200,
  user: {id: "123", subscription: "premium"},
  cart: {item_count: 3, total_cents: 15999},
  payment: {
    provider: "stripe",
    attempts: 3,
    latency_ms: 2800
  },
  error: {
    type: "PaymentError",
    code: "card_declined",
    retriable: false
  }
})

Bad: Missing Context

log.error("Database query failed", {error: "timeout"})

Good: Full Context

log.error({
  request_id: "req_xyz",
  user: {id: "user_789", subscription: "enterprise"},
  operation: "fetch_orders",
  database: {
    host: "db-prod-01",
    query_duration_ms: 30000,
    timeout_ms: 30000,
    table: "orders",
    rows_scanned: 1500000
  },
  error: {
    type: "DatabaseError",
    code: "QUERY_TIMEOUT",
    message: "Query exceeded 30s timeout",
    retriable: true,
    suggestion: "Add index on orders.user_id or paginate results"
  }
})

Implementation Pattern

Build the event throughout the request, emit once at the end:

function loggingMiddleware(handler):
    return function(request):
        start_time = now()

        // Initialize wide event
        event = {
            request_id: request.id,
            trace_id: request.trace_id,
            timestamp: start_time,
            method: request.method,
            path: request.path,
            service: SERVICE_NAME,
            version: SERVICE_VERSION
        }

        // Make event available to handlers
        request.wide_event = event

        try:
            response = handler(request)
            event.status_code = response.status
            event.level = "info"
            return response

        catch error:
            event.status_code = 500
            event.level = "error"
            event.error = {
                type: error.name,
                message: error.message,
                code: error.code,
                retriable: error.retriable ?? false
            }
            throw error

        finally:
            event.duration_ms = now() - start_time
            logger.log(event)

In handlers, enrich with business context:

function checkoutHandler(request):
    event = request.wide_event

    // Add user context
    user = getUser(request)
    event.user = {
        id: user.id,
        subscription: user.tier,
        account_age_days: daysSince(user.created_at),
        lifetime_value_cents: user.ltv
    }

    // Add business context
    cart = getCart(user.id)
    event.cart = {
        id: cart.id,
        item_count: cart.items.length,
        total_cents: cart.total
    }

    // Track timing for external calls
    payment_start = now()
    result = processPayment(cart)
    event.payment = {
        provider: "stripe",
        latency_ms: now() - payment_start,
        attempt: result.attempt_number
    }

    return response(result)

Querying Power

Wide events enable queries that scattered logs cannot:

-- Find all checkout failures for premium users with new checkout flow
SELECT * FROM logs
WHERE path = '/checkout'
  AND status_code >= 500
  AND user.subscription = 'premium'
  AND feature_flags.new_checkout_flow = true
  AND timestamp > now() - interval '1 hour'

-- Analyze payment failures by error code and user tier
SELECT
  error.code,
  user.subscription,
  count(*) as failures,
  avg(payment.latency_ms) as avg_latency
FROM logs
WHERE error.type = 'PaymentError'
GROUP BY error.code, user.subscription

-- Find slow requests for enterprise customers
SELECT * FROM logs
WHERE duration_ms > 2000
  AND user.subscription = 'enterprise'
ORDER BY duration_ms DESC