application-metrics

PierreZ's avatarfrom PierreZ

Guide for instrumenting applications with metrics. Use when adding observability, monitoring, metrics, counters, gauges, or instrumentation to code. Covers API endpoints, databases, queues, caching, and locks.

0stars🔀0forks📁View on GitHub🕐Updated Jan 6, 2026

When & Why to Use This Skill

This Claude skill provides a comprehensive framework for instrumenting applications with robust observability metrics. It guides developers through implementing operational counters, resource utilization gauges, and performance latency tracking across critical infrastructure components like APIs, databases, and message queues. By following standardized naming conventions and avoiding common anti-patterns like high cardinality, it ensures that system health and business logic are transparent and actionable for SRE and DevOps teams.

Use Cases

  • API Performance Optimization: Implementing request tracking, error rate monitoring, and latency percentiles (p95, p99) to identify and resolve REST/GraphQL bottlenecks.
  • Database Health Monitoring: Tracking connection pool utilization, slow query counts, and transaction rollback rates to prevent database-related application failures.
  • Distributed System Reliability: Monitoring message queue depths, consumer lag, and dead-letter queue sizes to ensure asynchronous data processing remains healthy.
  • Standardizing Observability: Establishing a unified metric naming convention across microservices to improve dashboard consistency and alerting accuracy.
  • Resource & Cache Management: Measuring cache hit/miss ratios and memory eviction rates to optimize infrastructure costs and application speed.
nameapplication-metrics
descriptionGuide for instrumenting applications with metrics. Use when adding
allowed-toolsRead, Grep, Edit, Write

Application Metrics Instrumentation

Practical patterns for adding observability to applications.

Five Metric Types

Type Purpose Example
Operational Counters Track discrete events (success/failure) api.requests.success_total
Resource Utilization Current capacity usage (gauges) db.connections.active
Performance/Latency Speed with explicit units api.request.duration_ms
Data Volume Information flow rates queue.messages.bytes_total
Business Logic Domain-specific value orders.completed_total

Naming Convention

<system>.<component>.<operation>.<metric_type>

Examples:

  • myapp.api.users.requests_total
  • myapp.db.queries.duration_ms
  • myapp.cache.items.hit_total

Component Checklists

API Endpoints

  • Request count by endpoint and method
  • Response time (p50, p95, p99)
  • Error rate by status code
  • Authentication failures
  • Request/response payload sizes

Database

  • Connection pool (active, idle, waiting)
  • Query duration by operation type
  • Slow query count (threshold-based)
  • Error count by type (timeout, constraint, connection)
  • Transaction commit/rollback rates

Message Queues

  • Messages produced/consumed per topic
  • Queue depth (current backlog)
  • Processing latency (end-to-end)
  • Consumer lag
  • Dead letter queue size

Caching

  • Hit/miss ratio
  • Eviction count and reason
  • Cache size (entries and bytes)
  • TTL expiration rate
  • Connection pool status

Locks/Synchronization

  • Acquisition time
  • Contention count (failed acquisitions)
  • Hold duration
  • Timeout count
  • Deadlock occurrences

Anti-patterns to Avoid

  1. Unbounded label cardinality - Never use user IDs, session tokens, or request IDs as labels
  2. Missing failure paths - Always instrument errors alongside successes
  3. No heartbeat metric - Add a constant gauge (e.g., app.up = 1) to verify instrumentation works
  4. Inconsistent naming - Stick to one convention across the codebase

Full Reference

For detailed examples, patterns, and rationale, fetch the complete guide: https://pierrezemb.fr/posts/practical-guide-to-application-metrics/

application-metrics – AI Agent Skills | Claude Skills