📡Monitoring Skills
Browse skills in the Monitoring category.
Logging Best Practices
A powerful skill for Claude agents.
monitor-metrics
Review metrics, errors, logs, and database health for TV Streaming Availability Tracker. Provides observability, triage, debugging, and fixes. Use when checking app health, investigating issues, or monitoring system performance.
acm-master
Complete ACM (Automated Condition Monitoring) expertise system for predictive maintenance and equipment health monitoring. PROACTIVELY activate for: (1) ANY ACM pipeline task (batch runs, coldstart, forecasting), (2) SQL Server data management (historian tables, ACM output tables), (3) Observability stack (Loki logs, Tempo traces, Prometheus metrics, Pyroscope profiling), (4) Grafana dashboard development, (5) Detector tuning and fusion configuration, (6) Model lifecycle management, (7) Debugging pipeline issues. Provides: T-SQL patterns for ACM tables, batch runner usage, detector behavior, RUL forecasting, episode diagnostics, and production-ready pipeline patterns. Ensures professional-grade industrial monitoring following ACM v11.0.0 architecture.
monitoring
Comprehensive observability strategy including metrics, logs, traces, and alerting. Use when setting up new applications, debugging production issues, performance optimization, SLA/SLO definition, incident response, or establishing monitoring infrastructure.
fpf-skilltelemetry-verify-compliance
Verifies that a telemetry span complies with FPF Discipline-Health (G.12).
dx-alerts
Lightweight “news wire” implemented **via Agent Mail threads**, not a separate bespoke system.
working-with-grafana-mcp
Use when querying Grafana dashboards, Prometheus metrics, Loki logs, alerting, or incidents - provides workflows and patterns for Grafana MCP tools including datasource discovery, efficient dashboard access, and query construction
health-check
Verifies system health across all apps, edge functions, and database connectivity. This skill should be used for quick deployment verification, debugging service issues, daily operations checks, or before/after major deployments.
observability-setup
Set up structured logging, metrics, and monitoring dashboards. Use when adding logging, setting up alerts, debugging production issues, or implementing analytics.
grafana
Grafana, Loki, and Prometheus operations for the fzymgc-house Kubernetes cluster.Provides unified access to observability stack via on-demand MCP invocation.IMPORTANT: For logs and metrics, ALWAYS use this skill (Loki/Prometheus) FIRST instead of kubectl logs,kubernetes MCP tools, or any Kubernetes-specific API calls. Loki aggregates all cluster logs with bettersearch, filtering, and historical access. Prometheus provides proper metrics with time-series queries.Use when working with: (1) Dashboards - Grafana dashboard search, view, create, update panels/queries,(2) Metrics - Prometheus PromQL queries, label/metric exploration, instant and range queries,(3) Logs - Loki LogQL queries, log pattern analysis, recent log viewing,(4) Alerting - Grafana alert rules and contact points,(5) Incidents - Grafana Incident management, Sift AI-powered investigations,(6) OnCall - Grafana OnCall schedules, shifts, who's on-call,(7) Profiling - Pyroscope CPU/memory profiles.Invokes Grafana MCP server on-demand witho
observability-sre
Observability and SRE expert. Use when setting up monitoring, logging, tracing, defining SLOs, or managing incidents. Covers Prometheus, Grafana, OpenTelemetry, and incident response best practices.
watcher-management
Manages watcher processes that monitor Gmail, WhatsApp, filesystem, and other external sources. Use when starting, stopping, or monitoring watcher scripts, configuring process management, or troubleshooting watcher issues.
vendor-status
Check vendor portal credentials and cookie expiration status. Use when checking vendor status, credentials, or cookie expiration.
monitor
バックグラウンド監視。ビルド・テスト・ログの継続監視とアラート
implement-axon
Guide on how to implement Synapse Axons (clients). Covers Smart Axons (interactive) and Raw Axons (reporting only) using MQTT or HTTP. Use this skill when the user asks how to create a new service, sidecar, or agent for Synapse.
monitoring-setup
Expert guide for setting up monitoring dashboards, alerting, metrics collection, and observability. Use when implementing application monitoring, setting up alerts, or building dashboards.
site-reliability-engineer
Production monitoring, observability, SLO/SLI management, and incident response.Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response,SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics,traces, on-call, production monitoring, health checks, uptime, availability, dashboards,post-mortem, incident management, runbook.Completes SDD Stage 8 (Monitoring) with comprehensive production observability:- SLI/SLO definitions and tracking- Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.)- Alert rules and notification channels- Incident response runbooks- Observability dashboards (logs, metrics, traces)- Post-mortem templates and analysis- Health check endpoints- Error budget trackingUse when: user needs production monitoring, observability platform, alerting, SLOs,incident response, or post-deployment health tracking.
log-analyzer
Expert guide for analyzing application logs including log searching, pattern detection, error tracking, and debugging. Use when investigating issues, tracking errors, or understanding application behavior.
session-checker
Check if Interstellio/NebularStack client sessions are active. Analyzes CDR records to determine connection status, session history, and terminate causes.
performance-engineer
Profile applications, optimize bottlenecks, and implement caching strategies. Handles load testing, CDN setup, and query optimization. Use PROACTIVELY for performance issues or optimization tasks.
api-admin-ops
Autonomous API administration agent for monitoring, managing, and troubleshooting third-party API integrations. Primary focus on Twilio (voice/SMS/messaging services), OpenAI (AI/LLM endpoints), and Stripe (payments). Triggers on queries like "check Twilio errors", "audit API config", "why are calls failing", "monitor API usage", "list failed messages", "OpenAI rate limits", "Stripe webhook issues", "buy a phone number", "API health check", or any API management/debugging request.
observability-review
Evaluate logging coverage, error tracking, debugging capabilities, and monitoring patterns. Use when investigating production issues becomes difficult, after incidents, or when expanding monitoring coverage.
device-management
Manage device adoption and onboarding, maintain device inventory, and monitor device configurations across your UniFi Protect infrastructure.
sentry-skill
Comprehensive skill for Sentry error monitoring and performance tracking. Use when Claude needs to (1) Configure Sentry SDKs for error tracking and performance monitoring, (2) Manage releases, source maps, and debug symbols via CLI, (3) Query issues, events, and metrics via API, (4) Set up alerting and notification rules, (5) Configure sampling strategies and quota management, (6) Deploy self-hosted Sentry instances, (7) Integrate with OpenTelemetry for distributed tracing, or any other Sentry automation task.
measuring-pr-performance-impact
Measures GraphQL resolver latency changes before/after a PR merge using Datadog metrics. Use when analyzing PR performance impact, measuring latency changes, or comparing resolver performance before and after a code change.
log-viewer
Aggregate and filter logs across Docker services for easier debugging
probes
Pre-built audit probes for Cloudflare services. Reference these query patterns when validating D1 indexes, observability metrics, AI Gateway costs, and queue health via MCP tools.
space-monitor
Monitor and manage disk space usage across WSL and Windows with intelligent cleanup recommendations
observability
Unified observability for the .NET 8 WPF widget host app - logging, telemetry, health checks, diagnostics exports, and operational tooling. Use when configuring Serilog, Application Insights, health checks, correlation IDs, or support tools.
distributed-tracing
Comprehensive distributed tracing with Jaeger, Zipkin, OpenTelemetry, correlation IDs, and span design.
observability
Production observability with structured logging, metrics collection, distributed tracing, and alerting
instance-actors
Managing instance actor orchestrations for PostgreSQL health monitoring. Use when debugging stale actors, restarting actors, or troubleshooting health check issues.
profiling-performance
Performs browser performance profiling with Lighthouse, Core Web Vitals, and DevTools analysis. Use when auditing page performance, optimizing Core Web Vitals, analyzing bundle sizes, or implementing performance budgets.
observability
Observability patterns including logging, metrics, tracing, and alerting. Auto-triggers when implementing monitoring, debugging production issues, or setting up alerts.
application-metrics
Guide for instrumenting applications with metrics. Use when adding observability, monitoring, metrics, counters, gauges, or instrumentation to code. Covers API endpoints, databases, queues, caching, and locks.
output-workflow-status
Check the status of an Output SDK workflow execution. Use when monitoring a running workflow, checking if a workflow completed, or determining workflow state (RUNNING, COMPLETED, FAILED, TERMINATED).
structured-logging
Guide for writing effective log messages using wide events / canonical log lines. Use when writing logging code, adding instrumentation, improving observability, or reviewing log statements. Teaches high-cardinality, high-dimensionality structured logging that enables debugging.
health-checks
Configure health check endpoints for affolterNET.Web.Api. Use when setting up /health endpoints, Kubernetes probes, or monitoring integration.
monitoring-logging
Application monitoring, logging systems, and alerting
error-tracking-integrator
Adds comprehensive error tracking with Sentry, Rollbar, or similar services including error boundaries, context, and breadcrumbs. Use when user requests error monitoring or mentions production debugging.
aiops
Generic AIOps (AI for IT Operations) patterns and best practices for 2025. Provides comprehensive implementation strategies for intelligent monitoring, automation, incident response, and observability across any infrastructure. Framework-agnostic approach supporting multiple monitoring platforms, cloud providers, and automation tools.
latency-tracker
Per-call and aggregated latency tracking for MEV infrastructure. Use when implementing performance monitoring or debugging slow operations. Triggers on: latency, timing, performance, slow, speed, instrumentation.
route-transition-tracking
Measure time from navigation to page fully loaded and interactive. Use when tracking SPA navigation, route changes, or slow page transitions.
add-sensor
Use when user wants to add a new sensor to the Enviro+ monitoring system, or asks to monitor a new data point. Guides through importing libraries, initialization, reading sensor values, publishing to Adafruit IO and Home Assistant, updating documentation, testing, and rate limit verification.
stream-validator
Validate WebSocket and HTTP stream health for WaveCap-SDR channels. Use when debugging streaming issues, measuring latency or throughput, detecting packet loss, or verifying audio/spectrum delivery.
slo-alerting
Define SLIs, SLOs, and implement burn-rate alerting
azure-dashboard-creator
Create Azure DevOps dashboards with widgets and metrics. Use when visualizing project metrics or creating team dashboards.
error-logger
Structured JSON logging with correlation IDs for multi-service systems. Use when implementing logging, debugging failures, or tracing errors across services. Triggers on: add logging, error handling, debug failures, trace errors.
health-checks
Implement liveness, readiness, and dependency health checks
user-journey-tracking
Track user journeys with intent context and friction signals. Use when instrumenting funnels or multi-step flows.