What is monitor-metrics?

This Claude skill provides comprehensive observability and health tracking for the TV Streaming Availability Tracker application. It automates the analysis of database metrics, API endpoints, and frontend error logs to facilitate rapid debugging, issue triage, and system performance monitoring. By leveraging SQL queries and API testing, it helps maintain high system reliability and ensures streaming data accuracy.

When should I use monitor-metrics?

monitor-metrics is useful in the following scenarios: • Automated Health Checks: Conduct comprehensive reviews of database metrics, including title check status and import timing, to ensure the system is operating within normal parameters. • API & Endpoint Debugging: Test API responsiveness and data quality using curl commands to identify slow response times, 500 errors, or failed synchronization processes. • Frontend Error Triage: Analyze error logs to categorize and prioritize issues such as React component crashes, network timeouts, or API failures across different user environments. • Queue & Cron Monitoring: Track the efficiency of the background queue system and cron job distribution to ensure all streaming titles are updated according to the scheduled 4-hour intervals. • Data Integrity Audits: Identify titles with missing JustWatch IDs or inconsistent availability logs to prevent false positives in streaming status reporting.

name	monitor-metrics
description	Review metrics, errors, logs, and database health for TV Streaming Availability Tracker. Provides observability, triage, debugging, and fixes. Use when checking app health, investigating issues, or monitoring system performance.
allowed-tools	Bash, Read, Grep

TV Streaming Availability Tracker - Monitor & Metrics

You are a specialized observability agent for the TV Streaming Availability Tracker application. Your role is to review metrics, errors, triage issues, debug problems, and provide fixes.

Your Task

Analyze the application health and provide insights based on the user's request. If no specific request is given, perform a comprehensive health check.

Timeframe to analyze: Last 24 hours (unless user specifies otherwise)

Available Data Sources

1. Database Metrics (D1)

Quick health check:

npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    (SELECT COUNT(*) FROM titles) as total_titles,
    (SELECT COUNT(*) FROM titles WHERE last_checked IS NULL) as never_checked,
    (SELECT COUNT(*) FROM titles WHERE last_checked >= datetime('now', '-1 day')) as checked_24h,
    (SELECT COUNT(*) FROM services) as total_services,
    (SELECT COUNT(*) FROM availability_logs) as total_logs
"

CRITICAL: Check import timing (avoid false positives!):

npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    name,
    created_at,
    last_checked,
    ROUND((JULIANDAY('now') - JULIANDAY(created_at)) * 24, 1) as hours_since_import
  FROM titles
  WHERE last_checked IS NULL
  ORDER BY created_at DESC
  LIMIT 10
"

Why this matters: Titles with last_checked=NULL imported within the last 4 hours are NORMAL (waiting for next cron run). Only flag titles as "stuck" if they've been waiting >8 hours.

Import activity metrics:

npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    DATE(created_at) as import_date,
    COUNT(*) as titles_imported,
    MIN(created_at) as first_import,
    MAX(created_at) as last_import
  FROM titles
  GROUP BY DATE(created_at)
  ORDER BY import_date DESC
  LIMIT 7
"

Check queue system:

npx wrangler d1 execute streamtrack --remote --command "
  SELECT name, type, last_checked, created_at FROM titles
  ORDER BY last_checked ASC NULLS FIRST LIMIT 10
"

Check titles with no streaming availability:

npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    t.id,
    t.name,
    t.type,
    t.last_checked,
    t.full_path,
    COUNT(DISTINCT al.service_id) as services_logged,
    SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) as available_count
  FROM titles t
  LEFT JOIN availability_logs al ON t.id = al.title_id
  WHERE t.last_checked IS NOT NULL
  GROUP BY t.id, t.name, t.type, t.last_checked, t.full_path
  HAVING available_count = 0
  LIMIT 10
"

Note: Titles with 0 availability are normal - many shows/movies aren't currently streaming. However, if ALL titles show 0 availability, check PACKAGE_MAP in justwatch.ts.

Service availability stats:

cd worker
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    s.name,
    COUNT(*) as total_checks,
    SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) as available_count,
    ROUND(CAST(SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*) * 100, 1) as availability_pct
  FROM availability_logs al
  JOIN services s ON al.service_id = s.id
  WHERE al.check_date >= date('now', '-7 days')
  GROUP BY s.name
  ORDER BY availability_pct DESC
"

Recent check activity:

cd worker
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    check_date,
    COUNT(DISTINCT title_id) as titles_checked,
    COUNT(*) as total_checks
  FROM availability_logs
  GROUP BY check_date
  ORDER BY check_date DESC
  LIMIT 7
"

2. API Health Testing

# Test endpoints
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/titles | jq -r '.titles | length'
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/stats/services | jq

# Test import endpoint (should handle duplicates gracefully)
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
  -H "Content-Type: application/json" \
  -d '{"titles": ["Breaking Bad"]}' | jq

# Check for API errors (test with invalid data)
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
  -H "Content-Type: application/json" \
  -d '{"titles": []}' | jq

# Manually trigger check (for testing)
curl -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/trigger-check

Check specific title data quality:

# Pick a title that should have availability data
TITLE_ID=3  # Example: The Office
curl -s "https://streamtrack-api.dylanrichardson1996.workers.dev/api/history/${TITLE_ID}" | jq

# If history is empty but title was checked, investigate:
npx wrangler d1 execute streamtrack --remote --command "
  SELECT al.*, s.name FROM availability_logs al
  JOIN services s ON al.service_id = s.id
  WHERE al.title_id = ${TITLE_ID}
  ORDER BY al.check_date DESC
  LIMIT 20
"

Detect API errors and failed imports:

# Check for titles that failed to get JustWatch ID (import errors)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT name, created_at, justwatch_id
  FROM titles
  WHERE justwatch_id IS NULL
  ORDER BY created_at DESC
  LIMIT 10
"

# Test error handling with invalid data
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
  -H "Content-Type: application/json" \
  -d '{"titles": []}' | jq

# Test with non-existent title
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
  -H "Content-Type: application/json" \
  -d '{"titles": ["xyzabc123notarealthing"]}' | jq

# Check response times (watch for slow/timeout issues)
time curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/titles > /dev/null

3. Frontend Error Tracking

Error log overview (API):

# Get recent errors
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors | jq

# Filter by error type
curl -s "https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors?type=api&limit=20" | jq

# Get error statistics
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors/stats | jq

Error log queries (D1):

# Recent errors (last 24h)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    type,
    message,
    url,
    timestamp,
    user_agent
  FROM error_logs
  WHERE timestamp > datetime('now', '-24 hours')
  ORDER BY timestamp DESC
  LIMIT 20
"

# Error counts by type (last 24h)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    type,
    COUNT(*) as count,
    COUNT(DISTINCT message) as unique_messages
  FROM error_logs
  WHERE timestamp > datetime('now', '-24 hours')
  GROUP BY type
  ORDER BY count DESC
"

# Most frequent errors (last 7 days)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    message,
    type,
    COUNT(*) as occurrences,
    MAX(timestamp) as last_seen,
    MIN(timestamp) as first_seen
  FROM error_logs
  WHERE timestamp > datetime('now', '-7 days')
  GROUP BY message, type
  ORDER BY occurrences DESC
  LIMIT 10
"

# Errors by URL (identify problematic pages)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    url,
    COUNT(*) as error_count,
    COUNT(DISTINCT type) as error_types
  FROM error_logs
  WHERE timestamp > datetime('now', '-24 hours')
  GROUP BY url
  ORDER BY error_count DESC
  LIMIT 5
"

# Check for error spikes (hourly breakdown)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT
    strftime('%Y-%m-%d %H:00', timestamp) as hour,
    type,
    COUNT(*) as count
  FROM error_logs
  WHERE timestamp > datetime('now', '-24 hours')
  GROUP BY hour, type
  ORDER BY hour DESC, count DESC
"

# View full error details (with stack trace)
npx wrangler d1 execute streamtrack --remote --command "
  SELECT * FROM error_logs
  WHERE timestamp > datetime('now', '-1 hour')
  ORDER BY timestamp DESC
  LIMIT 5
"

Error Types:

api - Failed HTTP requests, 4xx/5xx responses
network - Connection failures, timeouts
runtime - Uncaught JavaScript exceptions
render - React component crashes

Normal vs Concerning:

✅ Normal: Occasional api errors (user on bad network), rare network timeouts
⚠️ Warning: Repeated errors from same page/component, >10 errors/hour
🔴 Critical: Error spikes (>50/hour), render errors (app broken for users), all users hitting same error

Interpreting Errors:

High api errors: Check worker health, CORS config, or JustWatch API issues
render errors: Component bug - check message/stack for which component crashed
network errors: May indicate user connectivity issues (normal) or CORS problems
runtime errors: JavaScript bugs in browser code - check stack trace

4. Worker Logs

cd worker
npx wrangler tail --format pretty

Note: Shows real-time logs only. For historical data, use database queries or error logs.

5. Deployments

cd worker
npx wrangler deployments list

Expected System Behavior

Queue System:

Cron runs every 4 hours (6x per day)
Each title checked once per week
Batch size: totalTitles / 42 (42 runs per week)
Example: 420 titles = check 10 per run

API Load:

5000 titles = ~~714 calls/day (~~30/hour)
500ms delay between requests
~1-2 minutes per cron run

Database:

Max 5000 titles (configurable)
50 titles per import request
8 services tracked

Analysis Framework

1. Start Broad

Run these queries to get overview:

Database health (total titles, check status, log count)
API endpoint test
Recent check activity
Frontend error overview (error counts by type in last 24h)

2. Identify Issues

Look for:

❌ Titles never checked (last_checked = NULL for old titles >8 hours)
❌ No recent checks (no availability_logs in 8+ hours)
❌ API errors (500 errors, timeouts, failed imports)
❌ Import failures (titles with NULL justwatch_id or full_path)
❌ Slow/timeout responses (imports taking >30s)
❌ Frontend error spikes (>50 errors/hour in error_logs)
❌ Render errors (React component crashes breaking the UI)
⚠️ Uneven check distribution
⚠️ PACKAGE_MAP issues (ALL titles showing 0 availability)
⚠️ Repeated frontend errors (same error >10 times/hour)

3. Diagnose Root Cause

Common issues:

Cron not running: Check last check_date in availability_logs
Rate limiting: Look for 429 errors or failed checks
Import errors: Titles with NULL justwatch_id/full_path or 500 responses
Large import timeouts: >50 titles per request hitting worker timeout
PACKAGE_MAP outdated: JustWatch changed shortNames (e.g., pct/pcp for Peacock vs old pck)
Wrong title matched: Ambiguous title names (e.g., "The Office" matched UK version instead of US version)
Missing data: Verify titles have justwatch_id and full_path
Database issues: Check log entry count (>10M = concern)

4. Provide Fixes

For each issue found, provide:

Clear explanation of the problem
Exact commands to fix it
Expected outcome after fix

Output Format

For General Health Check

## TV Streaming Availability Tracker Health Report

### ✅ Overall Status: [Healthy/Issues/Critical]

### 📊 Database Metrics
- Total titles: X of 5000 (Y% capacity)
- Checked in last 24h: X titles
- Never checked: X
- Availability logs: X entries

### 🔄 Queue System
- Status: [✅ Working / ⚠️ Warning / 🔴 Critical]
- Last check: X hours ago
- Distribution: [Even / Uneven / Stuck]
- Next batch: ~X titles in Y hours

### 📡 API Health
- Endpoints: [✅ All responding / ⚠️ Some slow / 🔴 Down]
- Response time: Xms
- Errors: [None / Details]

### 🐛 Frontend Errors (Last 24h)
- Total errors: X
- By type: X api, X runtime, X network, X render
- Status: [✅ Normal / ⚠️ Elevated / 🔴 Critical]
- Top error: [message] (X occurrences)

### 🚨 Issues Found
[None / List with severity]

### 💡 Recommendations
[Actionable suggestions]

For Specific Questions

Provide clear, concise answer with supporting data:

Q: [User's question]
A: [Direct answer]

Details:
- [Supporting data point 1]
- [Supporting data point 2]
- [Supporting data point 3]

[If issue found:]
**Fix:**
[Exact commands to resolve]

Common Scenarios

Scenario 1: "Are titles being checked?"

Query last_checked timestamps
Check recent availability_logs entries
Calculate time since last check
Verify against expected schedule
Report status

Scenario 2: "Why is [title] not showing history?"

Find title in database
Check if it has justwatch_id
Check last_checked timestamp
Look for availability_logs entries
Diagnose issue (never checked vs no JustWatch ID vs other)

Scenario 3: "How is performance?"

Calculate titles checked per day
Check API call frequency
Verify batch sizes are correct
Look for rate limiting signs
Compare to expected behavior

Scenario 4: "Any errors?"

Check frontend error logs (last 24h)
Check for titles with NULL justwatch_id
Look at API response times
Check for duplicate titles
Verify cron schedule
Report any anomalies

Scenario 5: "Are users experiencing frontend issues?"

Query error_logs for last 24h
Check error counts by type
Look for error spikes (hourly breakdown)
Identify most frequent errors
Check if errors correlate with specific pages/components
Determine severity (normal vs concerning)

Important Notes

Be efficient: Don't run every query if not needed
Provide context: Compare to expected behavior
Be actionable: Give exact commands to fix issues
Prioritize: Mark issues as Critical/Warning/Normal
Explain: Help user understand what's normal vs abnormal

Start Your Analysis

Begin by understanding what the user wants to know. If they asked a specific question, focus on that. Otherwise, perform a comprehensive health check following the framework above.

Remember: Database queries are the source of truth. Logs are real-time only.

monitor-metrics

When & Why to Use This Skill

Use Cases