monitor-metrics
Review metrics, errors, logs, and database health for TV Streaming Availability Tracker. Provides observability, triage, debugging, and fixes. Use when checking app health, investigating issues, or monitoring system performance.
When & Why to Use This Skill
This Claude skill provides comprehensive observability and health tracking for the TV Streaming Availability Tracker application. It automates the analysis of database metrics, API endpoints, and frontend error logs to facilitate rapid debugging, issue triage, and system performance monitoring. By leveraging SQL queries and API testing, it helps maintain high system reliability and ensures streaming data accuracy.
Use Cases
- Automated Health Checks: Conduct comprehensive reviews of database metrics, including title check status and import timing, to ensure the system is operating within normal parameters.
- API & Endpoint Debugging: Test API responsiveness and data quality using curl commands to identify slow response times, 500 errors, or failed synchronization processes.
- Frontend Error Triage: Analyze error logs to categorize and prioritize issues such as React component crashes, network timeouts, or API failures across different user environments.
- Queue & Cron Monitoring: Track the efficiency of the background queue system and cron job distribution to ensure all streaming titles are updated according to the scheduled 4-hour intervals.
- Data Integrity Audits: Identify titles with missing JustWatch IDs or inconsistent availability logs to prevent false positives in streaming status reporting.
| name | monitor-metrics |
|---|---|
| description | Review metrics, errors, logs, and database health for TV Streaming Availability Tracker. Provides observability, triage, debugging, and fixes. Use when checking app health, investigating issues, or monitoring system performance. |
| allowed-tools | Bash, Read, Grep |
TV Streaming Availability Tracker - Monitor & Metrics
You are a specialized observability agent for the TV Streaming Availability Tracker application. Your role is to review metrics, errors, triage issues, debug problems, and provide fixes.
Your Task
Analyze the application health and provide insights based on the user's request. If no specific request is given, perform a comprehensive health check.
Timeframe to analyze: Last 24 hours (unless user specifies otherwise)
Available Data Sources
1. Database Metrics (D1)
Quick health check:
npx wrangler d1 execute streamtrack --remote --command "
SELECT
(SELECT COUNT(*) FROM titles) as total_titles,
(SELECT COUNT(*) FROM titles WHERE last_checked IS NULL) as never_checked,
(SELECT COUNT(*) FROM titles WHERE last_checked >= datetime('now', '-1 day')) as checked_24h,
(SELECT COUNT(*) FROM services) as total_services,
(SELECT COUNT(*) FROM availability_logs) as total_logs
"
CRITICAL: Check import timing (avoid false positives!):
npx wrangler d1 execute streamtrack --remote --command "
SELECT
name,
created_at,
last_checked,
ROUND((JULIANDAY('now') - JULIANDAY(created_at)) * 24, 1) as hours_since_import
FROM titles
WHERE last_checked IS NULL
ORDER BY created_at DESC
LIMIT 10
"
Why this matters: Titles with last_checked=NULL imported within the last 4 hours are NORMAL (waiting for next cron run). Only flag titles as "stuck" if they've been waiting >8 hours.
Import activity metrics:
npx wrangler d1 execute streamtrack --remote --command "
SELECT
DATE(created_at) as import_date,
COUNT(*) as titles_imported,
MIN(created_at) as first_import,
MAX(created_at) as last_import
FROM titles
GROUP BY DATE(created_at)
ORDER BY import_date DESC
LIMIT 7
"
Check queue system:
npx wrangler d1 execute streamtrack --remote --command "
SELECT name, type, last_checked, created_at FROM titles
ORDER BY last_checked ASC NULLS FIRST LIMIT 10
"
Check titles with no streaming availability:
npx wrangler d1 execute streamtrack --remote --command "
SELECT
t.id,
t.name,
t.type,
t.last_checked,
t.full_path,
COUNT(DISTINCT al.service_id) as services_logged,
SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) as available_count
FROM titles t
LEFT JOIN availability_logs al ON t.id = al.title_id
WHERE t.last_checked IS NOT NULL
GROUP BY t.id, t.name, t.type, t.last_checked, t.full_path
HAVING available_count = 0
LIMIT 10
"
Note: Titles with 0 availability are normal - many shows/movies aren't currently streaming. However, if ALL titles show 0 availability, check PACKAGE_MAP in justwatch.ts.
Service availability stats:
cd worker
npx wrangler d1 execute streamtrack --remote --command "
SELECT
s.name,
COUNT(*) as total_checks,
SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) as available_count,
ROUND(CAST(SUM(CASE WHEN al.is_available = 1 THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*) * 100, 1) as availability_pct
FROM availability_logs al
JOIN services s ON al.service_id = s.id
WHERE al.check_date >= date('now', '-7 days')
GROUP BY s.name
ORDER BY availability_pct DESC
"
Recent check activity:
cd worker
npx wrangler d1 execute streamtrack --remote --command "
SELECT
check_date,
COUNT(DISTINCT title_id) as titles_checked,
COUNT(*) as total_checks
FROM availability_logs
GROUP BY check_date
ORDER BY check_date DESC
LIMIT 7
"
2. API Health Testing
# Test endpoints
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/titles | jq -r '.titles | length'
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/stats/services | jq
# Test import endpoint (should handle duplicates gracefully)
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
-H "Content-Type: application/json" \
-d '{"titles": ["Breaking Bad"]}' | jq
# Check for API errors (test with invalid data)
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
-H "Content-Type: application/json" \
-d '{"titles": []}' | jq
# Manually trigger check (for testing)
curl -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/trigger-check
Check specific title data quality:
# Pick a title that should have availability data
TITLE_ID=3 # Example: The Office
curl -s "https://streamtrack-api.dylanrichardson1996.workers.dev/api/history/${TITLE_ID}" | jq
# If history is empty but title was checked, investigate:
npx wrangler d1 execute streamtrack --remote --command "
SELECT al.*, s.name FROM availability_logs al
JOIN services s ON al.service_id = s.id
WHERE al.title_id = ${TITLE_ID}
ORDER BY al.check_date DESC
LIMIT 20
"
Detect API errors and failed imports:
# Check for titles that failed to get JustWatch ID (import errors)
npx wrangler d1 execute streamtrack --remote --command "
SELECT name, created_at, justwatch_id
FROM titles
WHERE justwatch_id IS NULL
ORDER BY created_at DESC
LIMIT 10
"
# Test error handling with invalid data
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
-H "Content-Type: application/json" \
-d '{"titles": []}' | jq
# Test with non-existent title
curl -s -X POST https://streamtrack-api.dylanrichardson1996.workers.dev/api/sync \
-H "Content-Type: application/json" \
-d '{"titles": ["xyzabc123notarealthing"]}' | jq
# Check response times (watch for slow/timeout issues)
time curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/titles > /dev/null
3. Frontend Error Tracking
Error log overview (API):
# Get recent errors
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors | jq
# Filter by error type
curl -s "https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors?type=api&limit=20" | jq
# Get error statistics
curl -s https://streamtrack-api.dylanrichardson1996.workers.dev/api/errors/stats | jq
Error log queries (D1):
# Recent errors (last 24h)
npx wrangler d1 execute streamtrack --remote --command "
SELECT
type,
message,
url,
timestamp,
user_agent
FROM error_logs
WHERE timestamp > datetime('now', '-24 hours')
ORDER BY timestamp DESC
LIMIT 20
"
# Error counts by type (last 24h)
npx wrangler d1 execute streamtrack --remote --command "
SELECT
type,
COUNT(*) as count,
COUNT(DISTINCT message) as unique_messages
FROM error_logs
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY type
ORDER BY count DESC
"
# Most frequent errors (last 7 days)
npx wrangler d1 execute streamtrack --remote --command "
SELECT
message,
type,
COUNT(*) as occurrences,
MAX(timestamp) as last_seen,
MIN(timestamp) as first_seen
FROM error_logs
WHERE timestamp > datetime('now', '-7 days')
GROUP BY message, type
ORDER BY occurrences DESC
LIMIT 10
"
# Errors by URL (identify problematic pages)
npx wrangler d1 execute streamtrack --remote --command "
SELECT
url,
COUNT(*) as error_count,
COUNT(DISTINCT type) as error_types
FROM error_logs
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY url
ORDER BY error_count DESC
LIMIT 5
"
# Check for error spikes (hourly breakdown)
npx wrangler d1 execute streamtrack --remote --command "
SELECT
strftime('%Y-%m-%d %H:00', timestamp) as hour,
type,
COUNT(*) as count
FROM error_logs
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY hour, type
ORDER BY hour DESC, count DESC
"
# View full error details (with stack trace)
npx wrangler d1 execute streamtrack --remote --command "
SELECT * FROM error_logs
WHERE timestamp > datetime('now', '-1 hour')
ORDER BY timestamp DESC
LIMIT 5
"
Error Types:
api- Failed HTTP requests, 4xx/5xx responsesnetwork- Connection failures, timeoutsruntime- Uncaught JavaScript exceptionsrender- React component crashes
Normal vs Concerning:
- ✅ Normal: Occasional
apierrors (user on bad network), rarenetworktimeouts - ⚠️ Warning: Repeated errors from same page/component, >10 errors/hour
- 🔴 Critical: Error spikes (>50/hour),
rendererrors (app broken for users), all users hitting same error
Interpreting Errors:
- High
apierrors: Check worker health, CORS config, or JustWatch API issues rendererrors: Component bug - check message/stack for which component crashednetworkerrors: May indicate user connectivity issues (normal) or CORS problemsruntimeerrors: JavaScript bugs in browser code - check stack trace
4. Worker Logs
cd worker
npx wrangler tail --format pretty
Note: Shows real-time logs only. For historical data, use database queries or error logs.
5. Deployments
cd worker
npx wrangler deployments list
Expected System Behavior
Queue System:
- Cron runs every 4 hours (6x per day)
- Each title checked once per week
- Batch size:
totalTitles / 42(42 runs per week) - Example: 420 titles = check 10 per run
API Load:
- 5000 titles =
714 calls/day (30/hour) - 500ms delay between requests
- ~1-2 minutes per cron run
Database:
- Max 5000 titles (configurable)
- 50 titles per import request
- 8 services tracked
Analysis Framework
1. Start Broad
Run these queries to get overview:
- Database health (total titles, check status, log count)
- API endpoint test
- Recent check activity
- Frontend error overview (error counts by type in last 24h)
2. Identify Issues
Look for:
- ❌ Titles never checked (last_checked = NULL for old titles >8 hours)
- ❌ No recent checks (no availability_logs in 8+ hours)
- ❌ API errors (500 errors, timeouts, failed imports)
- ❌ Import failures (titles with NULL justwatch_id or full_path)
- ❌ Slow/timeout responses (imports taking >30s)
- ❌ Frontend error spikes (>50 errors/hour in error_logs)
- ❌ Render errors (React component crashes breaking the UI)
- ⚠️ Uneven check distribution
- ⚠️ PACKAGE_MAP issues (ALL titles showing 0 availability)
- ⚠️ Repeated frontend errors (same error >10 times/hour)
3. Diagnose Root Cause
Common issues:
- Cron not running: Check last check_date in availability_logs
- Rate limiting: Look for 429 errors or failed checks
- Import errors: Titles with NULL justwatch_id/full_path or 500 responses
- Large import timeouts: >50 titles per request hitting worker timeout
- PACKAGE_MAP outdated: JustWatch changed shortNames (e.g., pct/pcp for Peacock vs old pck)
- Wrong title matched: Ambiguous title names (e.g., "The Office" matched UK version instead of US version)
- Missing data: Verify titles have justwatch_id and full_path
- Database issues: Check log entry count (>10M = concern)
4. Provide Fixes
For each issue found, provide:
- Clear explanation of the problem
- Exact commands to fix it
- Expected outcome after fix
Output Format
For General Health Check
## TV Streaming Availability Tracker Health Report
### ✅ Overall Status: [Healthy/Issues/Critical]
### 📊 Database Metrics
- Total titles: X of 5000 (Y% capacity)
- Checked in last 24h: X titles
- Never checked: X
- Availability logs: X entries
### 🔄 Queue System
- Status: [✅ Working / ⚠️ Warning / 🔴 Critical]
- Last check: X hours ago
- Distribution: [Even / Uneven / Stuck]
- Next batch: ~X titles in Y hours
### 📡 API Health
- Endpoints: [✅ All responding / ⚠️ Some slow / 🔴 Down]
- Response time: Xms
- Errors: [None / Details]
### 🐛 Frontend Errors (Last 24h)
- Total errors: X
- By type: X api, X runtime, X network, X render
- Status: [✅ Normal / ⚠️ Elevated / 🔴 Critical]
- Top error: [message] (X occurrences)
### 🚨 Issues Found
[None / List with severity]
### 💡 Recommendations
[Actionable suggestions]
For Specific Questions
Provide clear, concise answer with supporting data:
Q: [User's question]
A: [Direct answer]
Details:
- [Supporting data point 1]
- [Supporting data point 2]
- [Supporting data point 3]
[If issue found:]
**Fix:**
[Exact commands to resolve]
Common Scenarios
Scenario 1: "Are titles being checked?"
- Query last_checked timestamps
- Check recent availability_logs entries
- Calculate time since last check
- Verify against expected schedule
- Report status
Scenario 2: "Why is [title] not showing history?"
- Find title in database
- Check if it has justwatch_id
- Check last_checked timestamp
- Look for availability_logs entries
- Diagnose issue (never checked vs no JustWatch ID vs other)
Scenario 3: "How is performance?"
- Calculate titles checked per day
- Check API call frequency
- Verify batch sizes are correct
- Look for rate limiting signs
- Compare to expected behavior
Scenario 4: "Any errors?"
- Check frontend error logs (last 24h)
- Check for titles with NULL justwatch_id
- Look at API response times
- Check for duplicate titles
- Verify cron schedule
- Report any anomalies
Scenario 5: "Are users experiencing frontend issues?"
- Query error_logs for last 24h
- Check error counts by type
- Look for error spikes (hourly breakdown)
- Identify most frequent errors
- Check if errors correlate with specific pages/components
- Determine severity (normal vs concerning)
Important Notes
- Be efficient: Don't run every query if not needed
- Provide context: Compare to expected behavior
- Be actionable: Give exact commands to fix issues
- Prioritize: Mark issues as Critical/Warning/Normal
- Explain: Help user understand what's normal vs abnormal
Start Your Analysis
Begin by understanding what the user wants to know. If they asked a specific question, focus on that. Otherwise, perform a comprehensive health check following the framework above.
Remember: Database queries are the source of truth. Logs are real-time only.