log-analyze
Parse and analyze system and application logs. Use when the user says "find errors in logs", "analyze logs", "check journalctl", "what's in the logs", "debug from logs", or asks to investigate log files.
When & Why to Use This Skill
This Claude skill provides advanced log parsing and analysis capabilities, enabling users to quickly identify system errors, application bugs, and infrastructure bottlenecks. By leveraging powerful CLI tools like journalctl and grep, it automates the process of filtering through massive log files to pinpoint root causes and provide actionable insights for developers and SREs.
Use Cases
- System Troubleshooting: Rapidly investigate system-level failures by analyzing journalctl and syslog for kernel panics, OOM kills, or service crashes.
- Web Server Debugging: Analyze Nginx or Apache error logs to identify 4xx/5xx status code trends and resolve configuration or connectivity issues.
- Database Health Checks: Parse PostgreSQL or MySQL logs to find slow queries, connection timeouts, or permission errors affecting application performance.
- Security Auditing: Scan /var/log/auth.log or other security logs to detect brute-force attacks, unauthorized access attempts, or privilege escalation.
- Resource Monitoring: Identify disk space exhaustion (ENOSPC) or memory leaks by correlating application logs with system resource events over specific time ranges.
| name | log-analyze |
|---|---|
| description | Parse and analyze system and application logs. Use when the user says "find errors in logs", "analyze logs", "check journalctl", "what's in the logs", "debug from logs", or asks to investigate log files. |
| allowed-tools | Bash, Read, Grep |
Log Analysis
Parse and analyze logs to identify errors, patterns, and issues.
Instructions
- Identify log source (journalctl, file, application)
- Establish time range of interest
- Filter for relevant entries (errors, specific service)
- Identify patterns and root causes
- Summarize findings with evidence
Common log sources
# Systemd journal
journalctl -u <service> --since "1 hour ago"
journalctl -p err --since today
journalctl -b # current boot
# System logs
/var/log/syslog
/var/log/messages
/var/log/auth.log
# Application logs
/var/log/nginx/error.log
/var/log/apache2/error.log
/var/log/postgresql/
Journalctl patterns
# Errors only
journalctl -p err -b
# Specific service with context
journalctl -u nginx --since "2024-01-01" --until "2024-01-02"
# Follow live
journalctl -f -u myapp
# Kernel messages
journalctl -k
# JSON output for parsing
journalctl -o json -u myapp | jq .
# Disk usage
journalctl --disk-usage
Analysis patterns
# Count errors by type
grep -oP 'ERROR: \K[^:]+' app.log | sort | uniq -c | sort -rn
# Find IPs with most errors
grep "error" access.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn
# Time distribution of errors
grep "ERROR" app.log | grep -oP '^\d{4}-\d{2}-\d{2} \d{2}' | uniq -c
# Errors around a specific time
grep -A5 -B5 "15:30:" error.log
Common error patterns
| Pattern | Indicates |
|---|---|
| OOM, "Killed" | Out of memory |
| ENOSPC | Disk full |
| ECONNREFUSED | Service not running/listening |
| ETIMEDOUT | Network/firewall issue |
| Permission denied | File permissions or SELinux |
| "too many open files" | ulimit exhausted |
Output format
## Summary
[Brief description of what was found]
## Errors Found
- [timestamp] [error message] (occurred N times)
## Root Cause Analysis
[Explanation of likely cause]
## Recommendations
1. [Action to fix]
2. [Preventive measure]
Rules
- MUST establish time range before analyzing
- MUST quantify error frequency (not just "found errors")
- MUST provide specific log excerpts as evidence
- Never expose sensitive data from logs (passwords, tokens)
- Always check for time correlation between errors