tracing-root-causes

doanchienthangdev's avatarfrom doanchienthangdev

AI agent performs systematic root cause analysis using 5 Whys, Fishbone diagrams, and evidence-based investigation. Use when debugging, conducting post-mortems, or investigating incidents.

0stars🔀0forks📁View on GitHub🕐Updated Jan 8, 2026

When & Why to Use This Skill

This Claude skill facilitates systematic Root Cause Analysis (RCA) through structured frameworks like the 5 Whys and Fishbone (Ishikawa) diagrams. It empowers teams to move beyond immediate symptoms to identify technical, systemic, and process-oriented root causes, ensuring long-term resolution and preventing the recurrence of critical incidents.

Use Cases

  • Post-Mortem Investigations: Conducting deep-dive sessions after system outages or service degradations to document the timeline and cause chain.
  • Complex Debugging: Investigating intermittent or 'heisenbug' software defects by gathering evidence and mapping proximate causes to fundamental logic errors.
  • Operational Process Improvement: Identifying systemic gaps in testing, monitoring, or code review processes that allow defects to reach production.
  • Evidence-Based Reporting: Generating structured RCA reports with actionable recommendations for stakeholders and engineering teams.
nametracing-root-causes
descriptionAI agent performs systematic root cause analysis using 5 Whys, Fishbone diagrams, and evidence-based investigation. Use when debugging, conducting post-mortems, or investigating incidents.

Tracing Root Causes

Quick Start

  1. Identify Symptom - Document the observable problem
  2. Gather Evidence - Collect logs, metrics, traces around incident
  3. Apply 5 Whys - Ask "Why?" iteratively until fundamental cause found
  4. Map Categories - Use Fishbone to explore all cause categories
  5. Document Findings - Create RCA report with action items

Features

Feature Description Guide
Cause Hierarchy Symptom -> Proximate -> Root -> Systemic Fix at deepest level possible
5 Whys Iterative "Why?" questioning Typically 5 iterations to root cause
Fishbone Diagram Category-based cause exploration Code, Data, Config, Infra, External, Process
Evidence Gathering Logs, metrics, traces, reproduction Timestamp, source, reliability rating
RCA Report Structured documentation Timeline, cause chain, action items
Systemic Factors Why wasn't this caught earlier? Testing, monitoring, process gaps

Common Patterns

# 5 Whys Example
Problem: Website down for 2 hours

Why #1: Why down? -> Server out of memory
Why #2: Why out of memory? -> Connections unbounded
Why #3: Why unbounded? -> Not released after use
Why #4: Why not released? -> Early return skipped finally
Why #5: Why not caught? -> No test for cleanup path

Root Causes:
1. Technical: Missing cleanup execution
2. Systemic: Missing test coverage
3. Process: Code review missed pattern

# Fishbone Categories (Software)
CODE:       Logic errors, race conditions, memory leaks
DATA:       Invalid input, corrupt data, schema mismatch
CONFIG:     Wrong settings, env mismatch, feature flags
INFRA:      Resource exhaustion, network, scaling
EXTERNAL:   Third-party APIs, dependencies, attacks
PROCESS:    Missing tests, review gaps, monitoring blind spots
# Cause Hierarchy
SYMPTOM: "App crashed"
    |
PROXIMATE CAUSE: "Out of memory"
    |
CONTRIBUTING FACTOR: "No memory limits"
    |
ROOT CAUSE: "Memory leak in event handlers"
    |
SYSTEMIC FACTOR: "No memory monitoring"

PRINCIPLE: Fix symptoms = problem returns
           Fix root cause = this problem prevented
           Fix systemic = class of problems prevented

Best Practices

Do Avoid
Gather evidence before forming hypotheses Jumping to conclusions
Use structured methods consistently Ad-hoc investigation
Involve multiple perspectives Single viewpoint
Look for systemic factors Just fixing immediate cause
Create actionable recommendations Vague "be more careful"
Verify fixes prevent recurrence Assuming fix works
Share learnings across team Siloing knowledge
Investigate near-misses too Only investigating failures

Related Skills

  • debugging-systematically - Four-phase debugging process
  • solving-problems - 5-phase problem-solving framework
  • thinking-sequentially - Numbered thought chains
  • verifying-before-completion - Ensure fix completeness