smith-validation
Hypothesis testing, root cause analysis, and debugging techniques. Use when debugging, testing hypotheses, validating solutions, proving correctness, or performing root cause analysis on failures.
When & Why to Use This Skill
This Claude skill provides a rigorous, scientific framework for software debugging and root cause analysis. It integrates advanced methodologies such as hypothesis testing, the 5 Whys, and Delta Debugging to help developers move beyond surface-level symptoms to find actionable systemic causes. By leveraging techniques like Git Bisect and Spectrum-Based Fault Localization (SBFL), it streamlines the process of identifying, isolating, and validating fixes for complex software failures.
Use Cases
- Root Cause Analysis (RCA): Applying the '5 Whys' technique to drill down from a technical symptom to the underlying systemic or configuration failure.
- Regression Hunting: Using automated Git Bisect to perform a binary search through commit history to identify the exact code change that introduced a bug.
- Hypothesis Testing: Utilizing 'Strong Inference' to devise and test multiple competing hypotheses simultaneously, rapidly narrowing down the cause of intermittent failures.
- Input Minimization: Implementing Delta Debugging to reduce large, crashing datasets or complex configurations into the smallest possible reproducible test case.
- Logic Verification: Using Rubber Duck debugging and the Feynman Technique to explain code step-by-step, revealing logic gaps and hidden defects through simplified explanation.
| name | smith-validation |
|---|---|
| description | Hypothesis testing, root cause analysis, and debugging techniques. Use when debugging, testing hypotheses, validating solutions, proving correctness, or performing root cause analysis on failures. |
Verification Techniques
- Scope: Hypothesis testing, root cause analysis, and verification
- Load if: Bug reported, test failure, proving correctness, root cause analysis
- Prerequisites: @smith-guidance/SKILL.md
Foundation: Based on PDSA's Study phase (Deming) and Popper's Falsification - understanding WHY something works or doesn't, not just IF it works.
When to use: Debugging, testing hypotheses, validating solutions, proving correctness.
Hypothesis Testing
Strong Inference
Rapid progress through multiple competing hypotheses:
- Devise multiple hypotheses - Not just one, but several alternatives
- Design crucial experiments - Tests that exclude one or more hypotheses
- Execute experiments - Run tests to eliminate hypotheses
- Iterate - Refine remaining hypotheses, repeat
Key insight: Science advances fastest when we actively try to disprove hypotheses, not confirm them.
For debugging:
- Bug: "Login fails intermittently"
- H1: Session storage full
- H2: Race condition in token refresh
- H3: Network timeout on auth server
- Crucial test: Check if failures correlate with session count (tests H1)
Falsification Principle (Popper)
A theory is scientific only if it can be proven false:
- Design tests that could disprove your hypothesis
- Seek evidence that contradicts, not confirms
- One counterexample disproves a universal claim
Anti-pattern: Only running tests you expect to pass Good practice: Actively try to break your own code
Root Cause Analysis
5 Whys (Toyota)
Root cause analysis through iterative questioning:
- State the problem
- Ask "Why did this happen?"
- Repeat for each answer (typically 5 times)
- Stop when you reach an actionable root cause
Example:
- Bug: Users logged out unexpectedly
- Why? Session expired
- Why? Token refresh failed
- Why? Refresh endpoint returned 401
- Why? Clock skew between servers
- Root cause: NTP not configured on auth server
Caution: Don't stop at symptoms. "Why?" should reach systemic causes.
Explanation Techniques
Rubber Duck Debugging
Explain code line-by-line aloud; when explanation doesn't match code, you've found the bug.
For AI agents: When stuck, explain the problem step-by-step before proposing solutions.
Feynman Technique
Explain simply to reveal gaps: Choose concept → Explain to child → Identify gaps → Review.
If you can't explain it simply, you don't understand it well enough.
Systematic Isolation
Delta Debugging
Minimize failing input: split in half, test each, recurse on failing half until minimal.
Use when: Large input crashes, many files break tests, config changes fail.
Scientific Debugging (TRAFFIC)
Track → Reproduce → Automate → Find origins → Focus → Isolate → Correct
Work backward: Failure → Propagation → Infection → Defect.
Version Control Debugging
Git Bisect
Binary search through commit history:
Usage:
git bisect start
git bisect bad
git bisect good abc1234
git bisect good
git bisect reset
Mark current as bad, known-good commit, then test each checkout (good/bad) until culprit found.
Automated:
git bisect run ./test.sh
Exit codes: 0 = good, 1-127 = bad, 125 = skip
Complexity: O(log n) - tests ~7 commits for 100 commit range
When to use:
- Regression appeared, unknown when
- Automated test can detect the bug
- Need to find exact commit that broke something
Coverage-Based Localization
Spectrum-Based Fault Localization (SBFL)
Use test coverage data to locate bugs:
Concept: Statements executed by failing tests but not passing tests are more suspicious.
Ochiai Formula (most effective):
suspiciousness(s) = failed(s) / sqrt(total_failed * (failed(s) + passed(s)))
Practical application:
- Run test suite with coverage
- Note which tests fail
- Rank statements by how often they appear in failing vs passing tests
- Inspect highest-ranked statements first
For AI agents: When multiple tests fail, identify code paths common to failures but not successes.
ACTION (Recency Zone)
When debugging or validating:
- Use Strong Inference: devise multiple hypotheses before testing
- Apply 5 Whys to find root cause, not symptoms
- Use Git Bisect for regressions (binary search ~7 commits for 100-commit range)
- Run tests with coverage; inspect code paths common to failures
Claude Code Plugin Integration
When pr-review-toolkit is available:
- silent-failure-hunter agent: Detects silent failures, inadequate error handling
- Analyzes catch blocks, fallback behavior, missing logging
- Trigger: "Check for silent failures" or use Task tool
Ralph Loop Integration
Debugging = Ralph iteration: hypothesis → test → eliminate → iterate until <promise>ROOT CAUSE FOUND</promise>.
See @smith-ralph/SKILL.md for full patterns.
- @smith-guidance/SKILL.md - Anti-sycophancy, HHH framework, exploration workflow
@smith-analysis/SKILL.md- Reasoning patterns, problem decomposition@smith-clarity/SKILL.md- Cognitive guards, logic fallacies