What is smith-validation?

This Claude skill provides a rigorous, scientific framework for software debugging and root cause analysis. It integrates advanced methodologies such as hypothesis testing, the 5 Whys, and Delta Debugging to help developers move beyond surface-level symptoms to find actionable systemic causes. By leveraging techniques like Git Bisect and Spectrum-Based Fault Localization (SBFL), it streamlines the process of identifying, isolating, and validating fixes for complex software failures.

When should I use smith-validation?

smith-validation is useful in the following scenarios: • Root Cause Analysis (RCA): Applying the '5 Whys' technique to drill down from a technical symptom to the underlying systemic or configuration failure. • Regression Hunting: Using automated Git Bisect to perform a binary search through commit history to identify the exact code change that introduced a bug. • Hypothesis Testing: Utilizing 'Strong Inference' to devise and test multiple competing hypotheses simultaneously, rapidly narrowing down the cause of intermittent failures. • Input Minimization: Implementing Delta Debugging to reduce large, crashing datasets or complex configurations into the smallest possible reproducible test case. • Logic Verification: Using Rubber Duck debugging and the Feynman Technique to explain code step-by-step, revealing logic gaps and hidden defects through simplified explanation.

name	smith-validation
description	Hypothesis testing, root cause analysis, and debugging techniques. Use when debugging, testing hypotheses, validating solutions, proving correctness, or performing root cause analysis on failures.

Verification Techniques

Scope: Hypothesis testing, root cause analysis, and verification
Load if: Bug reported, test failure, proving correctness, root cause analysis
Prerequisites: @smith-guidance/SKILL.md

Foundation: Based on PDSA's Study phase (Deming) and Popper's Falsification - understanding WHY something works or doesn't, not just IF it works.

When to use: Debugging, testing hypotheses, validating solutions, proving correctness.

Hypothesis Testing

Strong Inference

Rapid progress through multiple competing hypotheses:

Devise multiple hypotheses - Not just one, but several alternatives
Design crucial experiments - Tests that exclude one or more hypotheses
Execute experiments - Run tests to eliminate hypotheses
Iterate - Refine remaining hypotheses, repeat

Key insight: Science advances fastest when we actively try to disprove hypotheses, not confirm them.

For debugging:

Bug: "Login fails intermittently"
H1: Session storage full
H2: Race condition in token refresh
H3: Network timeout on auth server
Crucial test: Check if failures correlate with session count (tests H1)

Falsification Principle (Popper)

A theory is scientific only if it can be proven false:

Design tests that could disprove your hypothesis
Seek evidence that contradicts, not confirms
One counterexample disproves a universal claim

Anti-pattern: Only running tests you expect to pass Good practice: Actively try to break your own code

Root Cause Analysis

5 Whys (Toyota)

Root cause analysis through iterative questioning:

State the problem
Ask "Why did this happen?"
Repeat for each answer (typically 5 times)
Stop when you reach an actionable root cause

Example:

Bug: Users logged out unexpectedly
Why? Session expired
Why? Token refresh failed
Why? Refresh endpoint returned 401
Why? Clock skew between servers
Root cause: NTP not configured on auth server

Caution: Don't stop at symptoms. "Why?" should reach systemic causes.

Explanation Techniques

Rubber Duck Debugging

Explain code line-by-line aloud; when explanation doesn't match code, you've found the bug.

For AI agents: When stuck, explain the problem step-by-step before proposing solutions.

Feynman Technique

Explain simply to reveal gaps: Choose concept → Explain to child → Identify gaps → Review.

If you can't explain it simply, you don't understand it well enough.

Systematic Isolation

Delta Debugging

Minimize failing input: split in half, test each, recurse on failing half until minimal.

Use when: Large input crashes, many files break tests, config changes fail.

Scientific Debugging (TRAFFIC)

Track → Reproduce → Automate → Find origins → Focus → Isolate → Correct

Work backward: Failure → Propagation → Infection → Defect.

Version Control Debugging

Git Bisect

Binary search through commit history:

Usage:

git bisect start
git bisect bad
git bisect good abc1234
git bisect good
git bisect reset

Mark current as bad, known-good commit, then test each checkout (good/bad) until culprit found.

Automated:

git bisect run ./test.sh

Exit codes: 0 = good, 1-127 = bad, 125 = skip

Complexity: O(log n) - tests ~7 commits for 100 commit range

When to use:

Regression appeared, unknown when
Automated test can detect the bug
Need to find exact commit that broke something

Coverage-Based Localization

Spectrum-Based Fault Localization (SBFL)

Use test coverage data to locate bugs:

Concept: Statements executed by failing tests but not passing tests are more suspicious.

Ochiai Formula (most effective):

suspiciousness(s) = failed(s) / sqrt(total_failed * (failed(s) + passed(s)))

Practical application:

Run test suite with coverage
Note which tests fail
Rank statements by how often they appear in failing vs passing tests
Inspect highest-ranked statements first

For AI agents: When multiple tests fail, identify code paths common to failures but not successes.

ACTION (Recency Zone)

When debugging or validating:

Use Strong Inference: devise multiple hypotheses before testing
Apply 5 Whys to find root cause, not symptoms
Use Git Bisect for regressions (binary search ~7 commits for 100-commit range)
Run tests with coverage; inspect code paths common to failures

Claude Code Plugin Integration

When pr-review-toolkit is available:

silent-failure-hunter agent: Detects silent failures, inadequate error handling
Analyzes catch blocks, fallback behavior, missing logging
Trigger: "Check for silent failures" or use Task tool

Ralph Loop Integration

Debugging = Ralph iteration: hypothesis → test → eliminate → iterate until <promise>ROOT CAUSE FOUND</promise>.

See @smith-ralph/SKILL.md for full patterns.

@smith-guidance/SKILL.md - Anti-sycophancy, HHH framework, exploration workflow
@smith-analysis/SKILL.md - Reasoning patterns, problem decomposition
@smith-clarity/SKILL.md - Cognitive guards, logic fallacies

smith-validation

When & Why to Use This Skill

Use Cases