fact-checking
Use when reviewing code changes, auditing documentation accuracy, validating technical claims before merge, or user says "verify claims", "factcheck", "audit documentation", "validate comments", "are these claims accurate".
When & Why to Use This Skill
This Claude skill acts as a rigorous technical auditor and scientific skeptic, designed to validate technical claims within codebases and documentation. It systematically extracts claims from comments, docstrings, and PRs, providing evidence-backed verdicts to ensure technical accuracy, prevent documentation rot, and maintain high-quality technical standards.
Use Cases
- Pull Request Auditing: Automatically verify technical assertions such as 'thread-safe', 'O(n) complexity', or 'XSS-safe' during the code review process.
- Documentation Accuracy Checks: Audit README files, API documentation, and technical guides to ensure code examples and claims match the actual implementation.
- Legacy Code Refactoring: Identify 'stale' or misleading comments in older codebases that no longer reflect the current logic or architectural state.
- Compliance and Quality Assurance: Validate technical claims against external standards (like RFCs or security benchmarks) to ensure documentation meets regulatory or organizational requirements.
- Automated Glossary Generation: Extract and verify key technical terms and facts from configuration files to create centralized knowledge bases for AI agents.
| name | fact-checking |
|---|---|
| description | > |
Invariant Principles
- Claims are hypotheses - Every claim requires empirical evidence before verdict
- Evidence before verdict - No verdict without traceable, citable proof
- User controls scope - User selects scope and approves all fixes
- Deduplicate findings - Check AgentDB before verifying; store after
- Learn from trajectories - Store verification trajectories in ReasoningBank
Inputs/Outputs
| Input | Required | Description |
|---|---|---|
scope |
Yes | branch changes, uncommitted, or full repo |
modes |
No | Missing Facts, Extraneous Info, Clarity (default: all) |
autonomous |
No | Skip prompts, use defaults |
| Output | Type | Description |
|---|---|---|
verification_report |
Inline | Summary, findings, bibliography |
implementation_plan |
Inline | Fixes for refuted/stale claims |
glossary |
Inline | Key facts (Clarity Mode) |
state_checkpoint |
File | .fact-checking/state.json |
Workflow
Phase 0: Configuration
Present three optional modes (default: all enabled):
- Missing Facts Detection - gaps where claims lack critical context
- Extraneous Info Detection - redundant/LLM-style over-commenting
- Clarity Mode - generate glossaries for AI config files
Autonomous mode detected ("Mode: AUTONOMOUS")? Enable all automatically.
Phase 1: Scope Selection
| Option | Method |
|---|---|
| A. Branch changes | git diff $(git merge-base HEAD main)...HEAD --name-only + unstaged |
| B. Uncommitted | git diff --name-only + git diff --cached --name-only |
| C. Full repo | All code/doc patterns |
Phase 2: Claim Extraction
Sources:
| Source | Patterns |
|---|---|
| Comments | //, #, /* */, """, ''', <!-- -->, -- |
| Docstrings | Function/class/module documentation |
| Markdown | README, CHANGELOG, docs/*.md |
| Commits | git log --format=%B for branch commits |
| PR descriptions | Via gh pr view |
| Naming | validateX, safeX, isX, ensureX |
Categories:
| Category | Examples | Agent |
|---|---|---|
| Technical | "O(n log n)", "matches RFC 5322", "handles UTF-8" | CorrectnessAgent |
| Behavior | "returns null when...", "throws if...", "never blocks" | CorrectnessAgent |
| Security | "sanitized", "XSS-safe", "bcrypt hashed", "no injection" | SecurityAgent |
| Concurrency | "thread-safe", "reentrant", "atomic", "lock-free" | ConcurrencyAgent |
| Performance | "O(n)", "cached 5m", "lazy-loaded", benchmarks | PerformanceAgent |
| Invariant/state | "never null after init", "always sorted", "immutable" | CorrectnessAgent |
| Side effects | "pure function", "idempotent", "no side effects" | CorrectnessAgent |
| Dependencies | "requires Node 18+", "compatible with Postgres 14" | ConfigurationAgent |
| Configuration | "defaults to 30s", "env var X controls Y" | ConfigurationAgent |
| Historical | "workaround for Chrome bug", "fixes #123" | HistoricalAgent |
| TODO/FIXME | Referenced issues, "temporary" hacks | HistoricalAgent |
| Examples | Code examples in docs/README | DocumentationAgent |
| Test coverage | "covered by tests in test_foo.py" | DocumentationAgent |
| External refs | URLs, RFC citations, spec references | DocumentationAgent |
Also flag: Ambiguous, Misleading, Jargon-heavy
Phase 3: Triage
Display grouped by category with depth recommendations:
## Claims Found: 23
### Security (4 claims)
1. [MEDIUM] src/auth.ts:34 - "passwords hashed with bcrypt"
2. [DEEP] src/db.ts:89 - "SQL injection safe via parameterization"
...
Adjust depths? (Enter numbers to change, or 'continue')
Depth Definitions:
| Depth | Approach | When to Use |
|---|---|---|
| Shallow | Read code, reason about behavior | Simple, self-evident claims |
| Medium | Trace execution paths, analyze control flow | Most claims |
| Deep | Execute tests, run benchmarks, instrument | Critical/numeric claims |
ARH pattern for responses: DIRECT_ANSWER (accept, proceed), RESEARCH_REQUEST (dispatch analysis), UNKNOWN (analyze, regenerate), SKIP (use defaults).
Phase 4: Parallel Verification
// Before: check existing
const existing = await agentdb.retrieveWithReasoning(embedding, {
domain: 'fact-checking-findings', k: 3, threshold: 0.92
});
if (existing.memories[0]?.similarity > 0.92) return existing.memories[0].pattern;
// After: store finding
await agentdb.insertPattern({
type: 'verification-finding',
domain: 'fact-checking-findings',
pattern_data: { claim, location, verdict, evidence, sources }
});
Spawn category agents via swarm-orchestration (hierarchical topology):
- SecurityAgent, CorrectnessAgent, PerformanceAgent
- ConcurrencyAgent, DocumentationAgent, HistoricalAgent, ConfigurationAgent
Phase 5: Verdicts
| Verdict | Meaning | Evidence Required |
|---|---|---|
| Verified | Claim is accurate | test output, code trace, docs, benchmark |
| Refuted | Claim is false | failing test, contradicting code |
| Incomplete | True but missing context | base verified + missing elements |
| Inconclusive | Cannot determine | document attempts, why insufficient |
| Ambiguous | Wording unclear | multiple interpretations explained |
| Misleading | Technically true, implies falsehood | what reader assumes vs reality |
| Jargon-heavy | Too technical for audience | unexplained terms, accessible version |
| Stale | Was true, no longer applies | when true, what changed, current state |
| Extraneous | Unnecessary/redundant | value analysis shows no added info |
Phase 6: Report
Sections: Header, Summary, Findings by Category, Bibliography, Implementation Plan
Bibliography Formats:
| Type | Format |
|---|---|
| Code trace | file:lines - finding |
| Test | command - result |
| Web source | Title - URL - "excerpt" |
| Git history | commit/issue - finding |
| Documentation | Docs: source section - URL |
| Benchmark | Benchmark: method - results |
| Paper/RFC | Citation - section - URL |
Phase 6.5: Clarity Mode (if enabled)
Generate glossaries/key facts from verified claims (confidence > 0.7).
Targets: CLAUDE.md, GEMINI.md, AGENTS.md, *_AGENT.md, *_AI.md
Glossary Entry: - **[Term]**: [1-2 sentence definition]. [Usage context.]
Key Fact Categories: Architecture, Behavior, Integration, Error Handling, Performance
Update existing sections or append before --- separators.
Phase 7: Learning
Store trajectories in ReasoningBank:
await reasoningBank.insertPattern({
type: 'verification-trajectory',
domain: 'fact-checking-learning',
pattern: { claimText, claimType, depthUsed, verdict, timeSpent, evidenceQuality }
});
Applications: depth prediction, strategy selection, ordering optimization, false positive reduction.
Phase 8: Fixes
- Present implementation plan for non-verified claims
- Show proposed change, ask approval
- Apply approved fixes
- Offer re-verification
Interruption Handling
Checkpoint to .fact-checking/state.json after each claim:
{
"scope": "branch",
"claims": [...],
"completed": [0, 1, 2],
"pending": [3, 4, 5],
"findings": {...},
"bibliography": [...]
}
Offer resume on next invocation.
Skipping Claims
- No claim is "trivial" - verify individually
- No batching similar claims without individual verification
Applying Fixes Without Approval
- No auto-correcting comments
- Each fix requires explicit user approval
Ignoring AgentDB
- ALWAYS check before verifying
- ALWAYS store findings after verification
Phase 1: Scope selection → User selects "A. Branch changes"
Phase 2: Extract claims → Found 8 claims in 5 files
Phase 3: Triage display with depths:
### Security (2 claims)
1. [MEDIUM] src/auth/password.ts:34 - "passwords hashed with bcrypt"
2. [DEEP] src/auth/session.ts:78 - "session tokens cryptographically random"
...
Phase 4: Verification (showing claim 1):
- Read src/auth/password.ts:34-60
- Found:
import { hash } from 'bcryptjs' - Found:
await hash(password, 12) - Confirmed cost factor 12 meets OWASP
Verdict: VERIFIED Evidence: bcryptjs.hash() with cost factor 12 confirmed Sources: [1] Code trace, [2] OWASP Password Storage
Phase 6: Report excerpt:
# Fact-Checking Report
**Scope:** Branch feature/auth-refactor (12 commits)
**Verified:** 5 | **Refuted:** 1 | **Stale:** 1 | **Inconclusive:** 1
## Bibliography
[1] Code trace: src/auth/password.ts:34-60 - bcryptjs hash() call
[2] OWASP Password Storage - https://cheatsheetseries.owasp.org/...
## Implementation Plan
1. [ ] src/cache/store.ts:23 - TTL is 60s not 300s, update comment
If ANY unchecked: STOP and fix.