code-verification

benjaminshoemaker's avatarfrom benjaminshoemaker

Multi-agent code verification workflow using a main agent and sub-agent loop. Use when verifying code against requirements, acceptance criteria, or quality standards. Triggers on requests to verify, validate, or check code against specifications, checklists, or instructions.

0stars🔀0forks📁View on GitHub🕐Updated Jan 10, 2026

When & Why to Use This Skill

This Claude skill provides a sophisticated multi-agent workflow for automated code verification and quality assurance. By utilizing a main-agent and sub-agent loop, it systematically validates code against requirements, acceptance criteria, and quality standards. The skill features automated fix attempts, regression testing, and advanced browser-based UI verification via Playwright, ensuring that software implementations are reliable, accessible, and compliant with technical specifications.

Use Cases

  • Automated Requirements Validation: Verify that new feature implementations strictly adhere to complex acceptance criteria and functional specifications through structured, itemized checklists.
  • End-to-End Web UI Testing: Use Playwright integration to perform deep inspections of DOM elements, visual appearance, accessibility (ARIA) standards, and network performance on web applications.
  • Iterative Bug Fixing and Self-Healing: Automatically detect code failures or linting errors and execute iterative fix attempts with built-in regression checks to ensure stability without manual intervention.
  • Code Quality and Compliance Audits: Scan codebases to ensure all functions have docstrings, remove unused imports, and maintain high test coverage across multiple files.
  • Pre-deployment Verification: Generate comprehensive verification reports that document passed tests, failed attempts, and audit trails before merging code into production.
namecode-verification
descriptionMulti-agent code verification workflow using a main agent and sub-agent loop. Use when verifying code against requirements, acceptance criteria, or quality standards. Triggers on requests to verify, validate, or check code against specifications, checklists, or instructions.

Code Verification Skill

Verify code against requirements using a main agent / sub-agent loop with structured feedback and automatic retry.

Workflow Overview

1. Parse verification instructions into testable items
2. For each instruction:
   a. Pre-flight: Confirm instruction is testable
   b. Sub-agent: Verify if instruction is met
   c. If failed: Main agent attempts fix
   d. Repeat b-c up to 5 times or until success
   e. Update checklist with result
3. Generate verification report

Step 1: Parse Verification Instructions

Extract each verification instruction into a discrete, testable item:

  • ID: Unique identifier (e.g., V-001)
  • Instruction: The requirement text
  • Test approach: How to verify (file inspection, run tests, lint, type check, etc.)
  • Files involved: Which files to examine
  • Requires Browser: Whether the instruction needs Playwright MCP verification
    • Auto-detect from keywords: UI, render, display, visible, hidden, show, hide, click, hover, focus, blur, scroll, DOM, element, component, layout, responsive, style, CSS, color, font, screenshot, visual, appearance, console, error, warning, log, network, request, response, accessibility, a11y, ARIA, animation, transition, loading, performance
    • Mark as: browser: true or browser: false
  • Browser Verification Type (if browser: true):
    • DOM_INSPECTION - Element presence, visibility, content via accessibility tree snapshots
    • SCREENSHOT - Visual appearance, layout verification
    • CONSOLE - Browser console errors, warnings, logs
    • NETWORK - API requests, responses, status codes (via network interception)
    • PERFORMANCE - Load times, Core Web Vitals (via tracing)
    • ACCESSIBILITY - ARIA attributes, semantic HTML, accessibility tree analysis

Step 2: Pre-flight Validation

Before the verification loop, confirm each instruction is testable:

  • Instruction is specific and unambiguous
  • Success criteria are clear
  • Required files/resources exist

Flag untestable instructions immediately rather than attempting verification.

Browser-Specific Pre-Flight

For instructions with browser: true:

  1. Check Playwright MCP availability

    • If unavailable, mark instruction as BLOCKED with reason: "Playwright MCP not available"
    • Suggest: "Ensure Playwright MCP server is running (npx @playwright/mcp@latest)"
  2. Verify dev server is running

    • Check if configured dev server URL responds (e.g., http://localhost:3000)
    • If not running, attempt to start using configured command (e.g., npm run dev)
    • Wait for configured startup time before proceeding
    • If unable to start, mark as BLOCKED: "Dev server not accessible at {URL}"
  3. Confirm target route exists

    • Navigate to the page specified in the instruction using browser_navigate
    • If 404 or error, mark as BLOCKED: "Target route not found: {route}"

Step 3: Sub-Agent Verification Protocol

Spawn a sub-agent to verify each instruction. The sub-agent MUST return structured output:

VERIFICATION RESULT
-------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Location: [file:line or "N/A"]
Severity: BLOCKING | MINOR
Finding: [What was found]
Expected: [What was expected]
Suggested Fix: [Specific fix recommendation]

Sub-agent rules:

  • Check ONLY the specific instruction assigned
  • Do not attempt fixes—report findings only
  • Be precise about location (file, line number, function name)
  • Distinguish between blocking failures and minor issues

Browser-Enhanced Verification Output

For instructions with browser: true, the sub-agent MUST use Playwright MCP and return:

BROWSER VERIFICATION RESULT
---------------------------
Instruction ID: [ID]
Status: PASS | FAIL | BLOCKED
Type: DOM | VISUAL | CONSOLE | NETWORK | PERFORMANCE | ACCESSIBILITY
URL: [URL] | Viewport: [width]x[height]

Finding: [What was observed]
Expected: [What was expected]

Details: [Type-specific information]
  - DOM: selector, found, visible, content
  - Visual: screenshot path, description
  - Console: errors, warnings, logs
  - Network: endpoint, method, status, response summary
  - Performance: load time, LCP, FID, CLS
  - Accessibility: ARIA, semantic HTML, contrast, keyboard nav

Suggested Fix: [Specific fix recommendation]

Browser Sub-Agent Rules

In addition to standard sub-agent rules, browser verification sub-agents MUST:

  • Start with an accessibility tree snapshot (browser_snapshot) of the initial state
  • Use stable selectors (prefer data-testid over complex CSS paths, or use accessibility tree element refs)
  • Wait for dynamic content to load before inspecting (browser_wait_for_text or browser_wait)
  • Capture console output before and after actions
  • Take screenshots (browser_screenshot) when verifying visual appearance
  • Test at default viewport unless criterion specifies responsive/mobile (use browser_resize to change)

Step 4: Main Agent Fix Protocol

When sub-agent reports FAIL:

  1. Review the finding - Understand what failed and why
  2. Check fix history - Do not repeat a previously attempted fix
  3. Apply targeted fix - Make the minimum change to address the issue
  4. Log the attempt - Record what was changed

Fix attempt tracking

Maintain a fix log per instruction:

FIX LOG: [Instruction ID]
--------------------------
Attempt 1: [Description of change] → [Result]
Attempt 2: [Description of change] → [Result]
...

Strategy escalation

  • Attempts 1-2: Direct fix based on sub-agent suggestion
  • Attempt 3: Try alternative approach
  • Attempts 4-5: Broaden scope, consider architectural changes

If the same failure pattern repeats twice, explicitly try a different strategy.

Browser-Specific Fix Strategies

Failure Type Common Fixes
DOM/Visibility Conditional rendering, CSS display/visibility, z-index, prop passing
Console errors JS exceptions, missing mocks, env vars, CORS
Network Endpoint URLs, auth headers, payload format, CORS config
Visual CSS cascade, responsive breakpoints, font loading
Performance Bundle size, image optimization, lazy loading, render-blocking
Accessibility ARIA attributes, color contrast, heading hierarchy, keyboard handlers

Step 5: Exit Conditions

Exit the verification loop when ANY condition is met:

Condition Action
Sub-agent reports PASS ✅ Check off instruction
5 attempts exhausted ❌ Mark failed with notes
Same failure 3+ times ⚠️ Exit early, flag for review
Fix introduces regression ⚠️ Revert, flag for review
Issue is MINOR severity ⚠️ Note and continue

Step 6: Regression Check

After each fix attempt, verify:

  • The targeted instruction (primary check)
  • Any previously-passing related instructions (regression check)

If a fix breaks something else, revert and note the conflict.

Browser Regression Checks

After each browser-related fix, verify no regressions in: console errors, visual appearance, performance metrics, accessibility. If regression detected, capture before/after state and log in fix history.

Step 7: Generate Verification Report

After all instructions are processed:

VERIFICATION REPORT
===================
Total Instructions: [N]
Passed: [N] ✅
Failed: [N] ❌
Needs Review: [N] ⚠️

DETAILS
-------
[V-001] ✅ [Instruction summary]
[V-002] ❌ [Instruction summary]
  - Failed after 5 attempts
  - Last error: [description]
  - Attempts: [brief log]
[V-003] ⚠️ [Instruction summary]
  - Flagged: Repeated same failure pattern
  - Recommendation: [suggestion]

AUDIT TRAIL
-----------
[Timestamp] V-001: Verified PASS on first check
[Timestamp] V-002: Attempt 1 - Changed X → FAIL
[Timestamp] V-002: Attempt 2 - Changed Y → FAIL
...

BROWSER VERIFICATION (if applicable)
------------------------------------
Browser Checks: [passed]/[total] | Blocked: [N]
Playwright: Available | Unavailable
Dev Server: [URL] | Not Running

Issues Found:
- [V-XXX] {type}: {description}

Screenshots: [list of captured files]

Example

Given a checklist:

[ ] All functions have docstrings
[ ] No unused imports
[ ] Tests pass with >80% coverage

Workflow execution:

  1. Parse into V-001, V-002, V-003
  2. Pre-flight confirms all are testable
  3. Sub-agent checks V-001 → FAIL (missing docstring in utils.py:45)
  4. Main agent adds docstring
  5. Sub-agent re-checks → PASS
  6. Continue to V-002...
  7. Final report shows 3/3 passed

Key Principles

  • Structured feedback: Sub-agent always returns actionable, located findings
  • No repeated fixes: Track what was tried to avoid loops
  • Early exit: Don't burn attempts on unfixable issues
  • Regression awareness: Fixes shouldn't break other things
  • Audit everything: The journey matters for debugging