What is skill-refinement?

The Feedback-Driven Skill Refinement tool is a powerful observability solution designed to optimize AI agent performance through automated tool outcome analysis. It systematically collects execution data, attributes results to specific skills, and generates actionable insights to reduce error rates and improve the reliability of Claude's capabilities. By surfacing 'error hotspots' and success patterns, it enables developers to iteratively refine skill definitions and tool-use logic based on real-world usage data.

When should I use skill-refinement?

skill-refinement is useful in the following scenarios: • Identifying Error Hotspots: Automatically detect which specific tools or skills are failing most frequently to prioritize debugging efforts. • Optimizing Skill Definitions: Use performance recommendations to update SKILL.md files with better guidance, trigger phrases, or constraints based on observed failures. • Performance Benchmarking: Track the success rates of different agent skills over time to ensure updates or changes improve overall system reliability. • Root Cause Analysis: Query the feedback database to understand the context (transcript and tool inputs) behind specific execution errors. • Automated Feedback Loops: Set up a continuous improvement workflow where agent performance data is used to refine the semantic attribution of tools to skills.

name	skill-refinement
description	\|

Feedback-Driven Skill Refinement

Collects PostToolUse feedback, attributes outcomes to skills semantically, and surfaces actionable insights for improving skills.

Quick Start

# Set up feedback collection (one time)
voyager feedback setup

# Use Claude Code normally - feedback is collected automatically

# View insights
voyager feedback insights

# View insights for a specific skill
voyager feedback insights --skill session-brain --errors

CLIs

`feedback-setup` / `voyager feedback setup`

Initialize feedback collection by:

Creating the feedback database at .claude/voyager/feedback.db
Installing a PostToolUse hook at .claude/hooks/post_tool_use_feedback.py
Updating .claude/settings.local.json with hook configuration

Options:

--dry-run / -n: Show what would be done without making changes
--reset: Delete existing feedback data and start fresh
--db PATH: Use a custom database path

`skill-insights` / `voyager feedback insights`

Analyze collected feedback and generate improvement recommendations.

Options:

--skill SKILL / -s SKILL: Filter insights for a specific skill
--errors / -e: Show common errors
--json: Output results as JSON
--db PATH: Use a custom database path

How Skill Attribution Works

The system uses a cascade of strategies to attribute tool executions to skills without hardcoded mappings:

Transcript Context (most accurate)
- Checks if Claude read a SKILL.md file in this session
- If yes, attributes subsequent tool uses to that skill
Learned Associations (fast)
- Looks up similar tool+context patterns from past sessions
- Improves over time as more feedback is collected
ColBERT Index Query (semantic, if available)
- Queries the skill retrieval index with tool context
- Works when find-skill command is available
LLM Inference (comprehensive, disabled by default in hooks)
- Asks an LLM to identify the skill from context
- Slowest but most comprehensive fallback

Storage

Feedback Database: .claude/voyager/feedback.db (SQLite)
Hook Script: .claude/hooks/post_tool_use_feedback.py

Database Schema

tool_executions: Per-tool execution logs

session_id, tool_name, tool_input, tool_response
success, error_message, duration_ms
skill_used (attributed skill)
timestamp

session_summaries: Per-session aggregates

tools_used, skills_detected
total/successful/failed calls
task_completed, completion_feedback

learned_associations: Tool context → skill mappings

context_key (tool|extension|command)
skill_id, confidence, hit_count

Insights Output

The insights command shows:

Summary: Total executions, sessions, skills detected
Skill Performance: Success rate and error counts per skill
Tool Usage: Which tools are used most, failure rates
Common Errors: Recurring error patterns
Recommendations: Actionable suggestions like:
- "Low success rate - update SKILL.md with better guidance"
- "Recurring error (5x): file not found..."
- "Low usage - add more trigger phrases"

Workflow for Improving Skills

Run voyager feedback insights --errors to see problem areas
Check specific skill with voyager feedback insights --skill NAME
Review the recommendations
Update SKILL.md or reference.md based on observed failures
Re-run insights periodically to track improvement

skill-refinement

When & Why to Use This Skill

Use Cases

Feedback-Driven Skill Refinement

Quick Start

CLIs

`feedback-setup` / `voyager feedback setup`

`skill-insights` / `voyager feedback insights`

How Skill Attribution Works

Storage

Database Schema

Insights Output

Workflow for Improving Skills

See Also

When & Why to Use This Skill

Use Cases

Feedback-Driven Skill Refinement

Quick Start

CLIs

feedback-setup / voyager feedback setup

skill-insights / voyager feedback insights

How Skill Attribution Works

Storage

Database Schema

Insights Output

Workflow for Improving Skills

See Also

`feedback-setup` / `voyager feedback setup`

`skill-insights` / `voyager feedback insights`