skill-refinement
Feedback-driven skill improvement through tool outcome analysis. Collects executiondata and surfaces insights for skill refinement. Use this skill when you want to:- Understand how skills are performing ("show skill feedback", "how are skills doing")- Get insights on skill effectiveness ("skill insights", "what skills need improvement")- Identify skills that need improvement ("which skills have errors")- Analyze tool usage patterns ("what tools are failing", "error hotspots")- Set up feedback collection ("enable feedback", "setup feedback tracking")
When & Why to Use This Skill
The Feedback-Driven Skill Refinement tool is a powerful observability solution designed to optimize AI agent performance through automated tool outcome analysis. It systematically collects execution data, attributes results to specific skills, and generates actionable insights to reduce error rates and improve the reliability of Claude's capabilities. By surfacing 'error hotspots' and success patterns, it enables developers to iteratively refine skill definitions and tool-use logic based on real-world usage data.
Use Cases
- Identifying Error Hotspots: Automatically detect which specific tools or skills are failing most frequently to prioritize debugging efforts.
- Optimizing Skill Definitions: Use performance recommendations to update SKILL.md files with better guidance, trigger phrases, or constraints based on observed failures.
- Performance Benchmarking: Track the success rates of different agent skills over time to ensure updates or changes improve overall system reliability.
- Root Cause Analysis: Query the feedback database to understand the context (transcript and tool inputs) behind specific execution errors.
- Automated Feedback Loops: Set up a continuous improvement workflow where agent performance data is used to refine the semantic attribution of tools to skills.
| name | skill-refinement |
|---|---|
| description | | |
Feedback-Driven Skill Refinement
Collects PostToolUse feedback, attributes outcomes to skills semantically, and surfaces actionable insights for improving skills.
Quick Start
# Set up feedback collection (one time)
voyager feedback setup
# Use Claude Code normally - feedback is collected automatically
# View insights
voyager feedback insights
# View insights for a specific skill
voyager feedback insights --skill session-brain --errors
CLIs
feedback-setup / voyager feedback setup
Initialize feedback collection by:
- Creating the feedback database at
.claude/voyager/feedback.db - Installing a PostToolUse hook at
.claude/hooks/post_tool_use_feedback.py - Updating
.claude/settings.local.jsonwith hook configuration
Options:
--dry-run/-n: Show what would be done without making changes--reset: Delete existing feedback data and start fresh--db PATH: Use a custom database path
skill-insights / voyager feedback insights
Analyze collected feedback and generate improvement recommendations.
Options:
--skill SKILL/-s SKILL: Filter insights for a specific skill--errors/-e: Show common errors--json: Output results as JSON--db PATH: Use a custom database path
How Skill Attribution Works
The system uses a cascade of strategies to attribute tool executions to skills without hardcoded mappings:
Transcript Context (most accurate)
- Checks if Claude read a SKILL.md file in this session
- If yes, attributes subsequent tool uses to that skill
Learned Associations (fast)
- Looks up similar tool+context patterns from past sessions
- Improves over time as more feedback is collected
ColBERT Index Query (semantic, if available)
- Queries the skill retrieval index with tool context
- Works when
find-skillcommand is available
LLM Inference (comprehensive, disabled by default in hooks)
- Asks an LLM to identify the skill from context
- Slowest but most comprehensive fallback
Storage
- Feedback Database:
.claude/voyager/feedback.db(SQLite) - Hook Script:
.claude/hooks/post_tool_use_feedback.py
Database Schema
tool_executions: Per-tool execution logs
- session_id, tool_name, tool_input, tool_response
- success, error_message, duration_ms
- skill_used (attributed skill)
- timestamp
session_summaries: Per-session aggregates
- tools_used, skills_detected
- total/successful/failed calls
- task_completed, completion_feedback
learned_associations: Tool context → skill mappings
- context_key (tool|extension|command)
- skill_id, confidence, hit_count
Insights Output
The insights command shows:
- Summary: Total executions, sessions, skills detected
- Skill Performance: Success rate and error counts per skill
- Tool Usage: Which tools are used most, failure rates
- Common Errors: Recurring error patterns
- Recommendations: Actionable suggestions like:
- "Low success rate - update SKILL.md with better guidance"
- "Recurring error (5x): file not found..."
- "Low usage - add more trigger phrases"
Workflow for Improving Skills
- Run
voyager feedback insights --errorsto see problem areas - Check specific skill with
voyager feedback insights --skill NAME - Review the recommendations
- Update SKILL.md or reference.md based on observed failures
- Re-run insights periodically to track improvement
See Also
reference.md- Technical reference for implementation detailsskills/skill-retrieval/- Skill indexing for semantic attributionskills/skill-factory/- Creating new skills from observed patterns