fpf-skillmetric-score-usability
Calculates the SkillUsabilityScore (U.Metric) for Zero-Shot Enactment.
When & Why to Use This Skill
This Claude skill automates the calculation of the SkillUsabilityScore (U.Metric), a specialized metric designed to evaluate the effectiveness of zero-shot enactment. By analyzing discovery and compliance factors, it provides a standardized 1-5 ordinal scale to measure how seamlessly an agent can execute tasks without prior training or clarification, making it essential for developers focused on agentic performance and UX.
Use Cases
- Benchmarking AI Agent Performance: Quantify how effectively an agent discovers and executes specific skills in a zero-shot environment to compare different model versions.
- UX Research for AI Interactions: Measure the 'friction' level in user-agent interactions to identify whether failures are due to discovery issues or execution errors.
- Automated Quality Assurance: Integrate usability scoring into the evaluation phase of the development lifecycle to ensure new skills meet high standards of zero-shot usability.
| name | fpf-skill:metric-score-usability |
|---|---|
| description | Calculates the SkillUsabilityScore (U.Metric) for Zero-Shot Enactment. |
| version | 0.1.0 |
| allowed_tools | [] |
metric/score-usability Kernel
Context
This skill calculates a specific composite metric: SkillUsabilityScore. As defined in the experiment, this is an Ordinal 1-5 scale measuring "Zero-Shot Enactment".
Instructions
Calculate the score based on the following rubric:
1. Factors
Evaluate two inputs:
- Discovery (D): Did the user find the skill without clarification? (Boolean)
- Compliance (C): Did the interface invocation succeed without error? (Boolean)
2. Scoring Logic
- 5 (Perfect): D=True AND C=True.
- 3 (Friction): D=False (required help) OR C=False (retry needed), but eventually succeeded.
- 1 (Failure): Could not complete even with help.
3. Output
Return the integer score (1, 3, or 5).