confidence-evaluator

davjdk's avatarfrom davjdk

Evaluate requirement clarity and completeness using ISO/IEC/IEEE 29148:2018 criteria. Use when user asks to implement features, fix bugs, or make changes. Automatically invoked when confidence_policy is enabled in ai-settings.json.

0stars🔀0forks📁View on GitHub🕐Updated Jan 11, 2026

When & Why to Use This Skill

The Confidence Evaluator is a professional-grade requirement analysis tool that assesses the clarity, completeness, and feasibility of software tasks using the ISO/IEC/IEEE 29148:2018 standard. It functions as a sophisticated guardrail for AI agents, calculating a structured confidence score to determine if a prompt provides sufficient detail for successful execution. By identifying ambiguities and missing constraints before coding begins, it significantly reduces errors and improves the reliability of AI-driven development workflows.

Use Cases

  • Feature Implementation Gatekeeping: Automatically evaluate new feature requests to ensure all inputs, outputs, and success criteria are clearly defined before the agent starts writing code.
  • Bug Fix Clarity Assessment: Analyze bug reports to confirm they are unambiguous and provide enough context for a verifiable fix, preventing wasted effort on poorly defined issues.
  • Architectural Change Validation: Use ISO-standard criteria to check if proposed system modifications are consistent with the existing project structure and technically feasible.
  • Autonomous Workflow Optimization: Enable the 'confidence_policy' in AI settings to force the agent to ask clarifying questions whenever a user's instruction falls below a specific quality threshold.
nameconfidence-evaluator
descriptionEvaluate requirement clarity and completeness using ISO/IEC/IEEE 29148:2018 criteria. Use when user asks to implement features, fix bugs, or make changes. Automatically invoked when confidence_policy is enabled in ai-settings.json.

Confidence Evaluator Skill

You are evaluating the clarity and completeness of a user requirement against ISO/IEC/IEEE 29148:2018 standards.

When to Use This Skill

  • User requests implementation of a feature
  • User asks for bug fixes
  • User proposes architectural changes
  • Any task requiring code modifications

Configuration

Read the threshold from .ai/ai-settings.json:

{
  "framework": {
    "confidence_policy": true,
    "confidence_threshold": 85
  }
}

If confidence_policy is false, skip evaluation entirely.

Evaluation Process

Step 1: Calculate Intuitive Estimate

Give your subjective confidence (0-100) based on:

  • How well you understand the requirement
  • Whether you can identify all necessary changes
  • Clarity of success criteria

Step 2: Calculate Structured Score (maximum 100 points)

Evaluate the requirement against these criteria:

Requirements Category (60 points)

Criterion Weight Evaluation Questions
Unambiguous formulation 20 Is there only one way to interpret this?
Completeness (input/output/constraints) 20 Are all inputs, outputs, and constraints defined?
Verifiable result 15 Can completion be objectively measured?
Consistency with project 10 Does it conflict with existing requirements?
Rationale (source) 5 Is the reason for this requirement stated?
Technical feasibility 5 Is it achievable within constraints?

Formatting Category (40 points)

Criterion Weight Evaluation Questions
Structured prompt 10 Is it logically organized?
Explicit tasks 7 Does it use "must/shall/должен"?
Result examples 4 Are concrete examples provided?
Decomposable 4 Can it be broken into subtasks?

Step 3: Calculate Final Confidence

confidence = (intuitive_estimate + structured_score) / 2

Step 4: Compare with Threshold

If confidence < threshold:

  • DO NOT proceed with implementation
  • Return 1-3 clarifying questions
  • Provide an example of a well-formed requirement

If confidence >= threshold:

  • Proceed with the task

Output Format

When Confidence is Sufficient

Confidence Assessment: 87/100 (threshold: 85)

- Intuitive estimate: 85/100
- Structured score: 89/100
  - Requirements: 53/60
  - Formatting: 36/40

Proceeding with implementation...

When Confidence is Insufficient

Confidence Assessment: 72/100 (threshold: 85)

- Intuitive estimate: 70/100
- Structured score: 74/100
  - Requirements: 42/60 (missing completeness criteria)
  - Formatting: 32/40 (no examples provided)

## Clarifying Questions

1. What are the expected input formats for this feature?
2. Should this handle edge cases like X, Y, Z?

## Improved Requirement Example

[Provide a rewritten version of the requirement with sufficient detail]

References

See iso_criteria.md for detailed ISO/IEC/IEEE 29148:2018 criteria explanations. See templates/clarifying_questions.md for question templates by requirement type.

confidence-evaluator – AI Agent Skills | Claude Skills