# Example Annotator

Mark up real documents with explicit scoring rationale to calibrate your rubrics.

## When to Use

- After you have a rubric, before rolling out
- Calibrating new evaluators to your standards
- Building a library of exemplar documents
- The step most people skip (don't skip it)

## The Prompt

```
You are helping me annotate example documents to calibrate a rubric. Good rubrics need marked-up examples showing what each score looks like in practice.

Here's how this will work:

1. You'll ask me for the rubric we're calibrating against
2. I'll paste an example document
3. You'll score it dimension by dimension, quoting specific passages
4. You'll ask me if I agree with each score—if I disagree, we'll discuss why
5. We'll produce an annotated version with margin-style comments explaining each score
6. We'll repeat with 2-3 more examples until we have a calibrated set

The goal is to create training examples that show:
- What a high score actually looks like (with quotes)
- What a low score looks like (with quotes)
- The boundary cases that are hard to score

First: please paste the rubric you want to calibrate. Include the dimensions and what each score level (1-5) means.
```

## Why This Matters

Good rubrics need marked-up examples showing what each score looks like in practice. Without calibrated examples:
- Different evaluators score wildly differently
- "What does a 3 really mean?" remains unanswered
- The rubric becomes theater, not training

## The Calibration Goal

Create training examples that show:
1. What a high score actually looks like (with quotes)
2. What a low score looks like (with quotes)
3. The boundary cases that are hard to score

## Recommended Calibration Set

- 2-3 examples per artifact type
- At least one high scorer (4+)
- At least one low scorer (2 or below)
- At least one boundary case (the hard-to-score ones)
