prediction-tracking
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
When & Why to Use This Skill
The Prediction Tracking skill is a specialized framework designed to systematically record, monitor, and evaluate the accuracy of AI-related forecasts and claims. By capturing critical metadata—including author confidence, specific timeframes, and success metrics—it enables users to transform vague industry hype into measurable data. This tool is essential for researchers and strategists who need to assess the reliability of industry voices, identify forecasting biases, and maintain a high-integrity knowledge base of AI progress.
Use Cases
- Expert Accountability Tracking: Monitor and score the historical accuracy of predictions made by AI industry leaders and critics to identify who provides the most reliable insights.
- Forecasting Calibration Analysis: Evaluate whether specific predictors are consistently overconfident or underconfident, helping to adjust expectations for future technological milestones.
- AI Milestone Documentation: Build a structured repository of technological claims to verify if specific capabilities (like AGI or specific benchmark scores) were achieved within stated timeframes.
- Strategic Trend Research: Use historical 'verified' vs 'falsified' data to inform investment or development strategies based on proven technological trajectories rather than speculation.
| name | prediction-tracking |
|---|---|
| description | Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain. |
Prediction Tracking Skill
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
Prediction Recording
When recording a new prediction, capture:
Required Fields
- text: The prediction as stated
- author: Who made it
- madeAt: When it was made
- timeframe: When they expect it to happen
- topic: What area of AI
- confidence: How confident they seemed
Optional Fields
- sourceUrl: Where the prediction was made
- targetDate: Specific date if mentioned
- conditions: Any caveats or conditions
- metrics: How to measure success
Evaluation Status
When evaluating predictions, assign one of:
verified
Clearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described
falsified
Clearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim
partially-verified
Partially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct
too-early
Not enough time has passed.
- Still within stated timeframe
- No definitive evidence either way
unfalsifiable
Cannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts
ambiguous
Prediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear
Evaluation Process
For each prediction being evaluated:
1. Restate the prediction
What exactly was claimed?
2. Identify timeframe
Has enough time passed to evaluate?
3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence
4. Assess status
Which evaluation status applies?
5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong
6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls
Output Format
For evaluation:
{
"evaluations": [
{
"predictionId": "id",
"status": "verified",
"accuracyScore": 0.85,
"evidence": "Description of evidence",
"notes": "Additional context",
"evaluatedAt": "timestamp"
}
]
}
For accuracy statistics:
{
"author": "Author name",
"totalPredictions": 15,
"verified": 5,
"falsified": 3,
"partiallyVerified": 2,
"pending": 4,
"unfalsifiable": 1,
"averageAccuracy": 0.62,
"topicBreakdown": {
"reasoning": { "predictions": 5, "accuracy": 0.7 },
"agents": { "predictions": 3, "accuracy": 0.4 }
},
"calibration": "Assessment of how well-calibrated they are"
}
Calibration Assessment
Evaluate whether predictors are well-calibrated:
Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately
Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence
Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative
Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy
Tracking Notable Predictors
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes |
|---|---|---|---|---|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
Red Flags
Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anything