experiment-analyzer

majiayu000's avatarfrom majiayu000

Analyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes.

0stars🔀0forks📁View on GitHub🕐Updated Jan 5, 2026

When & Why to Use This Skill

This Claude skill automates the end-to-end analysis of growth experiments, enabling teams to validate hypotheses, calculate metric shifts, and generate actionable insights. It streamlines the transition from raw data to strategic follow-up actions by providing structured reports, ICE score retrospectives, and data-driven recommendations for future testing cycles.

Use Cases

  • Automated Experiment Reporting: Generate comprehensive markdown reports immediately after an A/B test or growth experiment is completed, covering statistical significance and primary metric changes.
  • Hypothesis Validation: Compare original 'if-then' assumptions with actual user behavior to determine if the underlying product logic was correct or requires a pivot.
  • ICE Score Retrospective: Evaluate the accuracy of initial Impact, Confidence, and Ease predictions against real-world results to improve the team's future prioritization accuracy.
  • Strategic Follow-up Planning: Receive specific recommendations for next steps, such as scaling winning features to 100% of users, optimizing inconclusive results, or investigating the root causes of losses.
  • Cross-Experiment Pattern Recognition: Analyze a portfolio of experiments to identify which funnel stages (Acquisition, Retention, etc.) or user segments are yielding the highest growth ROI.
nameexperiment-analyzer
descriptionAnalyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes.
allowed-tools[Read, Write, Grep, Glob]

Experiment Analyzer Skill

Analyze completed growth experiments, extract insights, and drive continuous learning.

When to Activate

This skill should activate when:

  • User marks experiment as "completed"
  • User asks "what did we learn?"
  • User mentions "results", "outcomes", or "analysis"
  • User asks "what should we do next?"
  • User wants to compare multiple experiments
  • User asks about experiment success rates

Analysis Framework

1. Result Classification

Win (Positive + Significant)

  • Result is better than baseline
  • Statistical significance ≥ 95%
  • Change is meaningful (usually ≥5%)

Loss (Negative + Significant)

  • Result is worse than baseline
  • Statistical significance ≥ 95%
  • Change is meaningful

Inconclusive

  • Statistical significance < 95%
  • Not enough data to make decision
  • Sample size may be insufficient

Neutral

  • Minimal change (< ±2%)
  • No meaningful impact either way
  • May indicate hypothesis was off

2. Hypothesis Validation

Compare original hypothesis to results:

Hypothesis Components:

  • Proposed change → Was it implemented as planned?
  • Target audience → Did we reach the right users?
  • Expected outcome → Did we hit the target?
  • Rationale → Was our reasoning correct?

Validation Questions:

  • Did we achieve the expected outcome? (Yes/No/Partially)
  • Was the underlying assumption correct?
  • What surprised us?
  • What would we do differently?

3. ICE Score Retrospective

Compare predicted vs actual:

Impact Score Validation:

  • Predicted Impact: [original score]
  • Actual Impact: [calculate based on results]
  • Delta: [difference]
  • Learning: Was our impact prediction accurate?

Confidence Score Validation:

  • Predicted Confidence: [original score]
  • Outcome: [win/loss/inconclusive]
  • Learning: Was our confidence justified?

Ease Score Validation:

  • Predicted Ease: [original score]
  • Actual Time: [if tracked]
  • Learning: Was implementation as easy as expected?

4. Insight Generation

Key Questions:

  • What worked? Specific elements that drove success
  • What didn't work? Elements that failed or harmed metrics
  • What was surprising? Unexpected findings
  • What patterns emerge? Connections to other experiments
  • What new questions arise? Areas to investigate further

Secondary Metrics:

  • Review all secondary metrics tracked
  • Look for unintended positive effects
  • Watch for negative side effects
  • Consider holistic impact

5. Follow-up Experiment Suggestions

Based on the outcome, suggest 2-3 follow-up experiments:

For Wins:

  • Scale: Roll out to 100% of users
  • Amplify: Make the winning element more prominent
  • Extend: Apply pattern to related areas
  • Optimize: Test variations to improve further

For Losses:

  • Pivot: Try alternative approach to same problem
  • Investigate: Run research to understand why
  • Revert: Document and move on
  • Learn: Apply learnings to future experiments

For Inconclusive:

  • Re-run: Increase sample size or duration
  • Simplify: Test smaller version to isolate variable
  • Segment: Test with specific user segments
  • Refine: Adjust hypothesis based on early signals

Analysis Process

Step 1: Load and Validate

1. Read experiment JSON from completed/archived folder
2. Verify results data exists:
   - Primary metric
   - Baseline value
   - Result value
   - Statistical significance
   - Sample size
   - Duration
3. Check if hypothesis is documented
4. Review ICE scores

Step 2: Calculate Key Metrics

Change Percentage = ((Result - Baseline) / Baseline) × 100

Result Classification:
- IF change% > 2% AND significance >= 95% → Win
- IF change% < -2% AND significance >= 95% → Loss
- IF significance < 95% → Inconclusive
- IF abs(change%) < 2% → Neutral

Step 3: Generate Insights

1. Classify result (Win/Loss/Inconclusive/Neutral)
2. Validate hypothesis against results
3. Review ICE score predictions
4. Extract key learnings
5. Identify surprising findings
6. Check secondary metrics
7. Look for patterns across related experiments

Step 4: Create Follow-up Ideas

1. Based on result type, brainstorm 2-3 follow-ups
2. For each follow-up:
   - Draft hypothesis
   - Explain rationale (reference current learnings)
   - Suggest category
   - Provide preliminary ICE estimate
3. Prioritize follow-ups by potential impact

Step 5: Generate Report

1. Create markdown analysis report
2. Include:
   - Summary (result classification, key numbers)
   - Hypothesis validation
   - ICE score retrospective
   - Key insights (bulleted list)
   - Secondary metrics review
   - Recommendations
   - Follow-up experiment ideas
3. Save to experiments/archive/[id]_analysis.md
4. Update experiment JSON with learnings

Analysis Output Template

# Experiment Analysis: [Title]

**Date:** [Analysis date]
**Experiment ID:** [id]
**Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○

## Summary

- **Primary Metric:** [metric name]
- **Baseline:** [baseline value]
- **Result:** [result value]
- **Change:** [+/-X%]
- **Statistical Significance:** [XX%]
- **Sample Size:** [count]
- **Duration:** [days]

## Hypothesis Validation

### Original Hypothesis
[Full hypothesis statement]

### Validation
- **Expected Outcome:** [what we expected]
- **Actual Outcome:** [what happened]
- **Hypothesis Validated:** [Yes/No/Partially]

**Analysis:**
[Explanation of whether and why hypothesis was validated]

## ICE Score Retrospective

| Component | Predicted | Actual/Assessment | Accuracy |
|-----------|-----------|------------------|----------|
| Impact | [score] | [calculate from results] | [good/overestimated/underestimated] |
| Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] |
| Ease | [score] | [based on actual effort] | [accurate/harder/easier] |

**Learnings for Future Scoring:**
- [What we learned about predicting impact]
- [What we learned about confidence]
- [What we learned about ease]

## Key Insights

1. **[Primary insight]** - [Explanation with data]
2. **[Secondary insight]** - [Explanation]
3. **[Surprising finding]** - [What we didn't expect]

## Secondary Metrics

| Metric | Change | Interpretation |
|--------|--------|----------------|
| [metric 1] | [+/-X%] | [Good/Bad/Neutral] |
| [metric 2] | [+/-X%] | [Good/Bad/Neutral] |

**Side Effects:**
- Positive: [Any unexpected positive impacts]
- Negative: [Any unexpected negative impacts]

## Recommendations

### Immediate Actions
- [ ] [Action item 1]
- [ ] [Action item 2]

### Strategic Implications
[Broader implications for product/growth strategy]

## Follow-up Experiment Ideas

### 1. [Experiment Title]
**Category:** [category]

**Hypothesis:**
[Full hypothesis following template]

**Rationale:**
[Why this follow-up based on current learnings]

**Preliminary ICE:**
- Impact: [score] - [reasoning]
- Confidence: [score] - [reasoning]
- Ease: [score] - [reasoning]
- **Total: [score]**

---

### 2. [Experiment Title]
[Repeat format]

---

### 3. [Experiment Title]
[Repeat format]

## Related Experiments

[List any related experiments and their outcomes for pattern recognition]

## Notes

[Any additional context, edge cases, or considerations]

Cross-Experiment Analysis

When user asks to analyze multiple experiments:

Metrics to Calculate:

  • Success Rate: % of wins out of completed experiments
  • Category Performance: Which funnel stages have best win rate?
  • ICE Score Accuracy: How well do high-ICE experiments perform?
  • Average Impact: What's the typical metric improvement?
  • Cycle Time: Average days from backlog → completed

Pattern Recognition:

  • Which types of experiments succeed most?
  • Which audience segments respond best?
  • Which testing methods are most reliable?
  • What confidence levels actually predict success?

Portfolio View:

# Experiment Portfolio Analysis

## Overview
- Total Experiments: [count]
- Completed: [count]
- Win Rate: [X%]
- Average Change: [+X%]

## By Category
| Category | Experiments | Win Rate | Avg Impact |
|----------|-------------|----------|------------|
| Acquisition | [count] | [X%] | [+X%] |
| Activation | [count] | [X%] | [+X%] |
| Retention | [count] | [X%] | [+X%] |
| Revenue | [count] | [X%] | [+X%] |
| Referral | [count] | [X%] | [+X%] |

## ICE Score Performance
- Experiments with ICE > 500: [X% win rate]
- Experiments with ICE 300-500: [X% win rate]
- Experiments with ICE < 300: [X% win rate]

**Learning:** [Are high ICE scores actually better predictors?]

## Top Performers
1. [Experiment] - [+X%] change
2. [Experiment] - [+X%] change
3. [Experiment] - [+X%] change

## Key Patterns
- [Pattern 1 discovered across experiments]
- [Pattern 2]
- [Pattern 3]

## Recommendations
[Strategic recommendations based on portfolio analysis]

Integration Points

  • Automatically trigger when /experiment-update sets status to "completed"
  • Work with ICE scorer skill to validate predictions
  • Inform hypothesis generator with learnings
  • Feed into metrics calculator for portfolio analysis

Continuous Improvement

After each analysis:

  • Store learnings in a knowledge base
  • Update ICE scoring calibration
  • Refine hypothesis templates
  • Build pattern library
  • Improve follow-up suggestions