heterogeneity-analysis
Assess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate.
When & Why to Use This Skill
This Claude skill provides a comprehensive framework for assessing and interpreting between-study heterogeneity in meta-analyses. It guides researchers through critical statistical measures including I², Cochran's Q, tau², and prediction intervals to evaluate research consistency, understand sources of variation, and determine the validity of data pooling in evidence synthesis.
Use Cases
- Evaluating whether multiple clinical trials or studies are statistically consistent enough to justify a combined meta-analysis.
- Interpreting complex statistical outputs like I² and tau² to distinguish between sampling error and true variation in effect sizes.
- Generating executable R code using the 'metafor' package to calculate heterogeneity statistics and prediction intervals.
- Visualizing research inconsistency through specialized plots such as Forest plots, Baujat plots, and GOSH plots to identify outliers.
- Assessing the clinical implications of heterogeneity by calculating prediction intervals for new study settings.
- Deciding between fixed-effects and random-effects models based on the magnitude and nature of observed variability.
| name | heterogeneity-analysis |
|---|---|
| description | Assess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate. |
| license | Apache-2.0 |
| compatibility | Requires R with metafor package |
| author | meta-agent |
| version | "1.0.0" |
| category | statistics |
| domain | evidence-synthesis |
| difficulty | intermediate |
| estimated-time | "12 minutes" |
| prerequisites | meta-analysis-fundamentals |
Heterogeneity Analysis
This skill teaches assessment and interpretation of between-study heterogeneity, a critical component of meta-analysis quality.
Overview
Heterogeneity refers to variation in true effects across studies beyond what we'd expect from sampling error alone. High heterogeneity questions whether pooling is meaningful.
When to Use This Skill
Activate this skill when users:
- Ask about I², tau², or Q statistic
- Want to know if studies are "too different to combine"
- See conflicting results in their forest plot
- Ask about "inconsistency" or "variability"
- Need to interpret heterogeneity statistics
Key Heterogeneity Measures
1. Cochran's Q Statistic
What it is: Tests null hypothesis that all studies share a common effect.
Interpretation:
- Significant Q (p < 0.10) → Evidence of heterogeneity
- Non-significant Q → Does NOT prove homogeneity (low power)
Limitation: Underpowered with few studies, overpowered with many.
2. I² (I-squared)
What it is: Percentage of variability due to heterogeneity rather than chance.
Interpretation Guidelines (Cochrane):
| I² Value | Interpretation |
|---|---|
| 0-40% | Might not be important |
| 30-60% | May represent moderate heterogeneity |
| 50-90% | May represent substantial heterogeneity |
| 75-100% | Considerable heterogeneity |
Key Teaching Points:
- I² is a proportion, not an absolute measure
- Overlapping ranges are intentional—context matters
- Always consider clinical and methodological diversity
Socratic Questions:
- "If I² is 75%, what does that tell us about the studies?"
- "Can we still do a meta-analysis with high I²?"
- "What might cause studies to have different true effects?"
3. Tau² (Tau-squared)
What it is: Estimated variance of true effects across studies.
Interpretation:
- Tau² = 0 → No heterogeneity (all studies estimate same effect)
- Larger tau² → Greater spread of true effects
- Tau (square root) is on same scale as effect size
Advantage: Absolute measure, unlike I² which is relative.
4. Prediction Interval
What it is: Range where we expect the true effect of a NEW study to fall.
Why it matters:
- Wider than confidence interval
- Shows practical implications of heterogeneity
- Critical for clinical decision-making
Example:
Pooled effect: OR = 0.70, 95% CI [0.55, 0.89]
Prediction interval: [0.35, 1.40]
Interpretation: While the average effect favors treatment,
a new study might find effects ranging from strongly
beneficial (0.35) to slightly harmful (1.40).
R Code for Heterogeneity Assessment
Basic Heterogeneity Statistics
library(metafor)
# Fit random-effects model
res <- rma(yi = yi, sei = sei, data = dat, method = "REML")
# View heterogeneity statistics
print(res)
# Look for: tau², I², H², Q, p-value
# Extract specific values
res$tau2 # tau-squared
res$I2 # I-squared (as proportion)
res$QE # Q statistic
res$QEp # p-value for Q test
Confidence Intervals for I²
# Get confidence interval for I²
confint(res)
# Output includes:
# estimate ci.lb ci.ub
# tau^2 0.0234 0.0012 0.1456
# I^2(%) 62.4000 12.3000 89.2000
Prediction Interval
# Calculate prediction interval
predict(res)
# Or manually:
pi_lower <- res$beta - qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)
pi_upper <- res$beta + qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)
Visualizing Heterogeneity
# Forest plot with prediction interval
forest(res,
slab = dat$study,
addpred = TRUE, # Adds prediction interval
header = TRUE)
# Baujat plot (identifies outliers)
baujat(res)
# GOSH plot (sensitivity to study inclusion)
gosh_res <- gosh(res)
plot(gosh_res)
Teaching Framework
Step 1: Report the Statistics
"Let's look at your heterogeneity results:
- Q = 24.5, p = 0.003 (significant)
- I² = 67% [42%, 82%]
- Tau² = 0.08"
Step 2: Interpret in Context
"This suggests substantial heterogeneity. About 67% of the variation we see is due to real differences between studies, not just chance."
Step 3: Discuss Implications
"With this level of heterogeneity, we should:
- Still report the pooled effect, but with caution
- Explore sources of heterogeneity
- Consider subgroup or meta-regression analysis
- Report the prediction interval"
Step 4: Investigate Sources
"Let's think about what might cause these differences:
- Different populations (age, severity)?
- Different interventions (dose, duration)?
- Different outcome measures?
- Different study designs?"
Decision Framework
I² Assessment
│
├── I² < 40%
│ └── Heterogeneity likely unimportant
│ → Proceed with pooled estimate
│
├── I² 40-75%
│ └── Moderate heterogeneity
│ → Report pooled estimate
│ → Explore sources (subgroups)
│ → Report prediction interval
│
└── I² > 75%
└── Substantial heterogeneity
→ Question if pooling is meaningful
→ Mandatory exploration of sources
→ Consider narrative synthesis
→ Always report prediction interval
Common Misconceptions
"High I² means we can't do meta-analysis"
- Reality: High I² means we need to investigate and interpret carefully
- Pooling may still be appropriate with proper caveats
"Non-significant Q means no heterogeneity"
- Reality: Q test has low power with few studies
- Always report I² and tau² alongside Q
"I² tells us about clinical importance"
- Reality: I² is statistical, not clinical
- A small I² can hide clinically important variation
Assessment Questions
Basic: "What does I² = 50% mean?"
- Correct: About half the observed variation is due to true differences between studies
Intermediate: "Q test is non-significant but I² = 45%. How do you interpret this?"
- Correct: Q test may be underpowered; moderate heterogeneity may still exist
Advanced: "Pooled OR = 0.6 [0.4, 0.9] but prediction interval is [0.3, 1.2]. What's the clinical implication?"
- Correct: While average effect is beneficial, a new setting might see no effect or even harm
Related Skills
meta-analysis-fundamentals- Understanding pooled effectsforest-plot-creation- Visualizing heterogeneitypublication-bias-detection- Another source of concern
Adaptation Guidelines
Glass (the teaching agent) MUST adapt this content to the learner:
- Language Detection: Detect the user's language from their messages and respond naturally in that language
- Cultural Context: Adapt examples to local healthcare systems and research contexts when relevant
- Technical Terms: Maintain standard English terms (e.g., "forest plot", "effect size", "I²") but explain them in the user's language
- Level Adaptation: Adjust complexity based on user's demonstrated knowledge level
- Socratic Method: Ask guiding questions in the detected language to promote deep understanding
- Local Examples: When possible, reference studies or guidelines familiar to the user's region
Example Adaptations:
- 🇧🇷 Portuguese: Use Brazilian health system examples (SUS, ANVISA guidelines)
- 🇪🇸 Spanish: Reference PAHO/OPS guidelines for Latin America
- 🇨🇳 Chinese: Include examples from Chinese medical literature