heterogeneity-analysis

matheus-rech's avatarfrom matheus-rech

Assess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate.

0stars🔀0forks📁View on GitHub🕐Updated Jan 10, 2026

When & Why to Use This Skill

This Claude skill provides a comprehensive framework for assessing and interpreting between-study heterogeneity in meta-analyses. It guides researchers through critical statistical measures including I², Cochran's Q, tau², and prediction intervals to evaluate research consistency, understand sources of variation, and determine the validity of data pooling in evidence synthesis.

Use Cases

  • Evaluating whether multiple clinical trials or studies are statistically consistent enough to justify a combined meta-analysis.
  • Interpreting complex statistical outputs like I² and tau² to distinguish between sampling error and true variation in effect sizes.
  • Generating executable R code using the 'metafor' package to calculate heterogeneity statistics and prediction intervals.
  • Visualizing research inconsistency through specialized plots such as Forest plots, Baujat plots, and GOSH plots to identify outliers.
  • Assessing the clinical implications of heterogeneity by calculating prediction intervals for new study settings.
  • Deciding between fixed-effects and random-effects models based on the magnitude and nature of observed variability.
nameheterogeneity-analysis
descriptionAssess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate.
licenseApache-2.0
compatibilityRequires R with metafor package
authormeta-agent
version"1.0.0"
categorystatistics
domainevidence-synthesis
difficultyintermediate
estimated-time"12 minutes"
prerequisitesmeta-analysis-fundamentals

Heterogeneity Analysis

This skill teaches assessment and interpretation of between-study heterogeneity, a critical component of meta-analysis quality.

Overview

Heterogeneity refers to variation in true effects across studies beyond what we'd expect from sampling error alone. High heterogeneity questions whether pooling is meaningful.

When to Use This Skill

Activate this skill when users:

  • Ask about I², tau², or Q statistic
  • Want to know if studies are "too different to combine"
  • See conflicting results in their forest plot
  • Ask about "inconsistency" or "variability"
  • Need to interpret heterogeneity statistics

Key Heterogeneity Measures

1. Cochran's Q Statistic

What it is: Tests null hypothesis that all studies share a common effect.

Interpretation:

  • Significant Q (p < 0.10) → Evidence of heterogeneity
  • Non-significant Q → Does NOT prove homogeneity (low power)

Limitation: Underpowered with few studies, overpowered with many.

2. I² (I-squared)

What it is: Percentage of variability due to heterogeneity rather than chance.

Interpretation Guidelines (Cochrane):

I² Value Interpretation
0-40% Might not be important
30-60% May represent moderate heterogeneity
50-90% May represent substantial heterogeneity
75-100% Considerable heterogeneity

Key Teaching Points:

  • I² is a proportion, not an absolute measure
  • Overlapping ranges are intentional—context matters
  • Always consider clinical and methodological diversity

Socratic Questions:

  • "If I² is 75%, what does that tell us about the studies?"
  • "Can we still do a meta-analysis with high I²?"
  • "What might cause studies to have different true effects?"

3. Tau² (Tau-squared)

What it is: Estimated variance of true effects across studies.

Interpretation:

  • Tau² = 0 → No heterogeneity (all studies estimate same effect)
  • Larger tau² → Greater spread of true effects
  • Tau (square root) is on same scale as effect size

Advantage: Absolute measure, unlike I² which is relative.

4. Prediction Interval

What it is: Range where we expect the true effect of a NEW study to fall.

Why it matters:

  • Wider than confidence interval
  • Shows practical implications of heterogeneity
  • Critical for clinical decision-making

Example:

Pooled effect: OR = 0.70, 95% CI [0.55, 0.89]
Prediction interval: [0.35, 1.40]

Interpretation: While the average effect favors treatment,
a new study might find effects ranging from strongly 
beneficial (0.35) to slightly harmful (1.40).

R Code for Heterogeneity Assessment

Basic Heterogeneity Statistics

library(metafor)

# Fit random-effects model
res <- rma(yi = yi, sei = sei, data = dat, method = "REML")

# View heterogeneity statistics
print(res)
# Look for: tau², I², H², Q, p-value

# Extract specific values
res$tau2   # tau-squared
res$I2     # I-squared (as proportion)
res$QE     # Q statistic
res$QEp    # p-value for Q test

Confidence Intervals for I²

# Get confidence interval for I²
confint(res)

# Output includes:
#        estimate   ci.lb   ci.ub
# tau^2    0.0234  0.0012  0.1456
# I^2(%)  62.4000 12.3000 89.2000

Prediction Interval

# Calculate prediction interval
predict(res)

# Or manually:
pi_lower <- res$beta - qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)
pi_upper <- res$beta + qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)

Visualizing Heterogeneity

# Forest plot with prediction interval
forest(res, 
       slab = dat$study,
       addpred = TRUE,  # Adds prediction interval
       header = TRUE)

# Baujat plot (identifies outliers)
baujat(res)

# GOSH plot (sensitivity to study inclusion)
gosh_res <- gosh(res)
plot(gosh_res)

Teaching Framework

Step 1: Report the Statistics

"Let's look at your heterogeneity results:

  • Q = 24.5, p = 0.003 (significant)
  • I² = 67% [42%, 82%]
  • Tau² = 0.08"

Step 2: Interpret in Context

"This suggests substantial heterogeneity. About 67% of the variation we see is due to real differences between studies, not just chance."

Step 3: Discuss Implications

"With this level of heterogeneity, we should:

  1. Still report the pooled effect, but with caution
  2. Explore sources of heterogeneity
  3. Consider subgroup or meta-regression analysis
  4. Report the prediction interval"

Step 4: Investigate Sources

"Let's think about what might cause these differences:

  • Different populations (age, severity)?
  • Different interventions (dose, duration)?
  • Different outcome measures?
  • Different study designs?"

Decision Framework

I² Assessment
    │
    ├── I² < 40%
    │   └── Heterogeneity likely unimportant
    │       → Proceed with pooled estimate
    │
    ├── I² 40-75%
    │   └── Moderate heterogeneity
    │       → Report pooled estimate
    │       → Explore sources (subgroups)
    │       → Report prediction interval
    │
    └── I² > 75%
        └── Substantial heterogeneity
            → Question if pooling is meaningful
            → Mandatory exploration of sources
            → Consider narrative synthesis
            → Always report prediction interval

Common Misconceptions

  1. "High I² means we can't do meta-analysis"

    • Reality: High I² means we need to investigate and interpret carefully
    • Pooling may still be appropriate with proper caveats
  2. "Non-significant Q means no heterogeneity"

    • Reality: Q test has low power with few studies
    • Always report I² and tau² alongside Q
  3. "I² tells us about clinical importance"

    • Reality: I² is statistical, not clinical
    • A small I² can hide clinically important variation

Assessment Questions

  1. Basic: "What does I² = 50% mean?"

    • Correct: About half the observed variation is due to true differences between studies
  2. Intermediate: "Q test is non-significant but I² = 45%. How do you interpret this?"

    • Correct: Q test may be underpowered; moderate heterogeneity may still exist
  3. Advanced: "Pooled OR = 0.6 [0.4, 0.9] but prediction interval is [0.3, 1.2]. What's the clinical implication?"

    • Correct: While average effect is beneficial, a new setting might see no effect or even harm

Related Skills

  • meta-analysis-fundamentals - Understanding pooled effects
  • forest-plot-creation - Visualizing heterogeneity
  • publication-bias-detection - Another source of concern

Adaptation Guidelines

Glass (the teaching agent) MUST adapt this content to the learner:

  1. Language Detection: Detect the user's language from their messages and respond naturally in that language
  2. Cultural Context: Adapt examples to local healthcare systems and research contexts when relevant
  3. Technical Terms: Maintain standard English terms (e.g., "forest plot", "effect size", "I²") but explain them in the user's language
  4. Level Adaptation: Adjust complexity based on user's demonstrated knowledge level
  5. Socratic Method: Ask guiding questions in the detected language to promote deep understanding
  6. Local Examples: When possible, reference studies or guidelines familiar to the user's region

Example Adaptations:

  • 🇧🇷 Portuguese: Use Brazilian health system examples (SUS, ANVISA guidelines)
  • 🇪🇸 Spanish: Reference PAHO/OPS guidelines for Latin America
  • 🇨🇳 Chinese: Include examples from Chinese medical literature
heterogeneity-analysis – AI Agent Skills | Claude Skills