What is grade-assessment?

This Claude skill implements the GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework, the global gold standard for evaluating evidence certainty. It assists researchers and clinicians in systematically rating the quality of evidence, identifying factors that downgrade or upgrade confidence in findings, and generating professional Summary of Findings (SoF) tables for systematic reviews and clinical guidelines.

When should I use grade-assessment?

grade-assessment is useful in the following scenarios: • Systematic Review Methodology: Systematically assess the certainty of evidence across multiple studies for specific research outcomes. • Clinical Guideline Development: Support the transition from research evidence to clinical recommendations by providing structured quality ratings. • Summary of Findings (SoF) Generation: Calculate absolute effect estimates and create standardized tables for evidence synthesis reports. • Academic Peer Review: Evaluate the methodological rigor and evidence certainty of submitted manuscripts or meta-analyses. • Evidence-Based Medicine Training: Teach students and researchers how to apply the five downgrade factors (Risk of Bias, Inconsistency, Indirectness, Imprecision, and Publication Bias).

name	grade-assessment
description	Apply the GRADE framework to assess certainty of evidence in systematic reviews. Use when users need to rate evidence quality, create Summary of Findings tables, or understand the factors that affect confidence in effect estimates.
license	Apache-2.0
compatibility	Works with any AI agent; GRADE methodology is universal
author	meta-agent
version	"1.0.0"
category	evidence-assessment
domain	evidence-synthesis
difficulty	advanced
estimated-time	"20 minutes"
prerequisites	meta-analysis-fundamentals, heterogeneity-analysis

GRADE Assessment

This skill teaches the GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework for assessing certainty of evidence.

Overview

GRADE is the internationally recognized standard for rating the quality of evidence in systematic reviews. It provides a systematic approach to moving from evidence to recommendations.

When to Use This Skill

Activate this skill when users:

Ask about "quality of evidence" or "certainty"
Need to create a Summary of Findings (SoF) table
Want to understand GRADE ratings
Ask about downgrading or upgrading evidence
Are preparing a Cochrane review or guideline

GRADE Certainty Levels

Level	Symbol	Meaning
High	⊕⊕⊕⊕	Very confident the true effect is close to the estimate
Moderate	⊕⊕⊕◯	Moderately confident; true effect likely close to estimate
Low	⊕⊕◯◯	Limited confidence; true effect may be substantially different
Very Low	⊕◯◯◯	Very little confidence; true effect likely substantially different

Starting Point

Study Design	Starting Certainty
Randomized trials	High (⊕⊕⊕⊕)
Observational studies	Low (⊕⊕◯◯)

Factors That Lower Certainty (Downgrade)

1. Risk of Bias

What to assess:

Randomization and allocation concealment
Blinding of participants, personnel, outcome assessors
Incomplete outcome data
Selective reporting
Other biases

When to downgrade:

Serious limitations → Down 1 level
Very serious limitations → Down 2 levels

Socratic Questions:

"Were the studies properly randomized?"
"Could the lack of blinding have affected results?"
"Was there substantial loss to follow-up?"

2. Inconsistency (Heterogeneity)

What to assess:

Point estimates vary widely
Confidence intervals show minimal overlap
I² is high
Studies show different directions of effect

When to downgrade:

Unexplained heterogeneity with I² > 50%
Studies show conflicting results
Prediction interval crosses null

Key Teaching Point: "Inconsistency is different from imprecision. Inconsistency means studies disagree; imprecision means we're uncertain about each estimate."

3. Indirectness

Types of indirectness:

Type	Example
Population	Studies in adults, question about children
Intervention	Studies of drug A, question about drug B
Comparator	Studies vs. placebo, question vs. active treatment
Outcome	Studies measure surrogate, question about clinical outcome

When to downgrade:

Important differences between evidence and question
Surrogate outcomes used instead of patient-important outcomes

4. Imprecision

What to assess:

Wide confidence intervals
Small sample size / few events
Optimal Information Size (OIS) not met

Rules of thumb:

Binary: < 300 events total → consider downgrading
Continuous: < 400 participants total → consider downgrading
CI crosses thresholds of clinical importance

When to downgrade:

CI includes both appreciable benefit and appreciable harm
CI includes no effect and appreciable benefit (or harm)

5. Publication Bias

What to assess:

Funnel plot asymmetry
Egger's test significant
Industry funding with positive results only
Small study effects

When to downgrade:

Strong suspicion of missing studies
Trim-and-fill suggests meaningful impact

Factors That Raise Certainty (Upgrade)

Only for observational studies starting at Low

1. Large Effect

Magnitude	Upgrade
RR > 2 or < 0.5	Consider +1
RR > 5 or < 0.2	Consider +2

2. Dose-Response Gradient

Clear relationship between dose/exposure and outcome
Biological plausibility

3. Plausible Confounding

All plausible confounders would reduce the effect
Yet effect is still observed

GRADE Assessment Process

Step 1: Define the Question (PICO)
    │
Step 2: Identify Study Designs
    │
    ├── RCTs → Start at HIGH
    └── Observational → Start at LOW
    │
Step 3: Assess Downgrade Factors
    │
    ├── Risk of Bias?
    ├── Inconsistency?
    ├── Indirectness?
    ├── Imprecision?
    └── Publication Bias?
    │
Step 4: Assess Upgrade Factors (if observational)
    │
    ├── Large Effect?
    ├── Dose-Response?
    └── Confounding?
    │
Step 5: Determine Final Rating
    │
Step 6: Write Certainty Statement

Summary of Findings Table

Template

Outcome	№ of studies (participants)	Certainty	Relative effect (95% CI)	Anticipated absolute effects
Mortality	5 RCTs (2,340)	⊕⊕⊕◯ Moderate	RR 0.75 (0.60-0.94)	50 fewer per 1000 (from 80 fewer to 12 fewer)

Creating Absolute Effects

# From relative risk
baseline_risk <- 0.20  # 20% in control group
RR <- 0.75
RR_lower <- 0.60
RR_upper <- 0.94

# Absolute risk reduction
ARR <- baseline_risk * (1 - RR)  # 5% = 50 per 1000
ARR_lower <- baseline_risk * (1 - RR_upper)
ARR_upper <- baseline_risk * (1 - RR_lower)

Certainty Statements

High certainty: "We are very confident that the true effect lies close to that of the estimate of the effect."

Moderate certainty: "We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different."

Low certainty: "Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect."

Very low certainty: "We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect."

Teaching Framework

Step 1: Establish the Question

"What exactly are we trying to answer? Let's define:

Population
Intervention
Comparator
Outcomes"

Step 2: Identify the Evidence

"What studies do we have?

How many RCTs vs observational?
What's our starting point?"

Step 3: Systematic Assessment

"Let's go through each GRADE domain:

First, risk of bias..."
Then, inconsistency..." [Continue through all domains]

Step 4: Make Judgments

"Based on our assessment:

We downgraded for [reasons]
Final certainty: [level]"

Step 5: Write the Statement

"Now let's write what this means for decision-makers..."

Common Mistakes to Avoid

Double-counting
- Don't downgrade for both heterogeneity AND wide CIs if they're related
Automatic downgrading
- Not every limitation requires downgrading
- Consider impact on the effect estimate
Ignoring context
- A "large" CI depends on clinical context
- What difference matters to patients?
Forgetting outcomes
- GRADE is assessed per outcome, not per review

Assessment Questions

Basic: "RCTs start at what GRADE certainty level?"
- Correct: High
Intermediate: "I² = 70% with studies showing effects in opposite directions. Which domain is affected?"
- Correct: Inconsistency
Advanced: "Studies are in adults but your question is about children. The intervention and outcomes are the same. What domain and how much to downgrade?"
- Correct: Indirectness (population); typically down 1 level for serious indirectness

Related Skills

meta-analysis-fundamentals - Understanding effect sizes
heterogeneity-analysis - Assessing inconsistency
publication-bias-detection - One of the GRADE domains

Adaptation Guidelines

Glass (the teaching agent) MUST adapt this content to the learner:

Language Detection: Detect the user's language from their messages and respond naturally in that language
Cultural Context: Adapt examples to local healthcare systems and research contexts when relevant
Technical Terms: Maintain standard English terms (e.g., "forest plot", "effect size", "I²") but explain them in the user's language
Level Adaptation: Adjust complexity based on user's demonstrated knowledge level
Socratic Method: Ask guiding questions in the detected language to promote deep understanding
Local Examples: When possible, reference studies or guidelines familiar to the user's region

Example Adaptations:

🇧🇷 Portuguese: Use Brazilian health system examples (SUS, ANVISA guidelines)
🇪🇸 Spanish: Reference PAHO/OPS guidelines for Latin America
🇨🇳 Chinese: Include examples from Chinese medical literature

grade-assessment

When & Why to Use This Skill

Use Cases

GRADE Assessment

Overview

When to Use This Skill

GRADE Certainty Levels

Starting Point

Factors That Lower Certainty (Downgrade)

1. Risk of Bias

2. Inconsistency (Heterogeneity)

3. Indirectness

4. Imprecision

5. Publication Bias

Factors That Raise Certainty (Upgrade)

1. Large Effect

2. Dose-Response Gradient

3. Plausible Confounding

GRADE Assessment Process

Summary of Findings Table

Template

Creating Absolute Effects

Certainty Statements

Teaching Framework

Step 1: Establish the Question

Step 2: Identify the Evidence

Step 3: Systematic Assessment

Step 4: Make Judgments

Step 5: Write the Statement

Common Mistakes to Avoid

Assessment Questions

Related Skills

Adaptation Guidelines