What is group-sequential-methods?

This Claude skill provides comprehensive statistical tools and guidance for Group Sequential Design, essential for modern clinical trials and experimental research. It enables users to implement interim analyses, manage alpha spending functions, and set futility stopping rules, ensuring statistical rigor while optimizing trial efficiency and resource allocation.

When should I use group-sequential-methods?

group-sequential-methods is useful in the following scenarios: • Designing clinical trials with planned interim looks to allow for early stopping for efficacy, potentially bringing treatments to market faster. • Implementing alpha spending functions like O'Brien-Fleming or Pocock to strictly control Type I error across multiple data analyses. • Setting up binding or non-binding futility boundaries to terminate unsuccessful experiments early and minimize participant risk or financial waste. • Simulating trial operating characteristics using R packages such as simtrial and gsDesign2 to evaluate power and expected sample sizes under various scenarios. • Calculating information fractions and updating statistical boundaries based on actual versus planned event accrual during a study.

name	group-sequential-methods
description	Group sequential design methods for interim analyses, alpha spending, and futility stopping. Use when designing trials with interim looks or implementing spending functions.

Group Sequential Methods

When to Use This Skill

Designing group sequential trials with interim analyses
Implementing alpha spending functions
Setting futility stopping rules
Calculating information fractions
Using sim_gs_n() for GS simulations
Integrating with gsDesign2 package

Fundamental Concepts

Group Sequential Design

A group sequential design allows for:

Early stopping for efficacy: If treatment effect is larger than expected
Early stopping for futility: If treatment effect is unlikely to reach significance
Reduced expected sample size: When treatment effect is present

Information Fraction

Information fraction at analysis k:

I_k / I_K = (events at analysis k) / (total planned events)

For time-to-event trials, information ≈ number of events.

Type I Error Spending

The key constraint: Σ α_k ≤ α (overall Type I error)

Spending functions distribute alpha across analyses.

Alpha Spending Functions

O'Brien-Fleming (OBF)

Properties:

Conservative at early analyses
Nearly full alpha at final analysis
Difficult to stop early
Maintains nominal Type I error

Formula:

α*(t) = 2 - 2Φ(z_{α/2} / √t)

When to Use:

Want maximum power at final analysis
Early efficacy stopping unlikely
Regulatory preference for conservative early bounds

Pocock

Properties:

Equal spending at each analysis
Easier to stop early
Inflated final alpha
Lower power at final analysis

Formula:

α*(t) = α × log(1 + (e-1)t)

When to Use:

Early stopping is a priority
Treatment effect expected to be large
Willing to sacrifice final analysis power

Hwang-Shih-DeCani (HSD)

Properties:

Flexible family indexed by γ
γ = -4: Similar to OBF
γ = 1: Similar to Pocock
γ = 0: Linear (Pocock-like)

Formula:

α*(t) = α × (1 - e^{-γt}) / (1 - e^{-γ})

When to Use:

Want flexibility between OBF and Pocock
Customized spending pattern needed

Spending Function Comparison

Function	Early Spending	Final Power	Early Stopping
OBF	Low	High	Difficult
Pocock	High	Lower	Easier
HSD(γ=-4)	Low	High	Difficult
HSD(γ=1)	High	Lower	Easier

Futility Boundaries

Binding Futility

If futility boundary crossed, trial MUST stop
Affects Type I error calculation
More powerful than non-binding

Non-Binding Futility

Crossing futility boundary is advisory
Trial can continue at investigator discretion
Conservative: assumes no early stopping for futility in Type I error

Beta-Spending for Futility

Similar to alpha-spending, but for Type II error:

β*(t) = spending function × β

simtrial GS Implementation

create_cut() - Define Analysis Timing

# Interim Analysis 1
ia1_cut <- create_cut(
  planned_calendar_time = 20,        # Minimum 20 months
  target_event_overall = 100,        # Target 100 events
  max_extension_for_target_event = 24,  # Wait up to 24 months for events
  min_n_overall = 200,               # At least 200 enrolled
  min_followup = 12                  # 12 months minimum follow-up
)

# Interim Analysis 2
ia2_cut <- create_cut(
  planned_calendar_time = 32,
  target_event_overall = 200,
  max_extension_for_target_event = 34,
  min_time_after_previous_analysis = 10  # At least 10 months after IA1
)

# Final Analysis
fa_cut <- create_cut(
  planned_calendar_time = 45,
  target_event_overall = 350
)

sim_gs_n() - Run GS Simulations

library(simtrial)
library(gsDesign2)

# Define enrollment
enroll_rate <- define_enroll_rate(
  duration = c(4, 12),
  rate = c(10, 30)
)

# Define failure rates
fail_rate <- define_fail_rate(
  duration = c(3, 100),
  fail_rate = log(2)/9,
  hr = c(1, 0.6),
  dropout_rate = 0.001
)

# Run simulation
results <- sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  test = wlr,
  cut = list(ia1 = ia1_cut, ia2 = ia2_cut, fa = fa_cut),
  weight = fh(rho = 0, gamma = 0)
)

Integration with gsDesign2

library(gsDesign2)

# Design with gsDesign2
design <- gs_design_ahr(
  enroll_rate = define_enroll_rate(duration = c(4, 12), rate = c(10, 30)),
  fail_rate = define_fail_rate(
    duration = c(3, 100),
    fail_rate = log(2)/9,
    hr = c(1, 0.6),
    dropout_rate = 0.001
  ),
  alpha = 0.025,
  beta = 0.1,
  analysis_time = c(24, 36, 48),
  upper = gs_spending_bound,
  upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025),
  lower = gs_spending_bound,
  lpar = list(sf = gsDesign::sfHSD, param = -4, total_spend = 0.1)
) |> to_integer()

# Simulate with design object
sim_results <- sim_gs_n(
  n_sim = 1000,
  sample_size = max(design$analysis$n),
  enroll_rate = design$enroll_rate,
  fail_rate = design$fail_rate,
  test = wlr,
  cut = NULL,  # Auto-generated from design
  original_design = design,
  weight = fh(rho = 0, gamma = 0)
)

Bound Updates with sim_gs_n()

When using original_design, sim_gs_n() can compute updated bounds:

# Results include planned and updated bounds
results <- sim_gs_n(
  # ... parameters ...
  original_design = design,
  ia_alpha_spending = "min_planned_actual",  # Conservative
  fa_alpha_spending = "full_alpha"           # Spend full alpha at FA
)

# Output includes:
# - planned_upper_bound, planned_lower_bound
# - updated_upper_bound, updated_lower_bound

Alpha Spending Options:

ia_alpha_spending	Description
"min_planned_actual"	Conservative: min of planned and actual
"actual"	Spend based on actual information

fa_alpha_spending	Description
"full_alpha"	Spend remaining alpha at final
"info_frac"	Spend based on information fraction

Different Tests Across Analyses

# Different tests at each analysis
ia1_test <- create_test(wlr, weight = fh(rho = 0, gamma = 0))
ia2_test <- create_test(wlr, weight = fh(rho = 0, gamma = 0.5))
fa_test <- create_test(wlr, weight = mb(delay = 6, w_max = Inf))

results <- sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  test = list(ia1 = ia1_test, ia2 = ia2_test, fa = fa_test),
  cut = list(ia1 = ia1_cut, ia2 = ia2_cut, fa = fa_cut)
)

Common GS Patterns

Two-Look Design (1 IA + FA)

# IA at 50% information, FA at 100%
ia_cut <- create_cut(target_event_overall = 150)  # 50%
fa_cut <- create_cut(target_event_overall = 300)  # 100%

sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  test = wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = fh(0, 0)
)

Three-Look Design (2 IA + FA)

# Standard 33%, 67%, 100% information
ia1_cut <- create_cut(target_event_overall = 100)
ia2_cut <- create_cut(target_event_overall = 200)
fa_cut <- create_cut(target_event_overall = 300)

Event-Driven with Calendar Constraints

# Events-based but with minimum calendar time
ia_cut <- create_cut(
  target_event_overall = 150,
  planned_calendar_time = 18,  # At least 18 months
  max_extension_for_target_event = 24  # Max 24 months
)

Operating Characteristics

Key Metrics to Evaluate

Power: P(reject H0 | H1 true)
Type I Error: P(reject H0 | H0 true)
Expected Sample Size: E[N] under H0 and H1
Expected Events: E[events] at each analysis
Stopping Probabilities: P(stop at analysis k)

Simulation Summary

# Summarize simulation results
results_summary <- results |>
  group_by(analysis) |>
  summarise(
    mean_events = mean(event),
    mean_z = mean(z),
    power = mean(z < qnorm(0.025)),  # One-sided
    .groups = "drop"
  )

Best Practices

Information Fraction: Target evenly spaced (e.g., 50%, 100% or 33%, 67%, 100%)
Alpha Spending: OBF is default for most regulatory submissions
Futility: Use non-binding to preserve flexibility
Validation: Compare simulated power to gsDesign analytical results
Documentation: Record all boundary calculations for regulatory submission
Parallelization: Use plan("multisession") for large simulations

Regulatory Considerations

Pre-specify number and timing of interim analyses
Pre-specify spending function and parameters
Document stopping rules clearly in protocol
Consider DSMB recommendations for unblinded reviews
Maintain blinding for operational team

group-sequential-methods

When & Why to Use This Skill

Use Cases

Group Sequential Methods

When to Use This Skill

Fundamental Concepts

Group Sequential Design

Information Fraction

Type I Error Spending

Alpha Spending Functions

O'Brien-Fleming (OBF)

Pocock

Hwang-Shih-DeCani (HSD)

Spending Function Comparison

Futility Boundaries

Binding Futility

Non-Binding Futility

Beta-Spending for Futility

simtrial GS Implementation

create_cut() - Define Analysis Timing

sim_gs_n() - Run GS Simulations

Integration with gsDesign2

Bound Updates with sim_gs_n()

Different Tests Across Analyses

Common GS Patterns

Two-Look Design (1 IA + FA)

Three-Look Design (2 IA + FA)

Event-Driven with Calendar Constraints

Operating Characteristics

Key Metrics to Evaluate

Simulation Summary

Best Practices

Regulatory Considerations