This Claude skill provides a specialized toolkit for geostatistics and paleoseismic research, focusing on uncertainty quantification, time series analysis, and spatial clustering. It streamlines complex scientific workflows such as Monte Carlo dating simulations, Z-score anomaly detection, and recurrence interval statistics to enhance the precision and rigor of geoscience data interpretation.

When should I use statistics?

statistics is useful in the following scenarios: • Quantifying dating uncertainty for U-Th and Radiocarbon samples using Monte Carlo propagation to build reliable age-depth models. • Detecting significant geochemical anomalies in time series data through rolling window Z-score analysis and statistical thresholding. • Analyzing earthquake recurrence patterns to characterize seismic behavior as quasi-periodic, random, or clustered using Coefficient of Variation (COV) statistics. • Identifying 'orphan' earthquakes and seismic hotspots by performing spatial clustering analysis and nearest-neighbor distance calculations. • Validating geological hypotheses through rigorous significance testing, including Kolmogorov-Smirnov tests and non-parametric comparisons of distributions.

name	statistics
description	Geostatistics, uncertainty quantification, and time series analysis for paleoseismic research. Use for z-score analysis, Monte Carlo dating uncertainty, spatial clustering, and recurrence statistics.
location	user

Statistics for Geoscience Research

When to Use This Skill

Invoke when:

Calculating z-scores or anomaly detection
Quantifying dating uncertainty
Analyzing spatial clustering of events
Computing recurrence intervals
Performing Monte Carlo simulations
Assessing statistical significance

Core Methods

1. Z-Score Analysis

Standard approach for anomaly detection in geochemical time series:

z = (x - μ) / σ

where:
  x = observed value
  μ = mean of reference period
  σ = standard deviation of reference period

Interpretation thresholds:

z-score	Interpretation	Action
	z	< 2.0
	z	≥ 2.0
	z	≥ 2.5
	z	≥ 3.0

Best practices:

Use rolling window for mean/std (50-100 data points)
Exclude the test point from reference calculation
Report both positive and negative z-scores
Consider autocorrelation in time series

2. Dating Uncertainty

U-Th dating (typical for speleothems):

Precision: ±50-100 years for Holocene
Report as: 1285 ± 85 yr (U-Th: 1237-1322 CE)

Radiocarbon dating:

Raw dates in 14C years BP
Calibrate using IntCal20/SHCal20
Report 2σ range: 830 ± 50 14C yr BP (cal. 720-940 CE)

Monte Carlo uncertainty propagation:

# Pseudocode for age-depth model uncertainty
for i in range(10000):
    ages_sample = sample_from_age_distributions(age_points)
    model = fit_age_depth_model(ages_sample, depths)
    interpolated_ages[i] = model.interpolate(target_depth)

uncertainty = np.percentile(interpolated_ages, [2.5, 97.5])

3. Recurrence Interval Statistics

For earthquake recurrence from event dates:

intervals = [t[i+1] - t[i] for i in range(len(events)-1)]
mean_recurrence = mean(intervals)
std_recurrence = std(intervals)
COV = std / mean  # Coefficient of variation

Interpretation:

COV < 0.5: Quasi-periodic behavior
COV ~ 1.0: Random (Poisson) process
COV > 1.0: Clustered behavior

MCP Tool: calc_recurrence computes these statistics

4. Spatial Clustering Analysis

Orphan earthquake detection: Identify seismicity far from mapped faults using nearest-neighbor distance.

# For each earthquake, calculate distance to nearest fault
distances = [min_distance_to_fault(eq) for eq in earthquakes]

# Earthquakes > 50 km from any fault are "orphans"
orphans = [eq for eq, d in zip(earthquakes, distances) if d > 50]

Kernel density estimation:

Use for seismicity hotspot mapping
Bandwidth selection: Scott's rule or cross-validation

5. Significance Testing

When comparing distributions:

Kolmogorov-Smirnov test: Compare CDFs
Mann-Whitney U: Non-parametric comparison of medians
Permutation tests: Distribution-free significance

Correlation analysis:

Pearson r: Linear relationship (assumes normality)
Spearman ρ: Monotonic relationship (rank-based)
Cross-correlation: Time-lagged relationships

P-value interpretation:

p-value	Interpretation
p > 0.10	Not significant
0.05 < p ≤ 0.10	Marginally significant
0.01 < p ≤ 0.05	Significant
p ≤ 0.01	Highly significant

Warning: With small samples (n < 20), be cautious about p-values.

6. Time Series Analysis

Detrending:

Remove long-term trend (linear, polynomial, or LOESS)
Calculate residuals for anomaly detection
Avoid over-fitting the trend

Autocorrelation:

Check lag-1 autocorrelation before independence tests
Effective sample size: n_eff = n × (1 - ρ) / (1 + ρ)

Change point detection:

PELT algorithm for multiple change points
Bayesian change point detection for uncertainty

Available MCP Calculation Tools

Tool	Function
`calc_recurrence`	Recurrence interval from event dates
`calc_pga`	Peak Ground Acceleration (attenuation)
`calc_energy`	Seismic energy density (Wang & Manga)
`calc_distance`	Great-circle distance

Common Pitfalls

Multiple comparisons: Testing many anomalies inflates false positive rate. Apply Bonferroni correction or FDR.
Small sample size: With n < 10 events, statistical power is low. Report effect sizes, not just p-values.
Circular logic: Don't use detected anomalies to define the reference period.
Ignoring uncertainty: Always propagate dating errors through calculations.
Cherry-picking: Report ALL tests performed, not just significant ones.

Reporting Standards

Always report:

Sample size (n)
Central tendency AND spread (mean ± std, or median + IQR)
Uncertainty ranges (95% CI or 2σ)
Effect size, not just significance
Method used for calculation

statistics

When & Why to Use This Skill

Use Cases