statistics
Geostatistics, uncertainty quantification, and time series analysis for paleoseismic research. Use for z-score analysis, Monte Carlo dating uncertainty, spatial clustering, and recurrence statistics.
When & Why to Use This Skill
This Claude skill provides a specialized toolkit for geostatistics and paleoseismic research, focusing on uncertainty quantification, time series analysis, and spatial clustering. It streamlines complex scientific workflows such as Monte Carlo dating simulations, Z-score anomaly detection, and recurrence interval statistics to enhance the precision and rigor of geoscience data interpretation.
Use Cases
- Quantifying dating uncertainty for U-Th and Radiocarbon samples using Monte Carlo propagation to build reliable age-depth models.
- Detecting significant geochemical anomalies in time series data through rolling window Z-score analysis and statistical thresholding.
- Analyzing earthquake recurrence patterns to characterize seismic behavior as quasi-periodic, random, or clustered using Coefficient of Variation (COV) statistics.
- Identifying 'orphan' earthquakes and seismic hotspots by performing spatial clustering analysis and nearest-neighbor distance calculations.
- Validating geological hypotheses through rigorous significance testing, including Kolmogorov-Smirnov tests and non-parametric comparisons of distributions.
| name | statistics |
|---|---|
| description | Geostatistics, uncertainty quantification, and time series analysis for paleoseismic research. Use for z-score analysis, Monte Carlo dating uncertainty, spatial clustering, and recurrence statistics. |
| location | user |
Statistics for Geoscience Research
When to Use This Skill
Invoke when:
- Calculating z-scores or anomaly detection
- Quantifying dating uncertainty
- Analyzing spatial clustering of events
- Computing recurrence intervals
- Performing Monte Carlo simulations
- Assessing statistical significance
Core Methods
1. Z-Score Analysis
Standard approach for anomaly detection in geochemical time series:
z = (x - μ) / σ
where:
x = observed value
μ = mean of reference period
σ = standard deviation of reference period
Interpretation thresholds:
| z-score | Interpretation | Action |
|---|---|---|
| z | < 2.0 | |
| z | ≥ 2.0 | |
| z | ≥ 2.5 | |
| z | ≥ 3.0 |
Best practices:
- Use rolling window for mean/std (50-100 data points)
- Exclude the test point from reference calculation
- Report both positive and negative z-scores
- Consider autocorrelation in time series
2. Dating Uncertainty
U-Th dating (typical for speleothems):
- Precision: ±50-100 years for Holocene
- Report as:
1285 ± 85 yr (U-Th: 1237-1322 CE)
Radiocarbon dating:
- Raw dates in 14C years BP
- Calibrate using IntCal20/SHCal20
- Report 2σ range:
830 ± 50 14C yr BP (cal. 720-940 CE)
Monte Carlo uncertainty propagation:
# Pseudocode for age-depth model uncertainty
for i in range(10000):
ages_sample = sample_from_age_distributions(age_points)
model = fit_age_depth_model(ages_sample, depths)
interpolated_ages[i] = model.interpolate(target_depth)
uncertainty = np.percentile(interpolated_ages, [2.5, 97.5])
3. Recurrence Interval Statistics
For earthquake recurrence from event dates:
intervals = [t[i+1] - t[i] for i in range(len(events)-1)]
mean_recurrence = mean(intervals)
std_recurrence = std(intervals)
COV = std / mean # Coefficient of variation
Interpretation:
- COV < 0.5: Quasi-periodic behavior
- COV ~ 1.0: Random (Poisson) process
- COV > 1.0: Clustered behavior
MCP Tool: calc_recurrence computes these statistics
4. Spatial Clustering Analysis
Orphan earthquake detection: Identify seismicity far from mapped faults using nearest-neighbor distance.
# For each earthquake, calculate distance to nearest fault
distances = [min_distance_to_fault(eq) for eq in earthquakes]
# Earthquakes > 50 km from any fault are "orphans"
orphans = [eq for eq, d in zip(earthquakes, distances) if d > 50]
Kernel density estimation:
- Use for seismicity hotspot mapping
- Bandwidth selection: Scott's rule or cross-validation
5. Significance Testing
When comparing distributions:
- Kolmogorov-Smirnov test: Compare CDFs
- Mann-Whitney U: Non-parametric comparison of medians
- Permutation tests: Distribution-free significance
Correlation analysis:
- Pearson r: Linear relationship (assumes normality)
- Spearman ρ: Monotonic relationship (rank-based)
- Cross-correlation: Time-lagged relationships
P-value interpretation:
| p-value | Interpretation |
|---|---|
| p > 0.10 | Not significant |
| 0.05 < p ≤ 0.10 | Marginally significant |
| 0.01 < p ≤ 0.05 | Significant |
| p ≤ 0.01 | Highly significant |
Warning: With small samples (n < 20), be cautious about p-values.
6. Time Series Analysis
Detrending:
- Remove long-term trend (linear, polynomial, or LOESS)
- Calculate residuals for anomaly detection
- Avoid over-fitting the trend
Autocorrelation:
- Check lag-1 autocorrelation before independence tests
- Effective sample size: n_eff = n × (1 - ρ) / (1 + ρ)
Change point detection:
- PELT algorithm for multiple change points
- Bayesian change point detection for uncertainty
Available MCP Calculation Tools
| Tool | Function |
|---|---|
calc_recurrence |
Recurrence interval from event dates |
calc_pga |
Peak Ground Acceleration (attenuation) |
calc_energy |
Seismic energy density (Wang & Manga) |
calc_distance |
Great-circle distance |
Common Pitfalls
Multiple comparisons: Testing many anomalies inflates false positive rate. Apply Bonferroni correction or FDR.
Small sample size: With n < 10 events, statistical power is low. Report effect sizes, not just p-values.
Circular logic: Don't use detected anomalies to define the reference period.
Ignoring uncertainty: Always propagate dating errors through calculations.
Cherry-picking: Report ALL tests performed, not just significant ones.
Reporting Standards
Always report:
- Sample size (n)
- Central tendency AND spread (mean ± std, or median + IQR)
- Uncertainty ranges (95% CI or 2σ)
- Effect size, not just significance
- Method used for calculation