nixtla-forecast-validator
Validates time series forecast quality metrics by comparing current performance against historical benchmarks. Detects degradation in MASE and sMAPE metrics. Activates when user mentions "validate forecast", "check forecast quality", or "assess forecast metrics".
When & Why to Use This Skill
The Nixtla Forecast Validator is a specialized tool for time series quality assurance. It automates the detection of model degradation by comparing current performance metrics, specifically MASE and sMAPE, against historical benchmarks. This skill helps data scientists and analysts maintain high-accuracy forecasting pipelines by providing automated validation reports, alerts, and visual comparisons of model health.
Use Cases
- Production Model Monitoring: Automatically detect when a forecasting model's accuracy drops below acceptable historical thresholds, signaling potential concept drift or data issues.
- Model Retraining Validation: Compare the performance of a newly trained model against the previous champion model to ensure that updates actually improve or maintain forecast quality.
- Automated Quality Reporting: Generate standardized validation reports and visualizations (MASE/sMAPE) for stakeholders to demonstrate the reliability of time series predictions.
- Threshold-Based Alerting: Configure custom sensitivity levels (Conservative to Lenient) to trigger alerts only when forecast errors exceed specific business-defined tolerances.
| name | nixtla-forecast-validator |
|---|---|
| description | > |
| allowed-tools | "Read,Write,Bash,Glob,Grep" |
| version | "1.0.0" |
Nixtla Forecast Validator
Validates time series forecast quality metrics and detects performance degradation using statistical measures. Compares current forecast accuracy against historical benchmarks to identify significant deviations in MASE and sMAPE metrics.
Overview
This skill analyzes forecast quality by comparing current performance metrics against historical baselines. It detects significant increases in error metrics (MASE and sMAPE) that may indicate model degradation, data quality issues, or changing patterns in the time series. The skill generates comprehensive reports, alerts, and visualizations to help users identify and address forecast quality problems quickly.
Activates automatically when Claude detects forecast validation needs, or when explicitly requested with phrases like "validate forecast quality", "check model performance", or "assess forecast accuracy".
Prerequisites
Tools: Read, Write, Bash, Glob, Grep
Environment: No API keys required (operates on CSV metrics files)
Python Packages:
pip install pandas matplotlib
Required CSV Format:
CSV files must contain columns: model, MASE, sMAPE
Instructions
Step 1: Prepare metrics data
Ensure you have two CSV files containing forecast metrics:
- Historical metrics CSV (baseline performance)
- Current metrics CSV (recent performance to validate)
Each CSV must have columns: model, MASE, sMAPE
Example format:
model,MASE,sMAPE
model_A,1.2,0.15
model_B,0.8,0.10
Step 2: Set validation thresholds
Configure acceptable deviation thresholds for MASE and sMAPE metrics. Default thresholds are 0.2 (20% increase), but these can be adjusted based on business requirements and model characteristics.
Recommended thresholds:
- Conservative: 0.1 (10% increase triggers alert)
- Standard: 0.2 (20% increase triggers alert)
- Lenient: 0.3 (30% increase triggers alert)
Step 3: Execute validation
Run the validation script to compare current metrics against historical benchmarks:
python {baseDir}/scripts/validate_forecast.py \
--historical historical_metrics.csv \
--current current_metrics.csv \
--mase_threshold 0.2 \
--smape_threshold 0.2
The script performs:
- Loads historical and current metrics from CSV files
- Calculates percentage increase for each metric per model
- Compares increases against configured thresholds
- Generates validation report, comparison CSV, alert log, and visualization
Step 4: Review validation outputs
Analyze the generated outputs to identify forecast quality issues:
- Read
validation_report.txtfor summary of findings - Check
alert.logfor models requiring immediate attention - Review
metrics_comparison.csvfor detailed metric changes - Examine
metrics_visualization.pngfor visual comparison
If degradation is detected, investigate potential causes such as data quality changes, concept drift, or model staleness.
Output
The validation process generates four output files:
- validation_report.txt: Summary report indicating which models show significant degradation and overall validation status
- metrics_comparison.csv: Side-by-side comparison of historical vs current metrics for all models
- alert.log: Alert messages for models exceeding degradation thresholds
- metrics_visualization.png: Bar chart visualization comparing historical and current MASE and sMAPE values
Error Handling
Common errors and solutions:
Missing required metrics column (MASE or sMAPE)
- Ensure input CSV files contain columns named exactly
MASEandsMAPE(case-sensitive) - Verify column headers match expected format
- Ensure input CSV files contain columns named exactly
Invalid threshold value
- Provide positive numerical values for
--mase_thresholdand--smape_threshold - Thresholds represent percentage increase (0.2 = 20%)
- Provide positive numerical values for
Historical data unavailable
- Verify path to historical metrics CSV file is correct
- Ensure file exists and is readable
- Check file format matches required CSV structure
File not found error
- Verify both
--historicaland--currentfile paths are correct - Use absolute paths if relative paths fail
- Check file permissions
- Verify both
Empty DataFrame error
- Ensure CSV files are not empty
- Verify CSV files contain data rows beyond the header
- Check for proper CSV formatting (commas as delimiters)
Examples
Example 1: Significant MASE degradation detected
Input (historical_metrics.csv):
model,MASE,sMAPE
model_A,1.2,0.15
Input (current_metrics.csv):
model,MASE,sMAPE
model_A,1.8,0.18
Command:
python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv
Output (validation_report.txt):
WARNING: Significant increase in MASE detected for model model_A.
Interpretation: Model A shows 50% increase in MASE (from 1.2 to 1.8), exceeding the default 20% threshold. This indicates forecast quality degradation requiring investigation.
Example 2: Stable performance, no alerts
Input (historical_metrics.csv):
model,MASE,sMAPE
model_B,0.8,0.10
Input (current_metrics.csv):
model,MASE,sMAPE
model_B,0.85,0.11
Command:
python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv
Output (validation_report.txt):
Forecast validation passed. No significant degradation detected.
Interpretation: Model B shows only 6.25% increase in MASE and 10% increase in sMAPE, both below the 20% threshold. Performance is stable.
Example 3: Multiple models with custom thresholds
Command:
python scripts/validate_forecast.py \
--historical multi_model_historical.csv \
--current multi_model_current.csv \
--mase_threshold 0.3 \
--smape_threshold 0.25
Uses more lenient thresholds (30% for MASE, 25% for sMAPE) suitable for volatile forecasts or experimental models.
Resources
Script: {baseDir}/scripts/validate_forecast.py
Metrics: MASE (Mean Absolute Scaled Error), sMAPE (symmetric Mean Absolute Percentage Error)
Related skills: nixtla-timegpt-lab, nixtla-experiment-architect, nixtla-schema-mapper