nixtla-cross-validator
Performs rigorous time series cross-validation using expanding and sliding windows.Use when needing to evaluate the performance of time series models on unseen data.Trigger with "cross validate time series", "evaluate forecasting model", "time series backtesting".
When & Why to Use This Skill
The Nixtla Cross-Validator is a specialized Claude skill designed for rigorous time series model evaluation and backtesting. By automating expanding and sliding window techniques, it allows data scientists to assess how forecasting models generalize to unseen data. It integrates seamlessly with TimeGPT and StatsForecast to provide critical performance metrics like MAE and RMSE, ensuring high-reliability predictive analytics.
Use Cases
- Retail Sales Backtesting: Evaluate the accuracy of seasonal sales forecasts by simulating predictions over historical data to refine inventory planning.
- Model Comparison: Compare the performance of traditional statistical models (ARIMA, ETS) against modern AI models (TimeGPT) for energy demand forecasting.
- Financial Risk Assessment: Validate market trend prediction models by testing their robustness across different historical time windows.
- Hyperparameter Optimization: Use multi-fold cross-validation results to select the best window sizes and model parameters for long-term demand forecasting.
| name | nixtla-cross-validator |
|---|---|
| description | | |
| allowed-tools | "Read,Write,Bash,Glob,Grep" |
| version | "1.0.0" |
Cross-Validator Skill
Evaluates time series model performance using cross-validation.
Purpose
Rigorously assesses how well a time series model generalizes to unseen data by simulating future predictions.
Overview
This skill automates time series cross-validation by splitting historical data into multiple training and validation sets based on expanding or sliding window techniques. It integrates with TimeGPT and StatsForecast to evaluate model performance across various time periods. It reports key accuracy metrics, helping users select the best model.
Prerequisites
Tools: Read, Write, Bash, Glob, Grep
Environment: NIXTLA_TIMEGPT_API_KEY (if using TimeGPT)
Packages:
pip install nixtla pandas statsforecast matplotlib
Instructions
Step 1: Prepare data
Read time series data from CSV file into a pandas DataFrame using the data loader script.
Script: {baseDir}/scripts/load_data.py
The script expects a CSV file with columns: unique_id, ds (timestamp), and y (target value).
Example usage:
python {baseDir}/scripts/load_data.py data.csv
Step 2: Configure cross-validation
Define parameters like window size, step size, and number of folds using the configuration script.
Script: {baseDir}/scripts/configure_cv.py
The script creates expanding window splits for cross-validation. It validates that the data is sufficient for the specified window size and number of folds.
Step 3: Execute cross-validation
Run the cross-validation script with your chosen model and parameters.
Script: {baseDir}/scripts/cross_validate.py
Usage:
python {baseDir}/scripts/cross_validate.py \
--input data.csv \
--model arima \
--window 20 \
--folds 3 \
--freq D
Supported models:
timegpt: TimeGPT API (requires NIXTLA_TIMEGPT_API_KEY)arima: AutoARIMA from StatsForecastets: AutoETS from StatsForecasttheta: AutoTheta from StatsForecastnaive: SeasonalNaive baseline
Step 4: Analyze results
The script automatically calculates and outputs cross-validation metrics (MAE, RMSE) for all folds.
Output
- cv_results.csv: CSV file containing the cross-validation results for each fold.
- metrics.json: JSON file containing overall performance metrics across all folds.
- plots/: Directory containing plots comparing actual vs. predicted values for each fold.
Error Handling
Error:
Input file not foundSolution: Ensure the specified input CSV file exists at the given path.Error:
Invalid model nameSolution: Use a supported model name: 'timegpt', 'arima', 'ets', 'theta', 'naive'.Error:
Insufficient data for cross-validationSolution: Increase the length of the input time series or reduce the window size.Error:
Missing required parameterSolution: Specify all required parameters: input, model, window, folds.Error:
NIXTLA_TIMEGPT_API_KEY environment variable not set.Solution: Set theNIXTLA_TIMEGPT_API_KEYenvironment variable before running the script when using TimeGPT.
Examples
Example 1: Cross-validating TimeGPT on daily sales
Input:
unique_id,ds,y
store_1,2023-01-01,10
store_1,2023-01-02,12
store_1,2023-01-03,15
...
store_1,2023-12-31,20
Command:
python {baseDir}/scripts/cross_validate.py \
--input sales.csv \
--model timegpt \
--window 30 \
--folds 4 \
--freq D
Output:
fold,unique_id,ds,y,y_hat
1,store_1,2023-11-01,18,17.5
1,store_1,2023-11-02,20,19.2
...
Example 2: Cross-validating ARIMA on monthly demand
Input:
unique_id,ds,y
product_1,2020-01-01,100
product_1,2020-02-01,110
...
product_1,2023-12-01,125
Command:
python {baseDir}/scripts/cross_validate.py \
--input demand.csv \
--model arima \
--window 6 \
--folds 3 \
--freq M
Output:
{
"MAE": 5.2,
"RMSE": 7.1
}
Resources
- StatsForecast documentation: https://nixtlaverse.nixtla.io/statsforecast/
- TimeGPT API documentation: https://docs.nixtla.io/
- Cross-validation best practices: https://otexts.com/fpp3/tscv.html
- Scripts:
{baseDir}/scripts/directory contains all executable code