feature-engineering-kit

majiayu000's avatarfrom majiayu000

Auto-generate features with encodings, scaling, polynomial features, and interaction terms for ML pipelines.

5stars🔀1forks📁View on GitHub🕐Updated Jan 11, 2026

When & Why to Use This Skill

The Feature Engineering Kit is a specialized Claude skill designed to automate the critical data preprocessing phase of machine learning pipelines. It streamlines complex tasks such as categorical encoding, numerical scaling, and the generation of interaction terms, enabling data scientists to transform raw datasets into high-quality, model-ready features. By providing automated solutions for missing value imputation and time-based feature extraction, it significantly reduces manual effort and improves the predictive performance of ML models.

Use Cases

  • Preprocessing raw CSV datasets for classification or regression models by automatically applying One-Hot encoding and Standard scaling.
  • Enhancing model accuracy by generating polynomial features and interaction terms to capture complex relationships between variables.
  • Cleaning and preparing messy data using automated imputation strategies for missing values and discretizing continuous variables through binning.
  • Extracting actionable insights from temporal and text data by generating date-based features and TF-IDF transformations for advanced analytics.
namefeature-engineering-kit
descriptionAuto-generate features with encodings, scaling, polynomial features, and interaction terms for ML pipelines.

Feature Engineering Kit

Automated feature engineering with encodings, scaling, and transformations.

Features

  • Encodings: One-hot, label, target encoding
  • Scaling: Standard, min-max, robust scaling
  • Polynomial Features: Generate interactions
  • Binning: Discretize continuous features
  • Date Features: Extract time-based features
  • Text Features: TF-IDF, word counts
  • Missing Value Handling: Imputation strategies

CLI Usage

python feature_engineering.py --data train.csv --output engineered.csv --config config.json

Dependencies

  • scikit-learn>=1.3.0
  • pandas>=2.0.0
  • numpy>=1.24.0