This Claude skill provides a comprehensive framework for the Machine Learning development lifecycle, focusing on systematic experiment design, baseline establishment, and rigorous tracking. It integrates industry best practices for MLflow logging and iterative model improvement to ensure reproducible, data-driven, and high-quality ML outcomes.

When should I use ml-workflow?

ml-workflow is useful in the following scenarios: • Case 1: Establishing a performance floor by quickly training and comparing multiple baseline models like Logistic Regression and Random Forest. • Case 2: Automating the tracking of hypotheses, parameters, and metrics using MLflow to maintain a searchable history of all ML experiments. • Case 3: Implementing a structured iterative improvement workflow to quantify the exact performance gain from feature engineering or hyperparameter tuning. • Case 4: Standardizing the ML development process across a team using predefined commands for project initialization and baseline training.

name	ml-workflow
description	ML development workflow covering experiment design, baseline establishment, iterative improvement, and experiment tracking best practices.

ML Workflow

Systematic approach to ML model development.

Development Lifecycle

┌─────────────────────────────────────────────────────────────┐
│                  ML DEVELOPMENT WORKFLOW                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. PROBLEM      2. BASELINE     3. EXPERIMENT              │
│     SETUP           MODEL           ITERATE                  │
│     ↓               ↓               ↓                       │
│  Define metrics  Simple model   Hypothesis                  │
│  Success criteria Benchmark     Test ideas                  │
│  Constraints     Comparison     Track results               │
│                                                              │
│  4. EVALUATE     5. VALIDATE    6. DEPLOY                   │
│     ↓               ↓               ↓                       │
│  Full metrics    Production    Ship to prod                │
│  Error analysis  validation    Monitor                     │
│  Fairness        A/B test      Iterate                     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Experiment Design

import mlflow
from dataclasses import dataclass

@dataclass
class Experiment:
    name: str
    hypothesis: str
    metrics: list
    success_criteria: dict

experiment = Experiment(
    name="feature_engineering_v2",
    hypothesis="Adding temporal features improves prediction",
    metrics=["accuracy", "f1", "latency_ms"],
    success_criteria={"f1": 0.85, "latency_ms": 50}
)

# Track experiment
mlflow.set_experiment(experiment.name)
with mlflow.start_run():
    mlflow.log_param("hypothesis", experiment.hypothesis)
    # ... training code ...
    mlflow.log_metrics(results)

Baseline Models

from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

baselines = {
    "majority": DummyClassifier(strategy="most_frequent"),
    "logistic": LogisticRegression(),
    "random_forest": RandomForestClassifier(n_estimators=100)
}

results = {}
for name, model in baselines.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results[name] = {
        "accuracy": accuracy_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred, average="macro")
    }

# Best baseline
best = max(results.items(), key=lambda x: x[1]["f1"])
print(f"Best baseline: {best[0]} with F1={best[1]['f1']:.3f}")

Experiment Tracking

import mlflow
import mlflow.pytorch

# Start experiment
mlflow.set_tracking_uri("http://mlflow.example.com")
mlflow.set_experiment("churn_prediction")

with mlflow.start_run(run_name="xgboost_v3"):
    # Log parameters
    mlflow.log_params({
        "model_type": "xgboost",
        "max_depth": 6,
        "learning_rate": 0.1
    })

    # Train model
    model = train_model(X_train, y_train, params)

    # Log metrics
    mlflow.log_metrics({
        "train_accuracy": train_acc,
        "val_accuracy": val_acc,
        "f1_score": f1
    })

    # Log model
    mlflow.sklearn.log_model(model, "model")

    # Log artifacts
    mlflow.log_artifact("feature_importance.png")

Iterative Improvement

class ExperimentIterator:
    def __init__(self, baseline_metrics):
        self.baseline = baseline_metrics
        self.experiments = []

    def run_experiment(self, name, model_fn, hypothesis):
        with mlflow.start_run(run_name=name):
            mlflow.log_param("hypothesis", hypothesis)
            model, metrics = model_fn()
            mlflow.log_metrics(metrics)

            improvement = {k: metrics[k] - self.baseline[k]
                          for k in metrics}
            mlflow.log_metrics({f"{k}_improvement": v
                              for k, v in improvement.items()})

            self.experiments.append({
                "name": name,
                "hypothesis": hypothesis,
                "metrics": metrics,
                "improvement": improvement
            })

            return model, metrics

Commands

/omgml:init - Initialize project
/omgtrain:baseline - Train baselines

Best Practices

Always start with a baseline
Change one thing at a time
Track all experiments
Document hypotheses
Validate before deploying

ml-workflow

When & Why to Use This Skill

Use Cases