What is deep-learning-primer?

This Claude skill serves as a comprehensive Deep Learning Primer, providing foundational knowledge and production-ready PyTorch code snippets. It covers the entire neural network lifecycle, including architecture design, backpropagation, optimization strategies, and regularization techniques to help developers and learners build and debug machine learning models efficiently.

When should I use deep-learning-primer?

deep-learning-primer is useful in the following scenarios: • Scaffolding neural network architectures using standardized PyTorch templates for Multi-Layer Perceptrons (MLP). • Selecting and implementing the most effective activation and loss functions for specific tasks like binary classification, multi-class tasks, or regression. • Setting up professional-grade training loops with AdamW optimization, Cosine Annealing learning rate schedules, and gradient clipping. • Applying advanced regularization techniques such as Dropout, Batch Normalization, and Early Stopping to prevent overfitting and improve model generalization. • Implementing weight initialization strategies (e.g., Xavier/Glorot) to ensure stable training and faster convergence.

name	deep-learning-primer
description	Deep learning foundations including neural network basics, backpropagation, optimization, regularization, and training best practices.

Deep Learning Primer

Essential deep learning concepts for ML systems.

Neural Network Basics

import torch
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        return self.layers(x)

Activation Functions

Function	Use Case	Output Range
ReLU	Hidden layers	[0, ∞)
LeakyReLU	Prevent dead neurons	(-∞, ∞)
GELU	Transformers	(-∞, ∞)
Sigmoid	Binary classification	(0, 1)
Softmax	Multi-class	(0, 1)
Tanh	Centered output	(-1, 1)

Loss Functions

# Classification
cross_entropy = nn.CrossEntropyLoss()
binary_ce = nn.BCEWithLogitsLoss()
focal_loss = lambda p, y, gamma=2: -((1-p)**gamma * y * torch.log(p))

# Regression
mse = nn.MSELoss()
mae = nn.L1Loss()
huber = nn.HuberLoss()

# Custom loss
def contrastive_loss(embeddings, labels, margin=1.0):
    distances = torch.cdist(embeddings, embeddings)
    same_class = labels.unsqueeze(0) == labels.unsqueeze(1)
    loss = same_class * distances + (~same_class) * F.relu(margin - distances)
    return loss.mean()

Optimization

# Optimizers
optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-3,
    weight_decay=0.01
)

# Learning rate scheduling
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer,
    T_max=num_epochs
)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        optimizer.zero_grad()
        loss = criterion(model(batch.x), batch.y)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
    scheduler.step()

Regularization Techniques

# Dropout
nn.Dropout(p=0.5)

# Weight decay (L2)
optimizer = AdamW(params, weight_decay=0.01)

# Batch normalization
nn.BatchNorm1d(num_features)

# Layer normalization (for transformers)
nn.LayerNorm(hidden_dim)

# Early stopping
class EarlyStopping:
    def __init__(self, patience=5):
        self.patience = patience
        self.counter = 0
        self.best_score = None

    def __call__(self, val_loss):
        if self.best_score is None or val_loss < self.best_score:
            self.best_score = val_loss
            self.counter = 0
        else:
            self.counter += 1
        return self.counter >= self.patience

Initialization

def init_weights(module):
    if isinstance(module, nn.Linear):
        nn.init.xavier_uniform_(module.weight)
        if module.bias is not None:
            nn.init.zeros_(module.bias)
    elif isinstance(module, nn.Embedding):
        nn.init.normal_(module.weight, std=0.02)

model.apply(init_weights)

Best Practices

Start with Adam optimizer, lr=1e-3
Use learning rate warmup
Apply gradient clipping
Monitor training/validation gap
Use mixed precision training

deep-learning-primer

When & Why to Use This Skill

Use Cases