deep-learning-primer

doanchienthangdev's avatarfrom doanchienthangdev

Deep learning foundations including neural network basics, backpropagation, optimization, regularization, and training best practices.

0stars🔀0forks📁View on GitHub🕐Updated Jan 8, 2026

When & Why to Use This Skill

This Claude skill serves as a comprehensive Deep Learning Primer, providing foundational knowledge and production-ready PyTorch code snippets. It covers the entire neural network lifecycle, including architecture design, backpropagation, optimization strategies, and regularization techniques to help developers and learners build and debug machine learning models efficiently.

Use Cases

  • Scaffolding neural network architectures using standardized PyTorch templates for Multi-Layer Perceptrons (MLP).
  • Selecting and implementing the most effective activation and loss functions for specific tasks like binary classification, multi-class tasks, or regression.
  • Setting up professional-grade training loops with AdamW optimization, Cosine Annealing learning rate schedules, and gradient clipping.
  • Applying advanced regularization techniques such as Dropout, Batch Normalization, and Early Stopping to prevent overfitting and improve model generalization.
  • Implementing weight initialization strategies (e.g., Xavier/Glorot) to ensure stable training and faster convergence.
namedeep-learning-primer
descriptionDeep learning foundations including neural network basics, backpropagation, optimization, regularization, and training best practices.

Deep Learning Primer

Essential deep learning concepts for ML systems.

Neural Network Basics

import torch
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        return self.layers(x)

Activation Functions

Function Use Case Output Range
ReLU Hidden layers [0, ∞)
LeakyReLU Prevent dead neurons (-∞, ∞)
GELU Transformers (-∞, ∞)
Sigmoid Binary classification (0, 1)
Softmax Multi-class (0, 1)
Tanh Centered output (-1, 1)

Loss Functions

# Classification
cross_entropy = nn.CrossEntropyLoss()
binary_ce = nn.BCEWithLogitsLoss()
focal_loss = lambda p, y, gamma=2: -((1-p)**gamma * y * torch.log(p))

# Regression
mse = nn.MSELoss()
mae = nn.L1Loss()
huber = nn.HuberLoss()

# Custom loss
def contrastive_loss(embeddings, labels, margin=1.0):
    distances = torch.cdist(embeddings, embeddings)
    same_class = labels.unsqueeze(0) == labels.unsqueeze(1)
    loss = same_class * distances + (~same_class) * F.relu(margin - distances)
    return loss.mean()

Optimization

# Optimizers
optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-3,
    weight_decay=0.01
)

# Learning rate scheduling
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer,
    T_max=num_epochs
)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        optimizer.zero_grad()
        loss = criterion(model(batch.x), batch.y)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
    scheduler.step()

Regularization Techniques

# Dropout
nn.Dropout(p=0.5)

# Weight decay (L2)
optimizer = AdamW(params, weight_decay=0.01)

# Batch normalization
nn.BatchNorm1d(num_features)

# Layer normalization (for transformers)
nn.LayerNorm(hidden_dim)

# Early stopping
class EarlyStopping:
    def __init__(self, patience=5):
        self.patience = patience
        self.counter = 0
        self.best_score = None

    def __call__(self, val_loss):
        if self.best_score is None or val_loss < self.best_score:
            self.best_score = val_loss
            self.counter = 0
        else:
            self.counter += 1
        return self.counter >= self.patience

Initialization

def init_weights(module):
    if isinstance(module, nn.Linear):
        nn.init.xavier_uniform_(module.weight)
        if module.bias is not None:
            nn.init.zeros_(module.bias)
    elif isinstance(module, nn.Embedding):
        nn.init.normal_(module.weight, std=0.02)

model.apply(init_weights)

Best Practices

  1. Start with Adam optimizer, lr=1e-3
  2. Use learning rate warmup
  3. Apply gradient clipping
  4. Monitor training/validation gap
  5. Use mixed precision training