Question 1

What is restart-crashloop?

Accepted Answer

This Claude skill automates the remediation of Kubernetes pods stuck in a CrashLoopBackOff state. It provides a safe, programmatic way to restart failing containers by deleting the affected pod, allowing the deployment controller to trigger a fresh recreation. By validating preconditions like restart counts and pod types, it ensures that automated restarts are only applied to appropriate workloads, reducing manual SRE intervention and improving service recovery times.

Question 2

When should I use restart-crashloop?

Accepted Answer

restart-crashloop is useful in the following scenarios: • Transient Issue Resolution: Automatically restarting microservices that fail to initialize due to temporary network timeouts or external dependency unavailability. • SRE Runbook Automation: Serving as an automated step in an incident response workflow to handle 'stuck' pods without requiring manual kubectl commands. • Environment Maintenance: Quickly clearing failing pods in development or staging environments to ensure resource availability and clean state transitions. • First-Line Defense: Acting as a primary automated response for non-critical pod failures before escalating to human engineers.

name	restart-crashloop
description	>
and a restart might resolve transient issues. Keywords	crashloop, restart,
domain	k8s
category	remediation
requires-approval	false
confidence	0.85

restart-crashloop

When & Why to Use This Skill

Use Cases

Restart CrashLoopBackOff Pod

Preconditions

Actions

1. Delete Pod to Trigger Recreation

Success Criteria

Failure Handling

Examples