restart-imagepullbackoff

X-McKay's avatarfrom X-McKay

Handle a pod stuck in ImagePullBackOff state. First investigates the image pull error, then restarts to retry. Keywords: imagepull, image pull backoff, ErrImagePull, registry, container image, pull failed.

1stars🔀0forks📁View on GitHub🕐Updated Jan 11, 2026

When & Why to Use This Skill

This Claude skill automates the troubleshooting and remediation of Kubernetes pods stuck in the ImagePullBackOff or ErrImagePull state. It streamlines SRE and DevOps workflows by automatically investigating event logs to identify root causes and triggering pod recreations to resolve transient registry or network issues, significantly reducing Mean Time to Recovery (MTTR) for containerized applications.

Use Cases

  • Scenario 1: Automatically resolving transient image pull failures caused by temporary container registry downtime or network instability.
  • Scenario 2: Streamlining Kubernetes deployment troubleshooting by providing instant event analysis and automated retry actions for failed pod starts.
  • Scenario 3: Reducing manual SRE toil by implementing an automated runbook for common 'ErrImagePull' errors in production and staging environments.
namerestart-imagepullbackoff
description>
pull error, then restarts to retry. Keywordsimagepull, image pull backoff,
domaink8s
categoryremediation
requires-approvalfalse
confidence0.70

Handle ImagePullBackOff

Preconditions

Before applying this skill, verify:

  • Pod status is ImagePullBackOff or ErrImagePull
  • Pod has been in this state for more than 2 minutes

Actions

1. Get Pod Events

First, get pod events to understand the image pull failure.

mcp_tool: kubernetes-mcp-server/events_list
params:
  namespace: $namespace
timeout: 30s

2. Delete Pod to Trigger Recreation

Delete the pod to trigger recreation and retry the image pull.

mcp_tool: kubernetes-mcp-server/pods_delete
params:
  name: $pod_name
  namespace: $namespace
timeout: 30s

Success Criteria

The skill succeeds when:

  • New pod created and image pull succeeds
  • Pod reaches Running state within 5 minutes

Failure Handling

If image pull continues to fail:

  1. Verify image name and tag are correct
  2. Check image registry connectivity
  3. Verify image pull secrets are configured
  4. Escalate to human with registry details

Examples

Input Context:

{
  "pod_name": "myapp-deployment-xyz789",
  "namespace": "production",
  "status": "ImagePullBackOff",
  "image": "registry.example.com/myapp:v1.2.3"
}

Expected Outcome: Pod deleted, new pod successfully pulls image and reaches Running state.