Question 1

What is investigate-pod-failure?

Accepted Answer

This Claude skill automates the deep investigation of failing Kubernetes pods by aggregating logs, events, and resource statuses. It streamlines the troubleshooting process for SREs and DevOps engineers, enabling rapid root cause identification for common issues like OOM kills, configuration errors, and connection failures within a cluster.

Question 2

When should I use investigate-pod-failure?

Accepted Answer

investigate-pod-failure is useful in the following scenarios: • Rapidly diagnosing 'CrashLoopBackOff' or 'Error' states in production namespaces to minimize service downtime. • Identifying the root cause of pod failures by correlating container logs with cluster-level events and resource specifications. • Troubleshooting failed deployments where pods are created but fail to reach a 'Running' state due to application errors. • Investigating resource-related terminations, such as OOM (Out of Memory) kills, through detailed pod status and event history inspection.

name	investigate-pod-failure
description	>
status to identify root cause. Keywords	investigate, debug, pod failure,
domain	k8s
category	diagnostic
requires-approval	false
confidence	0.95

investigate-pod-failure

When & Why to Use This Skill

Use Cases

Investigate Pod Failure

Preconditions

Actions

1. Get Pod Status and Details

2. Get Pod Logs

3. Get Recent Events

Success Criteria

Failure Handling

Examples