health-checks
Implement liveness, readiness, and dependency health checks
When & Why to Use This Skill
This Claude skill provides a standardized framework for implementing robust health checks in cloud-native applications. It helps developers distinguish between liveness, readiness, and startup probes to ensure high availability, prevent cascading failures, and optimize service discovery within Kubernetes environments by correctly validating process health and external dependencies.
Use Cases
- Case 1: Configuring Kubernetes liveness probes to automatically detect and restart unresponsive application processes.
- Case 2: Implementing readiness checks that validate critical dependencies like databases and caches before allowing traffic to reach a service instance.
- Case 3: Designing startup probes for legacy or resource-heavy applications to manage long initialization periods without triggering premature restarts.
- Case 4: Standardizing health check API response formats (JSON) across microservices to improve observability and integration with SRE monitoring tools.
- Case 5: Preventing 'thundering herd' issues by implementing proper timeouts and avoiding anti-patterns like checking dependencies in liveness probes.
| name | health-checks |
|---|---|
| description | "Implement liveness, readiness, and dependency health checks" |
| priority | 1 |
Health Checks
Different checks serve different purposes. Don't conflate them.
Check Types
| Endpoint | Purpose | On Failure | Should Check |
|---|---|---|---|
/health/live |
Process alive? | K8s restarts pod | Only process responsiveness |
/health/ready |
Can handle traffic? | K8s removes from LB | DB, cache, critical deps |
/health/startup |
Init complete? | K8s waits | Initialization status |
Liveness (Simple)
Return 200 OK immediately. Never check dependencies.
Checking DB in liveness = pod restarts when DB is down = cascading failure.
Readiness (Dependency Checks)
For each dependency:
→ Check with timeout (1-2s)
→ Record healthy/unhealthy status
Return 503 if any critical dependency unhealthy
Response Format
{
"status": "healthy|unhealthy",
"version": "1.2.3",
"dependencies": {
"database": "healthy",
"cache": "unhealthy: connection refused"
}
}
Kubernetes Config
livenessProbe:
httpGet: { path: /health/live, port: 8080 }
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet: { path: /health/ready, port: 8080 }
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet: { path: /health/startup, port: 8080 }
periodSeconds: 5
failureThreshold: 30 # 150s max startup
Anti-Patterns
- Liveness checks dependencies → Cascading restarts
- No timeout on checks → Health endpoint hangs
- No caching → Thundering herd on health endpoints
References
references/platforms/{platform}/health-checks.md