instance-actors
Managing instance actor orchestrations for PostgreSQL health monitoring. Use when debugging stale actors, restarting actors, or troubleshooting health check issues.
When & Why to Use This Skill
This Claude skill facilitates the management and orchestration of instance actors dedicated to PostgreSQL health monitoring. It provides a robust framework for developers and SREs to debug stale processes, restart failed orchestrations, and troubleshoot health check discrepancies within Duroxide-based environments, ensuring high availability and database reliability.
Use Cases
- Detecting and restarting 'stale' actors where the last health check is older than 5 minutes to prevent monitoring gaps.
- Identifying and resolving 'zombie' actors that appear as 'Running' in the database but have ceased processing health updates due to server crashes.
- Automating the recovery of all instance actors following a database migration where orchestration states were not preserved.
- Managing the full lifecycle of monitoring orchestrations, including manual starts, restarts, and cancellations via API or Duroxide client.
- Troubleshooting missing actors by verifying orchestration IDs against the Duroxide state and the CMS database.
| name | instance-actors |
|---|---|
| description | Managing instance actor orchestrations for PostgreSQL health monitoring. Use when debugging stale actors, restarting actors, or troubleshooting health check issues. |
Instance Actor Management
Overview
Instance actors are detached Duroxide orchestrations that run continuously to monitor PostgreSQL instance health. They can become orphaned or stale.
Actor Lifecycle
- Created: When instance is created,
create_instanceorchestration spawns an actor - Running: Actor loops forever: check health → update CMS → timer → continue-as-new
- Cancelled: When instance is deleted, actor is cancelled via
cancel_instance()
Detecting Problems
Stale Actor (running but not working)
last_health_check> 5 minutes old- Actor may be stuck, timer broken, or worker not processing
Missing Actor
instance_actor_orchestration_idis NULL- Or orchestration doesn't exist in duroxide state
Zombie Actor (in DB but not processing)
- Status shows "Running" but no health updates
- Often caused by server crash or DB migration
API Endpoints
# Start new actor
curl -X POST /api/instances/:name/actor/start
# Restart actor (cancel + start new)
curl -X POST /api/instances/:name/actor/restart
# Cancel actor
curl -X POST /api/instances/:name/actor/cancel
Duroxide Client Usage
// Cancel an actor
client.cancel_instance(&actor_id, "User requested cancellation").await?;
// Check if actor exists
match client.get_instance_info(&actor_id).await {
Ok(info) => println!("Status: {}", info.status),
Err(_) => println!("Actor not found"),
}
// Start new actor (detached)
client.start_orchestration(
&new_actor_id,
orchestrations::INSTANCE_ACTOR,
serde_json::to_string(&input)?,
).await?;
Recovery Procedures
Restart All Stale Actors
-- Find instances with stale health
SELECT user_name, k8s_name, last_health_check, instance_actor_orchestration_id
FROM toygres_cms.instances
WHERE state = 'running'
AND (last_health_check IS NULL OR last_health_check < NOW() - INTERVAL '5 minutes');
Then use the UI or API to restart each actor.
After Database Migration
If duroxide state wasn't migrated, all actors are orphaned:
- List all running instances
- For each, call
/api/instances/:name/actor/start