instance-actors

affandar's avatarfrom affandar

Managing instance actor orchestrations for PostgreSQL health monitoring. Use when debugging stale actors, restarting actors, or troubleshooting health check issues.

0stars🔀0forks📁View on GitHub🕐Updated Jan 11, 2026

When & Why to Use This Skill

This Claude skill facilitates the management and orchestration of instance actors dedicated to PostgreSQL health monitoring. It provides a robust framework for developers and SREs to debug stale processes, restart failed orchestrations, and troubleshoot health check discrepancies within Duroxide-based environments, ensuring high availability and database reliability.

Use Cases

  • Detecting and restarting 'stale' actors where the last health check is older than 5 minutes to prevent monitoring gaps.
  • Identifying and resolving 'zombie' actors that appear as 'Running' in the database but have ceased processing health updates due to server crashes.
  • Automating the recovery of all instance actors following a database migration where orchestration states were not preserved.
  • Managing the full lifecycle of monitoring orchestrations, including manual starts, restarts, and cancellations via API or Duroxide client.
  • Troubleshooting missing actors by verifying orchestration IDs against the Duroxide state and the CMS database.
nameinstance-actors
descriptionManaging instance actor orchestrations for PostgreSQL health monitoring. Use when debugging stale actors, restarting actors, or troubleshooting health check issues.

Instance Actor Management

Overview

Instance actors are detached Duroxide orchestrations that run continuously to monitor PostgreSQL instance health. They can become orphaned or stale.

Actor Lifecycle

  1. Created: When instance is created, create_instance orchestration spawns an actor
  2. Running: Actor loops forever: check health → update CMS → timer → continue-as-new
  3. Cancelled: When instance is deleted, actor is cancelled via cancel_instance()

Detecting Problems

Stale Actor (running but not working)

  • last_health_check > 5 minutes old
  • Actor may be stuck, timer broken, or worker not processing

Missing Actor

  • instance_actor_orchestration_id is NULL
  • Or orchestration doesn't exist in duroxide state

Zombie Actor (in DB but not processing)

  • Status shows "Running" but no health updates
  • Often caused by server crash or DB migration

API Endpoints

# Start new actor
curl -X POST /api/instances/:name/actor/start

# Restart actor (cancel + start new)
curl -X POST /api/instances/:name/actor/restart

# Cancel actor
curl -X POST /api/instances/:name/actor/cancel

Duroxide Client Usage

// Cancel an actor
client.cancel_instance(&actor_id, "User requested cancellation").await?;

// Check if actor exists
match client.get_instance_info(&actor_id).await {
    Ok(info) => println!("Status: {}", info.status),
    Err(_) => println!("Actor not found"),
}

// Start new actor (detached)
client.start_orchestration(
    &new_actor_id,
    orchestrations::INSTANCE_ACTOR,
    serde_json::to_string(&input)?,
).await?;

Recovery Procedures

Restart All Stale Actors

-- Find instances with stale health
SELECT user_name, k8s_name, last_health_check, instance_actor_orchestration_id
FROM toygres_cms.instances
WHERE state = 'running'
  AND (last_health_check IS NULL OR last_health_check < NOW() - INTERVAL '5 minutes');

Then use the UI or API to restart each actor.

After Database Migration

If duroxide state wasn't migrated, all actors are orphaned:

  1. List all running instances
  2. For each, call /api/instances/:name/actor/start