What is planning-disaster-recovery?

This Claude skill streamlines backup automation and disaster recovery by providing comprehensive guidance and automated workflows. It enables users to implement robust data protection strategies using industry-standard tools like tar, rsync, and AWS S3, ensuring system resilience, data integrity, and rapid recovery from critical failures through structured planning and execution.

When should I use planning-disaster-recovery?

planning-disaster-recovery is useful in the following scenarios: • Automating Cloud Backups: Schedule and manage off-site data storage by integrating AWS S3 for secure, scalable backup solutions. • Disaster Recovery Planning: Generate detailed assessment reports and implementation plans to minimize downtime during infrastructure or system outages. • Server Data Synchronization: Utilize rsync and tar to create efficient, compressed archives of local file systems and synchronize them across remote environments. • Operational Runbook Generation: Automatically produce step-by-step documentation and scripts for maintenance, troubleshooting, and emergency data restoration procedures. • System State Archiving: Implement automated scripts to capture system configurations and baseline metrics for consistent recovery points.

planning-disaster-recovery – AI Agent Skills

name	planning-disaster-recovery
description	\|
- Bash(tar	, rsync:, aws:s3:*)
version	1.0.0
license	MIT

Prerequisites

Before using this skill, ensure:

Required credentials and permissions for the operations
Understanding of the system architecture and dependencies
Backup of critical data before making structural changes
Access to relevant documentation and configuration files
Monitoring tools configured for observability
Development or staging environment available for testing

Instructions

Step 1: Assess Current State

Review current configuration, setup, and baseline metrics
Identify specific requirements, goals, and constraints
Document existing patterns, issues, and pain points
Analyze dependencies and integration points
Validate all prerequisites are met before proceeding

Step 2: Design Solution

Define optimal approach based on best practices
Create detailed implementation plan with clear steps
Identify potential risks and mitigation strategies
Document expected outcomes and success criteria
Review plan with team or stakeholders if needed

Step 3: Implement Changes

Execute implementation in non-production environment first
Verify changes work as expected with thorough testing
Monitor for any issues, errors, or performance impacts
Document all changes, decisions, and configurations
Prepare rollback plan and recovery procedures

Step 4: Validate Implementation

Run comprehensive tests to verify all functionality
Compare performance metrics against baseline
Confirm no unintended side effects or regressions
Update all relevant documentation
Obtain approval before production deployment

Step 5: Deploy to Production

Schedule deployment during appropriate maintenance window
Execute implementation with real-time monitoring
Watch closely for any issues or anomalies
Verify successful deployment and functionality
Document completion, metrics, and lessons learned

Output

This skill produces:

Implementation Artifacts: Scripts, configuration files, code, and automation tools

Documentation: Comprehensive documentation of changes, procedures, and architecture

Test Results: Validation reports, test coverage, and quality metrics

Monitoring Configuration: Dashboards, alerts, metrics, and observability setup

Runbooks: Operational procedures for maintenance, troubleshooting, and incident response

Error Handling

Permission and Access Issues:

Verify credentials and permissions for all operations
Request elevated access if required for specific tasks
Document all permission requirements for automation
Use separate service accounts for privileged operations
Implement least-privilege access principles

Connection and Network Failures:

Check network connectivity, firewalls, and security groups
Verify service endpoints, DNS resolution, and routing
Test connections using diagnostic and troubleshooting tools
Review network policies, ACLs, and security configurations
Implement retry logic with exponential backoff

Resource Constraints:

Monitor resource usage (CPU, memory, disk, network)
Implement throttling, rate limiting, or queue mechanisms
Schedule resource-intensive tasks during low-traffic periods
Scale infrastructure resources if consistently hitting limits
Optimize queries, code, or configurations for efficiency

Configuration and Syntax Errors:

Validate all configuration syntax before applying changes
Test configurations thoroughly in non-production first
Implement automated configuration validation checks
Maintain version control for all configuration files
Keep previous working configuration for quick rollback

Resources

Configuration Templates: {baseDir}/templates/disaster-recovery-planner/

Documentation and Guides: {baseDir}/docs/disaster-recovery-planner/

Example Scripts and Code: {baseDir}/examples/disaster-recovery-planner/

Troubleshooting Guide: {baseDir}/docs/disaster-recovery-planner-troubleshooting.md

Best Practices: {baseDir}/docs/disaster-recovery-planner-best-practices.md

Monitoring Setup: {baseDir}/monitoring/disaster-recovery-planner-dashboard.json

planning-disaster-recovery

When & Why to Use This Skill

Use Cases