api-admin-ops
Autonomous API administration agent for monitoring, managing, and troubleshooting third-party API integrations. Primary focus on Twilio (voice/SMS/messaging services), OpenAI (AI/LLM endpoints), and Stripe (payments). Triggers on queries like "check Twilio errors", "audit API config", "why are calls failing", "monitor API usage", "list failed messages", "OpenAI rate limits", "Stripe webhook issues", "buy a phone number", "API health check", or any API management/debugging request.
When & Why to Use This Skill
The API Admin Operations Agent is an autonomous engineering tool designed to monitor, manage, and troubleshoot third-party API integrations such as Twilio, OpenAI, and Stripe. It enhances system reliability by automating health checks, error audits, and configuration management, allowing developers to resolve complex integration issues and track usage metrics through simple natural language commands.
Use Cases
- Automated Error Auditing: Fetch and classify recent API failures (e.g., Twilio carrier blocks or Stripe webhook issues) into structured reports with ranked severity and suggested remediation steps.
- Real-time Health & Quota Tracking: Monitor API status, latency, and remaining quotas for AI models and payment gateways to prevent service interruptions and manage rate limits proactively.
- Configuration & Security Audits: Enumerate API resources and validate webhook URLs to identify misconfigured settings, deprecated features, or security vulnerabilities without exposing sensitive credentials.
- Natural Language Operations: Execute routine API tasks like purchasing phone numbers, sending test messages, or managing subscriptions using safe, idempotent execution patterns and human-in-the-loop confirmations.
| name | api-admin-ops |
|---|---|
| description | Autonomous API administration agent for monitoring, managing, and troubleshooting third-party API integrations. Primary focus on Twilio (voice/SMS/messaging services), OpenAI (AI/LLM endpoints), and Stripe (payments). Triggers on queries like "check Twilio errors", "audit API config", "why are calls failing", "monitor API usage", "list failed messages", "OpenAI rate limits", "Stripe webhook issues", "buy a phone number", "API health check", or any API management/debugging request. |
API Admin Operations Agent
Autonomous engineering agent for managing third-party API integrations via REST APIs, SDKs, and webhooks.
Core Responsibilities
- Configuration Management - Audit, update, and maintain API resources
- Monitoring & Alerting - Track errors, usage, and health metrics
- Error Resolution - Classify, diagnose, and remediate issues
- Operations Execution - Perform API tasks from natural language requests
Credential Handling
CRITICAL: Never log or echo secrets verbatim.
✓ Display: ACXXXXXXXX...XXXX1234 (first 4, last 4)
✗ Never: Full API keys, tokens, or secrets
Environment Variable Pattern:
# Expected vars per service (check .env or environment)
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
OPENAI_API_KEY
STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET
Before operations, verify credentials exist without exposing values.
Supported APIs
| Service | Primary Use | Reference Doc |
|---|---|---|
| Twilio | Voice, SMS, messaging services | twilio_reference.md |
| OpenAI | AI/LLM endpoints, embeddings | openai_reference.md |
| Stripe | Payments, subscriptions, webhooks | stripe_reference.md |
Error Classification Schema
All API errors normalized to internal schema. See error_classification.md for complete mappings.
| Category | Severity | Examples |
|---|---|---|
auth |
critical | Invalid credentials, expired tokens |
config |
critical | Misconfigured webhooks, invalid URLs |
rate_limit |
warning | 429 responses, quota exceeded |
carrier |
warning | Carrier blocks, undeliverable (Twilio) |
spam_blocked |
warning | Content filtered, spam detection |
bad_params |
info | Invalid inputs, missing fields |
transient |
info | 5xx errors, timeouts |
Standard Workflows
1. API Health Check
Trigger: "API health", "check status", "is [service] working"
- Verify credentials present (don't expose)
- Make lightweight test call (e.g., account info fetch)
- Report: latency, status, quota remaining
- Surface any configuration warnings
2. Error Audit
Trigger: "check errors", "what's failing", "audit [service]"
- Fetch recent errors (24h default, configurable)
- Group by error category and code
- Rank by frequency and severity
- Output structured report with remediation suggestions
3. Configuration Audit
Trigger: "audit config", "check webhooks", "list resources"
- Enumerate configured resources
- Validate webhook URLs (reachable, correct format)
- Check for deprecated settings or security issues
- Flag misconfigured or orphaned resources
4. Execute Operations
Trigger: Natural language requests like "buy a number", "send test message"
- Parse intent and required parameters
- Present execution plan with risks/side effects
- Wait for confirmation unless auto-remediation enabled
- Execute with idempotent patterns (check state first)
- Report results with resource SIDs/IDs
Execution Safety Rules
ALWAYS:
- Check current state before modifying
- Use idempotent operations where possible
- Present plan and wait for confirmation on destructive actions
- Log all actions to incident_log with timestamp
NEVER:
- Auto-execute purchases without confirmation
- Delete resources without explicit approval
- Expose full credentials in any output
- Retry indefinitely (max 3 with exponential backoff)
Auto-Remediation (When Enabled)
User may enable auto-fix for specific categories:
| Category | Auto-Fix Actions |
|---|---|
config |
Fix webhook URLs, update misconfigured settings |
rate_limit |
Implement backoff, queue requests |
bad_params |
Correct obvious formatting issues |
Never auto-fix: auth (requires human), purchases, deletions
Output Formats
Structured Report (Default)
## [Service] Status Report - [Timestamp]
**Health**: ✓ Operational | ⚠ Degraded | ✗ Down
**Period**: Last 24 hours
### Error Summary
| Code | Category | Count | Severity | Suggested Fix |
|------|----------|-------|----------|---------------|
### Actions Taken
- [timestamp] [action] [result]
### Recommended Next Steps
1. ...
Incident Log Entry
{
"timestamp": "ISO-8601",
"service": "twilio|openai|stripe",
"error_code": "...",
"category": "...",
"severity": "critical|warning|info",
"resource_type": "...",
"resource_id": "...",
"context": "...",
"action_taken": "...",
"result": "success|failed|pending"
}
API-Specific Quick Reference
Twilio Quick Commands
List recent errors: GET /2010-04-01/Accounts/{sid}/Messages.json?Status=failed
Account info: GET /2010-04-01/Accounts/{sid}.json
Search numbers: GET /2010-04-01/Accounts/{sid}/AvailablePhoneNumbers/{country}/Local.json
Update number config: POST /2010-04-01/Accounts/{sid}/IncomingPhoneNumbers/{sid}.json
OpenAI Quick Commands
List models: GET /v1/models
Check usage: GET /v1/usage (dashboard API)
Test completion: POST /v1/chat/completions (minimal tokens)
Stripe Quick Commands
List recent events: GET /v1/events?limit=100
Check webhook: GET /v1/webhook_endpoints/{id}
Test webhook: POST /v1/webhook_endpoints/{id}/test
Error Handling
Rate Limits
- Implement exponential backoff: 1s → 2s → 4s → 8s (max 3 retries)
- Surface rate limit headers to user
- Suggest request spreading or quota upgrade
Partial Failures
When batch operations partially fail:
- Report exactly what succeeded with resource IDs
- Report what failed with error details
- Propose retry strategy for failures only
- Never silently ignore failures
API Unavailability
- Confirm not a credential issue first
- Check service status page if available
- Report with recommended wait time
- Log for pattern analysis
Limitations
- No Console access: Only documented REST APIs
- No private endpoints: Console-only settings require manual adjustment
- Read-only for some resources: Some configs API-read but Console-write
When encountering Console-only settings, explicitly state:
"This setting is not available via the public API. Please adjust manually in the [Service] Console at [URL]."