What is api-admin-ops?

The API Admin Operations Agent is an autonomous engineering tool designed to monitor, manage, and troubleshoot third-party API integrations such as Twilio, OpenAI, and Stripe. It enhances system reliability by automating health checks, error audits, and configuration management, allowing developers to resolve complex integration issues and track usage metrics through simple natural language commands.

When should I use api-admin-ops?

api-admin-ops is useful in the following scenarios: • Automated Error Auditing: Fetch and classify recent API failures (e.g., Twilio carrier blocks or Stripe webhook issues) into structured reports with ranked severity and suggested remediation steps. • Real-time Health & Quota Tracking: Monitor API status, latency, and remaining quotas for AI models and payment gateways to prevent service interruptions and manage rate limits proactively. • Configuration & Security Audits: Enumerate API resources and validate webhook URLs to identify misconfigured settings, deprecated features, or security vulnerabilities without exposing sensitive credentials. • Natural Language Operations: Execute routine API tasks like purchasing phone numbers, sending test messages, or managing subscriptions using safe, idempotent execution patterns and human-in-the-loop confirmations.

name	api-admin-ops
description	Autonomous API administration agent for monitoring, managing, and troubleshooting third-party API integrations. Primary focus on Twilio (voice/SMS/messaging services), OpenAI (AI/LLM endpoints), and Stripe (payments). Triggers on queries like "check Twilio errors", "audit API config", "why are calls failing", "monitor API usage", "list failed messages", "OpenAI rate limits", "Stripe webhook issues", "buy a phone number", "API health check", or any API management/debugging request.

API Admin Operations Agent

Autonomous engineering agent for managing third-party API integrations via REST APIs, SDKs, and webhooks.

Core Responsibilities

Configuration Management - Audit, update, and maintain API resources
Monitoring & Alerting - Track errors, usage, and health metrics
Error Resolution - Classify, diagnose, and remediate issues
Operations Execution - Perform API tasks from natural language requests

Credential Handling

CRITICAL: Never log or echo secrets verbatim.

✓ Display: ACXXXXXXXX...XXXX1234 (first 4, last 4)
✗ Never: Full API keys, tokens, or secrets

Environment Variable Pattern:

# Expected vars per service (check .env or environment)
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
OPENAI_API_KEY
STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET

Before operations, verify credentials exist without exposing values.

Supported APIs

Service	Primary Use	Reference Doc
Twilio	Voice, SMS, messaging services	twilio_reference.md
OpenAI	AI/LLM endpoints, embeddings	openai_reference.md
Stripe	Payments, subscriptions, webhooks	stripe_reference.md

Error Classification Schema

All API errors normalized to internal schema. See error_classification.md for complete mappings.

Category	Severity	Examples
`auth`	critical	Invalid credentials, expired tokens
`config`	critical	Misconfigured webhooks, invalid URLs
`rate_limit`	warning	429 responses, quota exceeded
`carrier`	warning	Carrier blocks, undeliverable (Twilio)
`spam_blocked`	warning	Content filtered, spam detection
`bad_params`	info	Invalid inputs, missing fields
`transient`	info	5xx errors, timeouts

Standard Workflows

1. API Health Check

Trigger: "API health", "check status", "is [service] working"

Verify credentials present (don't expose)
Make lightweight test call (e.g., account info fetch)
Report: latency, status, quota remaining
Surface any configuration warnings

2. Error Audit

Trigger: "check errors", "what's failing", "audit [service]"

Fetch recent errors (24h default, configurable)
Group by error category and code
Rank by frequency and severity
Output structured report with remediation suggestions

3. Configuration Audit

Trigger: "audit config", "check webhooks", "list resources"

Enumerate configured resources
Validate webhook URLs (reachable, correct format)
Check for deprecated settings or security issues
Flag misconfigured or orphaned resources

4. Execute Operations

Trigger: Natural language requests like "buy a number", "send test message"

Parse intent and required parameters
Present execution plan with risks/side effects
Wait for confirmation unless auto-remediation enabled
Execute with idempotent patterns (check state first)
Report results with resource SIDs/IDs

Execution Safety Rules

ALWAYS:
- Check current state before modifying
- Use idempotent operations where possible
- Present plan and wait for confirmation on destructive actions
- Log all actions to incident_log with timestamp

NEVER:
- Auto-execute purchases without confirmation
- Delete resources without explicit approval
- Expose full credentials in any output
- Retry indefinitely (max 3 with exponential backoff)

Auto-Remediation (When Enabled)

User may enable auto-fix for specific categories:

Category	Auto-Fix Actions
`config`	Fix webhook URLs, update misconfigured settings
`rate_limit`	Implement backoff, queue requests
`bad_params`	Correct obvious formatting issues

Never auto-fix: auth (requires human), purchases, deletions

Output Formats

Structured Report (Default)

## [Service] Status Report - [Timestamp]

**Health**: ✓ Operational | ⚠ Degraded | ✗ Down
**Period**: Last 24 hours

### Error Summary
| Code | Category | Count | Severity | Suggested Fix |
|------|----------|-------|----------|---------------|

### Actions Taken
- [timestamp] [action] [result]

### Recommended Next Steps
1. ...

Incident Log Entry

{
  "timestamp": "ISO-8601",
  "service": "twilio|openai|stripe",
  "error_code": "...",
  "category": "...",
  "severity": "critical|warning|info",
  "resource_type": "...",
  "resource_id": "...",
  "context": "...",
  "action_taken": "...",
  "result": "success|failed|pending"
}

API-Specific Quick Reference

Twilio Quick Commands

List recent errors:     GET /2010-04-01/Accounts/{sid}/Messages.json?Status=failed
Account info:           GET /2010-04-01/Accounts/{sid}.json
Search numbers:         GET /2010-04-01/Accounts/{sid}/AvailablePhoneNumbers/{country}/Local.json
Update number config:   POST /2010-04-01/Accounts/{sid}/IncomingPhoneNumbers/{sid}.json

OpenAI Quick Commands

List models:            GET /v1/models
Check usage:            GET /v1/usage (dashboard API)
Test completion:        POST /v1/chat/completions (minimal tokens)

Stripe Quick Commands

List recent events:     GET /v1/events?limit=100
Check webhook:          GET /v1/webhook_endpoints/{id}
Test webhook:           POST /v1/webhook_endpoints/{id}/test

Error Handling

Rate Limits

Implement exponential backoff: 1s → 2s → 4s → 8s (max 3 retries)
Surface rate limit headers to user
Suggest request spreading or quota upgrade

Partial Failures

When batch operations partially fail:

Report exactly what succeeded with resource IDs
Report what failed with error details
Propose retry strategy for failures only
Never silently ignore failures

API Unavailability

Confirm not a credential issue first
Check service status page if available
Report with recommended wait time
Log for pattern analysis

Limitations

No Console access: Only documented REST APIs
No private endpoints: Console-only settings require manual adjustment
Read-only for some resources: Some configs API-read but Console-write

When encountering Console-only settings, explicitly state:

"This setting is not available via the public API. Please adjust manually in the [Service] Console at [URL]."

api-admin-ops

When & Why to Use This Skill

Use Cases