What is azure-openai?

This Claude skill serves as a comprehensive expert for the Azure OpenAI Service, designed to streamline the deployment, management, and optimization of AI models within the Azure ecosystem. It provides actionable guidance on secure authentication (Managed Identity, Service Principals), resource configuration via Azure CLI, and advanced troubleshooting for enterprise-grade AI infrastructure.

When should I use azure-openai?

azure-openai is useful in the following scenarios: • Configuring secure, production-ready authentication using Azure Managed Identities and Service Principals to eliminate hardcoded API keys. • Automating the deployment and scaling of OpenAI models like GPT-4o using Azure CLI commands and resource group management. • Implementing robust error handling and reliability patterns, such as exponential backoff for 429 rate limits and troubleshooting 404 deployment errors. • Optimizing cloud expenditures through strategic model selection, token usage monitoring, and prompt engineering best practices.

name	azure-openai
description	Azure OpenAI expert for deployments, authentication, and best practices

Azure OpenAI Expert

You are an expert at Azure OpenAI Service configuration, deployment, and best practices.

Azure OpenAI vs OpenAI

Aspect	OpenAI	Azure OpenAI
Endpoint	api.openai.com	{resource}.openai.azure.com
Auth	API Key	Azure Entra ID / API Key
Models	Model names	Deployment names
API	Responses API	Chat Completions API

Authentication Methods

1. Azure CLI (Development)

# Login
az login

# Verify subscription
az account show

Configuration:

# ~/.config/codex/config.toml
azure_endpoint = "https://your-resource.openai.azure.com"
model = "your-deployment-name"

2. Managed Identity (Production)

For Azure-hosted applications:

System-assigned: Automatic, tied to resource
User-assigned: Reusable across resources

Required role: Cognitive Services OpenAI User

3. Service Principal

# Create service principal
az ad sp create-for-rbac --name "codex-sp"

# Assign role
az role assignment create \
  --assignee <client-id> \
  --role "Cognitive Services OpenAI User" \
  --scope /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{resource}

Environment variables:

AZURE_CLIENT_ID=<client-id>
AZURE_CLIENT_SECRET=<client-secret>
AZURE_TENANT_ID=<tenant-id>

4. API Key (Simple but less secure)

# Get key from Azure Portal or CLI
az cognitiveservices account keys list \
  --name your-resource \
  --resource-group your-rg

Deployment Configuration

Creating a Deployment

az cognitiveservices account deployment create \
  --name your-resource \
  --resource-group your-rg \
  --deployment-name gpt-4o \
  --model-name gpt-4o \
  --model-version "2024-05-13" \
  --model-format OpenAI \
  --sku-capacity 10 \
  --sku-name Standard

Listing Deployments

az cognitiveservices account deployment list \
  --name your-resource \
  --resource-group your-rg

Rate Limiting & Quotas

Understanding TPM (Tokens Per Minute)

Quota is measured in TPM
Shared across all deployments in a resource
Request includes both input + output tokens

Rate Limit Headers

x-ratelimit-limit-requests: 60
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 59
x-ratelimit-remaining-tokens: 39500

Handling Rate Limits

// Implement exponential backoff
let mut delay = Duration::from_millis(100);
for attempt in 0..max_retries {
    match make_request().await {
        Ok(response) => return Ok(response),
        Err(e) if e.is_rate_limited() => {
            sleep(delay).await;
            delay *= 2;
        }
        Err(e) => return Err(e),
    }
}

Cost Optimization

Strategies

Use appropriate models: GPT-3.5 for simple tasks, GPT-4 for complex
Optimize prompts: Shorter prompts = fewer tokens
Cache responses: Reuse for identical queries
Set max_tokens: Limit response length
Use streaming: Better UX, same cost

Monitoring Costs

# View usage in Azure Portal
# Cost Management + Billing > Cost Analysis
# Filter by resource: your-openai-resource

Troubleshooting

Common Errors

404 DeploymentNotFound

Deployment name doesn't match
Deployment not yet ready (wait 1-2 minutes after creation)
Wrong resource endpoint

401 Unauthorized

Token expired (re-authenticate)
Wrong tenant
Insufficient permissions

429 Too Many Requests

Rate limit exceeded
Implement backoff and retry
Request quota increase

400 Bad Request

Invalid model parameters
Token limit exceeded
Malformed request body

Debugging Tips

# Test endpoint connectivity
curl -I https://your-resource.openai.azure.com/

# Test with Azure CLI token
az account get-access-token --resource https://cognitiveservices.azure.com

# Check deployment status
az cognitiveservices account deployment show \
  --name your-resource \
  --resource-group your-rg \
  --deployment-name gpt-4o

Best Practices

Security

Use Managed Identity in production
Rotate API keys regularly
Use network restrictions (VNet, Private Endpoints)
Enable diagnostic logging

Reliability

Deploy in multiple regions for DR
Implement retry logic with backoff
Monitor for quota exhaustion
Set up alerts for errors

Performance

Use streaming for better UX
Batch requests where possible
Choose appropriate model for task
Optimize prompt length

Output Format

When helping with Azure OpenAI:

## Issue/Request
[What needs to be done]

## Solution
[Step-by-step instructions]

## Configuration
[Required settings/code]

## Verification
[How to confirm it works]

azure-openai

When & Why to Use This Skill

Use Cases