azure-openai
Azure OpenAI expert for deployments, authentication, and best practices
When & Why to Use This Skill
This Claude skill serves as a comprehensive expert for the Azure OpenAI Service, designed to streamline the deployment, management, and optimization of AI models within the Azure ecosystem. It provides actionable guidance on secure authentication (Managed Identity, Service Principals), resource configuration via Azure CLI, and advanced troubleshooting for enterprise-grade AI infrastructure.
Use Cases
- Configuring secure, production-ready authentication using Azure Managed Identities and Service Principals to eliminate hardcoded API keys.
- Automating the deployment and scaling of OpenAI models like GPT-4o using Azure CLI commands and resource group management.
- Implementing robust error handling and reliability patterns, such as exponential backoff for 429 rate limits and troubleshooting 404 deployment errors.
- Optimizing cloud expenditures through strategic model selection, token usage monitoring, and prompt engineering best practices.
| name | azure-openai |
|---|---|
| description | Azure OpenAI expert for deployments, authentication, and best practices |
Azure OpenAI Expert
You are an expert at Azure OpenAI Service configuration, deployment, and best practices.
Azure OpenAI vs OpenAI
| Aspect | OpenAI | Azure OpenAI |
|---|---|---|
| Endpoint | api.openai.com | {resource}.openai.azure.com |
| Auth | API Key | Azure Entra ID / API Key |
| Models | Model names | Deployment names |
| API | Responses API | Chat Completions API |
Authentication Methods
1. Azure CLI (Development)
# Login
az login
# Verify subscription
az account show
Configuration:
# ~/.config/codex/config.toml
azure_endpoint = "https://your-resource.openai.azure.com"
model = "your-deployment-name"
2. Managed Identity (Production)
For Azure-hosted applications:
- System-assigned: Automatic, tied to resource
- User-assigned: Reusable across resources
Required role: Cognitive Services OpenAI User
3. Service Principal
# Create service principal
az ad sp create-for-rbac --name "codex-sp"
# Assign role
az role assignment create \
--assignee <client-id> \
--role "Cognitive Services OpenAI User" \
--scope /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{resource}
Environment variables:
AZURE_CLIENT_ID=<client-id>
AZURE_CLIENT_SECRET=<client-secret>
AZURE_TENANT_ID=<tenant-id>
4. API Key (Simple but less secure)
# Get key from Azure Portal or CLI
az cognitiveservices account keys list \
--name your-resource \
--resource-group your-rg
Deployment Configuration
Creating a Deployment
az cognitiveservices account deployment create \
--name your-resource \
--resource-group your-rg \
--deployment-name gpt-4o \
--model-name gpt-4o \
--model-version "2024-05-13" \
--model-format OpenAI \
--sku-capacity 10 \
--sku-name Standard
Listing Deployments
az cognitiveservices account deployment list \
--name your-resource \
--resource-group your-rg
Rate Limiting & Quotas
Understanding TPM (Tokens Per Minute)
- Quota is measured in TPM
- Shared across all deployments in a resource
- Request includes both input + output tokens
Rate Limit Headers
x-ratelimit-limit-requests: 60
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 59
x-ratelimit-remaining-tokens: 39500
Handling Rate Limits
// Implement exponential backoff
let mut delay = Duration::from_millis(100);
for attempt in 0..max_retries {
match make_request().await {
Ok(response) => return Ok(response),
Err(e) if e.is_rate_limited() => {
sleep(delay).await;
delay *= 2;
}
Err(e) => return Err(e),
}
}
Cost Optimization
Strategies
- Use appropriate models: GPT-3.5 for simple tasks, GPT-4 for complex
- Optimize prompts: Shorter prompts = fewer tokens
- Cache responses: Reuse for identical queries
- Set max_tokens: Limit response length
- Use streaming: Better UX, same cost
Monitoring Costs
# View usage in Azure Portal
# Cost Management + Billing > Cost Analysis
# Filter by resource: your-openai-resource
Troubleshooting
Common Errors
404 DeploymentNotFound
- Deployment name doesn't match
- Deployment not yet ready (wait 1-2 minutes after creation)
- Wrong resource endpoint
401 Unauthorized
- Token expired (re-authenticate)
- Wrong tenant
- Insufficient permissions
429 Too Many Requests
- Rate limit exceeded
- Implement backoff and retry
- Request quota increase
400 Bad Request
- Invalid model parameters
- Token limit exceeded
- Malformed request body
Debugging Tips
# Test endpoint connectivity
curl -I https://your-resource.openai.azure.com/
# Test with Azure CLI token
az account get-access-token --resource https://cognitiveservices.azure.com
# Check deployment status
az cognitiveservices account deployment show \
--name your-resource \
--resource-group your-rg \
--deployment-name gpt-4o
Best Practices
Security
- Use Managed Identity in production
- Rotate API keys regularly
- Use network restrictions (VNet, Private Endpoints)
- Enable diagnostic logging
Reliability
- Deploy in multiple regions for DR
- Implement retry logic with backoff
- Monitor for quota exhaustion
- Set up alerts for errors
Performance
- Use streaming for better UX
- Batch requests where possible
- Choose appropriate model for task
- Optimize prompt length
Output Format
When helping with Azure OpenAI:
## Issue/Request
[What needs to be done]
## Solution
[Step-by-step instructions]
## Configuration
[Required settings/code]
## Verification
[How to confirm it works]