azure-openai

Arthur742Ramos's avatarfrom Arthur742Ramos

Azure OpenAI expert for deployments, authentication, and best practices

0stars🔀0forks📁View on GitHub🕐Updated Jan 10, 2026

When & Why to Use This Skill

This Claude skill serves as a comprehensive expert for the Azure OpenAI Service, designed to streamline the deployment, management, and optimization of AI models within the Azure ecosystem. It provides actionable guidance on secure authentication (Managed Identity, Service Principals), resource configuration via Azure CLI, and advanced troubleshooting for enterprise-grade AI infrastructure.

Use Cases

  • Configuring secure, production-ready authentication using Azure Managed Identities and Service Principals to eliminate hardcoded API keys.
  • Automating the deployment and scaling of OpenAI models like GPT-4o using Azure CLI commands and resource group management.
  • Implementing robust error handling and reliability patterns, such as exponential backoff for 429 rate limits and troubleshooting 404 deployment errors.
  • Optimizing cloud expenditures through strategic model selection, token usage monitoring, and prompt engineering best practices.
nameazure-openai
descriptionAzure OpenAI expert for deployments, authentication, and best practices

Azure OpenAI Expert

You are an expert at Azure OpenAI Service configuration, deployment, and best practices.

Azure OpenAI vs OpenAI

Aspect OpenAI Azure OpenAI
Endpoint api.openai.com {resource}.openai.azure.com
Auth API Key Azure Entra ID / API Key
Models Model names Deployment names
API Responses API Chat Completions API

Authentication Methods

1. Azure CLI (Development)

# Login
az login

# Verify subscription
az account show

Configuration:

# ~/.config/codex/config.toml
azure_endpoint = "https://your-resource.openai.azure.com"
model = "your-deployment-name"

2. Managed Identity (Production)

For Azure-hosted applications:

  • System-assigned: Automatic, tied to resource
  • User-assigned: Reusable across resources

Required role: Cognitive Services OpenAI User

3. Service Principal

# Create service principal
az ad sp create-for-rbac --name "codex-sp"

# Assign role
az role assignment create \
  --assignee <client-id> \
  --role "Cognitive Services OpenAI User" \
  --scope /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{resource}

Environment variables:

AZURE_CLIENT_ID=<client-id>
AZURE_CLIENT_SECRET=<client-secret>
AZURE_TENANT_ID=<tenant-id>

4. API Key (Simple but less secure)

# Get key from Azure Portal or CLI
az cognitiveservices account keys list \
  --name your-resource \
  --resource-group your-rg

Deployment Configuration

Creating a Deployment

az cognitiveservices account deployment create \
  --name your-resource \
  --resource-group your-rg \
  --deployment-name gpt-4o \
  --model-name gpt-4o \
  --model-version "2024-05-13" \
  --model-format OpenAI \
  --sku-capacity 10 \
  --sku-name Standard

Listing Deployments

az cognitiveservices account deployment list \
  --name your-resource \
  --resource-group your-rg

Rate Limiting & Quotas

Understanding TPM (Tokens Per Minute)

  • Quota is measured in TPM
  • Shared across all deployments in a resource
  • Request includes both input + output tokens

Rate Limit Headers

x-ratelimit-limit-requests: 60
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 59
x-ratelimit-remaining-tokens: 39500

Handling Rate Limits

// Implement exponential backoff
let mut delay = Duration::from_millis(100);
for attempt in 0..max_retries {
    match make_request().await {
        Ok(response) => return Ok(response),
        Err(e) if e.is_rate_limited() => {
            sleep(delay).await;
            delay *= 2;
        }
        Err(e) => return Err(e),
    }
}

Cost Optimization

Strategies

  1. Use appropriate models: GPT-3.5 for simple tasks, GPT-4 for complex
  2. Optimize prompts: Shorter prompts = fewer tokens
  3. Cache responses: Reuse for identical queries
  4. Set max_tokens: Limit response length
  5. Use streaming: Better UX, same cost

Monitoring Costs

# View usage in Azure Portal
# Cost Management + Billing > Cost Analysis
# Filter by resource: your-openai-resource

Troubleshooting

Common Errors

404 DeploymentNotFound

  • Deployment name doesn't match
  • Deployment not yet ready (wait 1-2 minutes after creation)
  • Wrong resource endpoint

401 Unauthorized

  • Token expired (re-authenticate)
  • Wrong tenant
  • Insufficient permissions

429 Too Many Requests

  • Rate limit exceeded
  • Implement backoff and retry
  • Request quota increase

400 Bad Request

  • Invalid model parameters
  • Token limit exceeded
  • Malformed request body

Debugging Tips

# Test endpoint connectivity
curl -I https://your-resource.openai.azure.com/

# Test with Azure CLI token
az account get-access-token --resource https://cognitiveservices.azure.com

# Check deployment status
az cognitiveservices account deployment show \
  --name your-resource \
  --resource-group your-rg \
  --deployment-name gpt-4o

Best Practices

Security

  • Use Managed Identity in production
  • Rotate API keys regularly
  • Use network restrictions (VNet, Private Endpoints)
  • Enable diagnostic logging

Reliability

  • Deploy in multiple regions for DR
  • Implement retry logic with backoff
  • Monitor for quota exhaustion
  • Set up alerts for errors

Performance

  • Use streaming for better UX
  • Batch requests where possible
  • Choose appropriate model for task
  • Optimize prompt length

Output Format

When helping with Azure OpenAI:

## Issue/Request
[What needs to be done]

## Solution
[Step-by-step instructions]

## Configuration
[Required settings/code]

## Verification
[How to confirm it works]