token-cost-tracking

majiayu000's avatarfrom majiayu000

Track token usage and costs across agents for budget management

0stars🔀0forks📁View on GitHub🕐Updated Jan 5, 2026

When & Why to Use This Skill

This Claude skill provides a robust framework for monitoring LLM token consumption and financial expenditures across AI agents. It enables developers and organizations to implement precise cost attribution, real-time budget alerting, and model efficiency optimization, ensuring full transparency and control over AI operational costs.

Use Cases

  • Budget Monitoring & Alerting: Establish automated thresholds for daily, session-based, or organizational spending to prevent unexpected API billing spikes.
  • Granular Cost Attribution: Assign LLM expenses to specific agents, features, or user IDs to identify high-cost areas and calculate the ROI of different AI workflows.
  • Model Selection Optimization: Analyze usage patterns to determine where expensive models can be replaced by more cost-effective alternatives like Claude 3 Haiku without sacrificing quality.
  • Usage Auditing & Reporting: Generate detailed logs of input, output, and cached tokens to facilitate accurate internal resource allocation or external client billing.
nametoken-cost-tracking
descriptionTrack token usage and costs across agents for budget management
priority1

Token and Cost Tracking

Track token usage and costs across agents for budget management and optimization.

Core Principle

Every organization needs to answer:

  1. How much are we spending on LLMs?
  2. Where is the spend going (features, agents, users)?
  3. Are we within budget?
  4. What's trending (up or down)?

Essential Attributes

# Per-call token tracking
span.set_attribute("llm.tokens.input", 1500)
span.set_attribute("llm.tokens.output", 350)
span.set_attribute("llm.tokens.total", 1850)

# Per-call cost
span.set_attribute("llm.cost_usd", 0.025)

# Attribution
span.set_attribute("cost.feature", "document_analysis")
span.set_attribute("cost.agent", "researcher")
span.set_attribute("cost.user_id", "user_abc")  # Hashed
span.set_attribute("cost.org_id", "org_123")

Model Pricing Table

Keep pricing updated (prices as of late 2024):

MODEL_PRICING = {
    # Anthropic (per 1M tokens)
    "claude-3-opus": {"input": 15.00, "output": 75.00},
    "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
    "claude-3-5-haiku": {"input": 0.80, "output": 4.00},
    "claude-3-haiku": {"input": 0.25, "output": 1.25},

    # OpenAI (per 1M tokens)
    "gpt-4-turbo": {"input": 10.00, "output": 30.00},
    "gpt-4o": {"input": 2.50, "output": 10.00},
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
    "gpt-3.5-turbo": {"input": 0.50, "output": 1.50},

    # Embeddings (per 1M tokens)
    "text-embedding-3-large": {"input": 0.13, "output": 0},
    "text-embedding-3-small": {"input": 0.02, "output": 0},
    "text-embedding-ada-002": {"input": 0.10, "output": 0},
}

Cost Calculation

def calculate_cost(
    model: str,
    input_tokens: int,
    output_tokens: int,
    cached_tokens: int = 0,
    cache_discount: float = 0.9  # 90% discount for cached
) -> float:
    """Calculate cost for an LLM call."""
    pricing = MODEL_PRICING.get(model)
    if not pricing:
        return 0.0

    # Cached tokens are discounted
    effective_input = input_tokens - cached_tokens
    cached_cost = (cached_tokens / 1_000_000) * pricing["input"] * (1 - cache_discount)
    input_cost = (effective_input / 1_000_000) * pricing["input"]
    output_cost = (output_tokens / 1_000_000) * pricing["output"]

    return round(input_cost + cached_cost + output_cost, 6)

Aggregation Levels

Track at multiple granularities:

Per-Call

span.set_attribute("llm.cost_usd", 0.025)

Per-Agent Run

span.set_attribute("agent.total_tokens", 15000)
span.set_attribute("agent.total_cost_usd", 0.45)
span.set_attribute("agent.llm_calls", 5)

Per-Session

span.set_attribute("session.total_tokens", 45000)
span.set_attribute("session.total_cost_usd", 1.35)
span.set_attribute("session.agent_runs", 3)

Per-User (Daily/Monthly)

Track in your observability platform, not in spans.

Budget Alerting

Set up alerts for:

BUDGET_THRESHOLDS = {
    "per_call_max": 1.00,      # Alert if single call > $1
    "per_session_max": 10.00,  # Alert if session > $10
    "per_user_daily": 50.00,   # Alert if user > $50/day
    "org_daily": 1000.00,      # Alert if org > $1000/day
}

def check_budget(cost: float, level: str, entity_id: str):
    threshold = BUDGET_THRESHOLDS.get(f"{level}_max")
    if threshold and cost > threshold:
        log_budget_alert(level, entity_id, cost, threshold)

Framework Integration

Langfuse

from langfuse import Langfuse

langfuse = Langfuse()

# Automatic token/cost tracking
trace = langfuse.trace(name="agent_run")
generation = trace.generation(
    name="llm_call",
    model="claude-3-5-sonnet",
    usage={
        "input": 1500,
        "output": 350,
        "unit": "TOKENS"
    }
)
# Langfuse calculates cost automatically

LangSmith

from langsmith import Client

client = Client()
# Token tracking automatic via callbacks
# Cost calculation in LangSmith dashboard

OpenTelemetry

from opentelemetry import trace

tracer = trace.get_tracer("agent")

with tracer.start_as_current_span("llm_call") as span:
    span.set_attribute("llm.tokens.input", 1500)
    span.set_attribute("llm.tokens.output", 350)
    span.set_attribute("llm.cost_usd", calculate_cost(...))

Caching Impact

Track cache hits for accurate costs:

span.set_attribute("llm.cache.hit", True)
span.set_attribute("llm.cache.tokens_saved", 1200)
span.set_attribute("llm.cache.cost_saved_usd", 0.018)

Optimization Signals

Track metrics that indicate optimization opportunities:

# Prompt efficiency
span.set_attribute("prompt.compression_ratio", 0.7)
span.set_attribute("prompt.could_use_haiku", True)

# Model selection
span.set_attribute("model.recommendation", "could_downgrade")
span.set_attribute("model.quality_requirement", "low")

Anti-Patterns

  • Not tracking tokens (cost blindness)
  • Missing cost attribution (can't optimize)
  • Hardcoded pricing (becomes stale)
  • No budget alerts (surprise bills)
  • Tracking at wrong granularity

Related Skills

  • llm-call-tracing - LLM instrumentation
  • session-conversation-tracking - Session aggregation
token-cost-tracking – AI Agent Skills | Claude Skills