What is langsmith-debugger?

The LangSmith Debugger skill is a specialized tool designed for developers to monitor, debug, and optimize LangGraph-based AI agents. It provides deep visibility into agent traces, allowing users to investigate execution patterns, identify specific node failures (like Orchestrator or Analyst errors), and perform detailed latency and token usage analysis to ensure production-grade reliability.

When should I use langsmith-debugger?

langsmith-debugger is useful in the following scenarios: • Root Cause Analysis: Investigating why an agent provided an incorrect response by examining the exact inputs, outputs, and tool calls of specific nodes like the 'analyst_node'. • Performance Optimization: Identifying system bottlenecks by calculating average latency across different agent phases and filtering for runs that exceed specific time thresholds. • Error Monitoring: Automatically listing and analyzing failed runs in production environments using specific tags and filters to improve agent success rates. • Cost Management: Analyzing token consumption patterns across different agent components to identify high-usage prompts and optimize LLM expenses. • Behavioral Pattern Analysis: Comparing success rates and execution paths between the Orchestrator and Analyst nodes to refine agent planning and intent parsing logic.

name	langsmith-debugger
description	Debug and analyze {{PROJECT_NAME}} LangGraph agent traces. Use when investigating agent behavior patterns, finding failures, analyzing latency, or understanding why Orchestrator/Analyst responses went wrong. Covers trace queries by agent tags, pattern analysis across runs, and common debugging scenarios.

LangSmith Debugger for {{PROJECT_NAME}}

⚡ Quick Reference ({{PROJECT_NAME}} Specific)

USE THIS FIRST - These are the exact IDs for this project:

Item	Value
API Endpoint	`https://api.smith.langchain.com`
Workspace ID	`81b6468e-dacf-403c-8cd6-b9b672b12836`
Org ID	`ca825949-d89f-475c-8a5d-3df627044cbe`
Project: Production	`092619fa-b4af-4543-8253-2903027dd7c5` ({{PROJECT_PREFIX}}-production)
Project: Default	`c1c001cc-425d-4019-b1d6-5688bb1d2d1a`
Project: Legacy	`aecb1488-c9b9-4863-b1e9-6ad2fe72a357` ({{PROJECT_NAME}})
API Key Location	`backend/.env` → `LANGSMITH_API_KEY`

Quick Start

LANGSMITH_API_KEY=... LANGSMITH_TENANT_ID=... LANGSMITH_PROJECT_ID=... \ python .claude/skills/langsmith-debugger/scripts/query_langsmith_runs.py
Add LANGSMITH_FILTER to narrow runs (examples in references/production-traces.md).

Bundled Resources

References

references/production-traces.md: production env vars + trace query examples.

Scripts

scripts/query_langsmith_runs.py: list recent runs via LANGSMITH_* env vars.

⚠️ IMPORTANT: Org-Scoped API Key

The API key is org-scoped and requires the X-Tenant-ID header for all requests:

headers = {
    'x-api-key': API_KEY,
    'X-Tenant-ID': '81b6468e-dacf-403c-8cd6-b9b672b12836',  # Required!
    'Content-Type': 'application/json'
}

Without this header, you'll get: 403 Forbidden: This API key is org-scoped and requires workspace specification

Environment Variables (Railway + Local)

# In backend/.env (local) or Railway dashboard (production)
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_PROJECT={{PROJECT_PREFIX}}-production
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=lsv2_sk_...  # Get from backend/.env or Railway

Agent Architecture

memory_hydrate_node → orchestrator_node → analyst_node → response

Node	Tags	Purpose
`memory_hydrate_node`	`agent:memory`, `phase:hydration`	Load context from ChromaDB
`orchestrator_node`	`agent:orchestrator`, `phase:planning`	Parse intent, identify knowledge gaps
`analyst_node`	`agent:analyst`, `phase:analysis`	Gather evidence, generate options

SDK Setup

from langsmith import Client
import os

client = Client(
    api_url=os.environ["LANGSMITH_ENDPOINT"],
    api_key=os.environ["LANGSMITH_API_KEY"]
)
project_name = os.environ["LANGSMITH_PROJECT"]

Common Queries

List Recent Runs by Agent

# All Analyst runs from last 24 hours
runs = client.list_runs(
    project_name=project_name,
    filter='has(tags, "agent:analyst")',
    start_time=datetime.now() - timedelta(hours=24)
)
for run in runs:
    print(f"{run.name} | {run.status} | {run.latency_ms}ms")

Find Failed Runs

# Failed runs by agent
runs = client.list_runs(
    project_name=project_name,
    filter='and(has(tags, "agent:analyst"), eq(status, "error"))',
    limit=20
)

Latency Analysis

# Slow Analyst runs (>3 seconds)
runs = client.list_runs(
    project_name=project_name,
    filter='and(has(tags, "agent:analyst"), gt(latency, 3000))',
    limit=50
)

# Calculate average latency per agent
from statistics import mean
analyst_latencies = [r.latency_ms for r in runs if r.latency_ms]
print(f"Avg Analyst latency: {mean(analyst_latencies):.0f}ms")

Get Run Details

# Full trace for a specific run
run = client.read_run(run_id="<run-id-here>")

# What went in
print("INPUT:", run.inputs)

# What came out
print("OUTPUT:", run.outputs)

# Error details (if failed)
print("ERROR:", run.error)

Compare Orchestrator vs Analyst Patterns

from collections import Counter

# Get recent runs for both agents
orch_runs = list(client.list_runs(
    project_name=project_name,
    filter='has(tags, "agent:orchestrator")',
    limit=100
))
analyst_runs = list(client.list_runs(
    project_name=project_name,
    filter='has(tags, "agent:analyst")',
    limit=100
))

# Success rates
orch_success = sum(1 for r in orch_runs if r.status == "success") / len(orch_runs)
analyst_success = sum(1 for r in analyst_runs if r.status == "success") / len(analyst_runs)

print(f"Orchestrator success: {orch_success:.1%}")
print(f"Analyst success: {analyst_success:.1%}")

Debugging Scenarios

"Why did Analyst give a bad recommendation?"

# 1. Find the run
run = client.read_run(run_id="<problem-run-id>")

# 2. Check what Orchestrator sent
print("Knowledge gaps sent to Analyst:")
print(run.inputs.get("knowledge_gaps"))

# 3. Check what tools Analyst called
print("Evidence gathered:")
print(run.outputs.get("evidence"))

# 4. Check the draft options
print("Options generated:")
print(run.outputs.get("draft_options"))

"Why is the system slow?"

# Get parent trace with all child spans
runs = client.list_runs(
    project_name=project_name,
    filter='eq(is_root, true)',
    limit=10
)

for run in runs:
    # Get child runs (each agent node)
    children = client.list_runs(
        project_name=project_name,
        filter=f'eq(parent_run_id, "{run.id}")'
    )
    print(f"\n--- Run {run.id[:8]} (total: {run.latency_ms}ms) ---")
    for child in children:
        print(f"  {child.name}: {child.latency_ms}ms")

"What queries confuse Orchestrator?"

# Find low-confidence or clarification-needed runs
runs = client.list_runs(
    project_name=project_name,
    filter='has(tags, "agent:orchestrator")',
    limit=100
)

confused = []
for run in runs:
    outputs = run.outputs or {}
    if outputs.get("needs_clarification") or outputs.get("confidence") == "LOW":
        confused.append({
            "query": run.inputs.get("query"),
            "intent": outputs.get("user_intent"),
            "gaps": outputs.get("knowledge_gaps")
        })

print(f"Found {len(confused)} confusing queries")
for c in confused[:5]:
    print(f"  Query: {c['query'][:50]}...")

Filter Syntax Reference

Filter	Example
By tag	`has(tags, "agent:analyst")`
By status	`eq(status, "error")`
By latency	`gt(latency, 3000)`
Combined	`and(has(tags, "agent:analyst"), eq(status, "error"))`
Root traces only	`eq(is_root, true)`
By parent	`eq(parent_run_id, "xxx")`

Token Usage Analysis

# Find high token usage runs
runs = client.list_runs(
    project_name=project_name,
    filter='has(tags, "agent:analyst")',
    limit=50
)

for run in runs:
    if run.total_tokens and run.total_tokens > 5000:
        print(f"{run.name}: {run.total_tokens} tokens")
        print(f"  Prompt: {run.prompt_tokens}, Completion: {run.completion_tokens}")

🚀 Quick Start: Full Analysis Script

Copy-paste this to run a complete trace analysis:

import requests
from datetime import datetime, timedelta, timezone

# {{PROJECT_NAME}} specific config
API_KEY = 'lsv2_sk_...'  # Get from backend/.env
TENANT_ID = '81b6468e-dacf-403c-8cd6-b9b672b12836'
PROJECT_ID = '092619fa-b4af-4543-8253-2903027dd7c5'  # {{PROJECT_PREFIX}}-production

headers = {
    'x-api-key': API_KEY,
    'X-Tenant-ID': TENANT_ID,
    'Content-Type': 'application/json'
}

# Query runs
body = {'session': [PROJECT_ID], 'limit': 100}
resp = requests.post('https://api.smith.langchain.com/runs/query', headers=headers, json=body)

if resp.ok:
    runs = resp.json().get('runs', [])
    print(f'Total runs: {len(runs)}')

    # Status breakdown
    success = sum(1 for r in runs if r.get('status') == 'success')
    errors = sum(1 for r in runs if r.get('status') == 'error')
    print(f'✅ Success: {success} | ❌ Errors: {errors}')

    # Show errors
    for r in runs:
        if r.get('status') == 'error':
            print(f"  ❌ {r.get('name')}: {r.get('error', 'No details')[:100]}")

    # Token usage
    total_tokens = sum(r.get('total_tokens', 0) or 0 for r in runs)
    print(f'Total tokens: {total_tokens:,}')
else:
    print(f'Error: {resp.status_code} - {resp.text}')

🐛 Known Issues & Gotchas

Issue	Solution
`403 Forbidden: org-scoped key`	Add `X-Tenant-ID` header (see Quick Reference)
`404: sessions not found`	Use POST to `/runs/query` with `session: [PROJECT_ID]`, not project name
`ModuleNotFoundError: langsmith`	Run `pip install langsmith` in backend venv
No latency data	Runs may not have `total_time`; calculate from `start_time`/`end_time`
SDK `list_runs` fails	Use direct REST API with headers (SDK doesn't support tenant ID well)

📅 Last Updated

Date: 2025-12-28
By: SAMA 2.0 (AI/ML Engineer)
Added: Workspace ID, Project IDs, X-Tenant-ID requirement, Quick Start script

langsmith-debugger

When & Why to Use This Skill

Use Cases