langfuse-cli
This skill should be used when the user asks to "query Langfuse traces", "show sessions", "check LLM costs", "analyse token usage", "view observations", "get scores", "query metrics", or mentions Langfuse, traces, or LLM observability. Also triggers on requests to analyse API latency, debug LLM calls, or investigate model performance.
When & Why to Use This Skill
The Langfuse CLI skill provides a comprehensive interface for the Langfuse LLM observability platform, enabling developers to monitor, debug, and optimize their AI applications. It facilitates deep-dive analysis into LLM traces, token usage, and operational costs while offering robust tools for prompt versioning and evaluation dataset management, essential for maintaining high-performance AI agents.
Use Cases
- Performance Debugging: Analyze API latency and investigate model performance by retrieving detailed traces and observations for specific LLM calls.
- Cost & Token Optimization: Query aggregated metrics to monitor total LLM costs and token consumption across different models, versions, or environments.
- Prompt Engineering Workflow: Manage the full lifecycle of prompts, including creating text/chat templates, versioning, and deploying specific versions to production with labels.
- Evaluation & Benchmarking: Create and manage evaluation datasets and scores to track the accuracy and quality of AI responses over time.
| name | langfuse-cli |
|---|---|
| description | This skill should be used when the user asks to "query Langfuse traces", "show sessions", "check LLM costs", "analyse token usage", "view observations", "get scores", "create score", "add score to trace", "query metrics", or mentions Langfuse, traces, or LLM observability. Also triggers on requests to analyse API latency, debug LLM calls, or investigate model performance. Use for prompt management tasks like "list prompts", "get prompt", "create prompt", "update prompt labels", or "deploy prompt to production". Use for dataset management tasks like "list datasets", "create dataset", "add dataset item", "view dataset runs", or "manage evaluation datasets". |
Langfuse CLI (lf)
Command-line interface for the Langfuse LLM observability platform. Query traces, sessions, observations, scores, and metrics. Manage prompts with versioning and labels. Manage datasets for evaluation.
Quick Reference
lf traces list [OPTIONS] # List traces with filters
lf traces get <ID> # Get specific trace (--with-observations)
lf sessions list [OPTIONS] # List sessions
lf sessions show <ID> # Show session details (--with-traces)
lf observations list [OPTIONS] # List observations (spans/generations/events)
lf observations get <ID> # Get specific observation
lf scores list [OPTIONS] # List scores
lf scores get <ID> # Get specific score
lf scores create [OPTIONS] # Create a new score
lf metrics query [OPTIONS] # Query aggregated metrics
lf prompts list [OPTIONS] # List prompts
lf prompts get <NAME> # Get prompt (by label or version)
lf prompts create-text # Create text prompt (-m for commit message)
lf prompts create-chat # Create chat prompt (-m for commit message)
lf prompts label <NAME> <VER> # Set labels on prompt version
lf prompts delete <NAME> # Delete prompt
lf datasets list [OPTIONS] # List datasets
lf datasets get <NAME> # Get dataset by name
lf datasets create <NAME> # Create a new dataset
lf datasets items [OPTIONS] # List dataset items (--dataset to filter)
lf datasets item-get <ID> # Get dataset item by ID
lf datasets item-create # Create dataset item (--dataset, --input required)
lf datasets runs <DATASET> # List runs for a dataset
lf datasets run-get <DS> <RUN> # Get a specific run
Common Tasks
View Recent Traces
lf traces list --limit 20
Filter Traces by Time
# Today's traces
lf traces list --from "$(date -u +%Y-%m-%dT00:00:00Z)"
# Last 24 hours
lf traces list --from "$(date -u -v-1d +%Y-%m-%dT%H:%M:%SZ)"
# Specific date range
lf traces list --from 2024-01-15T00:00:00Z --to 2024-01-16T00:00:00Z
Filter by User or Session
lf traces list --user-id user123
lf traces list --session-id sess456
lf traces list --name "chat-completion"
lf traces list --tags production --tags v2
Analyse Costs
# Total cost over time
lf metrics query --view traces --measure total-cost --aggregation sum --granularity day
# Cost by model
lf metrics query --view observations --measure total-cost --aggregation sum --dimensions model
# Average cost per trace
lf metrics query --view traces --measure total-cost --aggregation avg
Analyse Latency
# P95 latency
lf metrics query --view traces --measure latency --aggregation p95
# Latency by trace name
lf metrics query --view traces --measure latency --aggregation avg --dimensions traceName
# Latency trends
lf metrics query --view traces --measure latency --aggregation p50 --granularity hour
Token Usage
# Total tokens
lf metrics query --view observations --measure total-tokens --aggregation sum
# Tokens by model
lf metrics query --view observations --measure total-tokens --aggregation sum --dimensions model
# Input vs output tokens
lf metrics query --view observations --measure input-tokens --aggregation sum
lf metrics query --view observations --measure output-tokens --aggregation sum
Investigate Specific Trace
# Get trace details
lf traces get tr-abc123
# Get trace with all observations included
lf traces get tr-abc123 --with-observations
# See all observations in a trace
lf observations list --trace-id tr-abc123
# Check scores for a trace
lf scores list --name accuracy
Create Scores
# Score a trace
lf scores create --name accuracy --value 0.95 --trace-id tr-abc123
# Score an observation with comment
lf scores create --name relevance --value 0.8 \
--observation-id obs-xyz789 --comment "Good but could be more specific"
# Categorical score
lf scores create --name sentiment --value 1 \
--data-type CATEGORICAL --trace-id tr-abc123
# Boolean score
lf scores create --name approved --value 1 \
--data-type BOOLEAN --trace-id tr-abc123
Manage Prompts
# List all prompts
lf prompts list
# Filter by label or tag
lf prompts list --label production
lf prompts list --tag summarisation
# Get production version of a prompt
lf prompts get my-prompt
# Get specific version or label
lf prompts get my-prompt --version 3
lf prompts get my-prompt --label staging
# Get raw content (for piping)
lf prompts get my-prompt --raw > prompt.txt
Create and Update Prompts
# Create text prompt from file
lf prompts create-text --name my-prompt -f prompt.txt
# Create with commit message documenting the change
lf prompts create-text --name my-prompt -f prompt.txt \
-m "Add context about user preferences"
# Create from stdin
echo "You are a helpful assistant." | lf prompts create-text --name my-prompt
# Create with labels and config
lf prompts create-text --name my-prompt -f prompt.txt \
--labels production --tags summarisation \
--config '{"model": "gpt-4", "temperature": 0.7}'
# Create chat prompt from JSON
lf prompts create-chat --name chat-prompt -f messages.json
# Label a version as production
lf prompts label my-prompt 5 --labels production
# Delete a prompt
lf prompts delete old-prompt
lf prompts delete my-prompt --version 2
Manage Datasets
Datasets store input/output pairs for evaluation. Items can be created manually or from existing traces.
# List all datasets
lf datasets list
# Create a dataset
lf datasets create my-eval-dataset -d "Test cases for summarisation"
# Create with metadata
lf datasets create my-eval-dataset \
-d "Test cases for summarisation" \
-m '{"version": "1.0", "owner": "team-ml"}'
# Get dataset details
lf datasets get my-eval-dataset
Add Dataset Items
# Create item with input and expected output
lf datasets item-create --dataset my-eval-dataset \
--input '{"text": "Long article content..."}' \
--expected-output '{"summary": "Brief summary..."}'
# Create item from existing trace
lf datasets item-create --dataset my-eval-dataset \
--input '{"prompt": "Summarise this"}' \
--source-trace-id tr-abc123
# Create item with metadata
lf datasets item-create --dataset my-eval-dataset \
--input '{"text": "Content"}' \
--expected-output '{"result": "Expected"}' \
--metadata '{"category": "short-form", "difficulty": "easy"}'
# List items in a dataset
lf datasets items --dataset my-eval-dataset
# Get specific item
lf datasets item-get item-abc123
View Dataset Runs
Runs represent evaluation executions against a dataset.
# List runs for a dataset
lf datasets runs my-eval-dataset
# Get details of a specific run
lf datasets run-get my-eval-dataset run-2024-01-15
Output Formats
All list and query commands support output format selection:
lf traces list --format table # Default, human-readable
lf traces list --format json # Machine-readable, full details
lf traces list --format csv # Spreadsheet-compatible
lf traces list --format markdown # Documentation-friendly
Save to file:
lf traces list --format json --output traces.json
Configuration
The CLI uses profile-based configuration. Credentials resolve in order:
- CLI arguments (
--public-key,--secret-key,--host) - Environment variables (
LANGFUSE_PUBLIC_KEY,LANGFUSE_SECRET_KEY,LANGFUSE_HOST) - Config file profile (
~/.config/langfuse/config.yml)
Setup Profile
lf config setup
Use Specific Profile
lf traces list --profile production
Metrics Query Deep Dive
The metrics command provides aggregated analytics:
Required parameters:
--view:tracesorobservations--measure: What to measure--aggregation: How to aggregate
Measures:
count- Number of itemslatency- Duration in millisecondsinput-tokens,output-tokens,total-tokens- Token countsinput-cost,output-cost,total-cost- Cost in USD
Aggregations:
count- Total countsum- Total sumavg- Averagep50,p95,p99- Percentileshistogram- Distribution buckets
Dimensions (group by):
traceName,model,environment,version,release
Granularity (time bucketing):
auto,minute,hour,day,week,month
Additional Resources
For complete CLI documentation including all options:
references/cli-reference.md- Full command reference with all flags