langfuse-integration
Replaces Phoenix observability with Langfuse Cloud (EU) traceability for pharmaceutical test generation. Adds @observe decorators to existing code, configures LlamaIndex callbacks, propagates GAMP-5 compliance attributes, and removes Phoenix dependencies. Use PROACTIVELY when implementing Task 2.3 (LangFuse setup), migrating observability systems, or ensuring ALCOA+ trace attribution. MUST BE USED for pharmaceutical compliance monitoring requiring persistent cloud storage.
When & Why to Use This Skill
This Claude skill automates the transition from Phoenix to Langfuse Cloud (EU) to provide persistent, pharmaceutical-grade observability. It handles the complete migration lifecycle, including dependency management, code instrumentation with @observe decorators, and the configuration of GAMP-5 compliant metadata for robust LLM traceability and regulatory audit trails.
Use Cases
- Migrating from local, ephemeral Phoenix observability to Langfuse Cloud for persistent, production-ready monitoring and analytics.
- Ensuring GAMP-5 and ALCOA+ compliance in pharmaceutical AI workflows by automating the attribution of metadata and user/session tags.
- Instrumenting LlamaIndex-based agentic workflows with Langfuse callback handlers and decorators to capture detailed execution traces.
- Streamlining the removal of legacy observability dependencies while maintaining parity in span counts and workflow visibility.
| name | langfuse-integration |
|---|---|
| description | Replaces Phoenix observability with Langfuse Cloud (EU) traceability for pharmaceutical test generation. Adds @observe decorators to existing code, configures LlamaIndex callbacks, propagates GAMP-5 compliance attributes, and removes Phoenix dependencies. Use PROACTIVELY when implementing Task 2.3 (LangFuse setup), migrating observability systems, or ensuring ALCOA+ trace attribution. MUST BE USED for pharmaceutical compliance monitoring requiring persistent cloud storage. |
| allowed-tools | ["Bash", "Read", "Write", "Edit", "Grep", "Glob", "LS"] |
Langfuse Integration Skill
Purpose: Replace Phoenix observability with Langfuse Cloud (EU) for pharmaceutical-grade traceability and monitoring.
Target Architecture:
- From: Phoenix (local-only, ephemeral traces)
- To: Langfuse Cloud EU (persistent storage, analytics, GAMP-5 compliant)
- Strategy: Complete replacement (no dual observability)
When to Use This Skill
✅ Use when:
- Implementing PRP Task 2.3 (LangFuse Integration and Dashboard)
- Migrating from Phoenix to production observability
- Adding traceability to new pharmaceutical workflows
- Ensuring ALCOA+ attributable traces for regulatory compliance
- Preparing for AWS production deployment
❌ Do NOT use when:
- Extracting existing traces from Langfuse (use
langfuse-extractionskill) - Automating dashboard interactions (use
langfuse-dashboardskill) - Phoenix is required for local development (conflicts with replacement strategy)
Prerequisites
Before invoking this skill, verify:
Langfuse Cloud (EU) Account:
- Project URL:
https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107 - API keys available (public + secret)
- EU data residency confirmed
- Project URL:
Environment Variables:
export LANGFUSE_PUBLIC_KEY="pk-lf-..." export LANGFUSE_SECRET_KEY="sk-lf-..." export LANGFUSE_HOST="https://cloud.langfuse.com"Dependencies:
langfusePython package (will be installed if missing)llama-index-core>=0.12.0(for callback handler)- Existing Phoenix instrumentation code identified
Workflow Phases
Phase 1: Assessment and Analysis (5-10 minutes)
Objective: Understand current Phoenix instrumentation and identify migration points.
Steps:
Locate Phoenix Configuration:
# Search for Phoenix setup grep -r "phoenix" main/src/monitoring/ --include="*.py" grep -r "from phoenix" main/src/ --include="*.py" grep -r "import phoenix" main/src/ --include="*.py"Identify Instrumentation Points:
- Read
main/src/core/unified_workflow.py- identify workflow entry points - Read
main/src/agents/- identify agent methods needing tracing - Look for existing OpenTelemetry span creation
- Document all files importing Phoenix
- Read
Analyze Compliance Attributes:
- Check if GAMP-5 attributes are set (category, confidence)
- Check if ALCOA+ attributes are set (user_id, session_id, timestamps)
- Verify 21 CFR Part 11 metadata if applicable
Generate Assessment Report:
# Phoenix → Langfuse Migration Assessment ## Current Phoenix Instrumentation - Configuration file: <path> - Instrumented files: <count> - Span count per workflow: <number> - Compliance attributes: <present/missing> ## Migration Scope - Files requiring decorator addition: <list> - Phoenix imports to remove: <count> - Callback handlers to replace: <list> - Estimated migration time: <minutes> ## Risk Assessment - Breaking changes: <yes/no> - Test coverage: <percentage> - Rollback complexity: <low/medium/high>
Quality Gate: Assessment report generated with complete file inventory and attribute analysis.
Phase 2: Langfuse Configuration Setup (10-15 minutes)
Objective: Create Langfuse configuration module and verify cloud connectivity.
Steps:
Install Langfuse SDK:
# Add to pyproject.toml uv add langfuse # For LlamaIndex integration uv add llama-index-instrumentation-langfuseCreate Langfuse Configuration Module:
- File:
main/src/monitoring/langfuse_config.py - Content: See
reference/decorator-patterns.mdfor template - Key functions:
setup_langfuse(): Initialize client with EU cloud configget_langfuse_client(): Singleton accessorget_langfuse_callback_handler(): LlamaIndex integrationadd_compliance_attributes(): GAMP-5/ALCOA+ attribute helper
- File:
Verify Cloud Connectivity:
# Test script (temporary) from main.src.monitoring.langfuse_config import setup_langfuse client = setup_langfuse() client.trace(name="connectivity-test", input={"test": True}) client.flush() # Verify trace appears at: # https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/tracesUpdate Environment Configuration:
- Add Langfuse environment variables to
.env.example - Update
main/src/config.pyto load Langfuse settings - Add Langfuse to
ObservabilityConfigdataclass
- Add Langfuse environment variables to
Quality Gate:
- ✅ Langfuse SDK installed
- ✅
langfuse_config.pycreated and tested - ✅ Connectivity test trace visible in Langfuse Cloud dashboard
- ✅ Configuration variables documented
Phase 3: Code Instrumentation (20-30 minutes)
Objective: Add @observe decorators and replace Phoenix callbacks with Langfuse.
Steps:
Add Decorators to Workflow Entry Points:
Use the automated script for systematic instrumentation:
python .claude/skills/langfuse-integration/scripts/add_instrumentation.py \ --target main/src/core/unified_workflow.py \ --dry-run # Preview changes firstManual pattern (if script unavailable):
# main/src/core/unified_workflow.py from langfuse import observe class UnifiedWorkflow(Workflow): @observe(name="unified-workflow-run", as_type="span") async def run(self, ctx: Context, ev: StartEvent) -> StopEvent: # Existing code unchanged ...Instrument Agent Methods:
Target key agent operations:
# main/src/agents/categorizer.py from langfuse import observe @observe(name="gamp5-categorization", as_type="span") async def categorize_urs(self, urs_content: str) -> dict: # Add compliance attributes from langfuse import get_current_observation obs = get_current_observation() if obs: obs.update(metadata={ "compliance.gamp5.applicable": True, "compliance.alcoa_plus.attributable": True }) # Existing categorization logic result = await self._categorize(urs_content) # Tag with category if obs: obs.update(metadata={ "compliance.gamp5.category": result["category"] }) return resultReplace LlamaIndex Callback Handler:
# main/src/core/unified_workflow.py or main/main.py # OLD (Phoenix): # from phoenix.otel import register # tracer_provider = register() # NEW (Langfuse): from langfuse.llama_index import LlamaIndexCallbackHandler langfuse_handler = LlamaIndexCallbackHandler( public_key=os.getenv("LANGFUSE_PUBLIC_KEY"), secret_key=os.getenv("LANGFUSE_SECRET_KEY"), host=os.getenv("LANGFUSE_HOST") ) # Register with workflow workflow = UnifiedWorkflow( callbacks=[langfuse_handler], timeout=600 )Propagate User/Session Attributes:
# In API endpoint or workflow entry point from langfuse import observe, get_current_trace @observe() async def generate_test_suite(user_id: str, urs_file: str, job_id: str): # Set trace-level attributes trace = get_current_trace() if trace: trace.update( user_id=user_id, session_id=job_id, tags=["pharmaceutical", "gamp5"], metadata={ "compliance.alcoa_plus.attributable": True, "user.clerk_id": user_id, "job.id": job_id } ) # All nested operations inherit these attributes result = await unified_workflow.run(urs_file) return resultVerify Decorator Coverage:
# Check all instrumentation points have decorators grep -r "@observe" main/src/ --include="*.py" | wc -l # Compare to Phoenix span count (should match or exceed)
Quality Gate:
- ✅
@observedecorators added to all workflow entry points - ✅ LlamaIndex callback handler replaced
- ✅ User/session attributes propagated correctly
- ✅ GAMP-5 category metadata attached to categorization spans
- ✅ No syntax errors or import failures
Phase 4: Phoenix Removal (10-15 minutes)
Objective: Remove all Phoenix dependencies without breaking functionality.
Steps:
Remove Phoenix Configuration File:
# Backup first (optional) cp main/src/monitoring/phoenix_config.py main/src/monitoring/phoenix_config.py.bak # Remove rm main/src/monitoring/phoenix_config.pyUpdate Imports:
Use automated script:
python .claude/skills/langfuse-integration/scripts/remove_phoenix.py \ --target main/src/ \ --dry-run # Preview changesManual pattern:
# Remove all instances of: # - from phoenix.otel import register # - from phoenix import ... # - import phoenix # - Any calls to phoenix.trace(), register(), etc.Remove Phoenix from Dependencies:
# Remove from pyproject.toml uv remove arize-phoenix arize-phoenix-otelUpdate Monitoring Module Init:
# main/src/monitoring/__init__.py # OLD: # from .phoenix_config import setup_phoenix, PhoenixManager # NEW: from .langfuse_config import setup_langfuse, get_langfuse_client __all__ = ["setup_langfuse", "get_langfuse_client"]Remove Phoenix Server Command (if applicable):
# Check if phoenix serve is in any scripts grep -r "phoenix serve" . --include="*.sh" --include="*.py" --include="*.md" # Remove or comment out
Quality Gate:
- ✅
phoenix_config.pyremoved - ✅ All Phoenix imports removed from codebase
- ✅ Phoenix packages uninstalled
- ✅ No references to Phoenix in documentation
- ✅ Codebase still imports successfully
Phase 5: Validation and Testing (15-20 minutes)
Objective: Verify Langfuse integration works correctly and traces appear in dashboard.
Steps:
Run Integration Health Check:
python .claude/skills/langfuse-integration/scripts/validate_integration.pyExpected output:
✅ Langfuse SDK installed ✅ API keys configured ✅ Cloud connectivity successful ✅ Test trace created: trace_id=xxx ✅ @observe decorators found: 15 ✅ Callback handler configured ❌ No Phoenix imports found (expected)Run End-to-End Workflow:
# Execute test workflow with real URS uv run python main/main.py --urs examples/test_urs_001.mdVerify Trace in Dashboard:
- Navigate to:
https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/traces - Find most recent trace by timestamp
- Check:
- ✅ Trace appears (not 404)
- ✅ Span count matches expected (compare to Phoenix baseline)
- ✅ User ID populated
- ✅ Session ID populated
- ✅ Tags include "pharmaceutical", "gamp5"
- ✅ GAMP-5 category metadata present
- ✅ No errors in observations
- Navigate to:
Compare Span Structure:
# If Phoenix baseline available, compare span counts echo "Phoenix baseline: 131 spans/workflow" echo "Langfuse actual: <count from dashboard>" # Acceptable range: 120-140 (some variation expected)Test Compliance Attributes:
- Click on categorization span in dashboard
- Verify metadata contains:
compliance.gamp5.category: 1-5compliance.alcoa_plus.attributable: trueuser.clerk_id:job.id:
Run Existing Tests:
# Ensure no regressions pytest main/tests/ -v # Check for import errors mypy main/src/ # Check for Phoenix references ruff check main/src/
Quality Gate:
- ✅ Health check passes all tests
- ✅ End-to-end workflow completes successfully
- ✅ Trace visible in Langfuse Cloud dashboard
- ✅ Span count within 10% of Phoenix baseline
- ✅ All compliance attributes present
- ✅ Existing tests pass
- ✅ No mypy/ruff errors
Phase 6: Documentation and Finalization (5-10 minutes)
Objective: Document the migration and update project references.
Steps:
Update Quick Start Guide:
- Edit
main/docs/guides/QUICK_START_GUIDE.md - Replace Phoenix setup instructions with Langfuse
- Update environment variable examples
- Add Langfuse dashboard URL
- Edit
Update README:
- Replace Phoenix badge/link with Langfuse
- Update observability section
- Add Langfuse Cloud (EU) data residency note
Create Migration Notes:
# Phoenix → Langfuse Migration Summary **Date**: <YYYY-MM-DD> **Scope**: Complete Phoenix replacement ## Changes Made - Removed: phoenix_config.py, Phoenix dependencies - Added: langfuse_config.py, Langfuse SDK - Instrumented: 15 functions with @observe decorators - Replaced: LlamaIndex callback handler ## Verification - Trace count: 131 spans/workflow (matches Phoenix baseline) - Dashboard URL: https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107 - Compliance: GAMP-5 + ALCOA+ attributes preserved ## Rollback (if needed) - Restore phoenix_config.py.bak - Run: uv add arize-phoenix arize-phoenix-otel - Remove @observe decoratorsUpdate CLAUDE.md:
- Replace Phoenix references in "Technology Stack" section
- Update observability commands
- Add Langfuse skill invocation instructions
Commit Changes:
git add -A git status # Review changes # Commit with detailed message git commit -m "$(cat <<'EOF' feat: Replace Phoenix with Langfuse Cloud (EU) observability - Add Langfuse SDK and LlamaIndex instrumentation - Add @observe decorators to 15 workflow/agent functions - Configure Langfuse Cloud (EU) with GAMP-5 compliance attributes - Remove Phoenix dependencies and configuration - Verify trace parity: 131 spans/workflow maintained - Update documentation (Quick Start, README, CLAUDE.md) Task: PRP 2.3 (LangFuse Integration and Dashboard) Validation: All tests passing, traces visible in dashboard 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> EOF )"
Quality Gate:
- ✅ Quick Start Guide updated
- ✅ README updated
- ✅ Migration notes created
- ✅ CLAUDE.md reflects Langfuse
- ✅ Changes committed to Git
Success Criteria
Before marking this skill complete, verify ALL criteria:
Functional Requirements
- ✅ Langfuse SDK installed and configured for EU cloud
- ✅ API keys set in environment variables
- ✅
langfuse_config.pycreated with setup functions - ✅
@observedecorators added to all critical paths - ✅ LlamaIndex callback handler replaced
- ✅ Phoenix configuration file removed
- ✅ Phoenix imports removed from all files
- ✅ Phoenix dependencies uninstalled
Observability Requirements
- ✅ End-to-end workflow generates traces
- ✅ Traces visible in Langfuse Cloud dashboard
- ✅ Span count matches Phoenix baseline (±10%)
- ✅ Trace structure maintains workflow visibility
Compliance Requirements
- ✅ User ID (Clerk) propagated to all traces
- ✅ Session ID (job_id) propagated to all traces
- ✅ GAMP-5 category metadata on categorization spans
- ✅ ALCOA+ attributable=true on all traces
- ✅ Tags include ["pharmaceutical", "gamp5"]
Quality Requirements
- ✅ No FALLBACK LOGIC introduced
- ✅ All errors throw with full stack traces
- ✅ Existing tests pass (pytest)
- ✅ Type checking passes (mypy)
- ✅ Linting passes (ruff)
- ✅ No import errors or circular dependencies
Documentation Requirements
- ✅ Quick Start Guide updated
- ✅ README updated
- ✅ CLAUDE.md updated
- ✅ Migration notes created
- ✅ Changes committed to Git with descriptive message
Troubleshooting
Issue: Langfuse SDK Import Error
Symptom:
ModuleNotFoundError: No module named 'langfuse'
Solution:
uv add langfuse llama-index-instrumentation-langfuse
uv sync
Issue: Traces Not Appearing in Dashboard
Symptom: Workflow runs successfully but no traces in Langfuse Cloud.
Diagnosis:
Check API keys:
import os print(f"Public key: {os.getenv('LANGFUSE_PUBLIC_KEY')[:10]}...") print(f"Secret key configured: {bool(os.getenv('LANGFUSE_SECRET_KEY'))}")Check flush call:
from langfuse import get_client client = get_client() client.flush() # CRITICAL: Must flush before exitCheck network connectivity:
curl -I https://cloud.langfuse.com
Solution:
- Verify API keys match dashboard (Settings → API Keys)
- Add
client.flush()before process exit - Check firewall/proxy settings
Issue: Missing Compliance Attributes
Symptom: Traces appear but lack GAMP-5 metadata.
Solution:
# Ensure get_current_observation() is called inside decorated function
from langfuse import observe, get_current_observation
@observe()
def my_function():
obs = get_current_observation()
if obs: # CRITICAL: Check if obs exists
obs.update(metadata={"compliance.gamp5.category": 5})
Issue: Span Count Mismatch
Symptom: Langfuse shows fewer spans than Phoenix baseline.
Diagnosis:
- Check if all
@observedecorators are applied - Verify LlamaIndex callback handler is registered
- Check for early return statements before instrumented code
Solution:
# Find missing decorators
grep -r "async def" main/src/agents/ --include="*.py" | \
grep -v "@observe"
Issue: High Latency After Migration
Symptom: Workflows slower with Langfuse vs Phoenix.
Diagnosis:
- Langfuse batches events asynchronously (default: every 1 second)
- Network calls to EU cloud add latency
Solution:
# Tune batch settings
from langfuse import Langfuse
client = Langfuse(
flush_interval=5, # Flush every 5 seconds instead of 1
flush_at=50, # Batch 50 events before flushing
)
Reference Materials
Decorator Patterns
See reference/decorator-patterns.md for:
- Function-level instrumentation patterns
- Async function handling
- Nested span creation
- LLM generation tracing
Phoenix Migration Guide
See reference/phoenix-migration-guide.md for:
- Side-by-side comparison of Phoenix vs Langfuse APIs
- Import migration table
- Span structure equivalence
- Common pitfalls during migration
Compliance Attributes
See reference/compliance-attributes.md for:
- GAMP-5 category metadata schema
- ALCOA+ attribute requirements
- 21 CFR Part 11 considerations
- Audit trail best practices
Advanced Usage
Context Manager Pattern (Fine-Grained Control)
For more control than decorators provide:
from langfuse import get_client
langfuse = get_client()
def complex_workflow():
with langfuse.start_as_current_span(
name="complex-workflow",
as_type="span"
) as span:
span.update(input={"mode": "batch"})
# Manual sub-span creation
with langfuse.start_as_current_span(
name="data-validation",
as_type="span"
) as sub_span:
validate_data()
sub_span.update(output={"valid": True})
# Main logic
result = process_data()
span.update(output=result)
Custom Event Tracking
For discrete events (not spans):
from langfuse import get_current_observation
obs = get_current_observation()
if obs:
obs.event(
name="gamp5-category-assigned",
metadata={
"category": 5,
"confidence": 0.95,
"timestamp": datetime.now().isoformat()
}
)
Multi-Tenant Attribution
For pharmaceutical companies with multiple users:
from langfuse import observe, get_current_trace
@observe()
async def multi_tenant_workflow(org_id: str, user_id: str):
trace = get_current_trace()
if trace:
trace.update(
user_id=user_id,
tags=[f"org:{org_id}", "gamp5"],
metadata={
"organization.id": org_id,
"organization.name": get_org_name(org_id),
"compliance.data_residency": "EU"
}
)
# Workflow logic
...
Skill Completion Checklist
Before reporting success to the user, verify:
- Phase 1: Assessment report generated
- Phase 2: Langfuse configured and connectivity verified
- Phase 3: Decorators added, callback handler replaced
- Phase 4: Phoenix removed completely
- Phase 5: Validation passes all tests
- Phase 6: Documentation updated and committed
- All success criteria met (see above)
- No FALLBACK LOGIC violations
- User confirmation obtained: "Did you see traces in the dashboard?"
IMPORTANT: NEVER claim success without user verification. Always ask: "Can you confirm you see traces appearing in the Langfuse dashboard at https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/traces?"
Post-Migration: Next Steps
After successful migration:
Use langfuse-extraction skill to:
- Extract traces for debugging
- Generate audit trails for compliance
- Export data to pandas for analysis
Use langfuse-dashboard skill to:
- Capture dashboard screenshots for documentation
- Automate metric extraction for alerting
- Investigate specific traces interactively
Proceed with PRP tasks:
- Task 3.1: FastAPI backend development
- Task 4.3: Bedrock model integration
- Task 5.1: Production deployment validation
Skill Version: 1.0.0 Last Updated: 2025-01-17 Compatibility: LlamaIndex 0.12.0+, Langfuse SDK 3.0+ Data Residency: EU (cloud.langfuse.com) Compliance: GAMP-5, ALCOA+, 21 CFR Part 11 ready