What is langfuse-integration?

This Claude skill automates the transition from Phoenix to Langfuse Cloud (EU) to provide persistent, pharmaceutical-grade observability. It handles the complete migration lifecycle, including dependency management, code instrumentation with @observe decorators, and the configuration of GAMP-5 compliant metadata for robust LLM traceability and regulatory audit trails.

When should I use langfuse-integration?

langfuse-integration is useful in the following scenarios: • Migrating from local, ephemeral Phoenix observability to Langfuse Cloud for persistent, production-ready monitoring and analytics. • Ensuring GAMP-5 and ALCOA+ compliance in pharmaceutical AI workflows by automating the attribution of metadata and user/session tags. • Instrumenting LlamaIndex-based agentic workflows with Langfuse callback handlers and decorators to capture detailed execution traces. • Streamlining the removal of legacy observability dependencies while maintaining parity in span counts and workflow visibility.

name	langfuse-integration
description	Replaces Phoenix observability with Langfuse Cloud (EU) traceability for pharmaceutical test generation. Adds @observe decorators to existing code, configures LlamaIndex callbacks, propagates GAMP-5 compliance attributes, and removes Phoenix dependencies. Use PROACTIVELY when implementing Task 2.3 (LangFuse setup), migrating observability systems, or ensuring ALCOA+ trace attribution. MUST BE USED for pharmaceutical compliance monitoring requiring persistent cloud storage.
allowed-tools	["Bash", "Read", "Write", "Edit", "Grep", "Glob", "LS"]

Langfuse Integration Skill

Purpose: Replace Phoenix observability with Langfuse Cloud (EU) for pharmaceutical-grade traceability and monitoring.

Target Architecture:

From: Phoenix (local-only, ephemeral traces)
To: Langfuse Cloud EU (persistent storage, analytics, GAMP-5 compliant)
Strategy: Complete replacement (no dual observability)

When to Use This Skill

✅ Use when:

Implementing PRP Task 2.3 (LangFuse Integration and Dashboard)
Migrating from Phoenix to production observability
Adding traceability to new pharmaceutical workflows
Ensuring ALCOA+ attributable traces for regulatory compliance
Preparing for AWS production deployment

❌ Do NOT use when:

Extracting existing traces from Langfuse (use langfuse-extraction skill)
Automating dashboard interactions (use langfuse-dashboard skill)
Phoenix is required for local development (conflicts with replacement strategy)

Prerequisites

Before invoking this skill, verify:

Langfuse Cloud (EU) Account:
- Project URL: https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107
- API keys available (public + secret)
- EU data residency confirmed

Environment Variables:

export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_HOST="https://cloud.langfuse.com"

Dependencies:
- langfuse Python package (will be installed if missing)
- llama-index-core>=0.12.0 (for callback handler)
- Existing Phoenix instrumentation code identified

Workflow Phases

Phase 1: Assessment and Analysis (5-10 minutes)

Objective: Understand current Phoenix instrumentation and identify migration points.

Steps:

Locate Phoenix Configuration:

# Search for Phoenix setup
grep -r "phoenix" main/src/monitoring/ --include="*.py"
grep -r "from phoenix" main/src/ --include="*.py"
grep -r "import phoenix" main/src/ --include="*.py"

Identify Instrumentation Points:
- Read main/src/core/unified_workflow.py - identify workflow entry points
- Read main/src/agents/ - identify agent methods needing tracing
- Look for existing OpenTelemetry span creation
- Document all files importing Phoenix
Analyze Compliance Attributes:
- Check if GAMP-5 attributes are set (category, confidence)
- Check if ALCOA+ attributes are set (user_id, session_id, timestamps)
- Verify 21 CFR Part 11 metadata if applicable

Generate Assessment Report:

# Phoenix → Langfuse Migration Assessment

## Current Phoenix Instrumentation
- Configuration file: <path>
- Instrumented files: <count>
- Span count per workflow: <number>
- Compliance attributes: <present/missing>

## Migration Scope
- Files requiring decorator addition: <list>
- Phoenix imports to remove: <count>
- Callback handlers to replace: <list>
- Estimated migration time: <minutes>

## Risk Assessment
- Breaking changes: <yes/no>
- Test coverage: <percentage>
- Rollback complexity: <low/medium/high>

Quality Gate: Assessment report generated with complete file inventory and attribute analysis.

Phase 2: Langfuse Configuration Setup (10-15 minutes)

Objective: Create Langfuse configuration module and verify cloud connectivity.

Steps:

Install Langfuse SDK:

# Add to pyproject.toml
uv add langfuse

# For LlamaIndex integration
uv add llama-index-instrumentation-langfuse

Create Langfuse Configuration Module:
- File: main/src/monitoring/langfuse_config.py
- Content: See reference/decorator-patterns.md for template
- Key functions:
  - setup_langfuse(): Initialize client with EU cloud config
  - get_langfuse_client(): Singleton accessor
  - get_langfuse_callback_handler(): LlamaIndex integration
  - add_compliance_attributes(): GAMP-5/ALCOA+ attribute helper

Verify Cloud Connectivity:

# Test script (temporary)
from main.src.monitoring.langfuse_config import setup_langfuse

client = setup_langfuse()
client.trace(name="connectivity-test", input={"test": True})
client.flush()

# Verify trace appears at:
# https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/traces

Update Environment Configuration:
- Add Langfuse environment variables to .env.example
- Update main/src/config.py to load Langfuse settings
- Add Langfuse to ObservabilityConfig dataclass

Quality Gate:

✅ Langfuse SDK installed
✅ langfuse_config.py created and tested
✅ Connectivity test trace visible in Langfuse Cloud dashboard
✅ Configuration variables documented

Phase 3: Code Instrumentation (20-30 minutes)

Objective: Add @observe decorators and replace Phoenix callbacks with Langfuse.

Steps:

Add Decorators to Workflow Entry Points:

Use the automated script for systematic instrumentation:

python .claude/skills/langfuse-integration/scripts/add_instrumentation.py \
  --target main/src/core/unified_workflow.py \
  --dry-run  # Preview changes first

Manual pattern (if script unavailable):

# main/src/core/unified_workflow.py
from langfuse import observe

class UnifiedWorkflow(Workflow):
    @observe(name="unified-workflow-run", as_type="span")
    async def run(self, ctx: Context, ev: StartEvent) -> StopEvent:
        # Existing code unchanged
        ...

Instrument Agent Methods:

Target key agent operations:

# main/src/agents/categorizer.py
from langfuse import observe

@observe(name="gamp5-categorization", as_type="span")
async def categorize_urs(self, urs_content: str) -> dict:
    # Add compliance attributes
    from langfuse import get_current_observation
    obs = get_current_observation()
    if obs:
        obs.update(metadata={
            "compliance.gamp5.applicable": True,
            "compliance.alcoa_plus.attributable": True
        })

    # Existing categorization logic
    result = await self._categorize(urs_content)

    # Tag with category
    if obs:
        obs.update(metadata={
            "compliance.gamp5.category": result["category"]
        })

    return result

Replace LlamaIndex Callback Handler:

# main/src/core/unified_workflow.py or main/main.py
# OLD (Phoenix):
# from phoenix.otel import register
# tracer_provider = register()

# NEW (Langfuse):
from langfuse.llama_index import LlamaIndexCallbackHandler

langfuse_handler = LlamaIndexCallbackHandler(
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    host=os.getenv("LANGFUSE_HOST")
)

# Register with workflow
workflow = UnifiedWorkflow(
    callbacks=[langfuse_handler],
    timeout=600
)

Propagate User/Session Attributes:

# In API endpoint or workflow entry point
from langfuse import observe, get_current_trace

@observe()
async def generate_test_suite(user_id: str, urs_file: str, job_id: str):
    # Set trace-level attributes
    trace = get_current_trace()
    if trace:
        trace.update(
            user_id=user_id,
            session_id=job_id,
            tags=["pharmaceutical", "gamp5"],
            metadata={
                "compliance.alcoa_plus.attributable": True,
                "user.clerk_id": user_id,
                "job.id": job_id
            }
        )

    # All nested operations inherit these attributes
    result = await unified_workflow.run(urs_file)
    return result

Verify Decorator Coverage:

# Check all instrumentation points have decorators
grep -r "@observe" main/src/ --include="*.py" | wc -l
# Compare to Phoenix span count (should match or exceed)

Quality Gate:

✅ @observe decorators added to all workflow entry points
✅ LlamaIndex callback handler replaced
✅ User/session attributes propagated correctly
✅ GAMP-5 category metadata attached to categorization spans
✅ No syntax errors or import failures

Phase 4: Phoenix Removal (10-15 minutes)

Objective: Remove all Phoenix dependencies without breaking functionality.

Steps:

Remove Phoenix Configuration File:

# Backup first (optional)
cp main/src/monitoring/phoenix_config.py main/src/monitoring/phoenix_config.py.bak

# Remove
rm main/src/monitoring/phoenix_config.py

Update Imports:

Use automated script:

python .claude/skills/langfuse-integration/scripts/remove_phoenix.py \
  --target main/src/ \
  --dry-run  # Preview changes

Manual pattern:

# Remove all instances of:
# - from phoenix.otel import register
# - from phoenix import ...
# - import phoenix
# - Any calls to phoenix.trace(), register(), etc.

Remove Phoenix from Dependencies:

# Remove from pyproject.toml
uv remove arize-phoenix arize-phoenix-otel

Update Monitoring Module Init:

# main/src/monitoring/__init__.py
# OLD:
# from .phoenix_config import setup_phoenix, PhoenixManager

# NEW:
from .langfuse_config import setup_langfuse, get_langfuse_client

__all__ = ["setup_langfuse", "get_langfuse_client"]

Remove Phoenix Server Command (if applicable):

# Check if phoenix serve is in any scripts
grep -r "phoenix serve" . --include="*.sh" --include="*.py" --include="*.md"

# Remove or comment out

Quality Gate:

✅ phoenix_config.py removed
✅ All Phoenix imports removed from codebase
✅ Phoenix packages uninstalled
✅ No references to Phoenix in documentation
✅ Codebase still imports successfully

Phase 5: Validation and Testing (15-20 minutes)

Objective: Verify Langfuse integration works correctly and traces appear in dashboard.

Steps:

Run Integration Health Check:

python .claude/skills/langfuse-integration/scripts/validate_integration.py

Expected output:

✅ Langfuse SDK installed
✅ API keys configured
✅ Cloud connectivity successful
✅ Test trace created: trace_id=xxx
✅ @observe decorators found: 15
✅ Callback handler configured
❌ No Phoenix imports found (expected)

Run End-to-End Workflow:

# Execute test workflow with real URS
uv run python main/main.py --urs examples/test_urs_001.md

Verify Trace in Dashboard:
- Navigate to: https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/traces
- Find most recent trace by timestamp
- Check:
  - ✅ Trace appears (not 404)
  - ✅ Span count matches expected (compare to Phoenix baseline)
  - ✅ User ID populated
  - ✅ Session ID populated
  - ✅ Tags include "pharmaceutical", "gamp5"
  - ✅ GAMP-5 category metadata present
  - ✅ No errors in observations

Compare Span Structure:

# If Phoenix baseline available, compare span counts
echo "Phoenix baseline: 131 spans/workflow"
echo "Langfuse actual: <count from dashboard>"
# Acceptable range: 120-140 (some variation expected)

Test Compliance Attributes:
- Click on categorization span in dashboard
- Verify metadata contains:
  - compliance.gamp5.category: 1-5
  - compliance.alcoa_plus.attributable: true
  - user.clerk_id:
  - job.id:

Run Existing Tests:

# Ensure no regressions
pytest main/tests/ -v

# Check for import errors
mypy main/src/

# Check for Phoenix references
ruff check main/src/

Quality Gate:

✅ Health check passes all tests
✅ End-to-end workflow completes successfully
✅ Trace visible in Langfuse Cloud dashboard
✅ Span count within 10% of Phoenix baseline
✅ All compliance attributes present
✅ Existing tests pass
✅ No mypy/ruff errors

Phase 6: Documentation and Finalization (5-10 minutes)

Objective: Document the migration and update project references.

Steps:

Update Quick Start Guide:
- Edit main/docs/guides/QUICK_START_GUIDE.md
- Replace Phoenix setup instructions with Langfuse
- Update environment variable examples
- Add Langfuse dashboard URL
Update README:
- Replace Phoenix badge/link with Langfuse
- Update observability section
- Add Langfuse Cloud (EU) data residency note

Create Migration Notes:

# Phoenix → Langfuse Migration Summary

**Date**: <YYYY-MM-DD>
**Scope**: Complete Phoenix replacement

## Changes Made
- Removed: phoenix_config.py, Phoenix dependencies
- Added: langfuse_config.py, Langfuse SDK
- Instrumented: 15 functions with @observe decorators
- Replaced: LlamaIndex callback handler

## Verification
- Trace count: 131 spans/workflow (matches Phoenix baseline)
- Dashboard URL: https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107
- Compliance: GAMP-5 + ALCOA+ attributes preserved

## Rollback (if needed)
- Restore phoenix_config.py.bak
- Run: uv add arize-phoenix arize-phoenix-otel
- Remove @observe decorators

Update CLAUDE.md:
- Replace Phoenix references in "Technology Stack" section
- Update observability commands
- Add Langfuse skill invocation instructions

Commit Changes:

git add -A
git status  # Review changes

# Commit with detailed message
git commit -m "$(cat <<'EOF'
feat: Replace Phoenix with Langfuse Cloud (EU) observability

- Add Langfuse SDK and LlamaIndex instrumentation
- Add @observe decorators to 15 workflow/agent functions
- Configure Langfuse Cloud (EU) with GAMP-5 compliance attributes
- Remove Phoenix dependencies and configuration
- Verify trace parity: 131 spans/workflow maintained
- Update documentation (Quick Start, README, CLAUDE.md)

Task: PRP 2.3 (LangFuse Integration and Dashboard)
Validation: All tests passing, traces visible in dashboard

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
EOF
)"

Quality Gate:

✅ Quick Start Guide updated
✅ README updated
✅ Migration notes created
✅ CLAUDE.md reflects Langfuse
✅ Changes committed to Git

Success Criteria

Before marking this skill complete, verify ALL criteria:

Functional Requirements

✅ Langfuse SDK installed and configured for EU cloud
✅ API keys set in environment variables
✅ langfuse_config.py created with setup functions
✅ @observe decorators added to all critical paths
✅ LlamaIndex callback handler replaced
✅ Phoenix configuration file removed
✅ Phoenix imports removed from all files
✅ Phoenix dependencies uninstalled

Observability Requirements

✅ End-to-end workflow generates traces
✅ Traces visible in Langfuse Cloud dashboard
✅ Span count matches Phoenix baseline (±10%)
✅ Trace structure maintains workflow visibility

Compliance Requirements

✅ User ID (Clerk) propagated to all traces
✅ Session ID (job_id) propagated to all traces
✅ GAMP-5 category metadata on categorization spans
✅ ALCOA+ attributable=true on all traces
✅ Tags include ["pharmaceutical", "gamp5"]

Quality Requirements

✅ No FALLBACK LOGIC introduced
✅ All errors throw with full stack traces
✅ Existing tests pass (pytest)
✅ Type checking passes (mypy)
✅ Linting passes (ruff)
✅ No import errors or circular dependencies

Documentation Requirements

✅ Quick Start Guide updated
✅ README updated
✅ CLAUDE.md updated
✅ Migration notes created
✅ Changes committed to Git with descriptive message

Troubleshooting

Issue: Langfuse SDK Import Error

Symptom:

ModuleNotFoundError: No module named 'langfuse'

Solution:

uv add langfuse llama-index-instrumentation-langfuse
uv sync

Issue: Traces Not Appearing in Dashboard

Symptom: Workflow runs successfully but no traces in Langfuse Cloud.

Diagnosis:

Check API keys:

import os
print(f"Public key: {os.getenv('LANGFUSE_PUBLIC_KEY')[:10]}...")
print(f"Secret key configured: {bool(os.getenv('LANGFUSE_SECRET_KEY'))}")

Check flush call:

from langfuse import get_client
client = get_client()
client.flush()  # CRITICAL: Must flush before exit

Check network connectivity:
```
curl -I https://cloud.langfuse.com
```

Solution:

Verify API keys match dashboard (Settings → API Keys)
Add client.flush() before process exit
Check firewall/proxy settings

Issue: Missing Compliance Attributes

Symptom: Traces appear but lack GAMP-5 metadata.

Solution:

# Ensure get_current_observation() is called inside decorated function
from langfuse import observe, get_current_observation

@observe()
def my_function():
    obs = get_current_observation()
    if obs:  # CRITICAL: Check if obs exists
        obs.update(metadata={"compliance.gamp5.category": 5})

Issue: Span Count Mismatch

Symptom: Langfuse shows fewer spans than Phoenix baseline.

Diagnosis:

Check if all @observe decorators are applied
Verify LlamaIndex callback handler is registered
Check for early return statements before instrumented code

Solution:

# Find missing decorators
grep -r "async def" main/src/agents/ --include="*.py" | \
  grep -v "@observe"

Issue: High Latency After Migration

Symptom: Workflows slower with Langfuse vs Phoenix.

Diagnosis:

Langfuse batches events asynchronously (default: every 1 second)
Network calls to EU cloud add latency

Solution:

# Tune batch settings
from langfuse import Langfuse

client = Langfuse(
    flush_interval=5,  # Flush every 5 seconds instead of 1
    flush_at=50,       # Batch 50 events before flushing
)

Reference Materials

Decorator Patterns

See reference/decorator-patterns.md for:

Function-level instrumentation patterns
Async function handling
Nested span creation
LLM generation tracing

Phoenix Migration Guide

See reference/phoenix-migration-guide.md for:

Side-by-side comparison of Phoenix vs Langfuse APIs
Import migration table
Span structure equivalence
Common pitfalls during migration

Compliance Attributes

See reference/compliance-attributes.md for:

GAMP-5 category metadata schema
ALCOA+ attribute requirements
21 CFR Part 11 considerations
Audit trail best practices

Advanced Usage

Context Manager Pattern (Fine-Grained Control)

For more control than decorators provide:

from langfuse import get_client

langfuse = get_client()

def complex_workflow():
    with langfuse.start_as_current_span(
        name="complex-workflow",
        as_type="span"
    ) as span:
        span.update(input={"mode": "batch"})

        # Manual sub-span creation
        with langfuse.start_as_current_span(
            name="data-validation",
            as_type="span"
        ) as sub_span:
            validate_data()
            sub_span.update(output={"valid": True})

        # Main logic
        result = process_data()

        span.update(output=result)

Custom Event Tracking

For discrete events (not spans):

from langfuse import get_current_observation

obs = get_current_observation()
if obs:
    obs.event(
        name="gamp5-category-assigned",
        metadata={
            "category": 5,
            "confidence": 0.95,
            "timestamp": datetime.now().isoformat()
        }
    )

Multi-Tenant Attribution

For pharmaceutical companies with multiple users:

from langfuse import observe, get_current_trace

@observe()
async def multi_tenant_workflow(org_id: str, user_id: str):
    trace = get_current_trace()
    if trace:
        trace.update(
            user_id=user_id,
            tags=[f"org:{org_id}", "gamp5"],
            metadata={
                "organization.id": org_id,
                "organization.name": get_org_name(org_id),
                "compliance.data_residency": "EU"
            }
        )

    # Workflow logic
    ...

Skill Completion Checklist

Before reporting success to the user, verify:

Phase 1: Assessment report generated
Phase 2: Langfuse configured and connectivity verified
Phase 3: Decorators added, callback handler replaced
Phase 4: Phoenix removed completely
Phase 5: Validation passes all tests
Phase 6: Documentation updated and committed
All success criteria met (see above)
No FALLBACK LOGIC violations
User confirmation obtained: "Did you see traces in the dashboard?"

IMPORTANT: NEVER claim success without user verification. Always ask: "Can you confirm you see traces appearing in the Langfuse dashboard at https://cloud.langfuse.com/project/cmhuwhcfe006yad06cqfub107/traces?"

Post-Migration: Next Steps

After successful migration:

Use langfuse-extraction skill to:
- Extract traces for debugging
- Generate audit trails for compliance
- Export data to pandas for analysis
Use langfuse-dashboard skill to:
- Capture dashboard screenshots for documentation
- Automate metric extraction for alerting
- Investigate specific traces interactively
Proceed with PRP tasks:
- Task 3.1: FastAPI backend development
- Task 4.3: Bedrock model integration
- Task 5.1: Production deployment validation

Skill Version: 1.0.0 Last Updated: 2025-01-17 Compatibility: LlamaIndex 0.12.0+, Langfuse SDK 3.0+ Data Residency: EU (cloud.langfuse.com) Compliance: GAMP-5, ALCOA+, 21 CFR Part 11 ready

langfuse-integration

When & Why to Use This Skill

Use Cases

Langfuse Integration Skill

When to Use This Skill

Prerequisites

Workflow Phases

Phase 1: Assessment and Analysis (5-10 minutes)

Phase 2: Langfuse Configuration Setup (10-15 minutes)

Phase 3: Code Instrumentation (20-30 minutes)

Phase 4: Phoenix Removal (10-15 minutes)

Phase 5: Validation and Testing (15-20 minutes)

Phase 6: Documentation and Finalization (5-10 minutes)

Success Criteria

Functional Requirements

Observability Requirements

Compliance Requirements

Quality Requirements

Documentation Requirements

Troubleshooting

Issue: Langfuse SDK Import Error

Issue: Traces Not Appearing in Dashboard

Issue: Missing Compliance Attributes

Issue: Span Count Mismatch

Issue: High Latency After Migration

Reference Materials

Decorator Patterns

Phoenix Migration Guide

Compliance Attributes

Advanced Usage

Context Manager Pattern (Fine-Grained Control)

Custom Event Tracking

Multi-Tenant Attribution

Skill Completion Checklist

Post-Migration: Next Steps