What is schema-mapper?

The Schema Mapper skill automates the discovery, documentation, and visualization of complex database structures. It enables users to explore unfamiliar databases by mapping table relationships, identifying optimal join paths, and generating comprehensive ERD documentation. By combining explicit foreign key detection with intelligent naming pattern inference, it provides a clear, actionable overview of data architectures, significantly reducing the time required for database onboarding and query preparation.

When should I use schema-mapper?

schema-mapper is useful in the following scenarios: • Onboarding to Unfamiliar Databases: Quickly visualize and understand the structure of a new project's database without relying on outdated documentation. • Automated ERD Generation: Generate visual Mermaid diagrams and documentation to keep technical wikis and data dictionaries up to date. • Complex Join Path Discovery: Automatically identify the shortest and most efficient SQL join sequences between distant tables in a large schema. • Data Dictionary Creation: Export a comprehensive catalog of tables, columns, data types, and constraints into CSV or Markdown formats. • Implicit Relationship Mapping: Detect hidden links between tables based on naming conventions (e.g., user_id to users.id) even when formal foreign keys are missing.

name	schema-mapper
description	Database schema understanding and relationship mapping. Use when exploring unfamiliar databases, documenting table relationships, identifying join paths, or generating ERD documentation for existing schemas.

Schema Mapper

Quick Start

Automatically discover, document, and visualize database schemas including tables, columns, relationships, and join paths. Essential for understanding unfamiliar databases or creating documentation.

Context Requirements

Before mapping the schema, I need:

Database Access: Connection details or schema export
Scope: Which tables/schemas to map (or all)
Documentation Goal: What you need (ERD, join paths, data dictionary, etc.)
Known Relationships (optional): Explicit foreign keys or implicit relationships

Context Gathering

For Database Access:

"I can map your schema from:

Option 1 - Direct Database Connection:

connection_details = {
    'host': 'your-db.example.com',
    'database': 'production',
    'user': 'readonly_user',
    'password': '***',
    'type': 'postgresql'  # or mysql, snowflake, bigquery, redshift
}

Option 2 - Schema Export:

-- PostgreSQL:
SELECT table_schema, table_name, column_name, data_type, 
       is_nullable, column_default
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;

-- Also export constraints:
SELECT * FROM information_schema.table_constraints;
SELECT * FROM information_schema.key_column_usage;

Option 3 - dbt Project: Share your dbt_project.yml and models/ directory - I'll extract schema from dbt docs

Option 4 - Existing Documentation: Share any ERDs, data dictionaries, or schema docs you have"

For Scope:

"Should I map:

All tables in the database?
Specific schema (e.g., 'public', 'analytics', 'staging')?
Specific tables (list the important ones)?
Tables matching pattern (e.g., all 'fct_*' and 'dim_*' tables)?"

For Documentation Goal:

"What do you need from this schema mapping?

Common Goals:

Visual ERD - See table relationships graphically
Join Path Finder - How to join Table A to Table B
Data Dictionary - Complete catalog of tables and columns
Lineage Map - Understand data flow through tables
Quick Reference - Cheat sheet for common joins

Which would be most useful?"

For Relationships:

"I'll auto-detect relationships from:

Foreign key constraints
Column name patterns (id, user_id, etc.)
Naming conventions

If you have implicit relationships (not enforced by FK constraints), let me know:

Example: 'orders.customer_email relates to customers.email but no FK exists'"

Workflow

Step 1: Connect and Discover Schema

import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine, inspect

# Connect to database
engine = create_engine(connection_string)
inspector = inspect(engine)

# Get all schemas
schemas = inspector.get_schema_names()
print(f"📚 Schemas found: {schemas}")

# Get tables in target schema
tables = inspector.get_table_names(schema='public')
print(f"📊 Tables in 'public': {len(tables)}")
for table in sorted(tables):
    print(f"  - {table}")

Checkpoint: "Found {N} tables in schema. Does this look right? Any tables missing or unexpected?"

Step 2: Extract Table Metadata

def extract_table_metadata(engine, schema, table_name):
    """Extract comprehensive metadata for a single table"""
    inspector = inspect(engine)
    
    # Get columns
    columns = inspector.get_columns(table_name, schema=schema)
    
    # Get primary keys
    pk = inspector.get_pk_constraint(table_name, schema=schema)
    pk_columns = pk.get('constrained_columns', [])
    
    # Get foreign keys
    fks = inspector.get_foreign_keys(table_name, schema=schema)
    
    # Get indexes
    indexes = inspector.get_indexes(table_name, schema=schema)
    
    # Get unique constraints
    unique = inspector.get_unique_constraints(table_name, schema=schema)
    
    return {
        'table_name': table_name,
        'schema': schema,
        'columns': columns,
        'primary_keys': pk_columns,
        'foreign_keys': fks,
        'indexes': indexes,
        'unique_constraints': unique
    }

# Extract metadata for all tables
schema_metadata = {}
for table in tables:
    schema_metadata[table] = extract_table_metadata(engine, 'public', table)
    print(f"✓ Extracted metadata for {table}")

Step 3: Infer Relationships

def infer_relationships(schema_metadata):
    """
    Infer relationships from:
    1. Explicit foreign keys
    2. Column naming patterns (user_id → users.id)
    3. Common patterns (created_by → users.id)
    """
    relationships = []
    
    for table_name, metadata in schema_metadata.items():
        # Explicit FKs
        for fk in metadata['foreign_keys']:
            relationships.append({
                'type': 'explicit_fk',
                'from_table': table_name,
                'from_column': fk['constrained_columns'][0],
                'to_table': fk['referred_table'],
                'to_column': fk['referred_columns'][0],
                'confidence': 'high'
            })
        
        # Inferred from naming patterns
        for col in metadata['columns']:
            col_name = col['name']
            
            # Pattern: table_id → table.id
            if col_name.endswith('_id') and col_name != 'id':
                potential_table = col_name[:-3] + 's'  # users_id → users
                if potential_table in schema_metadata:
                    relationships.append({
                        'type': 'inferred_naming',
                        'from_table': table_name,
                        'from_column': col_name,
                        'to_table': potential_table,
                        'to_column': 'id',
                        'confidence': 'medium'
                    })
    
    return relationships

relationships = infer_relationships(schema_metadata)
print(f"🔗 Found {len(relationships)} relationships")

Step 4: Generate Data Dictionary

def generate_data_dictionary(schema_metadata):
    """Create comprehensive data dictionary"""
    
    dictionary = []
    
    for table_name, metadata in schema_metadata.items():
        for col in metadata['columns']:
            dictionary.append({
                'table': table_name,
                'column': col['name'],
                'type': str(col['type']),
                'nullable': col['nullable'],
                'default': col.get('default'),
                'is_pk': col['name'] in metadata['primary_keys'],
                'is_fk': any(col['name'] in fk['constrained_columns'] 
                           for fk in metadata['foreign_keys'])
            })
    
    df_dict = pd.DataFrame(dictionary)
    return df_dict

data_dict = generate_data_dictionary(schema_metadata)
print(f"\n📖 Data Dictionary:")
print(data_dict.head(20))

# Save to CSV
data_dict.to_csv('data_dictionary.csv', index=False)

Step 5: Find Join Paths

def find_join_path(from_table, to_table, relationships):
    """
    Find how to join from_table to to_table
    Using BFS to find shortest path
    """
    from collections import deque
    
    # Build adjacency graph
    graph = {}
    for rel in relationships:
        if rel['from_table'] not in graph:
            graph[rel['from_table']] = []
        graph[rel['from_table']].append({
            'to_table': rel['to_table'],
            'from_col': rel['from_column'],
            'to_col': rel['to_column'],
            'type': rel['type']
        })
    
    # BFS to find path
    queue = deque([(from_table, [])])
    visited = {from_table}
    
    while queue:
        current_table, path = queue.popleft()
        
        if current_table == to_table:
            return path
        
        if current_table in graph:
            for edge in graph[current_table]:
                if edge['to_table'] not in visited:
                    visited.add(edge['to_table'])
                    new_path = path + [edge]
                    queue.append((edge['to_table'], new_path))
    
    return None  # No path found

# Example: How to join orders to customers?
path = find_join_path('orders', 'customers', relationships)

if path:
    print("\n🛤️  Join Path: orders → customers")
    for i, step in enumerate(path, 1):
        print(f"  Step {i}: JOIN {step['to_table']} " +
              f"ON {path[i-1]['to_table'] if i > 1 else 'orders'}.{step['from_col']} = " +
              f"{step['to_table']}.{step['to_col']}")
else:
    print("❌ No join path found")

Step 6: Generate ERD

def generate_erd_mermaid(schema_metadata, relationships):
    """Generate Mermaid ERD diagram"""
    
    mermaid = ["erDiagram"]
    
    # Add tables with columns
    for table_name, metadata in schema_metadata.items():
        mermaid.append(f"    {table_name.upper()} {{")
        
        for col in metadata['columns'][:10]:  # Limit to key columns
            col_type = str(col['type'])[:20]
            pk_marker = " PK" if col['name'] in metadata['primary_keys'] else ""
            fk_marker = " FK" if any(col['name'] in fk['constrained_columns'] 
                                    for fk in metadata['foreign_keys']) else ""
            
            mermaid.append(f"        {col_type} {col['name']}{pk_marker}{fk_marker}")
        
        mermaid.append("    }")
    
    # Add relationships
    for rel in relationships:
        if rel['type'] == 'explicit_fk':
            mermaid.append(
                f"    {rel['from_table'].upper()} ||--o{{ " +
                f"{rel['to_table'].upper()} : {rel['from_column']}"
            )
    
    return "\n".join(mermaid)

erd = generate_erd_mermaid(schema_metadata, relationships)
print("\n📊 ERD (Mermaid format):")
print(erd)

# Save to file
with open('schema_erd.mmd', 'w') as f:
    f.write(erd)

Step 7: Generate Quick Reference Guide

def generate_quick_reference(schema_metadata, relationships):
    """Create quick reference for common joins"""
    
    ref = []
    ref.append("# Schema Quick Reference\n")
    
    # Table summary
    ref.append("## Tables Overview\n")
    for table_name, metadata in sorted(schema_metadata.items()):
        row_count = f"(~{metadata.get('row_count', '?')} rows)" if 'row_count' in metadata else ""
        ref.append(f"- **{table_name}** {row_count}")
        ref.append(f"  - Primary Key: {', '.join(metadata['primary_keys']) or 'None'}")
        ref.append(f"  - Columns: {len(metadata['columns'])}\n")
    
    # Common joins
    ref.append("\n## Common Join Patterns\n")
    
    # Group relationships by from_table
    joins_by_table = {}
    for rel in relationships:
        if rel['from_table'] not in joins_by_table:
            joins_by_table[rel['from_table']] = []
        joins_by_table[rel['from_table']].append(rel)
    
    for table, rels in sorted(joins_by_table.items()):
        ref.append(f"\n### From {table}:\n")
        for rel in rels:
            ref.append(
                f"```sql\n"
                f"JOIN {rel['to_table']} ON {table}.{rel['from_column']} = " +
                f"{rel['to_table']}.{rel['to_column']}\n"
                f"```\n"
            )
    
    return "\n".join(ref)

quick_ref = generate_quick_reference(schema_metadata, relationships)
print(quick_ref)

# Save to markdown
with open('schema_quick_reference.md', 'w') as f:
    f.write(quick_ref)

Context Validation

Before proceeding, verify:

Database connection works or schema export is complete
Target tables/schemas are clearly defined
Have permissions to query INFORMATION_SCHEMA
Understand which relationships are explicit vs inferred
Know what documentation format is needed

Output Template

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCHEMA MAPPING REPORT
Database: production_db
Schema: public
Generated: 2025-01-11
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 SCHEMA OVERVIEW
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total Tables: 42
Total Columns: 387
Total Relationships: 56
  - Explicit (FK): 38
  - Inferred: 18

🗂️  TABLE CATEGORIES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Fact Tables (fct_*): 8
Dimension Tables (dim_*): 12
Staging Tables (stg_*): 15
Raw Tables (raw_*): 7

📋 KEY TABLES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. users (dim_users)
   - PK: id
   - Columns: 15
   - Referenced by: orders, sessions, events

2. orders (fct_orders)
   - PK: id
   - FK: user_id → users.id, product_id → products.id
   - Columns: 23

3. products (dim_products)
   - PK: id
   - Columns: 12
   - Referenced by: orders, cart_items

🔗 RELATIONSHIP MAP
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

orders → users (user_id)
orders → products (product_id)  
sessions → users (user_id)
events → sessions (session_id)
cart_items → products (product_id)

🛤️  COMMON JOIN PATHS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

users → orders:
  JOIN orders ON users.id = orders.user_id

orders → products:
  JOIN products ON orders.product_id = products.id

users → events (via sessions):
  JOIN sessions ON users.id = sessions.user_id
  JOIN events ON sessions.id = events.session_id

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FILES GENERATED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ schema_erd.mmd (Mermaid ERD)
✓ data_dictionary.csv (All columns)
✓ schema_quick_reference.md (Join guide)
✓ relationship_graph.json (Machine-readable)

Common Scenarios

Scenario 1: "New to the database, need overview"

→ Full schema map with ERD and data dictionary → Highlight fact and dimension tables → Show most common join patterns

Scenario 2: "How do I join Table A to Table B?"

→ Use find_join_path to show exact SQL → If no direct path, show multi-hop joins → Validate path makes business sense

Scenario 3: "Document schema for new team members"

→ Generate comprehensive quick reference → Include business context for key tables → Create visual ERD for overview

Scenario 4: "Find all tables related to users"

→ Traverse relationship graph from users table → Show both direct and indirect relationships → Categorize by relationship type

Scenario 5: "Validate schema against expectations"

→ Compare actual schema to documented schema → Flag missing tables, columns, relationships → Identify schema drift

Handling Missing Context

User provides database but no context: "I'll map the entire schema and present an overview. You can then tell me which areas to explore in depth."

User has partial access (read-only): "I'll work with INFORMATION_SCHEMA queries which don't require write permissions. This will show structure but not data samples."

User has dbt project instead of database: "Great! I'll extract schema from your dbt models and documentation. This often has better business context than raw database schema."

User wants specific tables only: "I'll focus on those tables and map their direct relationships. Let me know if you want me to expand to related tables."

Foreign keys not enforced: "I'll infer relationships from column naming patterns. Review the 'inferred' relationships and confirm which are correct."

Advanced Options

After basic schema mapping, offer:

Lineage Tracking: "Want me to trace data lineage? I can show how data flows from raw tables through transformations to final fact tables."

Schema Comparison: "I can compare schemas across environments (dev vs prod) or over time to identify changes."

Query Pattern Analysis: "If you provide query logs, I can identify most-used join patterns and optimize documentation around them."

dbt Integration: "I can generate dbt schema.yml files from this mapping to document your models."

Auto-Documentation: "I can set up automated schema documentation that updates when schema changes."

Performance Insights: "Based on table sizes and join patterns, I can identify potential performance bottlenecks."

schema-mapper

When & Why to Use This Skill

Use Cases