schema-mapper
Database schema understanding and relationship mapping. Use when exploring unfamiliar databases, documenting table relationships, identifying join paths, or generating ERD documentation for existing schemas.
When & Why to Use This Skill
The Schema Mapper skill automates the discovery, documentation, and visualization of complex database structures. It enables users to explore unfamiliar databases by mapping table relationships, identifying optimal join paths, and generating comprehensive ERD documentation. By combining explicit foreign key detection with intelligent naming pattern inference, it provides a clear, actionable overview of data architectures, significantly reducing the time required for database onboarding and query preparation.
Use Cases
- Onboarding to Unfamiliar Databases: Quickly visualize and understand the structure of a new project's database without relying on outdated documentation.
- Automated ERD Generation: Generate visual Mermaid diagrams and documentation to keep technical wikis and data dictionaries up to date.
- Complex Join Path Discovery: Automatically identify the shortest and most efficient SQL join sequences between distant tables in a large schema.
- Data Dictionary Creation: Export a comprehensive catalog of tables, columns, data types, and constraints into CSV or Markdown formats.
- Implicit Relationship Mapping: Detect hidden links between tables based on naming conventions (e.g., user_id to users.id) even when formal foreign keys are missing.
| name | schema-mapper |
|---|---|
| description | Database schema understanding and relationship mapping. Use when exploring unfamiliar databases, documenting table relationships, identifying join paths, or generating ERD documentation for existing schemas. |
Schema Mapper
Quick Start
Automatically discover, document, and visualize database schemas including tables, columns, relationships, and join paths. Essential for understanding unfamiliar databases or creating documentation.
Context Requirements
Before mapping the schema, I need:
- Database Access: Connection details or schema export
- Scope: Which tables/schemas to map (or all)
- Documentation Goal: What you need (ERD, join paths, data dictionary, etc.)
- Known Relationships (optional): Explicit foreign keys or implicit relationships
Context Gathering
For Database Access:
"I can map your schema from:
Option 1 - Direct Database Connection:
connection_details = {
'host': 'your-db.example.com',
'database': 'production',
'user': 'readonly_user',
'password': '***',
'type': 'postgresql' # or mysql, snowflake, bigquery, redshift
}
Option 2 - Schema Export:
-- PostgreSQL:
SELECT table_schema, table_name, column_name, data_type,
is_nullable, column_default
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;
-- Also export constraints:
SELECT * FROM information_schema.table_constraints;
SELECT * FROM information_schema.key_column_usage;
Option 3 - dbt Project:
Share your dbt_project.yml and models/ directory - I'll extract schema from dbt docs
Option 4 - Existing Documentation: Share any ERDs, data dictionaries, or schema docs you have"
For Scope:
"Should I map:
- All tables in the database?
- Specific schema (e.g., 'public', 'analytics', 'staging')?
- Specific tables (list the important ones)?
- Tables matching pattern (e.g., all 'fct_*' and 'dim_*' tables)?"
For Documentation Goal:
"What do you need from this schema mapping?
Common Goals:
- Visual ERD - See table relationships graphically
- Join Path Finder - How to join Table A to Table B
- Data Dictionary - Complete catalog of tables and columns
- Lineage Map - Understand data flow through tables
- Quick Reference - Cheat sheet for common joins
Which would be most useful?"
For Relationships:
"I'll auto-detect relationships from:
- Foreign key constraints
- Column name patterns (id, user_id, etc.)
- Naming conventions
If you have implicit relationships (not enforced by FK constraints), let me know:
- Example: 'orders.customer_email relates to customers.email but no FK exists'"
Workflow
Step 1: Connect and Discover Schema
import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine, inspect
# Connect to database
engine = create_engine(connection_string)
inspector = inspect(engine)
# Get all schemas
schemas = inspector.get_schema_names()
print(f"📚 Schemas found: {schemas}")
# Get tables in target schema
tables = inspector.get_table_names(schema='public')
print(f"📊 Tables in 'public': {len(tables)}")
for table in sorted(tables):
print(f" - {table}")
Checkpoint: "Found {N} tables in schema. Does this look right? Any tables missing or unexpected?"
Step 2: Extract Table Metadata
def extract_table_metadata(engine, schema, table_name):
"""Extract comprehensive metadata for a single table"""
inspector = inspect(engine)
# Get columns
columns = inspector.get_columns(table_name, schema=schema)
# Get primary keys
pk = inspector.get_pk_constraint(table_name, schema=schema)
pk_columns = pk.get('constrained_columns', [])
# Get foreign keys
fks = inspector.get_foreign_keys(table_name, schema=schema)
# Get indexes
indexes = inspector.get_indexes(table_name, schema=schema)
# Get unique constraints
unique = inspector.get_unique_constraints(table_name, schema=schema)
return {
'table_name': table_name,
'schema': schema,
'columns': columns,
'primary_keys': pk_columns,
'foreign_keys': fks,
'indexes': indexes,
'unique_constraints': unique
}
# Extract metadata for all tables
schema_metadata = {}
for table in tables:
schema_metadata[table] = extract_table_metadata(engine, 'public', table)
print(f"✓ Extracted metadata for {table}")
Step 3: Infer Relationships
def infer_relationships(schema_metadata):
"""
Infer relationships from:
1. Explicit foreign keys
2. Column naming patterns (user_id → users.id)
3. Common patterns (created_by → users.id)
"""
relationships = []
for table_name, metadata in schema_metadata.items():
# Explicit FKs
for fk in metadata['foreign_keys']:
relationships.append({
'type': 'explicit_fk',
'from_table': table_name,
'from_column': fk['constrained_columns'][0],
'to_table': fk['referred_table'],
'to_column': fk['referred_columns'][0],
'confidence': 'high'
})
# Inferred from naming patterns
for col in metadata['columns']:
col_name = col['name']
# Pattern: table_id → table.id
if col_name.endswith('_id') and col_name != 'id':
potential_table = col_name[:-3] + 's' # users_id → users
if potential_table in schema_metadata:
relationships.append({
'type': 'inferred_naming',
'from_table': table_name,
'from_column': col_name,
'to_table': potential_table,
'to_column': 'id',
'confidence': 'medium'
})
return relationships
relationships = infer_relationships(schema_metadata)
print(f"🔗 Found {len(relationships)} relationships")
Step 4: Generate Data Dictionary
def generate_data_dictionary(schema_metadata):
"""Create comprehensive data dictionary"""
dictionary = []
for table_name, metadata in schema_metadata.items():
for col in metadata['columns']:
dictionary.append({
'table': table_name,
'column': col['name'],
'type': str(col['type']),
'nullable': col['nullable'],
'default': col.get('default'),
'is_pk': col['name'] in metadata['primary_keys'],
'is_fk': any(col['name'] in fk['constrained_columns']
for fk in metadata['foreign_keys'])
})
df_dict = pd.DataFrame(dictionary)
return df_dict
data_dict = generate_data_dictionary(schema_metadata)
print(f"\n📖 Data Dictionary:")
print(data_dict.head(20))
# Save to CSV
data_dict.to_csv('data_dictionary.csv', index=False)
Step 5: Find Join Paths
def find_join_path(from_table, to_table, relationships):
"""
Find how to join from_table to to_table
Using BFS to find shortest path
"""
from collections import deque
# Build adjacency graph
graph = {}
for rel in relationships:
if rel['from_table'] not in graph:
graph[rel['from_table']] = []
graph[rel['from_table']].append({
'to_table': rel['to_table'],
'from_col': rel['from_column'],
'to_col': rel['to_column'],
'type': rel['type']
})
# BFS to find path
queue = deque([(from_table, [])])
visited = {from_table}
while queue:
current_table, path = queue.popleft()
if current_table == to_table:
return path
if current_table in graph:
for edge in graph[current_table]:
if edge['to_table'] not in visited:
visited.add(edge['to_table'])
new_path = path + [edge]
queue.append((edge['to_table'], new_path))
return None # No path found
# Example: How to join orders to customers?
path = find_join_path('orders', 'customers', relationships)
if path:
print("\n🛤️ Join Path: orders → customers")
for i, step in enumerate(path, 1):
print(f" Step {i}: JOIN {step['to_table']} " +
f"ON {path[i-1]['to_table'] if i > 1 else 'orders'}.{step['from_col']} = " +
f"{step['to_table']}.{step['to_col']}")
else:
print("❌ No join path found")
Step 6: Generate ERD
def generate_erd_mermaid(schema_metadata, relationships):
"""Generate Mermaid ERD diagram"""
mermaid = ["erDiagram"]
# Add tables with columns
for table_name, metadata in schema_metadata.items():
mermaid.append(f" {table_name.upper()} {{")
for col in metadata['columns'][:10]: # Limit to key columns
col_type = str(col['type'])[:20]
pk_marker = " PK" if col['name'] in metadata['primary_keys'] else ""
fk_marker = " FK" if any(col['name'] in fk['constrained_columns']
for fk in metadata['foreign_keys']) else ""
mermaid.append(f" {col_type} {col['name']}{pk_marker}{fk_marker}")
mermaid.append(" }")
# Add relationships
for rel in relationships:
if rel['type'] == 'explicit_fk':
mermaid.append(
f" {rel['from_table'].upper()} ||--o{{ " +
f"{rel['to_table'].upper()} : {rel['from_column']}"
)
return "\n".join(mermaid)
erd = generate_erd_mermaid(schema_metadata, relationships)
print("\n📊 ERD (Mermaid format):")
print(erd)
# Save to file
with open('schema_erd.mmd', 'w') as f:
f.write(erd)
Step 7: Generate Quick Reference Guide
def generate_quick_reference(schema_metadata, relationships):
"""Create quick reference for common joins"""
ref = []
ref.append("# Schema Quick Reference\n")
# Table summary
ref.append("## Tables Overview\n")
for table_name, metadata in sorted(schema_metadata.items()):
row_count = f"(~{metadata.get('row_count', '?')} rows)" if 'row_count' in metadata else ""
ref.append(f"- **{table_name}** {row_count}")
ref.append(f" - Primary Key: {', '.join(metadata['primary_keys']) or 'None'}")
ref.append(f" - Columns: {len(metadata['columns'])}\n")
# Common joins
ref.append("\n## Common Join Patterns\n")
# Group relationships by from_table
joins_by_table = {}
for rel in relationships:
if rel['from_table'] not in joins_by_table:
joins_by_table[rel['from_table']] = []
joins_by_table[rel['from_table']].append(rel)
for table, rels in sorted(joins_by_table.items()):
ref.append(f"\n### From {table}:\n")
for rel in rels:
ref.append(
f"```sql\n"
f"JOIN {rel['to_table']} ON {table}.{rel['from_column']} = " +
f"{rel['to_table']}.{rel['to_column']}\n"
f"```\n"
)
return "\n".join(ref)
quick_ref = generate_quick_reference(schema_metadata, relationships)
print(quick_ref)
# Save to markdown
with open('schema_quick_reference.md', 'w') as f:
f.write(quick_ref)
Context Validation
Before proceeding, verify:
- Database connection works or schema export is complete
- Target tables/schemas are clearly defined
- Have permissions to query INFORMATION_SCHEMA
- Understand which relationships are explicit vs inferred
- Know what documentation format is needed
Output Template
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCHEMA MAPPING REPORT
Database: production_db
Schema: public
Generated: 2025-01-11
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 SCHEMA OVERVIEW
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total Tables: 42
Total Columns: 387
Total Relationships: 56
- Explicit (FK): 38
- Inferred: 18
🗂️ TABLE CATEGORIES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Fact Tables (fct_*): 8
Dimension Tables (dim_*): 12
Staging Tables (stg_*): 15
Raw Tables (raw_*): 7
📋 KEY TABLES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. users (dim_users)
- PK: id
- Columns: 15
- Referenced by: orders, sessions, events
2. orders (fct_orders)
- PK: id
- FK: user_id → users.id, product_id → products.id
- Columns: 23
3. products (dim_products)
- PK: id
- Columns: 12
- Referenced by: orders, cart_items
🔗 RELATIONSHIP MAP
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
orders → users (user_id)
orders → products (product_id)
sessions → users (user_id)
events → sessions (session_id)
cart_items → products (product_id)
🛤️ COMMON JOIN PATHS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
users → orders:
JOIN orders ON users.id = orders.user_id
orders → products:
JOIN products ON orders.product_id = products.id
users → events (via sessions):
JOIN sessions ON users.id = sessions.user_id
JOIN events ON sessions.id = events.session_id
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FILES GENERATED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ schema_erd.mmd (Mermaid ERD)
✓ data_dictionary.csv (All columns)
✓ schema_quick_reference.md (Join guide)
✓ relationship_graph.json (Machine-readable)
Common Scenarios
Scenario 1: "New to the database, need overview"
→ Full schema map with ERD and data dictionary → Highlight fact and dimension tables → Show most common join patterns
Scenario 2: "How do I join Table A to Table B?"
→ Use find_join_path to show exact SQL → If no direct path, show multi-hop joins → Validate path makes business sense
Scenario 3: "Document schema for new team members"
→ Generate comprehensive quick reference → Include business context for key tables → Create visual ERD for overview
Scenario 4: "Find all tables related to users"
→ Traverse relationship graph from users table → Show both direct and indirect relationships → Categorize by relationship type
Scenario 5: "Validate schema against expectations"
→ Compare actual schema to documented schema → Flag missing tables, columns, relationships → Identify schema drift
Handling Missing Context
User provides database but no context: "I'll map the entire schema and present an overview. You can then tell me which areas to explore in depth."
User has partial access (read-only): "I'll work with INFORMATION_SCHEMA queries which don't require write permissions. This will show structure but not data samples."
User has dbt project instead of database: "Great! I'll extract schema from your dbt models and documentation. This often has better business context than raw database schema."
User wants specific tables only: "I'll focus on those tables and map their direct relationships. Let me know if you want me to expand to related tables."
Foreign keys not enforced: "I'll infer relationships from column naming patterns. Review the 'inferred' relationships and confirm which are correct."
Advanced Options
After basic schema mapping, offer:
Lineage Tracking: "Want me to trace data lineage? I can show how data flows from raw tables through transformations to final fact tables."
Schema Comparison: "I can compare schemas across environments (dev vs prod) or over time to identify changes."
Query Pattern Analysis: "If you provide query logs, I can identify most-used join patterns and optimize documentation around them."
dbt Integration: "I can generate dbt schema.yml files from this mapping to document your models."
Auto-Documentation: "I can set up automated schema documentation that updates when schema changes."
Performance Insights: "Based on table sizes and join patterns, I can identify potential performance bottlenecks."