career-growth
Portfolio building, interviews, job search, continuous learning
When & Why to Use This Skill
This Claude skill serves as a comprehensive career advancement and interview preparation toolkit specifically tailored for Data Engineering professionals. It solves the challenge of navigating the complex technical job market by providing structured roadmaps for portfolio building, technical interview mastery (SQL/Python), and high-impact resume optimization using the STAR method.
Use Cases
- Developing a high-quality Data Engineering portfolio with end-to-end ETL, streaming, and cloud-native project templates.
- Preparing for technical interviews with curated coding patterns for SQL window functions, rate limiters, and system design architectures.
- Optimizing professional resumes and LinkedIn profiles by quantifying technical achievements and aligning with industry-standard tech stacks.
- Mapping out long-term professional development from Junior to Staff level with targeted learning paths and skill certification checklists.
| name | career-growth |
|---|---|
| description | Portfolio building, technical interviews, job search strategies, and continuous learning |
| sasmp_version | "1.3.0" |
| bonded_agent | 01-data-engineer |
| bond_type | SUPPORT_BOND |
| skill_version | "2.0.0" |
| last_updated | "2025-01" |
| complexity | foundational |
| estimated_mastery_hours | 40 |
| prerequisites | [] |
| unlocks | [] |
Career Growth
Professional development strategies for data engineering career advancement.
Quick Start
# Data Engineer Portfolio Checklist
## Required Projects (Pick 3-5)
- [ ] End-to-end ETL pipeline (Airflow + dbt)
- [ ] Real-time streaming project (Kafka/Spark Streaming)
- [ ] Data warehouse design (Snowflake/BigQuery)
- [ ] ML pipeline with MLOps (MLflow)
- [ ] API for data access (FastAPI)
## Documentation Template
Each project should include:
1. Problem statement
2. Architecture diagram
3. Tech stack justification
4. Challenges & solutions
5. Results/metrics
6. GitHub link with clean code
Core Concepts
1. Technical Interview Preparation
# Common coding patterns for data engineering interviews
# 1. SQL Window Functions
"""
Write a query to find the running total of sales by month,
and the percentage change from the previous month.
"""
sql = """
SELECT
month,
sales,
SUM(sales) OVER (ORDER BY month) AS running_total,
100.0 * (sales - LAG(sales) OVER (ORDER BY month))
/ NULLIF(LAG(sales) OVER (ORDER BY month), 0) AS pct_change
FROM monthly_sales
ORDER BY month;
"""
# 2. Data Processing - Find duplicates
def find_duplicates(data: list[dict], key: str) -> list[dict]:
"""Find duplicate records based on a key."""
seen = {}
duplicates = []
for record in data:
k = record[key]
if k in seen:
duplicates.append(record)
else:
seen[k] = record
return duplicates
# 3. Implement rate limiter
from collections import defaultdict
import time
class RateLimiter:
def __init__(self, max_requests: int, window_seconds: int):
self.max_requests = max_requests
self.window = window_seconds
self.requests = defaultdict(list)
def is_allowed(self, user_id: str) -> bool:
now = time.time()
# Remove old requests
self.requests[user_id] = [
t for t in self.requests[user_id]
if now - t < self.window
]
if len(self.requests[user_id]) < self.max_requests:
self.requests[user_id].append(now)
return True
return False
# 4. Design question: Data pipeline for e-commerce
"""
Requirements:
- Process 1M orders/day
- Real-time dashboard updates
- Historical analytics
Architecture:
1. Ingestion: Kafka for real-time events
2. Processing: Spark Streaming for aggregations
3. Storage: Delta Lake for ACID, Snowflake for analytics
4. Serving: Redis for real-time metrics, API for dashboards
"""
2. Resume Optimization
## Data Engineer Resume Template
### Summary
Data Engineer with X years of experience building scalable data pipelines
processing Y TB/day. Expert in [Spark/Airflow/dbt]. Reduced pipeline
latency by Z% at [Company].
### Experience Format (STAR Method)
**Senior Data Engineer** | Company | 2022-Present
- **Situation**: Legacy ETL system processing 500GB daily with 4-hour latency
- **Task**: Redesign for real-time analytics
- **Action**: Built Spark Streaming pipeline with Delta Lake, implemented
incremental processing
- **Result**: Reduced latency to 5 minutes, cut infrastructure costs by 40%
### Skills Section
**Languages**: Python, SQL, Scala
**Frameworks**: Spark, Airflow, dbt, Kafka
**Databases**: PostgreSQL, Snowflake, MongoDB, Redis
**Cloud**: AWS (Glue, EMR, S3), GCP (BigQuery, Dataflow)
**Tools**: Docker, Kubernetes, Terraform, Git
### Quantify Everything
- "Built data pipeline" → "Built pipeline processing 2TB/day with 99.9% uptime"
- "Improved performance" → "Reduced query time from 30min to 30sec (60x improvement)"
3. Interview Questions to Ask
## Questions for Data Engineering Interviews
### About the Team
- What does a typical data pipeline look like here?
- How do you handle data quality issues?
- What's the tech stack? Any planned migrations?
### About the Role
- What would success look like in 6 months?
- What's the biggest data challenge the team faces?
- How do data engineers collaborate with data scientists?
### About Engineering Practices
- How do you handle schema changes in production?
- What's your approach to testing data pipelines?
- How do you manage technical debt?
### Red Flags to Watch For
- "We don't have time for testing"
- "One person handles all the data infrastructure"
- "We're still on [very outdated technology]"
- Vague answers about on-call and incident response
4. Learning Path by Experience Level
## Career Progression
### Junior (0-2 years)
Focus Areas:
- SQL proficiency (complex queries, optimization)
- Python for data processing
- One cloud platform deeply (AWS/GCP)
- Git and basic CI/CD
- Understanding ETL patterns
### Mid-Level (2-5 years)
Focus Areas:
- Distributed systems (Spark)
- Data modeling (dimensional, Data Vault)
- Orchestration (Airflow)
- Infrastructure as Code
- Data quality frameworks
### Senior (5+ years)
Focus Areas:
- System design and architecture
- Cost optimization at scale
- Team leadership and mentoring
- Cross-functional collaboration
- Vendor evaluation and selection
### Staff/Principal (8+ years)
Focus Areas:
- Organization-wide data strategy
- Building data platforms
- Technical roadmap ownership
- Industry thought leadership
Resources
Learning Platforms
Interview Prep
Community
Books
- "Fundamentals of Data Engineering" - Reis & Housley
- "Designing Data-Intensive Applications" - Kleppmann
- "The Data Warehouse Toolkit" - Kimball
Best Practices
# ✅ DO:
- Build public projects on GitHub
- Write technical blog posts
- Contribute to open source
- Network at meetups/conferences
- Keep skills current (follow trends)
# ❌ DON'T:
- Apply without tailoring resume
- Neglect soft skills
- Stop learning after getting hired
- Ignore feedback from interviews
- Burn bridges when leaving jobs
Skill Certification Checklist:
- Have 3+ portfolio projects on GitHub
- Can explain system design decisions
- Can solve SQL problems efficiently
- Have updated LinkedIn and resume
- Active in data engineering community