Data Science Expert
Comprehensive data science frameworks for analytics, machine learning, and data-driven decision making.
Data Strategy
Data Maturity Model
| Level |
Name |
Characteristics |
| 1 |
Ad Hoc |
Manual, inconsistent, siloed |
| 2 |
Opportunistic |
Some automation, point solutions |
| 3 |
Systematic |
Defined processes, governance emerging |
| 4 |
Differentiating |
Data-driven decisions, advanced analytics |
| 5 |
Transformative |
AI-first, competitive advantage |
Analytics Value Chain
DATA → INFORMATION → INSIGHT → ACTION → VALUE
PROGRESSION:
Descriptive: What happened?
Diagnostic: Why did it happen?
Predictive: What will happen?
Prescriptive: What should we do?
Autonomous: Self-optimizing systems
Statistical Analysis
Descriptive Statistics
CENTRAL TENDENCY:
- Mean: Sum / Count (sensitive to outliers)
- Median: Middle value (robust to outliers)
- Mode: Most frequent value
DISPERSION:
- Range: Max - Min
- Variance: Average squared deviation
- Standard Deviation: √Variance
- IQR: Q3 - Q1 (robust)
DISTRIBUTION SHAPE:
- Skewness: Asymmetry (0 = symmetric)
- Kurtosis: Tail heaviness (3 = normal)
For detailed inferential statistics and hypothesis testing, see Statistical Methods Reference.
Machine Learning
Algorithm Selection
| Task |
Algorithms |
When to Use |
| Classification |
Logistic Regression, Random Forest, XGBoost, Neural Networks |
Categorical outcomes |
| Regression |
Linear Regression, Ridge/Lasso, Random Forest, XGBoost |
Continuous outcomes |
| Clustering |
K-Means, Hierarchical, DBSCAN |
Group discovery |
| Dimensionality Reduction |
PCA, t-SNE, UMAP |
Feature reduction, visualization |
| Anomaly Detection |
Isolation Forest, One-Class SVM, Autoencoders |
Outlier detection |
| Time Series |
ARIMA, Prophet, LSTM |
Sequential data |
| Recommendation |
Collaborative Filtering, Content-Based, Matrix Factorization |
Personalization |
| NLP |
Transformers, BERT, GPT |
Text understanding/generation |
For detailed ML pipelines, feature engineering, and model monitoring, see ML Pipelines Reference.
Data Governance
Data Governance Framework
GOVERNANCE PILLARS:
POLICIES:
- Data ownership
- Data classification
- Data retention
- Data access
- Data quality standards
ROLES:
- Data Owner: Accountable for data domain
- Data Steward: Day-to-day quality management
- Data Custodian: Technical implementation
- Data Consumer: End user
PROCESSES:
- Data cataloging
- Metadata management
- Data lineage
- Issue resolution
- Change management
METRICS:
- Data quality scores
- Policy compliance
- Data access requests
- Issue resolution time
Data Quality Dimensions
| Dimension |
Definition |
Measurement |
| Accuracy |
Correct representation of reality |
% records matching source |
| Completeness |
All required data present |
% non-null values |
| Consistency |
Same across systems |
% matching across sources |
| Timeliness |
Available when needed |
Latency, freshness |
| Validity |
Conforms to format/rules |
% passing validation |
| Uniqueness |
No unwanted duplicates |
Duplicate rate |
Business Intelligence
BI Architecture
ARCHITECTURE LAYERS:
DATA SOURCES:
- Operational systems
- External data
- IoT/streaming
DATA INTEGRATION:
- ETL/ELT pipelines
- Data lakes
- Data warehouses
SEMANTIC LAYER:
- Business definitions
- Calculated metrics
- Hierarchies
- Relationships
PRESENTATION:
- Dashboards
- Reports
- Ad-hoc analysis
- Embedded analytics
Dashboard Design Principles
DESIGN PRINCIPLES:
PURPOSE:
- One clear objective per dashboard
- Know your audience
- Enable decisions
LAYOUT:
- Most important top-left
- Related items grouped
- Progressive disclosure
- Whitespace for clarity
VISUALS:
- Right chart for data type
- Consistent formatting
- Minimal decoration
- Color with purpose
INTERACTIVITY:
- Filters for exploration
- Drill-down capability
- Cross-filtering
- Tooltip details
Metric Design
METRIC DEFINITION TEMPLATE:
NAME: [Metric name]
DEFINITION: [Clear business definition]
FORMULA: [Precise calculation]
OWNER: [Responsible person]
DATA SOURCE: [Where it comes from]
GRAIN: [Level of detail]
FREQUENCY: [Update cadence]
DIMENSIONS: [Slicing attributes]
TARGETS: [Goals/benchmarks]
RELATED: [Related metrics]
Predictive Modeling
Use Case Framework
| Use Case |
Business Application |
Approach |
| Churn Prediction |
Retention programs |
Classification |
| Demand Forecasting |
Inventory planning |
Time series |
| Lead Scoring |
Sales prioritization |
Classification |
| Price Optimization |
Revenue management |
Regression/RL |
| Fraud Detection |
Risk mitigation |
Anomaly detection |
| Recommendation |
Personalization |
Collaborative filtering |
| Customer Segmentation |
Marketing targeting |
Clustering |
| Lifetime Value |
Customer investment |
Regression |
Data Ethics & Privacy
Ethical AI Framework
PRINCIPLES:
FAIRNESS:
- No discriminatory outcomes
- Bias testing across groups
- Regular auditing
ACCOUNTABILITY:
- Clear ownership
- Decision audit trails
- Escalation process
TRANSPARENCY:
- Explainable decisions
- Clear documentation
- User communication
PRIVACY:
- Data minimization
- Consent management
- Security controls
Bias Detection
BIAS TYPES:
HISTORICAL: Reflects past discrimination
REPRESENTATION: Training data not representative
MEASUREMENT: Proxy variables correlate with protected attributes
AGGREGATION: Single model for diverse populations
EVALUATION: Inappropriate benchmarks
FAIRNESS METRICS:
- Demographic Parity: Equal positive rates
- Equalized Odds: Equal TPR and FPR
- Individual Fairness: Similar inputs, similar outputs
- Calibration: Equal accuracy across groups
Analytics Team Structure
Team Roles
| Role |
Focus |
Skills |
| Data Engineer |
Pipelines, infrastructure |
SQL, Python, Spark, Cloud |
| Data Analyst |
Reporting, ad-hoc analysis |
SQL, BI tools, Statistics |
| Data Scientist |
Modeling, ML |
Python/R, ML, Statistics |
| ML Engineer |
Model deployment |
MLOps, Software Engineering |
| Analytics Engineer |
Data modeling |
dbt, SQL, Data Modeling |
Operating Models
| Model |
Description |
Best For |
| Centralized |
Single analytics team |
Consistency, efficiency |
| Decentralized |
Embedded in business units |
Business alignment |
| Hub & Spoke |
Central CoE + embedded |
Balance of both |
| Federated |
Shared platform, domain teams |
Scale with autonomy |
References
See Also