🧹Data Cleaning Skills

Browse skills in the Data Cleaning category.

programmatic-eda

nimrodfisher's avatarfrom nimrodfisher

Systematic exploratory data analysis following best practices. Use when analyzing any dataset to understand structure, identify data quality issues (duplicates, missing values, inconsistencies, outliers), examine distributions, detect correlations, and generate visualizations. Provides comprehensive data profiling with sanity checks before analysis.

[Data Cleaning]

pandas-data-analyzer

aliemrevezir's avatarfrom aliemrevezir

Pandas kütüphanesi kullanarak veri analizi, veri temizleme, keşifsel veri analizi (EDA) ve görselleştirme işlemlerini gerçekleştirir. Kullanıcı .csv, .xlsx, .json dosyaları paylaştığında veya "veri analizi yap", "pandas kullan", "dataframe oluştur" gibi komutlar verdiğinde bu yeteneği kullanın.

[Data Cleaning]

data-quality-audit

nimrodfisher's avatarfrom nimrodfisher

Comprehensive data quality assessment against defined business rules and constraints. Use when validating data against expected schemas, checking referential integrity across tables, or auditing data pipeline outputs before production use.

[Data Cleaning]

nixtla-schema-mapper

intent-solutions-io's avatarfrom intent-solutions-io

Transform data sources to Nixtla schema (unique_id, ds, y) with column inference. Use when preparing data for forecasting. Trigger with 'map to Nixtla schema' or 'transform data'.

[Data Cleaning]

csv-validator

CBoser's avatarfrom CBoser

Validates and fixes BOM CSV files for ECIR tool compatibility. Use when users need to check CSV files before running ECIR comparisons, fix CSV formatting issues, ensure required columns exist, or diagnose why ECIR tool fails to process a CSV file.

[Data Cleaning]

nixtla-contract-schema-mapper

intent-solutions-io's avatarfrom intent-solutions-io

Transform prediction market data to Nixtla format (unique_id, ds, y). Use when preparing datasets for forecasting. Trigger with 'convert to Nixtla format' or 'schema mapping'.

[Data Cleaning]

dataset-curator

eddiebe147's avatarfrom eddiebe147

Curate and clean training datasets for high-quality machine learning

[Data Cleaning]

dotted-symbol-exclusion

smith6jt-cop's avatarfrom smith6jt-cop

Exclude ALL dotted symbols from yfinance sector lookups and training. Trigger when: (1) yfinance errors on warrants/units/class shares, (2) training notebook fails on excluded symbol types, (3) adding new symbol exclusion patterns.

[Data Cleaning]

pandas-data-processing

vamseeachanta's avatarfrom vamseeachanta

Pandas for time series analysis, OrcaFlex results processing, and marine engineering data workflows

[Data Cleaning]

data-engineering

doanchienthangdev's avatarfrom doanchienthangdev

ML data engineering covering data pipelines, data quality, collection strategies, storage, and versioning for machine learning systems.

[Data Cleaning]

training-data

doanchienthangdev's avatarfrom doanchienthangdev

Training data management including labeling strategies, data augmentation, handling imbalanced data, and data splitting best practices.

[Data Cleaning]

feature-engineering

doanchienthangdev's avatarfrom doanchienthangdev

Feature engineering techniques including feature extraction, transformation, selection, and feature store management for ML systems.

[Data Cleaning]

dataset-engineering

doanchienthangdev's avatarfrom doanchienthangdev

Building and processing datasets - data quality, curation, deduplication, synthesis, annotation, formatting. Use when creating training data, improving data quality, or generating synthetic data.

[Data Cleaning]

result-check

ljchg12-hue's avatarfrom ljchg12-hue

내가 만든 파일이 제대로 됐는지 확인해주는 도구. 빈 칸, 중복, 오류를 자동으로 찾아줘요!

[Data Cleaning]

field-validation

majiayu000's avatarfrom majiayu000

Validate data quality in CSV/Excel files for vehicle insurance platform. Use when checking required fields, validating data formats, detecting quality issues, or generating quality reports. Mentions "validate", "check fields", "data quality", "missing values", or "quality score".

[Data Cleaning]

numpy-sorting

majiayu000's avatarfrom majiayu000

Sorting and searching algorithms including O(n) partitioning, binary search, and hierarchical multi-key sorting. Triggers: sort, argsort, partition, searchsorted, lexsort, nan sort order.

[Data Cleaning]

numpy-set-ops

majiayu000's avatarfrom majiayu000

Set-theoretic operations for finding unique elements, membership testing, and array intersections. Triggers: unique, isin, intersect1d, setdiff1d, union1d.

[Data Cleaning]

staff-mapping-management

majiayu000's avatarfrom majiayu000

Manage staff-institution mapping table for vehicle insurance platform. Use when updating mapping files, resolving name conflicts, converting Excel to JSON, or checking mapping coverage. Mentions "update mapping", "staff conflicts", "mapping table", or "institution assignment".

[Data Cleaning]

numpy-masked

majiayu000's avatarfrom majiayu000

Masked arrays for robust handling of missing or invalid data, ensuring they are excluded from statistical and mathematical computations. Triggers: masked array, numpy.ma, missing data, invalid values, hard mask.

[Data Cleaning]

processing-data

jesseotremblay's avatarfrom jesseotremblay

Brief description of what this skill does. Use when the user mentions specific keywords or requests specific tasks related to this skill's functionality.

[Data Cleaning]

preprocessing-data-with-automated-pipelines

BbgnsurfTech's avatarfrom BbgnsurfTech

Automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation".

[Data Cleaning]

engineering-features-for-machine-learning

BbgnsurfTech's avatarfrom BbgnsurfTech

Create, select, and transform features to improve machine learning model performance. Handles feature scaling, encoding, and importance analysis. Use when asked to "engineer features" or "select features".

[Data Cleaning]

splitting-datasets

BbgnsurfTech's avatarfrom BbgnsurfTech

Split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning".

[Data Cleaning]

skill-name

aRustyDev's avatarfrom aRustyDev

{{DESCRIPTION}}

[Data Cleaning]

ml-fundamentals

pluginagentmarketplace's avatarfrom pluginagentmarketplace

Master machine learning foundations - algorithms, preprocessing, feature engineering, and evaluation

[Data Cleaning]

data-analyst

agent-trust-protocol's avatarfrom agent-trust-protocol

This skill should be used when analyzing CSV datasets, handling missing values through intelligent imputation, and creating interactive dashboards to visualize data trends. Use this skill for tasks involving data quality assessment, automated missing value detection and filling, statistical analysis, and generating Plotly Dash dashboards for exploratory data analysis.

[Data Cleaning]

d3-core-data

zacharyr0th's avatarfrom zacharyr0th

Use when working with data transformations, scales, color schemes, formatting, or CSV/TSV parsing. Invoke for data processing pipelines, scale creation, color interpolation, number/date formatting, or data loading/parsing operations.

[Data Cleaning]

policyengine-uk-data

PolicyEngine's avatarfrom PolicyEngine

UK survey data enhancement - FRS with WAS imputation patterns

[Data Cleaning]

date-normalizer

majiayu000's avatarfrom majiayu000

Use when asked to parse, normalize, standardize, or convert dates from various formats to consistent ISO 8601 or custom formats.

[Data Cleaning]

ai-annotation-workflow

majiayu000's avatarfrom majiayu000

Эксперт по data annotation. Используй для ML labeling, annotation workflows и quality control.

[Data Cleaning]

data-pipeline-patterns

majiayu000's avatarfrom majiayu000

Follow these patterns when implementing data pipelines, ETL, data ingestion, or data validation in OptAIC. Use for point-in-time (PIT) correctness, Arrow schemas, quality checks, and Prefect orchestration.

[Data Cleaning]

data-validator

majiayu000's avatarfrom majiayu000

验证车险CSV数据的完整性和正确性,检查26个必需字段,验证数据类型、枚举值和业务规则。当用户提到"验证数据"、"检查数据"、"数据导入"、"CSV"时使用。

[Data Cleaning]

phone-number-formatter

majiayu000's avatarfrom majiayu000

Standardize and format phone numbers with international support, validation, and multiple output formats.

[Data Cleaning]

data-cleaning

majiayu000's avatarfrom majiayu000

Data cleaning, preprocessing, and quality assurance techniques

[Data Cleaning]

address-parser

majiayu000's avatarfrom majiayu000

Parse unstructured addresses into structured components - street, city, state, zip, country with validation.

[Data Cleaning]

feature-engineering-kit

majiayu000's avatarfrom majiayu000

Auto-generate features with encodings, scaling, polynomial features, and interaction terms for ML pipelines.

[Data Cleaning]

outlier-detective

majiayu000's avatarfrom majiayu000

Detect anomalies and outliers in datasets using statistical and ML methods. Use for data cleaning, fraud detection, or quality control analysis.

[Data Cleaning]

geocoder

majiayu000's avatarfrom majiayu000

Convert addresses to coordinates (geocoding) and coordinates to addresses (reverse geocoding). Use for location data enrichment or address validation.

[Data Cleaning]

data-quality-auditor

majiayu000's avatarfrom majiayu000

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

[Data Cleaning]

loading-insurance-data

majiayu000's avatarfrom majiayu000

加载并预处理保险保单周度数据,支持智能周期检测、多周数据加载、数据验证和清洗。在开始任何保险数据分析任务时使用。

[Data Cleaning]

numpy-string-ops

majiayu000's avatarfrom majiayu000

Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.

[Data Cleaning]

data-cleaning-standards

majiayu000's avatarfrom majiayu000

Clean and standardize vehicle insurance CSV/Excel data. Use when handling missing values, fixing data formats, removing duplicates, or standardizing fields. Mentions "clean data", "handle nulls", "standardize", "duplicates", or "normalize".

[Data Cleaning]

backend-data-processor

majiayu000's avatarfrom majiayu000

Process vehicle insurance Excel data using Pandas - file handling, data cleaning, merging, validation. Use when processing Excel/CSV files, handling data imports, implementing business rules (negative premiums, zero commissions), debugging data pipelines, or optimizing Pandas performance. Keywords: data_processor.py, Excel, CSV, Pandas, merge, deduplication, date normalization.

[Data Cleaning]

polars

majiayu000's avatarfrom majiayu000

Lightning-fast DataFrame library written in Rust for high-performance data manipulation and analysis. Use when user wants blazing fast data transformations, working with large datasets, lazy evaluation pipelines, or needs better performance than pandas. Ideal for ETL, data wrangling, aggregations, joins, and reading/writing CSV, Parquet, JSON files.

[Data Cleaning]

data-profiler

majiayu000's avatarfrom majiayu000

Profile datasets to understand schema, quality, and characteristics. Use when analyzing data files (CSV, JSON, Parquet), discovering dataset properties, assessing data quality, or when user mentions data profiling, schema detection, data analysis, or quality metrics. Provides basic and intermediate profiling including distributions, uniqueness, and pattern detection.

[Data Cleaning]

data-things

tachyon-beep's avatarfrom tachyon-beep

Helps with data stuff

[Data Cleaning]

data-stuff

tachyon-beep's avatarfrom tachyon-beep

For data

[Data Cleaning]

data-quality-checker

armanzeroeight's avatarfrom armanzeroeight

Implement data quality checks, validation rules, and monitoring. Use when ensuring data quality, validating data pipelines, or implementing data governance.

[Data Cleaning]

polars

lifangda's avatarfrom lifangda

Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.

[Data Cleaning]

csv-excel-merger

OneWave-AI's avatarfrom OneWave-AI

Merge multiple CSV/Excel files with intelligent column matching, data deduplication, and conflict resolution. Handles different schemas, formats, and combines data sources. Use when users need to merge spreadsheets, combine data exports, or consolidate multiple files into one.

[Data Cleaning]
← Back to All Skills
Data Cleaning - Claude AI Skills | Claude Skills