Document Data Extraction Skills

detailed-design-parser

Parses detailed-design.md files to extract and format file-specific changes for easier copying. Use this skill when the user wants to generate a `detailed-design-by-file.md` from a `detailed-design.md` file.

[Document Data Extraction]

award-extractor

from Lbstrydom

Extracts wine awards from PDF documents. Use when importing competition results, processing wine ratings, or when user mentions "extract awards", "parse awards PDF", "import competition results", or "process wine ratings booklet".

[Document Data Extraction]

extract-images

from Embassy-of-the-Free-Mind

Extract and catalog illustrations from historical books using AI vision. Generates rich metadata (subjects, figures, symbols, style, technique) and museum-style descriptions. Use when asked to extract images, run image detection, or process book illustrations.

[Document Data Extraction]

gemini-ocr

from erymuzuan

Document OCR integration using Google Gemini Flash API for extracting data from passports and licenses.

[Document Data Extraction]

pdf-tools

from caiopizzol

Search and extract content from PDF files. Use when searching PDFs, finding text in documents, or extracting specific pages without reading the entire file.

[Document Data Extraction]

receipt-parser-engineer

from ThomasMcCrossin

Create, fix, and refine deterministic receipt/invoice parsers in curlys-books, including vendor detection, OCR text extraction routing (pdfplumber for text PDFs → AWS Textract fallback; images → Textract), golden fixture creation, and updates to the vendor dispatcher/registry. Use when adding a new vendor parser, debugging mis-detections, improving totals/tax/date/line extraction, or deciding when to rely on Claude Vision fallback for vendors without a tested parser.

[Document Data Extraction]

timeline-generator

from Feedforward-AI

Generates a chronological timeline of key events, decisions, and flashpoints from a collection of documents. Use when asked to create a timeline, understand sequence of events, see what happened when, or track how a situation evolved over time.

[Document Data Extraction]

parse-bank-statement-pdf

from houfu

Parse bank statement PDF text into structured transaction data with account information and transactions in consistent JSON format. Works with any bank format. Use when you need to extract or parse transactions from PDF bank statements.

[Document Data Extraction]

nlp-pipeline-builder

from eddiebe147

Build natural language processing pipelines for text analysis and understanding

[Document Data Extraction]

datasheet

from lumberbarons

Extract structured information from integrated circuit and component datasheets (PDF files or URLs) and generate consistent markdown summaries. Use when the user requests to extract, summarize, analyze, or document information from IC/component datasheets, or when they provide a datasheet and want structured documentation. Triggers on phrases like "extract this datasheet", "summarize this datasheet", "analyze [component name]", "document this IC", or when working with datasheets for hardware design.

[Document Data Extraction]

entity-extractor

from eddiebe147

Extract named entities from text with high accuracy and customization

[Document Data Extraction]

receipt-processing

from mattleonard16

Receipt OCR extraction, LLM fallback, and job pipeline patterns for TaxHelper. Use when working on receipt upload, extraction, inbox, or transaction creation from receipts.

[Document Data Extraction]

ocr

from masayan1126

画像ファイルからテキストを抽出しクリップボードにコピー。「文字起こし」「OCR」などで使用。

[Document Data Extraction]

ingredient-scanner

from raydocs

扫描护肤品成分表，OCR识别并AI解读成分功效与风险。实现成分扫描功能时使用此技能。

[Document Data Extraction]

genome-analyzer

from artwist-polyakov

Анализирует генетические данные пользователя из VCF файла. Используй когда пользователь спрашивает о своей генетике, наследственных признаках, предрасположенностях, метаболизме веществ (кофеин, алкоголь, лекарства), спортивных способностях, рисках заболеваний, питании на основе генов.

[Document Data Extraction]

look-at

from edwinhu

This skill should be used when the user asks to 'look at', 'analyze', 'describe', 'extract from', or 'what's in' media files like PDFs, images, diagrams, screenshots, or charts. Triggers include: 'what does this image show', 'extract the table from this PDF', 'describe this diagram', 'what's in this screenshot', 'analyze this chart', 'read this image', 'get text from this PDF', 'summarize this document', or requests for specific data extraction from visual or document files. Use when analyzed/interpreted content is needed rather than literal file reading (which uses Read tool).

[Document Data Extraction]

langextract

from aeonbridge

Extract structured information from unstructured text using LLMs with source grounding. Use when extracting entities from documents, medical notes, clinical reports, or any text requiring precise, traceable extraction. Supports Gemini, OpenAI, and local models (Ollama). Includes visualization and long document processing.

[Document Data Extraction]

multimodal-looker

from bahayonghang

多模态内容分析专家代理，专注于图片、PDF、图表等视觉内容的解读和信息提取。当用户需要以下帮助时使用：(1) 分析图片内容 (2) 提取 PDF 信息 (3) 解读图表和数据可视化 (4) 理解架构图和流程图 (5) 从截图提取信息 (6) 设计稿分析。触发词包括：「看这个图」「分析这个 PDF」「这个图表说明什么」「帮我看一下」等视觉内容分析请求。

[Document Data Extraction]

docling

from wfukatsu

Document reading and conversion using Docling. Use this skill when user asks to read, open, or process document files in these formats: PDF, DOCX, PPTX, XLSX, HTML, Markdown, AsciiDoc, or images (PNG, JPG, TIFF). Supports OCR for scanned documents. Trigger when:(1) User asks to read/open a document file (e.g., "このPDFを読んで", "read this document", "ファイルの内容を確認して")(2) File extension is .pdf, .docx, .pptx, .xlsx, .html, .md, .adoc, .png, .jpg, .tiff(3) User wants to extract text from scanned documents with OCR(4) User wants to convert documents to Markdown/JSON/HTML(5) User wants to process documents with tables, figures, or photos(6) User wants to extract images/figures from documents

[Document Data Extraction]

docx-reader

from childbamboo

Reads Microsoft Word (.docx) files and extracts text content. Use when needing to read .docx documents. Requires python-docx package.

[Document Data Extraction]

pdf-vision-reader

from childbamboo

Converts PDF pages to images and uses vision analysis to extract content including diagrams, charts, and visual elements. Use for PDFs with rich visual content. Requires pdf2image and poppler-utils.

[Document Data Extraction]

pdf-reader

from childbamboo

Reads PDF files and extracts text content in Markdown format. Handles tables and multi-page documents. Use when needing to read PDF documents. Requires pdfplumber package.

[Document Data Extraction]

land-reduction-trespass

from ACSKamloops

Clerk for reserve reduction, trespass, survey errors, and railway takings; use when processing the Land_Reduction_Trespass queue.

[Document Data Extraction]

fiduciary-duty-negligence

from ACSKamloops

Clerk for Crown fiduciary breaches, fund mismanagement, conflicts of interest, and failure to protect reserve lands; use for Fiduciary_Duty_Negligence queue.

[Document Data Extraction]

data-analysis

from Acurioustractor

AI-powered data analysis for Empathy Ledger. Use when working with themes, quotes, story suggestions, transcript analysis, storyteller connections, or any feature requiring extracted insights. Ensures consistent analysis patterns across the platform.

[Document Data Extraction]

coercion-duress

from ACSKamloops

Clerk for forced surrenders, threats, procedural irregularities, and lack of informed consent; use for Coercion_Duress queue.

[Document Data Extraction]

water-rights-fishing

from ACSKamloops

Clerk for water licenses, irrigation, riparian rights, and fishing restrictions affecting Pukaist/Nlaka'pamux; use for Water_Rights_Fishing queue.

[Document Data Extraction]

governance-sovereignty

from ACSKamloops

Clerk for Chief/Council authority, assertions of title, self-government, and resistance to federal imposition; use for Governance_Sovereignty queue.

[Document Data Extraction]

design-asset-parser

from munlucky

Parse Figma/PDF design exports to extract UI specs and draft design-spec.md and pending-questions.md. Use when analyzing design assets.

[Document Data Extraction]

gemini-document-processing

from majiayu000

Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)

[Document Data Extraction]

docling

from majiayu000

Convert documents (PPTX, PDF, DOCX, XLSX, images) to Markdown/JSON/HTML using IBM Docling. This skill should be used when user asks to convert, parse, or extract content from documents. Triggers on "convert pptx", "parse pdf", "extract from document", "конвертуй презентацію", "витягни з pdf".

[Document Data Extraction]

legal-ocr

from majiayu000

Extrai texto de documentos jurídicos escaneados em PDF usando OCR otimizado para linguagem jurídica brasileira. Use quando precisar converter PDFs escaneados (sentenças, petições, acórdãos) em texto editável com alta precisão. Suporta documentos de baixa qualidade, multi-colunas, tabelas e termos jurídicos específicos.

[Document Data Extraction]

document-ocr-processing

from majiayu000

Process scanned documents and images containing Chuukese text using OCR with specialized post-processing for accent characters and traditional formatting. Use when working with scanned books, documents, or images that contain Chuukese text that needs to be digitized.

[Document Data Extraction]

bill-processing

from majiayu000

Extract data from bill/receipt images and return JSON for lunch-splitter app

[Document Data Extraction]

airparser-api

from majiayu000

Guia para integrar con el servicio de parsing de documentos Airparser via API, webhooks y Make.com. Usar cuando se configuren inboxes, esquemas de extraccion, o flujos de automatizacion para procesamiento de recibos.

[Document Data Extraction]

data-extraction

from majiayu000

Use when extracting structured data from medical research PDFs, parsing study characteristics, patient demographics, outcomes, and results. Invoke for systematic review data collection from papers.

[Document Data Extraction]

ai-training-data-generation

from majiayu000

Generate high-quality training datasets from documents, text corpora, and structured content. Use when creating AI training data from dictionaries, documents, or when generating examples for machine learning models. Optimized for low-resource languages and domain-specific knowledge extraction.

[Document Data Extraction]

document-parser

from majiayu000

Parse large documents into structured sections with abstracts and metadata

[Document Data Extraction]

legislative-flattener

from majiayu000

Converts hierarchical legislative text from Word documents into a flat list of requirements. Use when processing regulatory documents, compliance frameworks, or legal text that needs to be extracted into individual, numbered requirements for analysis or mapping.

[Document Data Extraction]

neurosurgical-book-parser

from majiayu000

Extract structured knowledge from neurosurgical and spine surgery textbooks. Identifies anatomical structures, surgical procedures, complications, and clinical relationships. Use when processing medical PDFs, building surgical knowledge graphs, or creating clinical decision support content. Applies kaizen continuous improvement from prior extractions.

[Document Data Extraction]

gemini-pdf

from odysseus0

Process multimodal documents using Gemini CLI, leveraging Gemini's superior multimodal capabilities. Use for PDFs, scanned documents, image-heavy documents, or any file where visual understanding matters. Ideal for extracting content from complex layouts, tables, diagrams, handwritten notes, or mixed text/image documents. Triggers on PDF processing, document extraction, "use Gemini for this", or when document has visual complexity that benefits from multimodal understanding.

[Document Data Extraction]

data-normalizer

from majiayu000

발굴조사 자료(논문/보고서/주변유적) 수집 및 메타데이터 정규화

[Document Data Extraction]

nanonets-api

from majiayu000

Guia para integrar con el servicio OCR de Nanonets via API. Usar cuando se necesite extraer datos de documentos, crear modelos OCR, subir archivos para prediccion, o entrenar modelos personalizados.

[Document Data Extraction]

pdf-reader

from majiayu000

Extract text, tables, and images from PDF files using pdfplumber and PyMuPDF. Use when analyzing PDF documents, brand materials, reports, or any content that requires structured extraction from PDF format. Supports table detection, layout preservation, and high-quality image extraction.

[Document Data Extraction]

sense

from plurigrid

sense - Diagrammatic Video Extraction with Subtitle Alignment

[Document Data Extraction]

bsee-data-extractor

from vamseeachanta

Extract and process BSEE (Bureau of Safety and Environmental Enforcement) production data. Use for querying oil/gas production data by API number, block, lease, or field with automatic data normalization and caching.

[Document Data Extraction]

sodir-data-extractor

from vamseeachanta

SODIR Data Extractor (user)

[Document Data Extraction]

ocr-super-surya

from aktsmm

GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract.

[Document Data Extraction]

extract-heal-text-from-powerpoint

from karstegg

Extract HEAL matrix content from PowerPoint slides and save as formatted text files matching N2 HEAL format

[Document Data Extraction]

extract-epiroc-bev-weekly-report

from karstegg

Extract key sections from Epiroc BRMO BEV weekly PDF reports into structured markdown format for Week N

[Document Data Extraction]

← Back to All Skills