markdown-converter
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, remote URLs (including YouTube), or EPubs to Markdown format for LLM processing or text analysis.
When & Why to Use This Skill
This Claude skill provides a comprehensive solution for converting a wide variety of file formats into clean, structured Markdown using MarkItDown. It is specifically designed to bridge the gap between complex document types—such as PDFs, Office files, and media—and LLM-friendly text, ensuring that structural elements like tables and headings are preserved for optimal AI processing and data analysis.
Use Cases
- LLM Context Optimization: Convert complex PDF reports, Word documents, and PowerPoint presentations into Markdown to provide Claude with high-fidelity context for summarization or reasoning.
- Multimodal Data Extraction: Utilize built-in OCR for images and transcription for audio files to transform visual and auditory information into searchable, analyzable text.
- Web and Remote Content Ingestion: Directly fetch and convert remote URLs, including YouTube content and EPubs, into Markdown for rapid research and knowledge synthesis.
- Structured Data Conversion: Transform Excel spreadsheets, CSVs, and JSON files into Markdown tables and blocks, making it easier to perform data analysis within a chat interface.
- Batch Archive Processing: Automatically extract and convert the contents of ZIP archives, consolidating multiple documents into a unified format for streamlined workflows.
| name | markdown-converter |
|---|---|
| description | Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, remote URLs (including YouTube), or EPubs to Markdown format for LLM processing or text analysis. |
Markdown Converter
Convert files to Markdown using uvx markitdown — no installation required.
Basic Usage
# Convert to stdout
uvx markitdown input.pdf
# Convert a remote URL (markitdown will fetch it)
uvx markitdown https://example.com
# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md
# From stdin
cat input.pdf | uvx markitdown
Supported Formats
- Documents: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- Web/Data: HTML, CSV, JSON, XML
- Media: Images (EXIF + OCR), Audio (EXIF + transcription)
- Other: ZIP (iterates contents), remote URLs (HTTP/HTTPS, including YouTube), EPub
Options
-o OUTPUT # Output file
-x EXTENSION # Hint file extension (for stdin)
-m MIME_TYPE # Hint MIME type
-c CHARSET # Hint charset (e.g., UTF-8)
-d # Use Azure Document Intelligence
-e ENDPOINT # Document Intelligence endpoint
--use-plugins # Enable 3rd-party plugins
--list-plugins # Show installed plugins
Examples
# Convert Word document
uvx markitdown report.docx -o report.md
# Convert a remote document
uvx markitdown https://example.com/report.pdf -o report.md
# Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md
# Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md
# Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md
# Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
Notes
- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use
-dwith Azure Document Intelligence