Question 1

What is article-extractor?

Accepted Answer

The Article Extractor skill is a powerful utility designed to transform cluttered web pages into clean, structured Markdown files. It solves the problem of 'web noise' by automatically stripping away advertisements, navigation bars, and irrelevant scripts, leaving only the core content. With built-in support for the Wayback Machine and multiple extraction engines (Jina, Trafilatura, Readability), it ensures high reliability even when dealing with dead links, paywalls, or complex JavaScript-heavy websites.

Question 2

When should I use article-extractor?

Accepted Answer

article-extractor is useful in the following scenarios: • Personal Knowledge Management: Seamlessly convert online blog posts, tutorials, and news articles into Markdown format for easy import into tools like Obsidian, Notion, or Logseq. • AI Content Analysis: Provide clean, high-signal text input for LLMs to perform accurate summarization, sentiment analysis, or data synthesis without the interference of HTML clutter. • Digital Archiving and Offline Reading: Capture and save permanent local copies of web content for offline access, ensuring that valuable information remains available even if the original source goes offline. • Research Recovery: Utilize the integrated Wayback Machine fallback to retrieve and extract content from broken URLs or pages that have been moved behind a paywall. • Automated Documentation Collection: Build local technical libraries by batch-extracting documentation pages and tutorials into a standardized, readable format.

article-extractor

When & Why to Use This Skill

Use Cases

Article Extractor

Workflow

Usage

Key Options

Common Failures

Local Tools (Optional)