free-vision

thanhtunguet's avatarfrom thanhtunguet

Handle vision/image tasks (read, describe, analyze images) by calling Gemini CLI or Qwen Code CLI from the shell. Use for requests to interpret or describe images, extract visible text, or summarize visual content; prefer Gemini and fall back to Qwen if Gemini fails or is too generic.

0stars🔀0forks📁View on GitHub🕐Updated Jan 11, 2026

When & Why to Use This Skill

The Free Vision Claude skill empowers agents to perform advanced image analysis and visual data extraction by leveraging Gemini and Qwen CLI tools. It provides a robust framework for interpreting images, extracting verbatim text (OCR), and generating structured visual summaries. With built-in redundancy and fallback logic, it ensures high-quality visual processing for complex workflows involving local image files.

Use Cases

  • Automated OCR and Text Extraction: Efficiently extract verbatim text from screenshots, scanned documents, or infographics for data entry and analysis.
  • Visual Content Summarization: Generate concise one-sentence summaries and detailed object descriptions for large sets of images to assist in content indexing.
  • Technical Image Analysis: Identify specific UI elements, key objects, and notable details within technical diagrams or software screenshots.
  • Accessibility Enhancements: Create descriptive alt-text and detailed visual narrations for images to improve content accessibility.
namefree-vision
descriptionHandle vision/image tasks (read, describe, analyze images) by calling Gemini CLI or Qwen Code CLI from the shell. Use for requests to interpret or describe images, extract visible text, or summarize visual content; prefer Gemini and fall back to Qwen if Gemini fails or is too generic.

Free Vision

Quick workflow

  1. Identify the image path(s). Prefer absolute paths; confirm the file exists before calling a CLI.
  2. Run Gemini first with a specific, structured prompt. Use the CLI's file-include syntax if supported (commonly @/path/to/image).
    gemini "Analyze the image: (1) 1-sentence summary, (2) key objects, (3) visible text verbatim, (4) notable details. @/absolute/path/to/image"
    
  3. If Gemini errors, produces empty output, or responds too generically, run Qwen with the same prompt structure and image reference.
    qwen "Analyze the image: (1) 1-sentence summary, (2) key objects, (3) visible text verbatim, (4) notable details. @/absolute/path/to/image"
    

Prompting tips

  • Be explicit about the required fields and verbosity.
  • Ask for verbatim text extraction when relevant.
  • If the output is vague, re-run with stricter instructions: “Be specific; avoid generic phrases; list exact items and locations.”

Failure handling

  • If the CLI rejects @/path syntax, retry in interactive mode and include the image path in the prompt as supported by the CLI.
  • If both tools fail to load the image, report the failure and ask the user for guidance on the expected image input format.
free-vision – AI Agent Skills | Claude Skills