What is vibium-browser-automation?

This Claude skill provides a comprehensive framework for browser automation using the Vibium MCP tool within Cursor IDE. It enables users to programmatically navigate web pages, interact with UI elements like buttons and forms, capture screenshots for debugging, and perform complex web scraping or automated testing tasks directly from their development environment. By bridging the gap between the IDE and the browser, it streamlines workflows that require real-time web interaction and data extraction.

When should I use vibium-browser-automation?

vibium-browser-automation is useful in the following scenarios: • Automated Web Scraping: Extracting structured data from dynamic websites by navigating through pages and identifying specific CSS elements. • End-to-End UI Testing: Verifying website functionality and user flows by simulating actions like clicking, typing, and form submission within the Cursor IDE. • Visual Regression and Documentation: Automatically capturing viewport screenshots across multiple URLs to monitor UI consistency or generate visual documentation. • Data Entry Automation: Streamlining repetitive tasks such as filling out multi-page web forms or migrating data between web-based platforms. • Web Research and Analysis: Combining browser navigation with OCR tools to extract and analyze text from complex web layouts or images.

name	vibium-browser-automation
description	Comprehensive guide for using Vibium browser automation tool via MCP. Use when (1) Automating web interactions in Cursor IDE, (2) Navigating web pages, (3) Taking screenshots, (4) Clicking buttons or links, (5) Filling forms or input fields, (6) Finding elements on web pages, (7) Web scraping tasks, (8) Testing websites, or any task requiring programmatic browser control within Cursor IDE
short-description	Browser automation with Vibium MCP

Vibium Browser Automation

Guide for using Vibium browser automation tool through MCP protocol in Cursor IDE.

Available Tools

Tool	Description	Notes
`browser_launch`	Start browser (visible by default)	Use `headless: false` for debugging
`browser_navigate`	Go to URL	Waits for page load
`browser_find`	Find element by CSS selector	Returns element info (tag, text, box)
`browser_click`	Click an element	Waits for element to be clickable
`browser_type`	Type text into an element	Waits for element to be editable
`browser_screenshot`	Capture viewport	Saves to `Pictures/Vibium/` by default
`browser_quit`	Close browser	Clean up when done

Core Workflows

Basic Navigation and Screenshot

# Launch browser
browser_launch(headless=False)

# Navigate to page
browser_navigate(url="https://example.com")

# Take screenshot
browser_screenshot(filename="example.png")

# Close browser
browser_quit()

Finding and Interacting with Elements

# Find element
element = browser_find(selector="h1")
# Returns: tag=h1, text="Example", box={x:100, y:200, w:300, h:50}

# Click element
browser_click(selector="a")

# Type into input
browser_type(selector="input[name='q']", text="search query")

Form Filling

# Navigate to form page
browser_navigate(url="https://example.com/form")

# Fill multiple fields
browser_type(selector="input[name='name']", text="John Doe")
browser_type(selector="input[name='email']", text="john@example.com")
browser_type(selector="input[name='phone']", text="123-456-7890")

# Take screenshot of filled form
browser_screenshot(filename="form-filled.png")

Best Practices

CSS Selectors

Use simple, reliable selectors:

✅ input[name="q"] - Attribute selectors
✅ textarea[name="q"] - Element + attribute
✅ h1, a, button - Simple element selectors
✅ .class-name - Class selectors
❌ :has-text() - Not supported
❌ Complex pseudo-selectors - May fail

Multiple selectors for flexibility:

# Try multiple selectors if unsure
browser_find(selector="textarea[name='q'], input[name='q']")

Error Handling

Connection issues:

If browser connection fails, relaunch: browser_launch()
Check if browser is still running before operations

Element not found:

Wait for page to load before finding elements
Use more generic selectors if specific ones fail
Take screenshot to inspect page state

Timeouts:

Some operations may timeout after 30s
Break complex tasks into smaller steps
Take screenshots at key points for debugging

Screenshot Management

Screenshots save to C:\Users\<user>\Pictures\Vibium\ on Windows
Use descriptive filenames: google-search-typed.png
Take screenshots before/after important actions for debugging

Limitations and Workarounds

For detailed explanation of limitations and comprehensive workarounds, see limitations.md.

No Snapshot Log Support

Problem: Vibium doesn't support saving accessibility snapshots (YAML format).

Workaround: Combine with cursor-ide-browser:

Use vibium for browser operations
Use cursor-ide-browser's browser_snapshot to get accessibility snapshot
Save snapshot to YAML file
Use snapshot-query tools to analyze snapshot

No Keyboard Actions

Problem: Can't press Enter or other keys directly.

Workaround:

For form submission, find and click submit button instead
For search, find search button and click it
Or navigate directly to result URL if possible

Limited Selector Support

Problem: Some advanced CSS selectors don't work.

Workaround:

Use basic selectors (element, class, attribute)
Try multiple selector variations
Use browser_find to test selectors before clicking/typing

Text Extraction Limitations

Problem: Cannot directly extract full page text content.

Recommended Solution: EasyOCR + Screenshot + Scroll + Grep

This is the most reliable one-time solution:

Take screenshot with browser_screenshot()
Use EasyOCR to extract all text from image
Use grep/regex to search in OCR text
Scroll page and repeat for full content

See limitations.md for complete implementation.

Other Workarounds:

Combine with cursor-ide-browser snapshot for full text extraction
Use web_search tool for structured information
Extract multiple small elements instead of large ones

Manual Information Organization

Problem: Need to manually combine and organize data from multiple browser_find results.

Workaround:

Use web_search for pre-structured data
Combine tools (vibium + cursor-ide-browser + snapshot-query)
Create extraction scripts for automated organization

Common Patterns

Web Search Automation

# Navigate to search engine
browser_navigate(url="https://www.google.com")

# Find search box
browser_find(selector="textarea[name='q'], input[name='q']")

# Type search query
browser_type(selector="textarea[name='q']", text="your query")

# Take screenshot
browser_screenshot(filename="search-typed.png")

# Note: Can't press Enter, need to find and click search button
# Or navigate directly to search results URL

Information Extraction

# Navigate to page
browser_navigate(url="https://example.com")

# Find elements
heading = browser_find(selector="h1")
links = browser_find(selector="a")

# Extract information from results
# Results include: tag, text, box (position/size)

Multi-Page Workflow

# Page 1
browser_navigate(url="https://example.com/page1")
browser_screenshot(filename="page1.png")

# Page 2
browser_navigate(url="https://example.com/page2")
browser_screenshot(filename="page2.png")

# Page 3
browser_navigate(url="https://example.com/page3")
element = browser_find(selector=".target")
browser_click(selector=".target")
browser_screenshot(filename="page3-clicked.png")

Troubleshooting

Browser Connection Lost

Symptoms: "failed to get browsing context" errors

Solution:

# Relaunch browser
browser_launch(headless=False)
# Then retry operations

Element Not Found

Symptoms: "element not found" errors

Solutions:

Wait for page to fully load
Use more generic selectors
Take screenshot to verify page state
Try alternative selectors

Timeout Errors

Symptoms: Operations timeout after 30s

Solutions:

Break task into smaller steps
Take screenshots between steps
Verify page loaded correctly before operations

Search/Form Submission

Problem: Can't press Enter to submit

Solutions:

Find submit button and click it
Navigate directly to result URL
Use form action URL with parameters

Integration with Other Tools

Combining with cursor-ide-browser

For tasks requiring accessibility snapshots:

Use vibium for browser operations
Use cursor-ide-browser for snapshot capture
Save snapshot to YAML
Use snapshot-query for analysis

Combining with web_search

For research tasks:

Use web_search for initial information gathering
Use vibium to visit specific URLs found
Extract detailed information with browser_find
Take screenshots for documentation

Example Use Cases

1. Website Screenshot Collection

urls = ["https://example.com", "https://example.org", "https://example.net"]
for i, url in enumerate(urls):
    browser_navigate(url=url)
    browser_screenshot(filename=f"site-{i+1}.png")

2. Form Data Entry

browser_navigate(url="https://example.com/form")
browser_type(selector="input[name='name']", text="Test User")
browser_type(selector="input[name='email']", text="test@example.com")
browser_type(selector="input[name='phone']", text="123-456-7890")
browser_screenshot(filename="form-filled.png")

3. Information Research

# Search for information
browser_navigate(url="https://www.google.com/search?q=your+query")
browser_screenshot(filename="search-results.png")

# Find and click first result
first_result = browser_find(selector=".g h3 a")
browser_click(selector=".g h3 a")
browser_screenshot(filename="result-page.png")

Key Takeaways

EasyOCR for text extraction - Use EasyOCR + Screenshot + Scroll + Grep for reliable full-page text extraction (see easyocr-workflow.md)
Keep selectors simple - Use basic CSS selectors for reliability
Take screenshots often - Helps with debugging and documentation
Handle errors gracefully - Relaunch browser if connection lost
Combine tools - Use vibium with other browser tools for advanced features
Test selectors first - Use browser_find to verify selectors before clicking/typing
No keyboard actions - Use clicks instead of key presses
No snapshot logs - Use cursor-ide-browser for accessibility snapshots

When & Why to Use This Skill

Use Cases

Vibium Browser Automation

Available Tools

Core Workflows

Basic Navigation and Screenshot

Finding and Interacting with Elements

Form Filling

Best Practices

CSS Selectors

Error Handling

Screenshot Management

Limitations and Workarounds

No Snapshot Log Support

No Keyboard Actions

Limited Selector Support

Text Extraction Limitations

Manual Information Organization

Common Patterns

Web Search Automation

Information Extraction

Multi-Page Workflow

Troubleshooting

Browser Connection Lost

Element Not Found

Timeout Errors

Search/Form Submission

Integration with Other Tools

Combining with cursor-ide-browser

Combining with web_search

Example Use Cases

1. Website Screenshot Collection

2. Form Data Entry

3. Information Research

Key Takeaways