dev-browser
Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
When & Why to Use This Skill
The Dev Browser skill enables advanced browser automation with persistent page state, allowing for seamless navigation, form interaction, and data extraction. By maintaining login sessions and cookies across executions, it empowers Claude to handle complex, multi-step web workflows and interact with modern web applications just like a human user.
Use Cases
- Automated Web Testing: Perform end-to-end QA testing by simulating user journeys, clicking elements, and verifying page states with visual screenshots.
- Data Scraping and Extraction: Efficiently harvest structured data from websites using token-efficient methods like ARIA snapshots and visible text extraction.
- Persistent Session Workflows: Automate tasks on password-protected sites by maintaining authenticated states, eliminating the need for repeated logins.
- Web Interaction Debugging: Use visual feedback and accessibility trees to discover and interact with elements on unknown or complex page layouts.
| name | dev-browser |
|---|---|
| description | Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request. |
Dev Browser Skill
Browser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally.
Choosing Your Approach
- Local/source-available sites: Read the source code first to write selectors directly
- Unknown page layouts: Use
getAISnapshot()to discover elements andselectSnapshotRef()to interact with them - Visual feedback: Take screenshots to see what the user sees
Setup
IMPORTANT: Always use Standalone Mode for browser automation.
Standalone Mode (Default)
Launches a Chromium browser with a persistent profile. Login sessions, cookies, and local storage persist across browser restarts.
Start the server:
~/.config/opencode/skill/dev-browser/server.sh &
Wait for the Ready message before running scripts. Add --headless flag if user requests headless mode.
Key points:
- Profile stored at
~/.config/opencode/skill/dev-browser/profiles/browser-data - Once logged in, future sessions remain authenticated
- Use this mode for local dev testing with auth (localhost:3000, etc.)
Writing Scripts
Run all scripts from the dev-browser directory.
Execute scripts inline using heredocs:
cd ~/.config/opencode/skill/dev-browser && npx tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("example");
await page.setViewportSize({ width: 1280, height: 800 });
await page.goto("https://example.com");
await waitForPageLoad(page);
console.log({ title: await page.title(), url: page.url() });
await client.disconnect();
EOF
Key Principles
- Small scripts: Each script does ONE thing (navigate, click, fill, check)
- Evaluate state: Log/return state at the end to decide next steps
- Descriptive page names: Use
"checkout","login", not"main" - Disconnect to exit:
await client.disconnect()- pages persist on server - Plain JS in evaluate:
page.evaluate()runs in browser - no TypeScript syntax
Workflow Loop
- Write a script to perform one action
- Run it and observe the output
- Evaluate - did it work? What's the current state?
- Decide - is the task complete or do we need another script?
- Repeat until task is done
Client API
const client = await connect();
const page = await client.page("name"); // Get or create named page
const pages = await client.list(); // List all page names
await client.close("name"); // Close a page
await client.disconnect(); // Disconnect (pages persist)
// ARIA Snapshot methods
const snapshot = await client.getAISnapshot("name"); // Get accessibility tree
const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref
// Token-efficient content extraction
const outline = await client.getOutline("name"); // Tree of all elements
const interactive = await client.getInteractiveOutline("name"); // Only interactive elements
const text = await client.getVisibleText("name"); // Visible text only
Token-Efficient Content Extraction
| Method | Use case | Token efficiency |
|---|---|---|
getInteractiveOutline() |
Discover clickable elements | Most efficient |
getOutline() |
Understand page structure | Very efficient |
getVisibleText() |
Extract readable content | Very efficient |
getAISnapshot() |
Need ref-based clicking | Full ARIA tree |
screenshot() |
Visual debugging | Uses vision tokens |
Screenshots
await page.screenshot({ path: "tmp/screenshot.png" });
await page.screenshot({ path: "tmp/full.png", fullPage: true });
ARIA Snapshot (Element Discovery)
Use getAISnapshot() to discover page elements. Returns YAML-formatted accessibility tree with [ref=eN] references for interaction:
const snapshot = await client.getAISnapshot("hackernews");
console.log(snapshot); // Find the ref you need
const element = await client.selectSnapshotRef("hackernews", "e2");
await element.click();
Error Recovery
Page state persists after failures. Debug with:
cd ~/.config/opencode/skill/dev-browser && npx tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect();
const page = await client.page("hackernews");
await page.screenshot({ path: "tmp/debug.png" });
console.log({
url: page.url(),
title: await page.title(),
bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),
});
await client.disconnect();
EOF