Skip to content

Instantly share code, notes, and snippets.

@simonw

simonw/tools.md Secret

Created March 25, 2025 01:39
Show Gist options
  • Save simonw/69200999149221c549c1f62e7befa20f to your computer and use it in GitHub Desktop.
Save simonw/69200999149221c549c1f62e7befa20f to your computer and use it in GitHub Desktop.

Browser Automation Tools Overview

These files define a set of tools for browser automation, likely to be used with AI agents to interact with web pages. The tools are built on top of Playwright (a browser automation library) and provide a programmatic interface for common browsing tasks.

Core Components

Tool Structure

  • Each tool has a schema (name, description, input schema) and a handler function
  • Tools use Zod for input validation and schema definition
  • Tools return structured content (text or images) as results

Available Tools

Navigation Tools (common.ts)

  • browser_navigate: Navigate to a specific URL
  • browser_go_back: Navigate back in browser history
  • browser_go_forward: Navigate forward in browser history
  • browser_wait: Wait for a specified time in seconds
  • browser_press_key: Press a keyboard key
  • browser_save_as_pdf: Save current page as PDF
  • browser_close: Close the current page

Screenshot and Mouse Tools (screenshot.ts)

  • browser_screenshot: Take a screenshot of the current page
  • browser_move_mouse: Move mouse to specific coordinates
  • browser_click (coordinate-based): Click at specific x,y coordinates
  • browser_drag (coordinate-based): Drag mouse from one position to another
  • browser_type (keyboard): Type text and optionally submit

Accessibility Snapshot Tools (snapshot.ts)

  • browser_snapshot: Capture accessibility structure of the page
  • browser_click (element-based): Click on a specific element using accessibility reference
  • browser_drag (element-based): Drag between two elements
  • browser_hover: Hover over an element
  • browser_type (element-based): Type text into a specific element

Utility Functions (utils.ts)

  • runAndWait: Execute an action and wait for navigation/network activity to complete
  • captureAriaSnapshot: Generate an accessibility snapshot of the page
  • waitForCompletion: Wait for all network requests to complete

Key Features

  1. Dual interaction methods:

    • Coordinate-based interactions (x,y positions)
    • Element-based interactions (using accessibility references)
  2. Snapshot capabilities:

    • Screenshot capture
    • Accessibility tree snapshots (more semantic than screenshots)
  3. Input validation:

    • All tools use Zod schemas to validate inputs
    • Schemas are converted to JSON Schema for documentation
  4. Page state awareness:

    • Tools wait for page navigation and network activity to complete
    • Handles timeouts gracefully

This toolkit allows for comprehensive browser automation, from basic navigation to complex interactions with page elements, with strong focus on accessibility and stable interactions with web content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment