These files define a set of tools for browser automation, likely to be used with AI agents to interact with web pages. The tools are built on top of Playwright (a browser automation library) and provide a programmatic interface for common browsing tasks.
- Each tool has a schema (name, description, input schema) and a handler function
- Tools use Zod for input validation and schema definition
- Tools return structured content (text or images) as results
- browser_navigate: Navigate to a specific URL
- browser_go_back: Navigate back in browser history
- browser_go_forward: Navigate forward in browser history
- browser_wait: Wait for a specified time in seconds
- browser_press_key: Press a keyboard key
- browser_save_as_pdf: Save current page as PDF
- browser_close: Close the current page
- browser_screenshot: Take a screenshot of the current page
- browser_move_mouse: Move mouse to specific coordinates
- browser_click (coordinate-based): Click at specific x,y coordinates
- browser_drag (coordinate-based): Drag mouse from one position to another
- browser_type (keyboard): Type text and optionally submit
- browser_snapshot: Capture accessibility structure of the page
- browser_click (element-based): Click on a specific element using accessibility reference
- browser_drag (element-based): Drag between two elements
- browser_hover: Hover over an element
- browser_type (element-based): Type text into a specific element
- runAndWait: Execute an action and wait for navigation/network activity to complete
- captureAriaSnapshot: Generate an accessibility snapshot of the page
- waitForCompletion: Wait for all network requests to complete
-
Dual interaction methods:
- Coordinate-based interactions (x,y positions)
- Element-based interactions (using accessibility references)
-
Snapshot capabilities:
- Screenshot capture
- Accessibility tree snapshots (more semantic than screenshots)
-
Input validation:
- All tools use Zod schemas to validate inputs
- Schemas are converted to JSON Schema for documentation
-
Page state awareness:
- Tools wait for page navigation and network activity to complete
- Handles timeouts gracefully
This toolkit allows for comprehensive browser automation, from basic navigation to complex interactions with page elements, with strong focus on accessibility and stable interactions with web content.