Computer Use Preview — Detailed Introduction
Computer Use Preview is an open-source demo repository published by the google-gemini organization that implements a browser-controlling agent powered by Gemini models (or via Vertex AI). The project demonstrates how a language model can be used as a task-oriented agent to interact with web pages and perform typical "computer use" operations such as navigation, typing, clicking, and capturing screenshots.
Key features
- Uses Gemini Developer API or Vertex AI as the model backend.
- Two execution environments:
- "playwright": runs a local Chromium instance controlled with Playwright.
- "browserbase": connects to a Browserbase instance as a remote browser backend.
- Simple CLI (
main.py) for issuing natural-language queries like: "Go to Google and type 'Hello World' into the search bar". - Options to highlight the mouse cursor in screenshots for debugging and to specify an initial URL.
- Environment-variable driven configuration for API keys and Vertex/Browserbase settings.
Installation & quick start
- Clone the repo and create a Python virtual environment.
- Install Python dependencies from requirements.txt.
- Install Playwright and required browser/system deps (if using the
playwrightenvironment). - Set environment variables:
- For Gemini Developer API: GEMINI_API_KEY
- For Vertex AI: USE_VERTEXAI=true, VERTEXAI_PROJECT, VERTEXAI_LOCATION
- For Browserbase: BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID (when using browserbase)
- Run the agent with the CLI, e.g.:
python main.py --query "Go to Google and type 'Hello World' into the search bar" --env="playwright"You can also pass --initial_url and --highlight_mouse flags.
Configuration details
- The repository exposes clear environment variables for choosing the model backend and browser backend.
- When
USE_VERTEXAIis true the code uses Vertex AI client configuration (project & location required). - For local runs the Playwright environment needs the Chrome browser installed via
playwright install chromeand Playwright system deps (playwright install-deps chrome).
Known issues & workarounds
- Playwright Dropdown Menu: On some OSes, native
<select>elements are rendered by the OS and cannot be properly captured or interacted with. The repo documents two mitigations:- Use the Browserbase backend instead of Playwright.
- Inject a custom
proxy-selectscript and CSS to replace native<select>elements with JS-rendered ones. This is a partial workaround and not 100% reliable.
Use cases
- Prototyping multi-step web tasks driven by a large model (RPA-like scenarios).
- Demonstrating model-driven UI workflows and browser automation with LLM guidance.
- Debugging and experimenting with model-to-action pipelines (screenshot feedback, mouse highlighting).
Who is this for
Developers and researchers who want a hands-on example of how to integrate Gemini (or Vertex AI) with a browser automation backend to build an agent that performs web interactions and other on-screen tasks. The repo is a preview/demo rather than a polished production product—useful as a starting point for building more robust agent systems.
Repo metadata (from provided context)
- Stars: 2408 (as collected in context)
- Created at: 2025-05-06
Overall, Computer Use Preview is a practical reference implementation that ties a language model backend to browser automation infrastructure to illustrate model-driven "computer use" capabilities.
