LogoAIAny
Icon for item

Computer Use Preview

Computer Use Preview is an open-source browser-agent demo from google-gemini that shows how to use Gemini models (or Vertex AI) to perform computer-style tasks (web navigation, form input, screenshots). It supports local Playwright and Browserbase backends, requires a Gemini API key or Vertex AI configuration, and includes CLI, env var setup, and known-workarounds for platform-specific issues.

Introduction

Computer Use Preview — Detailed Introduction

Computer Use Preview is an open-source demo repository published by the google-gemini organization that implements a browser-controlling agent powered by Gemini models (or via Vertex AI). The project demonstrates how a language model can be used as a task-oriented agent to interact with web pages and perform typical "computer use" operations such as navigation, typing, clicking, and capturing screenshots.

Key features
  • Uses Gemini Developer API or Vertex AI as the model backend.
  • Two execution environments:
    • "playwright": runs a local Chromium instance controlled with Playwright.
    • "browserbase": connects to a Browserbase instance as a remote browser backend.
  • Simple CLI (main.py) for issuing natural-language queries like: "Go to Google and type 'Hello World' into the search bar".
  • Options to highlight the mouse cursor in screenshots for debugging and to specify an initial URL.
  • Environment-variable driven configuration for API keys and Vertex/Browserbase settings.
Installation & quick start
  1. Clone the repo and create a Python virtual environment.
  2. Install Python dependencies from requirements.txt.
  3. Install Playwright and required browser/system deps (if using the playwright environment).
  4. Set environment variables:
    • For Gemini Developer API: GEMINI_API_KEY
    • For Vertex AI: USE_VERTEXAI=true, VERTEXAI_PROJECT, VERTEXAI_LOCATION
    • For Browserbase: BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID (when using browserbase)
  5. Run the agent with the CLI, e.g.:
python main.py --query "Go to Google and type 'Hello World' into the search bar" --env="playwright"

You can also pass --initial_url and --highlight_mouse flags.

Configuration details
  • The repository exposes clear environment variables for choosing the model backend and browser backend.
  • When USE_VERTEXAI is true the code uses Vertex AI client configuration (project & location required).
  • For local runs the Playwright environment needs the Chrome browser installed via playwright install chrome and Playwright system deps (playwright install-deps chrome).
Known issues & workarounds
  • Playwright Dropdown Menu: On some OSes, native <select> elements are rendered by the OS and cannot be properly captured or interacted with. The repo documents two mitigations:
    1. Use the Browserbase backend instead of Playwright.
    2. Inject a custom proxy-select script and CSS to replace native <select> elements with JS-rendered ones. This is a partial workaround and not 100% reliable.
Use cases
  • Prototyping multi-step web tasks driven by a large model (RPA-like scenarios).
  • Demonstrating model-driven UI workflows and browser automation with LLM guidance.
  • Debugging and experimenting with model-to-action pipelines (screenshot feedback, mouse highlighting).
Who is this for

Developers and researchers who want a hands-on example of how to integrate Gemini (or Vertex AI) with a browser automation backend to build an agent that performs web interactions and other on-screen tasks. The repo is a preview/demo rather than a polished production product—useful as a starting point for building more robust agent systems.

Repo metadata (from provided context)
  • Stars: 2408 (as collected in context)
  • Created at: 2025-05-06

Overall, Computer Use Preview is a practical reference implementation that ties a language model backend to browser automation infrastructure to illustrate model-driven "computer use" capabilities.

Information

  • Websitegithub.com
  • Authorsgoogle-gemini (Google)
  • Published date2025/05/06