AIAny - Computer Use Preview

Computer Use Preview — Detailed Introduction

Computer Use Preview is an open-source demo repository published by the google-gemini organization that implements a browser-controlling agent powered by Gemini models (or via Vertex AI). The project demonstrates how a language model can be used as a task-oriented agent to interact with web pages and perform typical "computer use" operations such as navigation, typing, clicking, and capturing screenshots.

Key features

Uses Gemini Developer API or Vertex AI as the model backend.
Two execution environments:
- "playwright": runs a local Chromium instance controlled with Playwright.
- "browserbase": connects to a Browserbase instance as a remote browser backend.
Simple CLI (main.py) for issuing natural-language queries like: "Go to Google and type 'Hello World' into the search bar".
Options to highlight the mouse cursor in screenshots for debugging and to specify an initial URL.
Environment-variable driven configuration for API keys and Vertex/Browserbase settings.

Installation & quick start

Clone the repo and create a Python virtual environment.
Install Python dependencies from requirements.txt.
Install Playwright and required browser/system deps (if using the playwright environment).
Set environment variables:
- For Gemini Developer API: GEMINI_API_KEY
- For Vertex AI: USE_VERTEXAI=true, VERTEXAI_PROJECT, VERTEXAI_LOCATION
- For Browserbase: BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID (when using browserbase)
Run the agent with the CLI, e.g.:

python main.py --query "Go to Google and type 'Hello World' into the search bar" --env="playwright"

You can also pass --initial_url and --highlight_mouse flags.

Configuration details

The repository exposes clear environment variables for choosing the model backend and browser backend.
When USE_VERTEXAI is true the code uses Vertex AI client configuration (project & location required).
For local runs the Playwright environment needs the Chrome browser installed via playwright install chrome and Playwright system deps (playwright install-deps chrome).

Known issues & workarounds

Playwright Dropdown Menu: On some OSes, native <select> elements are rendered by the OS and cannot be properly captured or interacted with. The repo documents two mitigations:
1. Use the Browserbase backend instead of Playwright.
2. Inject a custom proxy-select script and CSS to replace native <select> elements with JS-rendered ones. This is a partial workaround and not 100% reliable.

Use cases

Prototyping multi-step web tasks driven by a large model (RPA-like scenarios).
Demonstrating model-driven UI workflows and browser automation with LLM guidance.
Debugging and experimenting with model-to-action pipelines (screenshot feedback, mouse highlighting).

Who is this for

Developers and researchers who want a hands-on example of how to integrate Gemini (or Vertex AI) with a browser automation backend to build an agent that performs web interactions and other on-screen tasks. The repo is a preview/demo rather than a polished production product—useful as a starting point for building more robust agent systems.

Repo metadata (from provided context)

Stars: 2408 (as collected in context)
Created at: 2025-05-06

Overall, Computer Use Preview is a practical reference implementation that ties a language model backend to browser automation infrastructure to illustrate model-driven "computer use" capabilities.

Computer Use Preview

Introduction

Computer Use Preview — Detailed Introduction

Key features

Installation & quick start

Configuration details

Known issues & workarounds

Use cases

Who is this for

Repo metadata (from provided context)

Information

Categories

Tags

More Items

MLX LM

MiroThinker

Memvid