Overview
Browser Use is an MIT-licensed Python package and cloud service that bridges large-language-model agents and real-world websites.
Instead of relying on brittle screenshot vision, it turns the live DOM into structured JSON, letting an agent see buttons, forms and text exactly as a human would. Developers can run it locally with Playwright or point to the hosted Browser Use Cloud.
Key Capabilities
- Universal LLM support – works with any LangChain-compatible model.
- Interactive-element detection and XPath extraction for precise clicks and scraping.
- Multi-tab & session memory for complex, chain-of-thought workflows.
- Vision-model integration to reason over screenshots when needed.
- Custom actions & plug-ins so you can add domain-specific automation.
- Robust handling of dynamic sites including login flows, cookies and CAPTCHAs.
- REST & Web-socket Cloud API for scalable, headless browser fleets.