vibium
Browser automation for AI agents and humans
Stars: 2564
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
README:
Browser automation without the drama.
Vibium is browser automation infrastructure built for AI agents. A single binary handles browser lifecycle, WebDriver BiDi protocol, and exposes an MCP server — so Claude Code (or any MCP client) can drive a browser with zero setup. Works great for AI agents, test automation, and anything else that needs a browser.
New here? Getting Started Tutorial — zero to hello world in 5 minutes.
Browser automation for AI agents and humans.
- AI-native. MCP server built-in. Claude Code can drive a browser out of the box.
- Zero config. One install, browser downloads automatically, visible by default.
- Standards-based. Built on WebDriver BiDi, not proprietary protocols controlled by large corporations.
- Lightweight. Single ~10MB binary. No runtime dependencies.
| Component | Purpose | Interface |
|---|---|---|
| Clicker | Browser automation, BiDi proxy, MCP server | CLI / stdio / WebSocket :9515 |
| JS Client | Developer-facing API | npm package |
| Python Client | Developer-facing API | pip package |
┌─────────────────────────────────────────────────────────────┐
│ LLM / Agent │
│ (Claude Code, Codex, Gemini, Local Models) │
└─────────────────────────────────────────────────────────────┘
▲
│ MCP Protocol (stdio)
▼
┌─────────────────────┐
│ Vibium Clicker │
│ │
│ ┌───────────────┐ │
│ │ MCP Server │ │
│ └───────▲───────┘ │ ┌──────────────────┐
│ │ │ │ │
│ ┌───────▼───────┐ │WebSocket│ │
│ │ BiDi Proxy │ │◄───────►│ Chrome Browser │
│ └───────────────┘ │ BiDi │ │
│ │ │ │
└─────────────────────┘ └──────────────────┘
▲
│ WebSocket BiDi :9515
▼
┌─────────────────────────────────────────────────────────────┐
│ Client Libraries │
│ npm install vibium · pip install vibium │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Async API │ │ Sync API │ │
│ │ await vibe.go() │ │ vibe.go() │ │
│ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
A single Go binary (~10MB) that does everything:
- Browser Management: Detects/launches Chrome with BiDi enabled
- BiDi Proxy: WebSocket server that routes commands to browser
- MCP Server: stdio interface for LLM agents
- Auto-Wait: Polls for elements before interacting
- Screenshots: Viewport capture as PNG
Design goal: The binary is invisible. JS developers just npm install vibium and it works.
// Option 1: require (REPL-friendly)
const { browserSync } = require('vibium')
// Option 2: dynamic import (REPL with --experimental-repl-await)
const { browser } = await import('vibium')
// Option 3: static import (in .mjs or .ts files)
import { browser, browserSync } from 'vibium'Sync API:
const fs = require('fs')
const { browserSync } = require('vibium')
const vibe = browserSync.launch()
vibe.go('https://example.com')
const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)
const link = vibe.find('a')
link.click()
vibe.quit()Async API:
const fs = await import('fs/promises')
const { browser } = await import('vibium')
const vibe = await browser.launch()
await vibe.go('https://example.com')
const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)
const link = await vibe.find('a')
await link.click()
await vibe.quit()from vibium import browser, browser_syncSync API:
from vibium import browser_sync as browser
vibe = browser.launch()
vibe.go("https://example.com")
png = vibe.screenshot()
with open("screenshot.png", "wb") as f:
f.write(png)
link = vibe.find("a")
link.click()
vibe.quit()Async API:
import asyncio
from vibium import browser
async def main():
vibe = await browser.launch()
await vibe.go("https://example.com")
png = await vibe.screenshot()
with open("screenshot.png", "wb") as f:
f.write(png)
link = await vibe.find("a")
await link.click()
await vibe.quit()
asyncio.run(main())One command to add browser control to your AI coding assistant:
Claude Code:
claude mcp add vibium -- npx -y vibiumGemini CLI:
gemini mcp add vibium npx -y vibiumThat's it. Chrome downloads automatically on first use.
See detailed setup guides: Claude Code | Gemini CLI
| Tool | Description |
|---|---|
browser_launch |
Start browser (visible by default) |
browser_navigate |
Go to URL |
browser_find |
Find element by CSS selector |
browser_evaluate |
Execute JavaScript to extract data, query DOM, or inspect page state |
browser_click |
Click an element |
browser_type |
Type text into an element |
browser_screenshot |
Capture viewport (base64 or save to file with --screenshot-dir) |
browser_quit |
Close browser |
npm install vibium # JavaScript/TypeScript
pip install vibium # PythonThis automatically:
- Installs the Clicker binary for your platform
- Downloads Chrome for Testing + chromedriver to platform cache:
- Linux:
~/.cache/vibium/ - macOS:
~/Library/Caches/vibium/ - Windows:
%LOCALAPPDATA%\vibium\
- Linux:
No manual browser setup required.
Skip browser download (if you manage browsers separately):
VIBIUM_SKIP_BROWSER_DOWNLOAD=1 npm install vibium| Platform | Architecture | Status |
|---|---|---|
| Linux | x64 | ✅ Supported |
| macOS | x64 (Intel) | ✅ Supported |
| macOS | arm64 (Apple Silicon) | ✅ Supported |
| Windows | x64 | ✅ Supported |
As a library:
import { browser } from "vibium";
const vibe = await browser.launch();
await vibe.go("https://example.com");
const el = await vibe.find("a");
await el.click();
await vibe.quit();With Claude Code:
Once installed via claude mcp add, just ask Claude to browse:
"Go to example.com and click the first link"
See CONTRIBUTING.md for development setup and guidelines.
V1 focuses on the core loop: browser control via MCP and JS client.
See V2-ROADMAP.md for planned features:
- Java client
- Cortex (memory/navigation layer)
- Retina (recording extension)
- Video recording
- AI-powered locators
- 2025-12-31: Python Client
- 2025-12-22: Day 12 - Published to npm
- 2025-12-21: Day 11 - Polish & Error Handling
- 2025-12-20: Day 10 - MCP Server
- 2025-12-19: Day 9 - Actionability
- 2025-12-19: Day 8 - Elements & Sync API
- 2025-12-17: Halfway There
- 2025-12-16: Week 1 Progress
- 2025-12-11: V1 Announcement
Apache 2.0
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for vibium
Similar Open Source Tools
For similar tasks
crawlee-python
Crawlee-python is a web scraping and browser automation library that covers crawling and scraping end-to-end, helping users build reliable scrapers fast. It allows users to crawl the web for links, scrape data, and store it in machine-readable formats without worrying about technical details. With rich configuration options, users can customize almost any aspect of Crawlee to suit their project's needs.
browser-use-webui
Browser-Use WebUI is a project that enhances the original browser-use tool by providing a brand new web interface, expanded LLM support for various Large Language Models, custom browser support for using your own browser with the tool, and a customized agent with optimized prompts. The tool aims to make websites accessible for AI agents and offers user-friendly interaction with the browser agent, eliminating the need for re-login to sites and dealing with authentication challenges. It also supports high-definition screen recording.
TermNet
TermNet is an AI-powered terminal assistant that connects a Large Language Model (LLM) with shell command execution, browser search, and dynamically loaded tools. It streams responses in real-time, executes tools one at a time, and maintains conversational memory across steps. The project features terminal integration for safe shell command execution, dynamic tool loading without code changes, browser automation powered by Playwright, WebSocket architecture for real-time communication, a memory system to track planning and actions, streaming LLM output integration, a safety layer to block dangerous commands, dual interface options, a notification system, and scratchpad memory for persistent note-taking. The architecture includes a multi-server setup with servers for WebSocket, browser automation, notifications, and web UI. The project structure consists of core backend files, various tools like web browsing and notification management, and servers for browser automation and notifications. Installation requires Python 3.9+, Ollama, and Chromium, with setup steps provided in the README. The tool can be used via the launcher for managing components or directly by starting individual servers. Additional tools can be added by registering them in `toolregistry.json` and implementing them in Python modules. Safety notes highlight the blocking of dangerous commands, allowed risky commands with warnings, and the importance of monitoring tool execution and setting appropriate timeouts.
vibium
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
AIPex
AIPex is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, AIPex can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions. It offers features like natural language control, AI-powered intelligence, multi-step automation, universal compatibility, smart data extraction, precision actions, form automation, visual understanding, developer-friendly with extensive API, and lightning-fast execution of automation tasks.
moling
MoLing is a computer-use and browser-use MCP Server that implements system interaction through operating system APIs, enabling file system operations such as reading, writing, merging, statistics, and aggregation, as well as the ability to execute system commands. It is a dependency-free local office automation assistant. Requiring no installation of any dependencies, MoLing can be run directly and is compatible with multiple operating systems, including Windows, Linux, and macOS. This eliminates the hassle of dealing with environment conflicts involving Node.js, Python, Docker, and other development environments. Command-line operations are dangerous and should be used with caution. MoLing supports features like file system operations, command-line terminal execution, browser control powered by 'github.com/chromedp/chromedp', and future plans for personal PC data organization, document writing assistance, schedule planning, and life assistant features. MoLing has been tested on macOS but may have issues on other operating systems.
browser
Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.