vibium

Browser automation for AI agents and humans

Stars: 2564

Visit

Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.

README:

Vibium

Browser automation without the drama.

Vibium is browser automation infrastructure built for AI agents. A single binary handles browser lifecycle, WebDriver BiDi protocol, and exposes an MCP server — so Claude Code (or any MCP client) can drive a browser with zero setup. Works great for AI agents, test automation, and anything else that needs a browser.

New here? Getting Started Tutorial — zero to hello world in 5 minutes.

Why Vibium?

Browser automation for AI agents and humans.

AI-native. MCP server built-in. Claude Code can drive a browser out of the box.
Zero config. One install, browser downloads automatically, visible by default.
Standards-based. Built on WebDriver BiDi, not proprietary protocols controlled by large corporations.
Lightweight. Single ~10MB binary. No runtime dependencies.

Quick Reference

Component	Purpose	Interface
Clicker	Browser automation, BiDi proxy, MCP server	CLI / stdio / WebSocket :9515
JS Client	Developer-facing API	npm package
Python Client	Developer-facing API	pip package

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         LLM / Agent                         │
│          (Claude Code, Codex, Gemini, Local Models)         │
└─────────────────────────────────────────────────────────────┘
                      ▲
                      │ MCP Protocol (stdio)
                      ▼
           ┌─────────────────────┐         
           │   Vibium Clicker    │
           │                     │
           │  ┌───────────────┐  │
           │  │  MCP Server   │  │
           │  └───────▲───────┘  │         ┌──────────────────┐
           │          │          │         │                  │
           │  ┌───────▼───────┐  │WebSocket│                  │
           │  │  BiDi Proxy   │  │◄───────►│  Chrome Browser  │
           │  └───────────────┘  │  BiDi   │                  │
           │                     │         │                  │
           └─────────────────────┘         └──────────────────┘
                      ▲
                      │ WebSocket BiDi :9515
                      ▼
┌─────────────────────────────────────────────────────────────┐
│                     Client Libraries                        │
│         npm install vibium  ·  pip install vibium           │
│                                                             │
│    ┌─────────────────┐               ┌─────────────────┐    │
│    │ Async API       │               │    Sync API     │    │
│    │ await vibe.go() │               │    vibe.go()    │    │
│    │                 │               │                 │    │
│    └─────────────────┘               └─────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Components

Clicker

A single Go binary (~10MB) that does everything:

Browser Management: Detects/launches Chrome with BiDi enabled
BiDi Proxy: WebSocket server that routes commands to browser
MCP Server: stdio interface for LLM agents
Auto-Wait: Polls for elements before interacting
Screenshots: Viewport capture as PNG

Design goal: The binary is invisible. JS developers just npm install vibium and it works.

JS/TS Client

// Option 1: require (REPL-friendly)
const { browserSync } = require('vibium')

// Option 2: dynamic import (REPL with --experimental-repl-await)
const { browser } = await import('vibium')

// Option 3: static import (in .mjs or .ts files)
import { browser, browserSync } from 'vibium'

Sync API:

const fs = require('fs')
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')

const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)

const link = vibe.find('a')
link.click()
vibe.quit()

Async API:

const fs = await import('fs/promises')
const { browser } = await import('vibium')

const vibe = await browser.launch()
await vibe.go('https://example.com')

const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)

const link = await vibe.find('a')
await link.click()
await vibe.quit()

Python Client

from vibium import browser, browser_sync

Sync API:

from vibium import browser_sync as browser

vibe = browser.launch()
vibe.go("https://example.com")

png = vibe.screenshot()
with open("screenshot.png", "wb") as f:
    f.write(png)

link = vibe.find("a")
link.click()
vibe.quit()

Async API:

import asyncio
from vibium import browser

async def main():
    vibe = await browser.launch()
    await vibe.go("https://example.com")

    png = await vibe.screenshot()
    with open("screenshot.png", "wb") as f:
        f.write(png)

    link = await vibe.find("a")
    await link.click()
    await vibe.quit()

asyncio.run(main())

For Agents

One command to add browser control to your AI coding assistant:

Claude Code:

claude mcp add vibium -- npx -y vibium

Gemini CLI:

gemini mcp add vibium npx -y vibium

That's it. Chrome downloads automatically on first use.

See detailed setup guides: Claude Code | Gemini CLI

Tool	Description
`browser_launch`	Start browser (visible by default)
`browser_navigate`	Go to URL
`browser_find`	Find element by CSS selector
`browser_evaluate`	Execute JavaScript to extract data, query DOM, or inspect page state
`browser_click`	Click an element
`browser_type`	Type text into an element
`browser_screenshot`	Capture viewport (base64 or save to file with `--screenshot-dir`)
`browser_quit`	Close browser

For Humans

npm install vibium   # JavaScript/TypeScript
pip install vibium   # Python

This automatically:

Installs the Clicker binary for your platform
Downloads Chrome for Testing + chromedriver to platform cache:
- Linux: ~/.cache/vibium/
- macOS: ~/Library/Caches/vibium/
- Windows: %LOCALAPPDATA%\vibium\

No manual browser setup required.

Skip browser download (if you manage browsers separately):

VIBIUM_SKIP_BROWSER_DOWNLOAD=1 npm install vibium

Platform Support

Platform	Architecture	Status
Linux	x64	✅ Supported
macOS	x64 (Intel)	✅ Supported
macOS	arm64 (Apple Silicon)	✅ Supported
Windows	x64	✅ Supported

Quick Start

As a library:

import { browser } from "vibium";

const vibe = await browser.launch();
await vibe.go("https://example.com");
const el = await vibe.find("a");
await el.click();
await vibe.quit();

With Claude Code:

Once installed via claude mcp add, just ask Claude to browse:

"Go to example.com and click the first link"

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Roadmap

V1 focuses on the core loop: browser control via MCP and JS client.

See V2-ROADMAP.md for planned features:

Java client
Cortex (memory/navigation layer)
Retina (recording extension)
Video recording
AI-powered locators

Updates

License

Apache 2.0

For Tasks:

Click tags to check more tools for each tasks

automate browsing test websites interact with web elements capture screenshots control browser

For Jobs:

software developer quality assurance tester automation engineer ai engineer web developer

Alternative AI tools for vibium

Similar Open Source Tools

No tools available

For similar tasks

crawlee-python

Crawlee-python is a web scraping and browser automation library that covers crawling and scraping end-to-end, helping users build reliable scrapers fast. It allows users to crawl the web for links, scrape data, and store it in machine-readable formats without worrying about technical details. With rich configuration options, users can customize almost any aspect of Crawlee to suit their project's needs.

github

: 8.0k

browser-use-webui

Browser-Use WebUI is a project that enhances the original browser-use tool by providing a brand new web interface, expanded LLM support for various Large Language Models, custom browser support for using your own browser with the tool, and a customized agent with optimized prompts. The tool aims to make websites accessible for AI agents and offers user-friendly interaction with the browser agent, eliminating the need for re-login to sites and dealing with authentication challenges. It also supports high-definition screen recording.

github

: 218

TermNet

TermNet is an AI-powered terminal assistant that connects a Large Language Model (LLM) with shell command execution, browser search, and dynamically loaded tools. It streams responses in real-time, executes tools one at a time, and maintains conversational memory across steps. The project features terminal integration for safe shell command execution, dynamic tool loading without code changes, browser automation powered by Playwright, WebSocket architecture for real-time communication, a memory system to track planning and actions, streaming LLM output integration, a safety layer to block dangerous commands, dual interface options, a notification system, and scratchpad memory for persistent note-taking. The architecture includes a multi-server setup with servers for WebSocket, browser automation, notifications, and web UI. The project structure consists of core backend files, various tools like web browsing and notification management, and servers for browser automation and notifications. Installation requires Python 3.9+, Ollama, and Chromium, with setup steps provided in the README. The tool can be used via the launcher for managing components or directly by starting individual servers. Additional tools can be added by registering them in `toolregistry.json` and implementing them in Python modules. Safety notes highlight the blocking of dangerous commands, allowed risky commands with warnings, and the importance of monitoring tool execution and setting appropriate timeouts.

github

: 61

vibium

github

: 2.6k

AIPex

AIPex is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, AIPex can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions. It offers features like natural language control, AI-powered intelligence, multi-step automation, universal compatibility, smart data extraction, precision actions, form automation, visual understanding, developer-friendly with extensive API, and lightning-fast execution of automation tasks.

github

: 375

moling

MoLing is a computer-use and browser-use MCP Server that implements system interaction through operating system APIs, enabling file system operations such as reading, writing, merging, statistics, and aggregation, as well as the ability to execute system commands. It is a dependency-free local office automation assistant. Requiring no installation of any dependencies, MoLing can be run directly and is compatible with multiple operating systems, including Windows, Linux, and macOS. This eliminates the hassle of dealing with environment conflicts involving Node.js, Python, Docker, and other development environments. Command-line operations are dangerous and should be used with caution. MoLing supports features like file system operations, command-line terminal execution, browser control powered by 'github.com/chromedp/chromedp', and future plans for personal PC data organization, document writing assistance, schedule planning, and life assistant features. MoLing has been tested on macOS but may have issues on other operating systems.

github

: 125

browser

Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.

github

: 11.8k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k