![web-ui](/statics/github-mark.png)
web-ui
Run AI Agent in your browser.
Stars: 3612
![screenshot](/screenshots_githubs/browser-use-web-ui.jpg)
WebUI is a user-friendly tool built on Gradio that enhances website accessibility for AI agents. It supports various Large Language Models (LLMs) and allows custom browser integration for seamless interaction. The tool eliminates the need for re-login and authentication challenges, offering high-definition screen recording capabilities.
README:
This project builds upon the foundation of the browser-use, which is designed to make websites accessible for AI agents.
We would like to officially thank WarmShao for his contribution to this project.
WebUI: is built on Gradio and supports a most of browser-use
functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent.
Expanded LLM Support: We've integrated support for various Large Language Models (LLMs), including: Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, Ollama etc. And we plan to add support for even more models in the future.
Custom Browser Support: You can use your own browser with our tool, eliminating the need to re-login to sites or deal with other authentication challenges. This feature also supports high-definition screen recording.
Persistent Browser Sessions: You can choose to keep the browser window open between AI tasks, allowing you to see the complete history and state of AI interactions.
Your browser does not support playing this video!
Read the quickstart guide or follow the steps below to get started.
Python 3.11 or higher is required.
First, we recommend using uv to setup the Python environment.
uv venv --python 3.11
and activate it with:
source .venv/bin/activate
Install the dependencies:
uv pip install -r requirements.txt
Then install playwright:
playwright install
-
Prerequisites:
- Docker and Docker Compose installed on your system
- Git to clone the repository
-
Setup:
# Clone the repository git clone https://github.com/browser-use/web-ui.git cd web-ui # Copy and configure environment variables cp .env.example .env # Edit .env with your preferred text editor and add your API keys
-
Run with Docker:
# Build and start the container with default settings (browser closes after AI tasks) docker compose up --build # Or run with persistent browser (browser stays open between AI tasks) CHROME_PERSISTENT_SESSION=true docker compose up --build
-
Access the Application:
- WebUI:
http://localhost:7788
- VNC Viewer (to see browser interactions):
http://localhost:6080/vnc.html
Default VNC password is "vncpassword". You can change it by setting the
VNC_PASSWORD
environment variable in your.env
file. - WebUI:
- Copy
.env.example
to.env
and set your environment variables, including API keys for the LLM.cp .env.example .env
-
Run the WebUI:
python webui.py --ip 127.0.0.1 --port 7788
- WebUI options:
-
--ip
: The IP address to bind the WebUI to. Default is127.0.0.1
. -
--port
: The port to bind the WebUI to. Default is7788
. -
--theme
: The theme for the user interface. Default isOcean
.- Default: The standard theme with a balanced design.
- Soft: A gentle, muted color scheme for a relaxed viewing experience.
- Monochrome: A grayscale theme with minimal color for simplicity and focus.
- Glass: A sleek, semi-transparent design for a modern appearance.
- Origin: A classic, retro-inspired theme for a nostalgic feel.
- Citrus: A vibrant, citrus-inspired palette with bright and fresh colors.
- Ocean (default): A blue, ocean-inspired theme providing a calming effect.
-
--dark-mode
: Enables dark mode for the user interface.
-
-
Access the WebUI: Open your web browser and navigate to
http://127.0.0.1:7788
. -
Using Your Own Browser(Optional):
- Set
CHROME_PATH
to the executable path of your browser andCHROME_USER_DATA
to the user data directory of your browser.- Windows
CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe" CHROME_USER_DATA="C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data"
Note: Replace
YourUsername
with your actual Windows username for Windows systems. - Mac
CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" CHROME_USER_DATA="~/Library/Application Support/Google/Chrome/Profile 1"
- Windows
- Close all Chrome windows
- Open the WebUI in a non-Chrome browser, such as Firefox or Edge. This is important because the persistent browser context will use the Chrome data when running the agent.
- Check the "Use Own Browser" option within the Browser Settings.
- Set
-
Keep Browser Open(Optional):
- Set
CHROME_PERSISTENT_SESSION=true
in the.env
file.
- Set
-
Environment Variables:
- All configuration is done through the
.env
file - Available environment variables:
# LLM API Keys OPENAI_API_KEY=your_key_here ANTHROPIC_API_KEY=your_key_here GOOGLE_API_KEY=your_key_here # Browser Settings CHROME_PERSISTENT_SESSION=true # Set to true to keep browser open between AI tasks RESOLUTION=1920x1080x24 # Custom resolution format: WIDTHxHEIGHTxDEPTH RESOLUTION_WIDTH=1920 # Custom width in pixels RESOLUTION_HEIGHT=1080 # Custom height in pixels # VNC Settings VNC_PASSWORD=your_vnc_password # Optional, defaults to "vncpassword"
- All configuration is done through the
-
Browser Persistence Modes:
-
Default Mode (CHROME_PERSISTENT_SESSION=false):
- Browser opens and closes with each AI task
- Clean state for each interaction
- Lower resource usage
-
Persistent Mode (CHROME_PERSISTENT_SESSION=true):
- Browser stays open between AI tasks
- Maintains history and state
- Allows viewing previous AI interactions
- Set in
.env
file or via environment variable when starting container
-
-
Viewing Browser Interactions:
- Access the noVNC viewer at
http://localhost:6080/vnc.html
- Enter the VNC password (default: "vncpassword" or what you set in VNC_PASSWORD)
- You can now see all browser interactions in real-time
- Access the noVNC viewer at
-
Container Management:
# Start with persistent browser CHROME_PERSISTENT_SESSION=true docker compose up -d # Start with default mode (browser closes after tasks) docker compose up -d # View logs docker compose logs -f # Stop the container docker compose down
- [x] 2025/01/10: Thanks to @casistack. Now we have Docker Setup option and also Support keep browser open between tasks.Video tutorial demo.
- [x] 2025/01/06: Thanks to @richard-devbot. A New and Well-Designed WebUI is released. Video tutorial demo.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for web-ui
Similar Open Source Tools
![web-ui Screenshot](/screenshots_githubs/browser-use-web-ui.jpg)
web-ui
WebUI is a user-friendly tool built on Gradio that enhances website accessibility for AI agents. It supports various Large Language Models (LLMs) and allows custom browser integration for seamless interaction. The tool eliminates the need for re-login and authentication challenges, offering high-definition screen recording capabilities.
![phospho Screenshot](/screenshots_githubs/phospho-app-phospho.jpg)
phospho
Phospho is a text analytics platform for LLM apps. It helps you detect issues and extract insights from text messages of your users or your app. You can gather user feedback, measure success, and iterate on your app to create the best conversational experience for your users.
![extension-gen-ai Screenshot](/screenshots_githubs/looker-open-source-extension-gen-ai.jpg)
extension-gen-ai
The Looker GenAI Extension provides code examples and resources for building a Looker Extension that integrates with Vertex AI Large Language Models (LLMs). Users can leverage the power of LLMs to enhance data exploration and analysis within Looker. The extension offers generative explore functionality to ask natural language questions about data and generative insights on dashboards to analyze data by asking questions. It leverages components like BQML Remote Models, BQML Remote UDF with Vertex AI, and Custom Fine Tune Model for different integration options. Deployment involves setting up infrastructure with Terraform and deploying the Looker Extension by creating a Looker project, copying extension files, configuring BigQuery connection, connecting to Git, and testing the extension. Users can save example prompts and configure user settings for the extension. Development of the Looker Extension environment includes installing dependencies, starting the development server, and building for production.
![shell-ai Screenshot](/screenshots_githubs/ricklamers-shell-ai.jpg)
shell-ai
Shell-AI (`shai`) is a CLI utility that enables users to input commands in natural language and receive single-line command suggestions. It leverages natural language understanding and interactive CLI tools to enhance command line interactions. Users can describe tasks in plain English and receive corresponding command suggestions, making it easier to execute commands efficiently. Shell-AI supports cross-platform usage and is compatible with Azure OpenAI deployments, offering a user-friendly and efficient way to interact with the command line.
![horde-worker-reGen Screenshot](/screenshots_githubs/Haidra-Org-horde-worker-reGen.jpg)
horde-worker-reGen
This repository provides the latest implementation for the AI Horde Worker, allowing users to utilize their graphics card(s) to generate, post-process, or analyze images for others. It offers a platform where users can create images and earn 'kudos' in return, granting priority for their own image generations. The repository includes important details for setup, recommendations for system configurations, instructions for installation on Windows and Linux, basic usage guidelines, and information on updating the AI Horde Worker. Users can also run the worker with multiple GPUs and receive notifications for updates through Discord. Additionally, the repository contains models that are licensed under the CreativeML OpenRAIL License.
![rag-gpt Screenshot](/screenshots_githubs/open-kf-rag-gpt.jpg)
rag-gpt
RAG-GPT is a tool that allows users to quickly launch an intelligent customer service system with Flask, LLM, and RAG. It includes frontend, backend, and admin console components. The tool supports cloud-based and local LLMs, enables deployment of conversational service robots in minutes, integrates diverse knowledge bases, offers flexible configuration options, and features an attractive user interface.
![gitdiagram Screenshot](/screenshots_githubs/ahmedkhaleel2004-gitdiagram.jpg)
gitdiagram
GitDiagram is a tool that turns any GitHub repository into an interactive diagram for visualization in seconds. It offers instant visualization, interactivity, fast generation, customization, and API access. The tool utilizes a tech stack including Next.js, FastAPI, PostgreSQL, Claude 3.5 Sonnet, Vercel, EC2, GitHub Actions, PostHog, and Api-Analytics. Users can self-host the tool for local development and contribute to its development. GitDiagram is inspired by Gitingest and has future plans to use larger context models, allow user API key input, implement RAG with Mermaid.js docs, and include font-awesome icons in diagrams.
![codepair Screenshot](/screenshots_githubs/yorkie-team-codepair.jpg)
codepair
CodePair is an open-source real-time collaborative markdown editor with AI intelligence, allowing users to collaboratively edit documents, share documents with external parties, and utilize AI intelligence within the editor. It is built using React, NestJS, and LangChain. The repository contains frontend and backend code, with detailed instructions for setting up and running each part. Users can choose between Frontend Development Only Mode or Full Stack Development Mode based on their needs. CodePair also integrates GitHub OAuth for Social Login feature. Contributors are welcome to submit patches and follow the contribution workflow.
![rag-gpt Screenshot](/screenshots_githubs/gpt-open-rag-gpt.jpg)
rag-gpt
RAG-GPT is a tool that allows users to quickly launch an intelligent customer service system with Flask, LLM, and RAG. It includes frontend, backend, and admin console components. The tool supports cloud-based and local LLMs, offers quick setup for conversational service robots, integrates diverse knowledge bases, provides flexible configuration options, and features an attractive user interface.
![rclip Screenshot](/screenshots_githubs/yurijmikhalevich-rclip.jpg)
rclip
rclip is a command-line photo search tool powered by the OpenAI's CLIP neural network. It allows users to search for images using text queries, similar image search, and combining multiple queries. The tool extracts features from photos to enable searching and indexing, with options for previewing results in supported terminals or custom viewers. Users can install rclip on Linux, macOS, and Windows using different installation methods. The repository follows the Conventional Commits standard and welcomes contributions from the community.
![t3rn-airdrop-bot Screenshot](/screenshots_githubs/dante4rt-t3rn-airdrop-bot.jpg)
t3rn-airdrop-bot
A bot designed to automate transactions and bridge assets on the t3rn network, making the process seamless and efficient. It supports multiple wallets through a JSON file containing private keys, with robust error handling and retry mechanisms. The tool is user-friendly, easy to set up, and supports bridging from Optimism Sepolia and Arbitrum Sepolia.
![preswald Screenshot](/screenshots_githubs/StructuredLabs-preswald.jpg)
preswald
Preswald is a full-stack platform for building, deploying, and managing interactive data applications in Python. It simplifies the process by combining ingestion, storage, transformation, and visualization into one lightweight SDK. With Preswald, users can connect to various data sources, customize app themes, and easily deploy apps locally. The platform focuses on code-first simplicity, end-to-end coverage, and efficiency by design, making it suitable for prototyping internal tools or deploying production-grade apps with reduced complexity and cost.
![pear-landing-page Screenshot](/screenshots_githubs/trypear-pear-landing-page.jpg)
pear-landing-page
PearAI Landing Page is an open-source AI-powered code editor managed by Nang and Pan. It is built with Next.js, Vercel, Tailwind CSS, and TypeScript. The project requires setting up environment variables for proper configuration. Users can run the project locally by starting the development server and visiting the specified URL in the browser. Recommended extensions include Prettier, ESLint, and JavaScript and TypeScript Nightly. Contributions to the project are welcomed and appreciated.
![openai-chat-api-workflow Screenshot](/screenshots_githubs/yohasebe-openai-chat-api-workflow.jpg)
openai-chat-api-workflow
**OpenAI Chat API Workflow for Alfred** An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT-3.5/GPT-4 ๐ค๐ฌ It also allows image generation ๐ผ๏ธ, image understanding ๐, speech-to-text conversion ๐ค, and text-to-speech synthesis ๐ **Features:** * Execute all features using Alfred UI, selected text, or a dedicated web UI * Web UI is constructed by the workflow and runs locally on your Mac ๐ป * API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI ๐ * OpenAI does not use the data from the API Platform for training ๐ซ * Export chat data to a simple JSON format external file ๐ * Continue the chat by importing the exported data later ๐
![trustgraph Screenshot](/screenshots_githubs/trustgraph-ai-trustgraph.jpg)
trustgraph
TrustGraph is a tool that deploys private GraphRAG pipelines to build a RDF style knowledge graph from data, enabling accurate and secure `RAG` requests compatible with cloud LLMs and open-source SLMs. It showcases the reliability and efficiencies of GraphRAG algorithms, capturing contextual language flags missed in conventional RAG approaches. The tool offers features like PDF decoding, text chunking, inference of various LMs, RDF-aligned Knowledge Graph extraction, and more. TrustGraph is designed to be modular, supporting multiple Language Models and environments, with a plug'n'play architecture for easy customization.
![code2prompt Screenshot](/screenshots_githubs/raphaelmansuy-code2prompt.jpg)
code2prompt
Code2Prompt is a powerful command-line tool that generates comprehensive prompts from codebases, designed to streamline interactions between developers and Large Language Models (LLMs) for code analysis, documentation, and improvement tasks. It bridges the gap between codebases and LLMs by converting projects into AI-friendly prompts, enabling users to leverage AI for various software development tasks. The tool offers features like holistic codebase representation, intelligent source tree generation, customizable prompt templates, smart token management, Gitignore integration, flexible file handling, clipboard-ready output, multiple output options, and enhanced code readability.
For similar tasks
![Fay Screenshot](/screenshots_githubs/xszyou-Fay.jpg)
Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The 'ๅธฆ่ดงๅฎๆด็' is suitable for online and offline salespersons. The 'ๅฉ็ๅฎๆด็' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent็' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.
![hume-python-sdk Screenshot](/screenshots_githubs/HumeAI-hume-python-sdk.jpg)
hume-python-sdk
The Hume AI Python SDK allows users to integrate Hume APIs directly into their Python applications. Users can access complete documentation, quickstart guides, and example notebooks to get started. The SDK is designed to provide support for Hume's expressive communication platform built on scientific research. Users are encouraged to create an account at beta.hume.ai and stay updated on changes through Discord. The SDK may undergo breaking changes to improve tooling and ensure reliable releases in the future.
![deid-examples Screenshot](/screenshots_githubs/privateai-deid-examples.jpg)
deid-examples
This repository contains examples demonstrating how to use the Private AI REST API for identifying and replacing Personally Identifiable Information (PII) in text. The API supports over 50 entity types, such as Credit Card information and Social Security numbers, across 50 languages. Users can access documentation and the API reference on Private AI's website. The examples include common API call scenarios and use cases in both Python and JavaScript, with additional content related to PrivateGPT for secure work with Language Models (LLMs).
![web-ui Screenshot](/screenshots_githubs/browser-use-web-ui.jpg)
web-ui
WebUI is a user-friendly tool built on Gradio that enhances website accessibility for AI agents. It supports various Large Language Models (LLMs) and allows custom browser integration for seamless interaction. The tool eliminates the need for re-login and authentication challenges, offering high-definition screen recording capabilities.
![AutoWebGLM Screenshot](/screenshots_githubs/THUDM-AutoWebGLM.jpg)
AutoWebGLM
AutoWebGLM is a project focused on developing a language model-driven automated web navigation agent. It extends the capabilities of the ChatGLM3-6B model to navigate the web more efficiently and address real-world browsing challenges. The project includes features such as an HTML simplification algorithm, hybrid human-AI training, reinforcement learning, rejection sampling, and a bilingual web navigation benchmark for testing AI web navigation agents.
![browser-use-webui Screenshot](/screenshots_githubs/warmshao-browser-use-webui.jpg)
browser-use-webui
Browser-Use WebUI is a project that enhances the original browser-use tool by providing a brand new web interface, expanded LLM support for various Large Language Models, custom browser support for using your own browser with the tool, and a customized agent with optimized prompts. The tool aims to make websites accessible for AI agents and offers user-friendly interaction with the browser agent, eliminating the need for re-login to sites and dealing with authentication challenges. It also supports high-definition screen recording.
![Bavarder Screenshot](/screenshots_githubs/Bavarder-Bavarder.jpg)
Bavarder
Bavarder is an AI-powered chit-chat tool designed for informal conversations about unimportant matters. Users can engage in light-hearted discussions with the AI, simulating casual chit-chat scenarios. The tool provides a platform for users to interact with AI in a fun and entertaining way, offering a unique experience of engaging with artificial intelligence in a conversational manner.
![ChaKt-KMP Screenshot](/screenshots_githubs/PatilShreyas-ChaKt-KMP.jpg)
ChaKt-KMP
ChaKt is a multiplatform app built using Kotlin and Compose Multiplatform to demonstrate the use of Generative AI SDK for Kotlin Multiplatform to generate content using Google's Generative AI models. It features a simple chat based user interface and experience to interact with AI. The app supports mobile, desktop, and web platforms, and is built with Kotlin Multiplatform, Kotlin Coroutines, Compose Multiplatform, Generative AI SDK, Calf - File picker, and BuildKonfig. Users can contribute to the project by following the guidelines in CONTRIBUTING.md. The app is licensed under the MIT License.
For similar jobs
![sweep Screenshot](/screenshots_githubs/sweepai-sweep.jpg)
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
![teams-ai Screenshot](/screenshots_githubs/microsoft-teams-ai.jpg)
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
![ai-guide Screenshot](/screenshots_githubs/Crataco-ai-guide.jpg)
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
![classifai Screenshot](/screenshots_githubs/10up-classifai.jpg)
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
![chatbot-ui Screenshot](/screenshots_githubs/mckaywrigley-chatbot-ui.jpg)
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
![BricksLLM Screenshot](/screenshots_githubs/bricks-cloud-BricksLLM.jpg)
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
![uAgents Screenshot](/screenshots_githubs/fetchai-uAgents.jpg)
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
![griptape Screenshot](/screenshots_githubs/griptape-ai-griptape.jpg)
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.