pi-browser
다중 AI 모델을 활용한 브라우저 자동화 CLI (Google Gemini, OpenAI, Anthropic, Ollama 지원)
Stars: 57
Pi-Browser is a CLI tool for automating browsers based on multiple AI models. It supports various AI models like Google Gemini, OpenAI, Anthropic Claude, and Ollama. Users can control the browser using natural language commands and perform tasks such as web UI management, Telegram bot integration, Notion integration, extension mode for maintaining Chrome login status, parallel processing with multiple browsers, and offline execution with the local AI model Ollama.
README:
다중 AI 모델 기반 브라우저 자동화 CLI
자연어 명령으로 브라우저를 제어합니다. Google Gemini, OpenAI, Anthropic Claude, Ollama 등 다양한 AI 모델을 지원합니다.
| 기능 | 설명 |
|---|---|
| 자연어 제어 | "쿠팡에서 아이폰 가격 알려줘" |
| 다중 AI 모델 | Gemini, GPT, Claude, Ollama 등 20+ 제공자 |
| 웹 UI | 브라우저에서 작업 관리 및 설정 |
| 텔레그램 봇 | 어디서든 명령 실행 |
| Notion 연동 | 작업 결과 자동 저장 |
| Extension 모드 | 기존 Chrome 로그인 상태 유지 |
| 병렬 처리 | 여러 브라우저로 동시 작업 |
| 로컬 AI | Ollama로 오프라인 실행 |
# 설치
git clone https://github.com/johunsang/pi-browser.git
cd pi-browser
npm install
# API 키 설정
cp .env.example .env
# .env 파일에 GOOGLE_API_KEY 입력
# 실행
npm start '네이버에서 오늘 날씨 알려줘'브라우저에서 모든 기능을 사용합니다.
npm start /web
# 또는
npx tsx src/cli.ts /webhttp://localhost:3000 접속 후:
- 작업 탭: 명령 입력 및 실행 상태 확인
- 설정 탭: 텔레그램, AI 모델, 브라우저, Notion 설정
새 Chrome 인스턴스를 실행합니다.
npm start '쿠팡에서 아이폰 16 가격 알려줘'
npm start # 대화형 모드기존 Chrome의 로그인 상태를 유지합니다.
# Extension 설치 (최초 1회)
# 1. chrome://extensions 열기
# 2. 개발자 모드 ON
# 3. "압축해제된 확장 프로그램 로드" → extension 폴더 선택
# 실행
npm start /ext
> 네이버 메일에서 최근 메일 3개 제목 알려줘
> Gmail에서 안 읽은 메일 개수 알려줘여러 브라우저로 동시에 작업합니다.
# 익명 브라우저 3개로 병렬 실행
npm start '/parallel 3 "구글에서 날씨" "네이버에서 뉴스" "다음에서 영화"'
# 프로필 브라우저로 병렬 실행 (로그인 유지)
npm start '/parallel "Default,Profile 1" "네이버 메일 확인" "Gmail 확인"'
# 프로필 목록 확인
npm start /profiles| 모드 | 명령 | 로그인 | 용도 |
|---|---|---|---|
| 익명 | /parallel 3 "작업"... |
없음 | 검색, 크롤링 |
| 프로필 | /parallel "P1,P2" "작업"... |
유지 | 메일, SNS |
| 명령어 | 설명 |
|---|---|
/web |
웹 UI 모드 (브라우저에서 제어) |
/ext |
Extension 모드 (로그인 유지) |
/parallel N "작업"... |
익명 브라우저 N개 병렬 |
/parallel "프로필" "작업"... |
프로필 브라우저 병렬 |
/profiles |
Chrome 프로필 목록 |
/models |
AI 모델 목록 |
/set <provider> <model> |
모델 변경 |
/config |
설정 확인 |
exit |
종료 |
어디서든 명령을 실행합니다.
- @BotFather에서 봇 생성 → 토큰 복사
- 웹 UI (
/web) → 설정 → 텔레그램 봇 - Bot Token 입력
- 허용된 사용자 ID 입력 (필수, @userinfobot에서 확인)
- 저장 후 활성화
/start - 시작
/help - 도움말
네이버에서 날씨 알려줘 - 명령 실행
작업 결과를 자동으로 Notion에 저장합니다.
- notion.so/my-integrations에서 Integration 생성
- Internal Integration Token 복사
- Notion 데이터베이스 생성 → Integration 연결
- 데이터베이스 URL에서 ID 복사 (notion.so/[ID]/...)
- 웹 UI → 설정 → Notion 연동
- API Key, Database ID 입력 후 저장
-
제목:
[task-id] 작업 내용 - 본문: 📋 작업, ✅ 결과, ⏰ 시간
# Google Gemini (기본, 무료 티어 있음)
npm start '/set google gemini-2.5-flash'
# OpenAI
npm start '/set openai gpt-4o'
# Anthropic Claude
npm start '/set anthropic claude-sonnet-4-20250514'
# Groq (빠른 추론, 무료)
npm start '/set groq llama-3.3-70b-versatile'# Ollama 설치 및 모델 다운로드
brew install ollama
ollama run llama3.2
# Pi-Browser에서 사용
npm start '/set ollama llama3.2'
npm start '구글 열어줘'.env 파일:
GOOGLE_API_KEY=your-google-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
OPENAI_API_KEY=your-openai-api-key
GROQ_API_KEY=your-groq-api-key| 제공자 | 링크 | 무료 |
|---|---|---|
| aistudio.google.com | O | |
| Groq | console.groq.com | O |
| OpenAI | platform.openai.com | X |
| Anthropic | console.anthropic.com | X |
AI가 사용하는 도구:
| 도구 | 설명 |
|---|---|
browser_navigate |
URL 이동 |
browser_click |
요소 클릭 |
browser_fill |
텍스트 입력 |
browser_press |
키 입력 (Enter, Tab 등) |
browser_screenshot |
스크린샷 |
browser_snapshot |
페이지 요소 목록 |
browser_scroll |
스크롤 |
browser_get_text |
텍스트 추출 |
browser_wait |
대기 (시간/텍스트) |
browser_download |
파일 다운로드 |
# 쇼핑
npm start '쿠팡에서 에어팟 프로 가격 비교해줘'
# 정보 검색
npm start '네이버에서 서울 날씨 알려줘'
# SNS (Extension 모드)
npm start /ext
> 네이버 카페 옥토퍼스맨에 테스트 글 써줘
# 병렬 크롤링
npm start '/parallel 5 "사이트1 크롤링" "사이트2 크롤링" "사이트3 크롤링" "사이트4 크롤링" "사이트5 크롤링"'pi-browser/
├── src/
│ ├── cli.ts # 메인 CLI
│ ├── web-client.ts # 웹 UI 서버
│ └── telegram.ts # 텔레그램 봇
├── extension/ # Chrome Extension
│ ├── manifest.json
│ ├── background.js
│ └── popup.html
├── .env # API 키
└── package.json
# 웹 UI 포트 충돌
lsof -ti:3000 | xargs kill -9
# Extension 연결 안됨
lsof -i :9876 # WebSocket 포트 확인
# Chrome 실행 안됨
lsof -i :9444 # CDP 포트 확인
# Ollama 연결 안됨
curl http://localhost:11434/api/tags
# 텔레그램 봇 연결 테스트
curl https://api.telegram.org/bot<TOKEN>/getMe
# Notion 연결 테스트
curl https://api.notion.com/v1/databases/<DB_ID> \
-H "Authorization: Bearer <API_KEY>" \
-H "Notion-Version: 2022-06-28"클라우드: Google, OpenAI, Anthropic, Mistral, Groq, xAI, OpenRouter, AWS Bedrock, Google Vertex
로컬: Ollama (Llama, Mistral, Qwen, Gemma 등)
MIT License
- @mariozechner/pi-ai - 다중 AI 통합
- Playwright - 브라우저 자동화
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for pi-browser
Similar Open Source Tools
pi-browser
Pi-Browser is a CLI tool for automating browsers based on multiple AI models. It supports various AI models like Google Gemini, OpenAI, Anthropic Claude, and Ollama. Users can control the browser using natural language commands and perform tasks such as web UI management, Telegram bot integration, Notion integration, extension mode for maintaining Chrome login status, parallel processing with multiple browsers, and offline execution with the local AI model Ollama.
llmio
LLMIO is a Go-based LLM load balancing gateway that provides a unified REST API, weight scheduling, logging, and modern management interface for your LLM clients. It helps integrate different model capabilities from OpenAI, Anthropic, Gemini, and more in a single service. Features include unified API compatibility, weight scheduling with two strategies, visual management dashboard, rate and failure handling, and local persistence with SQLite. The tool supports multiple vendors' APIs and authentication methods, making it versatile for various AI model integrations.
ChatTTS-Forge
ChatTTS-Forge is a powerful text-to-speech generation tool that supports generating rich audio long texts using a SSML-like syntax and provides comprehensive API services, suitable for various scenarios. It offers features such as batch generation, support for generating super long texts, style prompt injection, full API services, user-friendly debugging GUI, OpenAI-style API, Google-style API, support for SSML-like syntax, speaker management, style management, independent refine API, text normalization optimized for ChatTTS, and automatic detection and processing of markdown format text. The tool can be experienced and deployed online through HuggingFace Spaces, launched with one click on Colab, deployed using containers, or locally deployed after cloning the project, preparing models, and installing necessary dependencies.
Langchain-Chatchat
LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.
app-builder
AppBuilder SDK is a one-stop development tool for AI native applications, providing basic cloud resources, AI capability engine, Qianfan large model, and related capability components to improve the development efficiency of AI native applications.
LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.
Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.
TelegramForwarder
Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.
grps_trtllm
The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.
OpenClawChineseTranslation
OpenClaw Chinese Translation is a localization project that provides a fully Chinese interface for the OpenClaw open-source personal AI assistant platform. It allows users to interact with their AI assistant through chat applications like WhatsApp, Telegram, and Discord to manage daily tasks such as emails, calendars, and files. The project includes both CLI command-line and dashboard web interface fully translated into Chinese.
moonpalace
MoonPalace is a debugging tool for API provided by Moonshot AI. It supports all platforms (Mac, Windows, Linux) and is simple to use by replacing 'base_url' with 'http://localhost:9988'. It captures complete requests, including 'accident scenes' during network errors, and allows quick retrieval and viewing of request information using 'request_id' and 'chatcmpl_id'. It also enables one-click export of BadCase structured reporting data to help improve Kimi model capabilities. MoonPalace is recommended for use as an API 'supplier' during code writing and debugging stages to quickly identify and locate various issues related to API calls and code writing processes, and to export request details for submission to Moonshot AI to improve Kimi model.
Muice-Chatbot
Muice-Chatbot is an AI chatbot designed to proactively engage in conversations with users. It is based on the ChatGLM2-6B and Qwen-7B models, with a training dataset of 1.8K+ dialogues. The chatbot has a speaking style similar to a 2D girl, being somewhat tsundere but willing to share daily life details and greet users differently every day. It provides various functionalities, including initiating chats and offering 5 available commands. The project supports model loading through different methods and provides onebot service support for QQ users. Users can interact with the chatbot by running the main.py file in the project directory.
AIClient-2-API
AIClient-2-API is a versatile and lightweight API proxy designed for developers, providing ample free API request quotas and comprehensive support for various mainstream large models like Gemini, Qwen Code, Claude, etc. It converts multiple backend APIs into standard OpenAI format interfaces through a Node.js HTTP server. The project adopts a modern modular architecture, supports strategy and adapter patterns, comes with complete test coverage and health check mechanisms, and is ready to use after 'npm install'. By easily switching model service providers in the configuration file, any OpenAI-compatible client or application can seamlessly access different large model capabilities through the same API address, eliminating the hassle of maintaining multiple sets of configurations for different services and dealing with incompatible interfaces.
tiny-llm-zh
Tiny LLM zh is a project aimed at building a small-parameter Chinese language large model for quick entry into learning large model-related knowledge. The project implements a two-stage training process for large models and subsequent human alignment, including tokenization, pre-training, instruction fine-tuning, human alignment, evaluation, and deployment. It is deployed on ModeScope Tiny LLM website and features open access to all data and code, including pre-training data and tokenizer. The project trains a tokenizer using 10GB of Chinese encyclopedia text to build a Tiny LLM vocabulary. It supports training with Transformers deepspeed, multiple machine and card support, and Zero optimization techniques. The project has three main branches: llama2_torch, main tiny_llm, and tiny_llm_moe, each with specific modifications and features.
DeepAI
DeepAI is a proxy server that enhances the interaction experience of large language models (LLMs) by integrating the 'thinking chain' process. It acts as an intermediary layer, receiving standard OpenAI API compatible requests, using independent 'thinking services' to generate reasoning processes, and then forwarding the enhanced requests to the LLM backend of your choice. This ensures that responses are not only generated by the LLM but also based on pre-inference analysis, resulting in more insightful and coherent answers. DeepAI supports seamless integration with applications designed for the OpenAI API, providing endpoints for '/v1/chat/completions' and '/v1/models', making it easy to integrate into existing applications. It offers features such as reasoning chain enhancement, flexible backend support, API key routing, weighted random selection, proxy support, comprehensive logging, and graceful shutdown.
agentica
Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.
For similar tasks
freegenius
FreeGenius AI is an ambitious project offering a comprehensive suite of AI solutions that mirror the capabilities of LetMeDoIt AI. It is designed to engage in intuitive conversations, execute codes, provide up-to-date information, and perform various tasks. The tool is free, customizable, and provides access to real-time data and device information. It aims to support offline and online backends, open-source large language models, and optional API keys. Users can use FreeGenius AI for tasks like generating tweets, analyzing audio, searching financial data, checking weather, and creating maps.
Google-Shortcuts-Launcher
Google Shortcuts Launcher provides a seamless way to integrate powerful Google services into your daily workflow. With just a tap, you can quickly access a variety of shortcuts designed to enhance your daily device use and simplify your interactions with Google features. It offers shortcuts for games launcher, Google Lens, Google Music Search, Google Password Manager, Google Weather, and Voice Assistant. The tool requires Google, Google Play Services, and Google Play Games to be installed on the device for proper functionality, and some features may require root access.
pi-browser
Pi-Browser is a CLI tool for automating browsers based on multiple AI models. It supports various AI models like Google Gemini, OpenAI, Anthropic Claude, and Ollama. Users can control the browser using natural language commands and perform tasks such as web UI management, Telegram bot integration, Notion integration, extension mode for maintaining Chrome login status, parallel processing with multiple browsers, and offline execution with the local AI model Ollama.
ai-real-estate-assistant
AI Real Estate Assistant is a modern platform that uses AI to assist real estate agencies in helping buyers and renters find their ideal properties. It features multiple AI model providers, intelligent query processing, advanced search and retrieval capabilities, and enhanced user experience. The tool is built with a FastAPI backend and Next.js frontend, offering semantic search, hybrid agent routing, and real-time analytics.
MegaParse
MegaParse is a powerful and versatile parser designed to handle various types of documents such as text, PDFs, Powerpoint presentations, and Word documents with no information loss. It is fast, efficient, and open source, supporting a wide range of file formats. MegaParse ensures compatibility with tables, table of contents, headers, footers, and images, making it a comprehensive solution for document parsing.
NekoImageGallery
NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.
gemini_multipdf_chat
Gemini PDF Chatbot is a Streamlit-based application that allows users to chat with a conversational AI model trained on PDF documents. The chatbot extracts information from uploaded PDF files and answers user questions based on the provided context. It features PDF upload, text extraction, conversational AI using the Gemini model, and a chat interface. Users can deploy the application locally or to the cloud, and the project structure includes main application script, environment variable file, requirements, and documentation. Dependencies include PyPDF2, langchain, Streamlit, google.generativeai, and dotenv.
screen-pipe
Screen-pipe is a Rust + WASM tool that allows users to turn their screen into actions using Large Language Models (LLMs). It enables users to record their screen 24/7, extract text from frames, and process text and images for tasks like analyzing sales conversations. The tool is still experimental and aims to simplify the process of recording screens, extracting text, and integrating with various APIs for tasks such as filling CRM data based on screen activities. The project is open-source and welcomes contributions to enhance its functionalities and usability.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.