pi-browser

다중 AI 모델을 활용한 브라우저 자동화 CLI (Google Gemini, OpenAI, Anthropic, Ollama 지원)

Stars: 57

Visit

Pi-Browser is a CLI tool for automating browsers based on multiple AI models. It supports various AI models like Google Gemini, OpenAI, Anthropic Claude, and Ollama. Users can control the browser using natural language commands and perform tasks such as web UI management, Telegram bot integration, Notion integration, extension mode for maintaining Chrome login status, parallel processing with multiple browsers, and offline execution with the local AI model Ollama.

README:

Pi-Browser

다중 AI 모델 기반 브라우저 자동화 CLI

자연어 명령으로 브라우저를 제어합니다. Google Gemini, OpenAI, Anthropic Claude, Ollama 등 다양한 AI 모델을 지원합니다.

주요 기능

기능	설명
자연어 제어	"쿠팡에서 아이폰 가격 알려줘"
다중 AI 모델	Gemini, GPT, Claude, Ollama 등 20+ 제공자
웹 UI	브라우저에서 작업 관리 및 설정
텔레그램 봇	어디서든 명령 실행
Notion 연동	작업 결과 자동 저장
Extension 모드	기존 Chrome 로그인 상태 유지
병렬 처리	여러 브라우저로 동시 작업
로컬 AI	Ollama로 오프라인 실행

빠른 시작

# 설치
git clone https://github.com/johunsang/pi-browser.git
cd pi-browser
npm install

# API 키 설정
cp .env.example .env
# .env 파일에 GOOGLE_API_KEY 입력

# 실행
npm start '네이버에서 오늘 날씨 알려줘'

실행 모드

1. 웹 UI 모드 (권장)

브라우저에서 모든 기능을 사용합니다.

npm start /web
# 또는
npx tsx src/cli.ts /web

http://localhost:3000 접속 후:

작업 탭: 명령 입력 및 실행 상태 확인
설정 탭: 텔레그램, AI 모델, 브라우저, Notion 설정

2. 기본 모드 (CDP)

새 Chrome 인스턴스를 실행합니다.

npm start '쿠팡에서 아이폰 16 가격 알려줘'
npm start  # 대화형 모드

4. Extension 모드 (로그인 유지)

기존 Chrome의 로그인 상태를 유지합니다.

# Extension 설치 (최초 1회)
# 1. chrome://extensions 열기
# 2. 개발자 모드 ON
# 3. "압축해제된 확장 프로그램 로드" → extension 폴더 선택

# 실행
npm start /ext
> 네이버 메일에서 최근 메일 3개 제목 알려줘
> Gmail에서 안 읽은 메일 개수 알려줘

5. 병렬 모드 (Multi-Browser)

여러 브라우저로 동시에 작업합니다.

# 익명 브라우저 3개로 병렬 실행
npm start '/parallel 3 "구글에서 날씨" "네이버에서 뉴스" "다음에서 영화"'

# 프로필 브라우저로 병렬 실행 (로그인 유지)
npm start '/parallel "Default,Profile 1" "네이버 메일 확인" "Gmail 확인"'

# 프로필 목록 확인
npm start /profiles

병렬 모드 비교

모드	명령	로그인	용도
익명	`/parallel 3 "작업"...`	없음	검색, 크롤링
프로필	`/parallel "P1,P2" "작업"...`	유지	메일, SNS

명령어

명령어	설명
`/web`	웹 UI 모드 (브라우저에서 제어)
`/ext`	Extension 모드 (로그인 유지)
`/parallel N "작업"...`	익명 브라우저 N개 병렬
`/parallel "프로필" "작업"...`	프로필 브라우저 병렬
`/profiles`	Chrome 프로필 목록
`/models`	AI 모델 목록
`/set <provider> <model>`	모델 변경
`/config`	설정 확인
`exit`	종료

텔레그램 봇

어디서든 명령을 실행합니다.

설정

@BotFather에서 봇 생성 → 토큰 복사
웹 UI (/web) → 설정 → 텔레그램 봇
Bot Token 입력
허용된 사용자 ID 입력 (필수, @userinfobot에서 확인)
저장 후 활성화

사용

/start - 시작
/help - 도움말
네이버에서 날씨 알려줘 - 명령 실행

Notion 연동

작업 결과를 자동으로 Notion에 저장합니다.

설정

notion.so/my-integrations에서 Integration 생성
Internal Integration Token 복사
Notion 데이터베이스 생성 → Integration 연결
데이터베이스 URL에서 ID 복사 (notion.so/[ID]/...)
웹 UI → 설정 → Notion 연동
API Key, Database ID 입력 후 저장

결과 저장 형식

제목: [task-id] 작업 내용
본문: 📋 작업, ✅ 결과, ⏰ 시간

AI 모델 설정

클라우드 모델

# Google Gemini (기본, 무료 티어 있음)
npm start '/set google gemini-2.5-flash'

# OpenAI
npm start '/set openai gpt-4o'

# Anthropic Claude
npm start '/set anthropic claude-sonnet-4-20250514'

# Groq (빠른 추론, 무료)
npm start '/set groq llama-3.3-70b-versatile'

로컬 모델 (Ollama)

# Ollama 설치 및 모델 다운로드
brew install ollama
ollama run llama3.2

# Pi-Browser에서 사용
npm start '/set ollama llama3.2'
npm start '구글 열어줘'

환경 변수

.env 파일:

GOOGLE_API_KEY=your-google-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
OPENAI_API_KEY=your-openai-api-key
GROQ_API_KEY=your-groq-api-key

API 키 발급

제공자	링크	무료
Google	aistudio.google.com	O
Groq	console.groq.com	O
OpenAI	platform.openai.com	X
Anthropic	console.anthropic.com	X

브라우저 도구

AI가 사용하는 도구:

도구	설명
`browser_navigate`	URL 이동
`browser_click`	요소 클릭
`browser_fill`	텍스트 입력
`browser_press`	키 입력 (Enter, Tab 등)
`browser_screenshot`	스크린샷
`browser_snapshot`	페이지 요소 목록
`browser_scroll`	스크롤
`browser_get_text`	텍스트 추출
`browser_wait`	대기 (시간/텍스트)
`browser_download`	파일 다운로드

사용 예시

# 쇼핑
npm start '쿠팡에서 에어팟 프로 가격 비교해줘'

# 정보 검색
npm start '네이버에서 서울 날씨 알려줘'

# SNS (Extension 모드)
npm start /ext
> 네이버 카페 옥토퍼스맨에 테스트 글 써줘

# 병렬 크롤링
npm start '/parallel 5 "사이트1 크롤링" "사이트2 크롤링" "사이트3 크롤링" "사이트4 크롤링" "사이트5 크롤링"'

프로젝트 구조

pi-browser/
├── src/
│   ├── cli.ts          # 메인 CLI
│   ├── web-client.ts   # 웹 UI 서버
│   └── telegram.ts     # 텔레그램 봇
├── extension/          # Chrome Extension
│   ├── manifest.json
│   ├── background.js
│   └── popup.html
├── .env                # API 키
└── package.json

문제 해결

# 웹 UI 포트 충돌
lsof -ti:3000 | xargs kill -9

# Extension 연결 안됨
lsof -i :9876  # WebSocket 포트 확인

# Chrome 실행 안됨
lsof -i :9444  # CDP 포트 확인

# Ollama 연결 안됨
curl http://localhost:11434/api/tags

# 텔레그램 봇 연결 테스트
curl https://api.telegram.org/bot<TOKEN>/getMe

# Notion 연결 테스트
curl https://api.notion.com/v1/databases/<DB_ID> \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Notion-Version: 2022-06-28"

지원 AI 제공자

클라우드: Google, OpenAI, Anthropic, Mistral, Groq, xAI, OpenRouter, AWS Bedrock, Google Vertex

로컬: Ollama (Llama, Mistral, Qwen, Gemma 등)

라이선스

MIT License

크레딧

@mariozechner/pi-ai - 다중 AI 통합
Playwright - 브라우저 자동화

For Tasks:

Click tags to check more tools for each tasks

compare prices check weather write test posts crawl multiple sites extract text

For Jobs:

web developer data analyst ai engineer automation specialist digital marketer

Alternative AI tools for pi-browser

Similar Open Source Tools

pi-browser

github

: 57

llmio

LLMIO is a Go-based LLM load balancing gateway that provides a unified REST API, weight scheduling, logging, and modern management interface for your LLM clients. It helps integrate different model capabilities from OpenAI, Anthropic, Gemini, and more in a single service. Features include unified API compatibility, weight scheduling with two strategies, visual management dashboard, rate and failure handling, and local persistence with SQLite. The tool supports multiple vendors' APIs and authentication methods, making it versatile for various AI model integrations.

github

: 229

ChatTTS-Forge

ChatTTS-Forge is a powerful text-to-speech generation tool that supports generating rich audio long texts using a SSML-like syntax and provides comprehensive API services, suitable for various scenarios. It offers features such as batch generation, support for generating super long texts, style prompt injection, full API services, user-friendly debugging GUI, OpenAI-style API, Google-style API, support for SSML-like syntax, speaker management, style management, independent refine API, text normalization optimized for ChatTTS, and automatic detection and processing of markdown format text. The tool can be experienced and deployed online through HuggingFace Spaces, launched with one click on Colab, deployed using containers, or locally deployed after cloning the project, preparing models, and installing necessary dependencies.

github

: 692

Langchain-Chatchat

LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.

github

: 34.4k

app-builder

AppBuilder SDK is a one-stop development tool for AI native applications, providing basic cloud resources, AI capability engine, Qianfan large model, and related capability components to improve the development efficiency of AI native applications.

github

: 553

LangChain-SearXNG

LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.

github

: 83

Awesome-ChatTTS

Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

github

: 594

TelegramForwarder

Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.

github

: 193

grps_trtllm

The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.

github

: 122

OpenClawChineseTranslation

OpenClaw Chinese Translation is a localization project that provides a fully Chinese interface for the OpenClaw open-source personal AI assistant platform. It allows users to interact with their AI assistant through chat applications like WhatsApp, Telegram, and Discord to manage daily tasks such as emails, calendars, and files. The project includes both CLI command-line and dashboard web interface fully translated into Chinese.

github

: 1.1k

moonpalace

MoonPalace is a debugging tool for API provided by Moonshot AI. It supports all platforms (Mac, Windows, Linux) and is simple to use by replacing 'base_url' with 'http://localhost:9988'. It captures complete requests, including 'accident scenes' during network errors, and allows quick retrieval and viewing of request information using 'request_id' and 'chatcmpl_id'. It also enables one-click export of BadCase structured reporting data to help improve Kimi model capabilities. MoonPalace is recommended for use as an API 'supplier' during code writing and debugging stages to quickly identify and locate various issues related to API calls and code writing processes, and to export request details for submission to Moonshot AI to improve Kimi model.

github

: 52

Muice-Chatbot

Muice-Chatbot is an AI chatbot designed to proactively engage in conversations with users. It is based on the ChatGLM2-6B and Qwen-7B models, with a training dataset of 1.8K+ dialogues. The chatbot has a speaking style similar to a 2D girl, being somewhat tsundere but willing to share daily life details and greet users differently every day. It provides various functionalities, including initiating chats and offering 5 available commands. The project supports model loading through different methods and provides onebot service support for QQ users. Users can interact with the chatbot by running the main.py file in the project directory.

github

: 314

AIClient-2-API

AIClient-2-API is a versatile and lightweight API proxy designed for developers, providing ample free API request quotas and comprehensive support for various mainstream large models like Gemini, Qwen Code, Claude, etc. It converts multiple backend APIs into standard OpenAI format interfaces through a Node.js HTTP server. The project adopts a modern modular architecture, supports strategy and adapter patterns, comes with complete test coverage and health check mechanisms, and is ready to use after 'npm install'. By easily switching model service providers in the configuration file, any OpenAI-compatible client or application can seamlessly access different large model capabilities through the same API address, eliminating the hassle of maintaining multiple sets of configurations for different services and dealing with incompatible interfaces.

github

: 1.0k

tiny-llm-zh

Tiny LLM zh is a project aimed at building a small-parameter Chinese language large model for quick entry into learning large model-related knowledge. The project implements a two-stage training process for large models and subsequent human alignment, including tokenization, pre-training, instruction fine-tuning, human alignment, evaluation, and deployment. It is deployed on ModeScope Tiny LLM website and features open access to all data and code, including pre-training data and tokenizer. The project trains a tokenizer using 10GB of Chinese encyclopedia text to build a Tiny LLM vocabulary. It supports training with Transformers deepspeed, multiple machine and card support, and Zero optimization techniques. The project has three main branches: llama2_torch, main tiny_llm, and tiny_llm_moe, each with specific modifications and features.

github

: 147

DeepAI

DeepAI is a proxy server that enhances the interaction experience of large language models (LLMs) by integrating the 'thinking chain' process. It acts as an intermediary layer, receiving standard OpenAI API compatible requests, using independent 'thinking services' to generate reasoning processes, and then forwarding the enhanced requests to the LLM backend of your choice. This ensures that responses are not only generated by the LLM but also based on pre-inference analysis, resulting in more insightful and coherent answers. DeepAI supports seamless integration with applications designed for the OpenAI API, providing endpoints for '/v1/chat/completions' and '/v1/models', making it easy to integrate into existing applications. It offers features such as reasoning chain enhancement, flexible backend support, API key routing, weighted random selection, proxy support, comprehensive logging, and graceful shutdown.

github

: 121

agentica

Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

github

: 108

For similar tasks

freegenius

FreeGenius AI is an ambitious project offering a comprehensive suite of AI solutions that mirror the capabilities of LetMeDoIt AI. It is designed to engage in intuitive conversations, execute codes, provide up-to-date information, and perform various tasks. The tool is free, customizable, and provides access to real-time data and device information. It aims to support offline and online backends, open-source large language models, and optional API keys. Users can use FreeGenius AI for tasks like generating tweets, analyzing audio, searching financial data, checking weather, and creating maps.

github

: 100

Google-Shortcuts-Launcher

Google Shortcuts Launcher provides a seamless way to integrate powerful Google services into your daily workflow. With just a tap, you can quickly access a variety of shortcuts designed to enhance your daily device use and simplify your interactions with Google features. It offers shortcuts for games launcher, Google Lens, Google Music Search, Google Password Manager, Google Weather, and Voice Assistant. The tool requires Google, Google Play Services, and Google Play Games to be installed on the device for proper functionality, and some features may require root access.

github

: 418

pi-browser

github

: 57

ai-real-estate-assistant

AI Real Estate Assistant is a modern platform that uses AI to assist real estate agencies in helping buyers and renters find their ideal properties. It features multiple AI model providers, intelligent query processing, advanced search and retrieval capabilities, and enhanced user experience. The tool is built with a FastAPI backend and Next.js frontend, offering semantic search, hybrid agent routing, and real-time analytics.

github

: 108

MegaParse

MegaParse is a powerful and versatile parser designed to handle various types of documents such as text, PDFs, Powerpoint presentations, and Word documents with no information loss. It is fast, efficient, and open source, supporting a wide range of file formats. MegaParse ensures compatibility with tables, table of contents, headers, footers, and images, making it a comprehensive solution for document parsing.

github

: 5.6k

NekoImageGallery

NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.

github

: 97

gemini_multipdf_chat

Gemini PDF Chatbot is a Streamlit-based application that allows users to chat with a conversational AI model trained on PDF documents. The chatbot extracts information from uploaded PDF files and answers user questions based on the provided context. It features PDF upload, text extraction, conversational AI using the Gemini model, and a chat interface. Users can deploy the application locally or to the cloud, and the project structure includes main application script, environment variable file, requirements, and documentation. Dependencies include PyPDF2, langchain, Streamlit, google.generativeai, and dotenv.

github

: 205

screen-pipe

Screen-pipe is a Rust + WASM tool that allows users to turn their screen into actions using Large Language Models (LLMs). It enables users to record their screen 24/7, extract text from frames, and process text and images for tasks like analyzing sales conversations. The tool is still experimental and aims to simplify the process of recording screens, extracting text, and integrating with various APIs for tasks such as filling CRM data based on screen activities. The project is open-source and welcomes contributions to enhance its functionalities and usability.

github

: 1.0k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k