AIPex

AIPex: open claude for chrome, automate your browser

Stars: 375

Visit

AIPex is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, AIPex can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions. It offers features like natural language control, AI-powered intelligence, multi-step automation, universal compatibility, smart data extraction, precision actions, form automation, visual understanding, developer-friendly with extensive API, and lightning-fast execution of automation tasks.

README:

🤖 AIPex - AI-Powered Browser Automation Extension

Automate your browser with natural language commands - The open source browser-use solution

🇺🇸 English | 🇨🇳 中文

**⭐ Star this repo if you find it helpful! ⭐**

🤖 What is AIPex?

AIPex is a revolutionary Chrome extension that transforms your browser into an intelligent automation platform. Using natural language commands and AI-powered intelligence, AIPex can automate virtually any browser task - from complex multi-step workflows to simple repetitive actions.

🎯 Why Choose AIPex for Browser Automation?

🧠 Natural Language Control: Command your browser in plain English - no coding required
🤖 AI-Powered Intelligence: 30+ MCP tools that understand context and adapt to your needs
⚡ Multi-Step Automation: Execute complex workflows with single commands
🔄 Universal Compatibility: Works with any website - no special setup needed
📊 Smart Data Extraction: Automatically collect and organize information from web pages
🎯 Precision Actions: Click, fill, scroll, and interact with elements using AI vision
📝 Form Automation: Fill out forms, submit data, and handle complex interactions
🖼️ Visual Understanding: AI can see and understand page content for intelligent automation
🔧 Developer Friendly: Open source with extensive API for custom automation
🚀 Lightning Fast: Execute automation tasks in seconds, not minutes

✨ Core Automation Features

📊 Intelligent Data Extraction

Smart Content Analysis: Extract structured data from any webpage
Price Monitoring: Track prices across multiple e-commerce sites
Research Automation: Gather information from multiple sources automatically

🎯 Precision Element Interaction

Visual Element Detection: AI can see and interact with page elements
Form Automation: Fill out complex forms with intelligent field mapping
Dynamic Content Handling: Adapt to changing page layouts and content

📝 Content Processing & Analysis

Text Highlighting & Summarization: Automatically highlight and summarize important content
Document Processing: Extract and organize information from web documents
Smart Note-Taking: Capture and organize insights from web browsing

🗂️ Advanced Tab & Window Management

AI-Powered Organization: Automatically group and organize tabs by topic
Smart Tab Switching: Find and switch between tabs using natural language
Multi-Window Coordination: Manage complex workflows across multiple browser windows

🚀 Getting Started

Quick Start

Install from Chrome Web Store (Recommended)
- Click here to install
Open AIPex
- Press ⌘+M (Mac) or Ctrl+M (Windows/Linux)
- Or click the AIPex icon in your toolbar
Start Automating
- Type /ai to start AI automation chat
- Use natural language: "Click the login button", "Fill out this form"
- Try complex workflows: "Research React best practices and save to notes"

🛠️ Development & Contributing

We love contributions! Here's how you can help make AIPex even better:

📖 For detailed development setup, build instructions, and contribution guidelines, please see DEVELOPMENT.md

Quick Start for Contributors

🏗️ Local Development: See DEVELOPMENT.md#local-development-setup
🔧 Building: See DEVELOPMENT.md#building-for-production
🤝 Contributing: See DEVELOPMENT.md#how-to-contribute
📊 Project Status: See DEVELOPMENT.md#development-status

📊 Tool Categories Overview

🗂️ Tab Management - 8 tools

Complete tab control and navigation:

get_all_tabs - Get all open tabs across all windows
get_current_tab - Get information about the currently active tab
switch_to_tab - Switch to a specific tab by ID
create_new_tab - Create a new tab with the specified URL
get_tab_info - Get detailed information about a specific tab
duplicate_tab - Duplicate an existing tab
close_tab - Close a specific tab
get_current_tab_content - Get the visible text content of the current tab

📄 Page Content & Interaction - 15 tools

Content extraction, analysis, and page interaction:

get_page_metadata - Get page metadata including title, description, keywords
extract_page_text - Extract text content with word count and reading time
get_page_links - Get all links from the current page
search_page_text - Search for text on the current page
get_interactive_elements - Get all interactive elements (links, buttons, inputs)
get_interactive_elements_optimized - Optimized version for complex pages
click_element - Click an element using CSS selector
summarize_page - Summarize page content with key points
fill_input - Fill an input field with text
clear_input - Clear the content of an input field
get_input_value - Get the current value of an input field
submit_form - Submit a form using CSS selector
get_form_elements - Get all form elements and input fields
scroll_to_element - Scroll to a DOM element and center it
highlight_element - Permanently highlight DOM elements
highlight_text_inline - Highlight specific words or phrases within text

⬇️ Downloads & Files - 4 tools

Download control and file management:

download_text_as_markdown - Download text content as markdown file
download_image - Download an image from base64 data
download_chat_images - Download multiple images from chat messages
download_current_chat_images - Download all images from current AI chat

📸 Screenshots - 3 tools

Visual capture and screenshot management:

capture_screenshot - Capture screenshot of current visible tab
capture_tab_screenshot - Capture screenshot of a specific tab by ID
capture_screenshot_to_clipboard - Capture screenshot and save to clipboard

🔧 Advanced Features - 3+ tools

Advanced browser automation and utilities:

Additional specialized tools for enhanced browser control
AI-powered content analysis and processing
Custom automation workflows

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Support & Community

🐛 Found a bug? Open an issue
💡 Have a feature request? Start a discussion
🤝 Want to contribute? See our Contributing Guide
💬 Need help? Join our community discussions

🏆 Contributors

Thank you to all the amazing contributors who help make AIPex better:

_ropzislaw
_{56 commits}

_Codexiaoyi
_{10 commits}

_{buttercannfly}
_{5 commits}

Total Contributors: 3 | Total Commits: 71

Want to contribute? Check out our Contributing Guide and help make AIPex even better!

🌟 Star History

Made with ❤️ by the AIPex Team

For Tasks:

Click tags to check more tools for each tasks

automate form filling extract data summarize content manage browser tabs capture screenshots

For Jobs:

automation engineer web developer data analyst quality assurance tester digital marketer

Alternative AI tools for AIPex

Similar Open Source Tools

AIPex

github

: 375

zotero-mcp

Zotero MCP is an open-source project that integrates AI capabilities with Zotero using the Model Context Protocol. It consists of a Zotero plugin and an MCP server, enabling AI assistants to search, retrieve, and cite references from Zotero library. The project features a unified architecture with an integrated MCP server, eliminating the need for a separate server process. It provides features like intelligent search, detailed reference information, filtering by tags and identifiers, aiding in academic tasks such as literature reviews and citation management.

github

: 99

layra

LAYRA is the world's first visual-native AI automation engine that sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. It empowers users to build next-generation intelligent systems with no limits or compromises. Built for Enterprise-Grade deployment, LAYRA features a modern frontend, high-performance backend, decoupled service architecture, visual-native multimodal document understanding, and a powerful workflow engine.

github

: 817

mcp-pointer

MCP Pointer is a local tool that combines an MCP Server with a Chrome Extension to allow users to visually select DOM elements in the browser and make textual context available to agentic coding tools like Claude Code. It bridges between the browser and AI tools via the Model Context Protocol, enabling real-time communication and compatibility with various AI tools. The tool extracts detailed information about selected elements, including text content, CSS properties, React component detection, and more, making it a valuable asset for developers working with AI-powered web development.

github

: 206

persistent-ai-memory

Persistent AI Memory System is a comprehensive tool that offers persistent, searchable storage for AI assistants. It includes features like conversation tracking, MCP tool call logging, and intelligent scheduling. The system supports multiple databases, provides enhanced memory management, and offers various tools for memory operations, schedule management, and system health checks. It also integrates with various platforms like LM Studio, VS Code, Koboldcpp, Ollama, and more. The system is designed to be modular, platform-agnostic, and scalable, allowing users to handle large conversation histories efficiently.

github

: 138

evi-run

evi-run is a powerful, production-ready multi-agent AI system built on Python using the OpenAI Agents SDK. It offers instant deployment, ultimate flexibility, built-in analytics, Telegram integration, and scalable architecture. The system features memory management, knowledge integration, task scheduling, multi-agent orchestration, custom agent creation, deep research, web intelligence, document processing, image generation, DEX analytics, and Solana token swap. It supports flexible usage modes like private, free, and pay mode, with upcoming features including NSFW mode, task scheduler, and automatic limit orders. The technology stack includes Python 3.11, OpenAI Agents SDK, Telegram Bot API, PostgreSQL, Redis, and Docker & Docker Compose for deployment.

github

: 74

astrsk

astrsk is a tool that pushes the boundaries of AI storytelling by offering advanced AI agents, customizable response formatting, and flexible prompt editing for immersive roleplaying experiences. It provides complete AI agent control, a visual flow editor for conversation flows, and ensures 100% local-first data storage. The tool is true cross-platform with support for various AI providers and modern technologies like React, TypeScript, and Tailwind CSS. Coming soon features include cross-device sync, enhanced session customization, and community features.

github

: 106

paelladoc

PAELLADOC is an intelligent documentation system that uses AI to analyze code repositories and generate comprehensive technical documentation. It offers a modular architecture with MECE principles, interactive documentation process, key features like Orchestrator and Commands, and a focus on context for successful AI programming. The tool aims to streamline documentation creation, code generation, and product management tasks for software development teams, providing a definitive standard for AI-assisted development documentation.

github

: 221

llamafarm

LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.

github

: 115

CrewAI-GUI

CrewAI-GUI is a Node-Based Frontend tool designed to revolutionize AI workflow creation. It empowers users to design complex AI agent interactions through an intuitive drag-and-drop interface, export designs to JSON for modularity and reusability, and supports both GPT-4 API and Ollama for flexible AI backend. The tool ensures cross-platform compatibility, allowing users to create AI workflows on Windows, Linux, or macOS efficiently.

github

: 88

RepoMaster

RepoMaster is an AI agent that leverages GitHub repositories to solve complex real-world tasks. It transforms how coding tasks are solved by automatically finding the right GitHub tools and making them work together seamlessly. Users can describe their tasks, and RepoMaster's AI analysis leads to auto discovery and smart execution, resulting in perfect outcomes. The tool provides a web interface for beginners and a command-line interface for advanced users, along with specialized agents for deep search, general assistance, and repository tasks.

github

: 167

ai-doc-gen

An AI-powered code documentation generator that automatically analyzes repositories and creates comprehensive documentation using advanced language models. The system employs a multi-agent architecture to perform specialized code analysis and generate structured documentation.

github

: 616

DeepSeekAI

DeepSeekAI is a browser extension plugin that allows users to interact with AI by selecting text on web pages and invoking the DeepSeek large model to provide AI responses. The extension enhances browsing experience by enabling users to get summaries or answers for selected text directly on the webpage. It features context text selection, API key integration, draggable and resizable window, AI streaming replies, Markdown rendering, one-click copy, re-answer option, code copy functionality, language switching, and multi-turn dialogue support. Users can install the extension from Chrome Web Store or Edge Add-ons, or manually clone the repository, install dependencies, and build the extension. Configuration involves entering the DeepSeek API key in the extension popup window to start using the AI-driven responses.

github

: 203

chat-ollama

ChatOllama is an open-source chatbot based on LLMs (Large Language Models). It supports a wide range of language models, including Ollama served models, OpenAI, Azure OpenAI, and Anthropic. ChatOllama supports multiple types of chat, including free chat with LLMs and chat with LLMs based on a knowledge base. Key features of ChatOllama include Ollama models management, knowledge bases management, chat, and commercial LLMs API keys management.

github

: 3.4k

DreamLayer

DreamLayer AI is an open-source Stable Diffusion WebUI designed for AI researchers, labs, and developers. It automates prompts, seeds, and metrics for benchmarking models, datasets, and samplers, enabling reproducible evaluations across multiple seeds and configurations. The tool integrates custom metrics and evaluation pipelines, providing a streamlined workflow for AI research. With features like automated benchmarking, reproducibility, built-in metrics, multi-modal readiness, and researcher-friendly interface, DreamLayer AI aims to simplify and accelerate the model evaluation process.

github

: 367

pixel-banner

Pixel Banner is a powerful Obsidian plugin that enhances note-taking by creating visually stunning headers with customizable banner images. It offers AI-generated banners, professional banner images from a store, local image support, and direct URL banners. Users can customize banner placement, appearance, display modes, and add decorative icons. The plugin provides efficient workflow with quick banner selection, command integration, and custom field names. It also offers smart organization features like folder-specific settings and image shuffling. Premium features include a token-based system for AI banners, banner history, and prompt inspiration. Enhance your Obsidian experience with beautiful, intelligent banners that make your notes visually distinctive and organized.

github

: 153

For similar tasks

fuji-web

Fuji-Web is an intelligent AI partner designed for full browser automation. It autonomously navigates websites and performs tasks on behalf of the user while providing explanations for each action step. Users can easily install the extension in their browser, access the Fuji icon to input tasks, and interact with the tool to streamline web browsing tasks. The tool aims to enhance user productivity by automating repetitive web actions and providing a seamless browsing experience.

github

: 252

AIPex

github

: 375

skyvern

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions. Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them. This approach gives us a few advantages: 1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code 2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16 2. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) Want to see examples of Skyvern in action? Jump to #real-world-examples-of- skyvern

github

: 14.5k

airbyte-connectors

This repository contains Airbyte connectors used in Faros and Faros Community Edition platforms as well as Airbyte Connector Development Kit (CDK) for JavaScript/TypeScript.

github

: 121

open-parse

Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.

github

: 2.4k

unstract

Unstract is a no-code platform that enables users to launch APIs and ETL pipelines to structure unstructured documents. With Unstract, users can go beyond co-pilots by enabling machine-to-machine automation. Unstract's Prompt Studio provides a simple, no-code approach to creating prompts for LLMs, vector databases, embedding models, and text extractors. Users can then configure Prompt Studio projects as API deployments or ETL pipelines to automate critical business processes that involve complex documents. Unstract supports a wide range of LLM providers, vector databases, embeddings, text extractors, ETL sources, and ETL destinations, providing users with the flexibility to choose the best tools for their needs.

github

: 5.8k

Dot

Dot is a standalone, open-source application designed for seamless interaction with documents and files using local LLMs and Retrieval Augmented Generation (RAG). It is inspired by solutions like Nvidia's Chat with RTX, providing a user-friendly interface for those without a programming background. Pre-packaged with Mistral 7B, Dot ensures accessibility and simplicity right out of the box. Dot allows you to load multiple documents into an LLM and interact with them in a fully local environment. Supported document types include PDF, DOCX, PPTX, XLSX, and Markdown. Users can also engage with Big Dot for inquiries not directly related to their documents, similar to interacting with ChatGPT. Built with Electron JS, Dot encapsulates a comprehensive Python environment that includes all necessary libraries. The application leverages libraries such as FAISS for creating local vector stores, Langchain, llama.cpp & Huggingface for setting up conversation chains, and additional tools for document management and interaction.

github

: 726

instructor

Instructor is a Python library that makes it a breeze to work with structured outputs from large language models (LLMs). Built on top of Pydantic, it provides a simple, transparent, and user-friendly API to manage validation, retries, and streaming responses. Get ready to supercharge your LLM workflows!

github

: 7.7k

For similar jobs

aiscript

AiScript is a lightweight scripting language that runs on JavaScript. It supports arrays, objects, and functions as first-class citizens, and is easy to write without the need for semicolons or commas. AiScript runs in a secure sandbox environment, preventing infinite loops from freezing the host. It also allows for easy provision of variables and functions from the host.

github

: 227

askui

AskUI is a reliable, automated end-to-end automation tool that only depends on what is shown on your screen instead of the technology or platform you are running on.

github

: 83

bots

The 'bots' repository is a collection of guides, tools, and example bots for programming bots to play video games. It provides resources on running bots live, installing the BotLab client, debugging bots, testing bots in simulated environments, and more. The repository also includes example bots for games like EVE Online, Tribal Wars 2, and Elvenar. Users can learn about developing bots for specific games, syntax of the Elm programming language, and tools for memory reading development. Additionally, there are guides on bot programming, contributing to BotLab, and exploring Elm syntax and core library.

github

: 179

ain

Ain is a terminal HTTP API client designed for scripting input and processing output via pipes. It allows flexible organization of APIs using files and folders, supports shell-scripts and executables for common tasks, handles url-encoding, and enables sharing the resulting curl, wget, or httpie command-line. Users can put things that change in environment variables or .env-files, and pipe the API output for further processing. Ain targets users who work with many APIs using a simple file format and uses curl, wget, or httpie to make the actual calls.

github

: 592

LaVague

LaVague is an open-source Large Action Model framework that uses advanced AI techniques to compile natural language instructions into browser automation code. It leverages Selenium or Playwright for browser actions. Users can interact with LaVague through an interactive Gradio interface to automate web interactions. The tool requires an OpenAI API key for default examples and offers a Playwright integration guide. Contributors can help by working on outlined tasks, submitting PRs, and engaging with the community on Discord. The project roadmap is available to track progress, but users should exercise caution when executing LLM-generated code using 'exec'.

github

: 5.8k

robocorp

Robocorp is a platform that allows users to create, deploy, and operate Python automations and AI actions. It provides an easy way to extend the capabilities of AI agents, assistants, and copilots with custom actions written in Python. Users can create and deploy tools, skills, loaders, and plugins that securely connect any AI Assistant platform to their data and applications. The Robocorp Action Server makes Python scripts compatible with ChatGPT and LangChain by automatically creating and exposing an API based on function declaration, type hints, and docstrings. It simplifies the process of developing and deploying AI actions, enabling users to interact with AI frameworks effortlessly.

github

: 501

Open-Interface

Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.

github

: 934

AI-Case-Sorter-CS7.1

AI-Case-Sorter-CS7.1 is a project focused on building a case sorter using machine vision and machine learning AI to sort cases by headstamp. The repository includes Arduino code and 3D models necessary for the project.

github

: 67