open-computer-use

The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.

Stars: 312

Visit

Open Computer Use is an open-source platform that enables AI agents to control computers through browser automation, terminal access, and desktop interaction. It is designed for developers to create autonomous AI workflows. The platform allows agents to browse the web, run terminal commands, control desktop applications, orchestrate multi-agents, stream execution, and is 100% open-source and self-hostable. It provides capabilities similar to Anthropic's Claude Computer Use but is fully open-source and extensible.

README:

💻 Open Computer Use - Autonomous Computer Using Agents at Scale

Your AI Agent That Actually Uses Computers Like Humans Do

Open Computer Use is an open-source platform that gives AI agents real computer control through browser automation, terminal access, and desktop interaction. Built for developers who want to create truly autonomous AI workflows.

Website • Discord • X

Preview

✨ What Makes This Special?

Unlike traditional AI assistants that only talk about tasks, Open Computer Use enables AI agents to actually perform them by:

🌐 Browsing the web like a human (search, click, fill forms, extract data)
💻 Running terminal commands and managing files
🖱️ Controlling desktop applications with full UI automation
🤖 Multi-agent orchestration that breaks down complex tasks
🔄 Streaming execution with real-time feedback
🎯 100% open-source and self-hostable

"Computer use" capabilities similar to Anthropic's Claude Computer Use, but fully open-source and extensible.

🎬 See It In Action

Browser Automation

AI agent searching, navigating, and interacting with websites autonomously

▶️ Watch: AI Agent Browsing and Playing

Terminal Operations & Development

Executing commands, managing files, and running complex workflows

▶️ Watch: Quant Trading & Research on QuantConnect

Multi-Agent Orchestration

Complex tasks broken down and executed by specialized agents

▶️ Watch: Building Nvidia Options Dashboard

Advanced Features

Human-in-the-loop control and intelligent collaboration

▶️ Watch: AI Agent with Human Intervention

🎯 Core Capabilities

🌐 Browser Agent Search-first strategy using Google Search API Smart web navigation with automatic form filling Element detection and intelligent clicking Multi-tab management for parallel workflows Page context extraction for AI understanding Screenshot capture for visual verification	💻 Terminal Agent Command execution in isolated environments File operations (read, write, edit, delete) Directory management with full control Script execution (Python, Node.js, bash) Package installation and environment setup Output streaming with real-time feedback
🖱️ Desktop Agent UI element detection using computer vision Mouse and keyboard control for any application Window management (focus, resize, arrange) Screenshot analysis for context awareness OCR capabilities for text extraction Cross-platform support (Linux desktop)	🤖 Multi-Agent System Task decomposition by AI planner Sequential execution with context passing Specialized agents for different capabilities Error handling with automatic retries User interaction when clarification needed Execution reports with detailed summaries

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (Next.js 15)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │  Chat UI     │  │  Model       │  │  VM          │           │
│  │  Components  │  │  Selection   │  │  Management  │           │
│  └──────────────┘  └──────────────┘  └──────────────┘           │
└─────────────────────────────────────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Backend API (FastAPI)                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │           Multi-Agent Executor Service                   │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   │
│  │  │   Planner   │→ │   Browser   │→ │   Terminal  │       │   │
│  │  │    Agent    │  │    Agent    │  │    Agent    │       │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘       │   │
│  └──────────────────────────────────────────────────────────┘   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │   WebSocket  │  │   Database   │  │   Billing    │           │
│  │   VM Control │  │   Service    │  │   Service    │           │
│  └──────────────┘  └──────────────┘  └──────────────┘           │
└─────────────────────────────────────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│               Docker VM (Ubuntu 22.04 + XFCE)                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  Chrome Browser  │  Terminal  │  Desktop Apps  │  Tools  │   │
│  └──────────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │         WebSocket Agent Server (Port 8080)               │   │
│  │         VNC Server (Port 5900)                           │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Node.js 20+ and npm
Python 3.10+ and pip
Docker and Docker Compose
Supabase account (free tier works)
API keys for AI providers (OpenAI, Anthropic, etc.)

1. Clone the Repository

git clone https://github.com/LLmHub-dev/open-computer-use.git
cd open-computer-use

2. Set Up Supabase Database

Create Supabase Project

Go to Supabase and create a new project
Wait for the project to finish setting up
Go to Project Settings → API to get your keys

Run Database Schema

Execute the schema to create all required tables:

# Option A: Using Supabase Dashboard
# 1. Go to SQL Editor in your Supabase dashboard
# 2. Copy contents of supabase/schema.sql
# 3. Paste and run the SQL

# Option B: Using Supabase CLI (recommended)
npm install -g supabase
supabase login
supabase link --project-ref your-project-ref
supabase db push

Or manually run the schema file:

psql -h db.your-project.supabase.co -U postgres -d postgres -f supabase/schema.sql

This creates all necessary tables:

👤 Users & Auth: users, user_preferences, user_keys
💬 Chat System: chats, messages, chat_participants, chat_attachments
🤖 AI Agents: machine_sessions, machine_usage, machine_ai_actions
💳 Billing: user_credits, credit_transactions, stripe_customers, subscription_plans
📊 Projects: projects, user_machines, machine_snapshots

3. Set Up Environment Variables

# Frontend
cp .env.example .env
# Edit .env with your configuration

# Backend
cp backend/.env.example backend/.env
# Edit backend/.env with your configuration

Required Variables

Supabase (Required)

NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-from-supabase-dashboard
SUPABASE_SERVICE_ROLE=your-service-role-key-from-supabase-dashboard

Security Keys (Required)

# Generate with: openssl rand -hex 32
ENCRYPTION_KEY=your-generated-32-byte-hex-string
CSRF_SECRET=your-generated-32-byte-hex-string

Google Search API (Required for web search)

GOOGLE_SEARCH_KEY=your-google-api-key
GOOGLE_SEARCH_CX=your-custom-search-engine-id

Get these from Google Cloud Console:

Enable Custom Search API
Create API key
Create Custom Search Engine at programmablesearchengine.google.com

AI Provider Keys (Choose at least one)

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Azure OpenAI (Optional)
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_VERSION=2024-02-15-preview

Azure Container Instances (Optional - for cloud VM deployment)

AZURE_SUBSCRIPTION_ID=your-subscription-id
AZURE_RESOURCE_GROUP=your-resource-group
AZURE_TENANT_ID=your-tenant-id
AZURE_CLIENT_ID=your-client-id
AZURE_CLIENT_SECRET=your-client-secret
AZURE_CONTAINER_REGISTRY=your-registry.azurecr.io
AZURE_DESKTOP_IMAGE=your-registry.azurecr.io/ai-desktop:latest

Stripe (Optional - for billing)

STRIPE_API_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...

4. Install Dependencies

# Frontend
npm install

# Backend
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..

5. Start Development Servers

Option A: Using Docker (Recommended)

# Start all services
docker-compose up --build

# Access the application
# Frontend: http://localhost:3000
# Backend: http://localhost:8001

Option B: Manual Start

# Terminal 1: Frontend
npm run dev

# Terminal 2: Backend
cd backend
python main.py

# Terminal 3: AI Desktop (if needed)
docker-compose -f docker-compose.ai-desktop.yml up --build

6. Create Your First Agent Session

Open http://localhost:3000
Sign up / Log in with Supabase Auth
Start a new chat
Try a command: "Search for the latest AI news and summarize the top 3 articles"
Watch your AI agent work! 🎉

🎨 Features

Multi-Provider AI Support

Connect your own API keys and switch between providers mid-conversation:

✅ OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
✅ Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
✅ Google (Gemini Pro, Gemini 1.5)
✅ Azure OpenAI (Enterprise deployments)
✅ xAI (Grok models)
✅ Mistral AI (Mistral Large, Mixtral)
✅ Perplexity (Online models)
✅ OpenRouter (Access to 100+ models)

Bring Your Own Keys (BYOK)

All API keys are encrypted and stored securely. You maintain full control over your AI costs and usage.

Real-Time Streaming

Watch your agents work in real-time with:

📊 Task progress indicators
🛠️ Tool call visualization
📸 Live screenshots from VM
💬 Streaming responses
📋 Detailed execution logs

Advanced Task Planning

The AI automatically:

Analyzes your request
Breaks down into subtasks
Assigns to specialized agents
Executes with full context
Reports detailed results

Secure VM Isolation

Each agent session runs in an isolated Docker container:

🔒 Sandboxed execution environment
🔄 Ephemeral containers (no data persistence)
🌐 Network isolation options
📊 Resource limits and monitoring

📚 Use Cases

🔍 Research & Data Gathering Web scraping and data extraction Competitive analysis Market research automation Academic paper collection	🧪 Testing & QA Automated UI testing Cross-browser testing E2E test generation Regression testing
📝 Content Creation Screenshot and documentation Tutorial generation Workflow recording Demo creation	🔧 DevOps & Automation Server configuration Deployment automation Log analysis System monitoring
🛒 E-commerce Operations Price monitoring Product research Order management Inventory tracking	📊 Business Intelligence Report generation Dashboard monitoring Data analysis workflows KPI tracking

🛠️ Technology Stack

Frontend

Framework: Next.js 15 (App Router, React 19)
Language: TypeScript
Styling: Tailwind CSS 4
UI Components: Radix UI, shadcn/ui
State Management: Zustand
AI SDK: Vercel AI SDK
Database: Supabase (Auth + Postgres)
Payments: Stripe

Backend

Framework: FastAPI (Python 3.10+)
Async Runtime: asyncio, uvicorn
WebSocket: websockets library
AI Providers: openai, anthropic, google-generativeai
Search: Google Custom Search API
Caching: Redis (optional)
Image Processing: Pillow, ImageMagick

Infrastructure

Containerization: Docker, Docker Compose
VM Environment: Ubuntu 22.04 LTS + XFCE
Browser: Google Chrome (with remote debugging)
Automation: Selenium, Playwright, PyAutoGUI
Cloud: Azure Container Instances (optional)

🤝 Contributing

We love contributions! Here's how you can help:

🐛 Found a Bug?

Open an issue with:

Clear description of the bug
Steps to reproduce
Expected vs actual behavior
Screenshots or logs

💡 Have a Feature Idea?

Check if it's already requested
Open a new issue with the enhancement label
Describe your use case and proposed solution

🔧 Want to Contribute Code?

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Write tests if applicable
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Please read our Contributing Guide for detailed guidelines.

📖 Documentation

💬 Discord Community

🗺️ Roadmap

Q1 2026

[ ] Multi-VM orchestration (parallel agents)
[ ] Advanced workflow builder (visual programming)
[ ] Marketplace for custom agents
[ ] Windows and macOS VM support
[ ] Mobile app (iOS/Android)

Q2 2026

[ ] Plugin system for custom tools
[ ] Collaborative agent sessions
[ ] Advanced analytics dashboard
[ ] Enterprise SSO support
[ ] Self-hosted cloud deployment guides

Future

[ ] Voice control integration
[ ] Video understanding capabilities
[ ] Agent memory and learning
[ ] Multi-modal agent interactions
[ ] Community agent templates

Vote on features: Feature Requests

📊 Performance & Benchmarks

Metric	Value
Average Task Completion	~45 seconds
Concurrent Sessions	50+ (per server)
Browser Navigation	~2s per page
Tool Call Latency	<500ms
VM Startup Time	~15 seconds
Memory per Session	~2GB

Benchmarks measured on: 4 CPU cores, 8GB RAM, SSD storage

⚠️ Responsible AI Use

Open Computer Use gives AI agents significant autonomy. Please use responsibly:

✅ Do: Automate repetitive tasks, research, testing, content creation
❌ Don't: Violate terms of service, spam, scrape without permission
🔒 Security: Never share credentials, use isolated environments
📋 Compliance: Follow data protection laws (GDPR, CCPA, etc.)
🤝 Ethics: Respect website robots.txt and rate limits

Read our Responsible Use Guidelines for more details.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Apache License 2.0

Copyright (c) 2025 Open Computer Use Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

🙏 Acknowledgments

Built with amazing open-source projects:

Next.js - The React Framework
FastAPI - Modern Python web framework
Supabase - Open source Firebase alternative
Vercel AI SDK - AI toolkit for TypeScript
Radix UI - Unstyled, accessible components
Anthropic - Inspiration from Claude Computer Use
Docker - Containerization platform

Special thanks to all our contributors! 💙

🌟 Star History

💬 Community & Support

💬 Discord: Join our community server
🐦 Twitter: Follow @llmhub_dev
📧 Email: [email protected]
🐛 Issues: GitHub Issues
💡 Discussions: GitHub Discussions

⭐ Star us on GitHub if you find this useful!

Made with ❤️ by the Open Computer Use community

Star on GitHub • Join Discord

For Tasks:

Click tags to check more tools for each tasks

automate web browsing run terminal commands control desktop apps orchestrate tasks stream execution

For Jobs:

automation engineer ai developer software developer data scientist web developer

Alternative AI tools for open-computer-use

Similar Open Source Tools

open-computer-use

github

: 312

Shannon

Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.

github

: 258

helix

HelixML is a private GenAI platform that allows users to deploy the best of open AI in their own data center or VPC while retaining complete data security and control. It includes support for fine-tuning models with drag-and-drop functionality. HelixML brings the best of open source AI to businesses in an ergonomic and scalable way, optimizing the tradeoff between GPU memory and latency.

github

: 713

vibium

Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.

github

: 2.6k

vllm-mlx

vLLM-MLX is a tool that brings native Apple Silicon GPU acceleration to vLLM by integrating Apple's ML framework with unified memory and Metal kernels. It offers optimized LLM inference with KV cache and quantization, vision-language models for multimodal inference, speech-to-text and text-to-speech with native voices, text embeddings for semantic search and RAG, and more. Users can benefit from features like multimodal support for text, image, video, and audio, native GPU acceleration on Apple Silicon, compatibility with OpenAI API, Anthropic Messages API, reasoning models extraction, integration with external tools via Model Context Protocol, memory-efficient caching, and high throughput for multiple concurrent users.

github

: 369

aiohomematic

AIO Homematic (hahomematic) is a lightweight Python 3 library for controlling and monitoring HomeMatic and HomematicIP devices, with support for third-party devices/gateways. It automatically creates entities for device parameters, offers custom entity classes for complex behavior, and includes features like caching paramsets for faster restarts. Designed to integrate with Home Assistant, it requires specific firmware versions for HomematicIP devices. The public API is defined in modules like central, client, model, exceptions, and const, with example usage provided. Useful links include changelog, data point definitions, troubleshooting, and developer resources for architecture, data flow, model extension, and Home Assistant lifecycle.

github

: 162

solo-server

Solo Server is a lightweight server designed for managing hardware-aware inference. It provides seamless setup through a simple CLI and HTTP servers, an open model registry for pulling models from platforms like Ollama and Hugging Face, cross-platform compatibility for effortless deployment of AI models on hardware, and a configurable framework that auto-detects hardware components (CPU, GPU, RAM) and sets optimal configurations.

github

: 225

mesh

MCP Mesh is an open-source control plane for MCP traffic that provides a unified layer for authentication, routing, and observability. It replaces multiple integrations with a single production endpoint, simplifying configuration management. Built for multi-tenant organizations, it offers workspace/project scoping for policies, credentials, and logs. With core capabilities like MeshContext, AccessControl, and OpenTelemetry, it ensures fine-grained RBAC, full tracing, and metrics for tools and workflows. Users can define tools with input/output validation, access control checks, audit logging, and OpenTelemetry traces. The project structure includes apps for full-stack MCP Mesh, encryption, observability, and more, with deployment options ranging from Docker to Kubernetes. The tech stack includes Bun/Node runtime, TypeScript, Hono API, React, Kysely ORM, and Better Auth for OAuth and API keys.

github

: 331

AgentX

AgentX is a next-generation open-source AI agent development framework and runtime platform. It provides an event-driven runtime with a simple framework and minimal UI. The platform is ready-to-use and offers features like multi-user support, session persistence, real-time streaming, and Docker readiness. Users can build AI Agent applications with event-driven architecture using TypeScript for server-side (Node.js) and client-side (Browser/React) development. AgentX also includes comprehensive documentation, core concepts, guides, API references, and various packages for different functionalities. The architecture follows an event-driven design with layered components for server-side and client-side interactions.

github

: 75

pilot

Pilot is an AI tool designed to streamline the process of handling tickets from GitHub, Linear, Jira, or Asana. It plans the implementation, writes the code, runs tests, and opens a PR for you to review and merge. With features like Autopilot, Epic Decomposition, Self-Review, and more, Pilot aims to automate the ticket handling process and reduce the time spent on prioritizing and completing tasks. It integrates with various platforms, offers intelligence features, and provides real-time visibility through a dashboard. Pilot is free to use, with costs associated with Claude API usage. It is designed for bug fixes, small features, refactoring, tests, docs, and dependency updates, but may not be suitable for large architectural changes or security-critical code.

github

: 71

memsearch

Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.

github

: 188

giztoy

Giztoy is a multi-language framework designed for building AI toys and intelligent applications. It provides a unified abstraction layer that spans from resource-constrained embedded systems to powerful cloud services. With features like native support for ESP32 and other MCUs, cross-platform app development, a unified build system with Bazel, an agent framework for AI agents, audio processing capabilities, support for various Large Language Models, real-time models with WebSocket streaming, secure transport protocols, and multi-language implementations in Go, Rust, Zig, and C/C++, Giztoy serves as a versatile tool for developing AI-powered applications across different platforms and devices.

github

: 218

boxlite

BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.

github

: 1.1k

MediCareAI

MediCareAI is an intelligent disease management system powered by AI, designed for patient follow-up and disease tracking. It integrates medical guidelines, AI-powered diagnosis, and document processing to provide comprehensive healthcare support. The system includes features like user authentication, patient management, AI diagnosis, document processing, medical records management, knowledge base system, doctor collaboration platform, and admin system. It ensures privacy protection through automatic PII detection and cleaning for document sharing.

github

: 83

claudex

Claudex is an open-source, self-hosted Claude Code UI that runs entirely on your machine. It provides multiple sandboxes, allows users to use their own plans, offers a full IDE experience with VS Code in the browser, and is extensible with skills, agents, slash commands, and MCP servers. Users can run AI agents in isolated environments, view and interact with a browser via VNC, switch between multiple AI providers, automate tasks with Celery workers, and enjoy various chat features and preview capabilities. Claudex also supports marketplace plugins, secrets management, integrations like Gmail, and custom instructions. The tool is configured through providers and supports various providers like Anthropic, OpenAI, OpenRouter, and Custom. It has a tech stack consisting of React, FastAPI, Python, PostgreSQL, Celery, Redis, and more.

github

: 202

Zen-Ai-Pentest

Zen-AI-Pentest is a professional AI-powered penetration testing framework designed for security professionals, bug bounty hunters, and enterprise security teams. It combines cutting-edge language models with 20+ integrated security tools, offering comprehensive security assessments. The framework is security-first with multiple safety controls, extensible with a plugin system, cloud-native for deployment on AWS, Azure, or GCP, and production-ready with CI/CD, monitoring, and support. It features autonomous AI agents, risk analysis, exploit validation, benchmarking, CI/CD integration, AI persona system, subdomain scanning, and multi-cloud & virtualization support.

github

: 192

For similar tasks

crawlee

Crawlee is a web scraping and browser automation library that helps you build reliable scrapers quickly. Your crawlers will appear human-like and fly under the radar of modern bot protections even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data, and store it to disk or cloud while staying configurable to suit your project's needs.

github

: 21.7k

rpaframework

RPA Framework is an open-source collection of libraries and tools for Robotic Process Automation (RPA), designed to be used with Robot Framework and Python. It offers well-documented core libraries for Software Robot Developers, optimized for Robocorp Control Room and Developer Tools, and accepts external contributions. The project includes various libraries for tasks like archiving, browser automation, date/time manipulations, cloud services integration, encryption operations, database interactions, desktop automation, document processing, email operations, Excel manipulation, file system operations, FTP interactions, web API interactions, image manipulation, AI services, and more. The development of the repository is Python-based and requires Python version 3.8+, with tooling based on poetry and invoke for compiling, building, and running the package. The project is licensed under the Apache License 2.0.

github

: 1.1k

apify-mcp-server

The Apify MCP Server enables AI agents to extract data from various websites using ready-made scrapers and automation tools. It supports OAuth for easy connection from clients like Claude.ai or Visual Studio Code. The server also supports Skyfire agentic payments for AI agents to pay for Actor runs without an API token. Compatible with various clients adhering to the Model Context Protocol, it allows dynamic tool discovery and interaction with Apify Actors. The server provides tools for interacting with Apify Actors, dynamic tool discovery, and telemetry data collection. It offers a set of example prompts and resources for users to explore and interact with Apify through MCP.

github

: 777

open-computer-use

github

: 312

dbt-airflow

A Python package that helps Data and Analytics engineers render dbt projects in Apache Airflow DAGs. It enables teams to automatically render their dbt projects in a granular level, creating individual Airflow tasks for every model, seed, snapshot, and test within the dbt project. This allows for full control at the task-level, improving visibility and management of data models within the team.

github

: 52

blades

Blades is a multimodal AI Agent framework in Go, supporting custom models, tools, memory, middleware, and more. It is well-suited for multi-turn conversations, chain reasoning, and structured output. The framework provides core components like Agent, Prompt, Chain, ModelProvider, Tool, Memory, and Middleware, enabling developers to build intelligent applications with flexible configuration and high extensibility. Blades leverages the characteristics of Go to achieve high decoupling and efficiency, making it easy to integrate different language model services and external tools. The project is in its early stages, inviting Go developers and AI enthusiasts to contribute and explore the possibilities of building AI applications in Go.

github

: 393

flyte-sdk

Flyte 2 SDK is a pure Python tool for type-safe, distributed orchestration of agents, ML pipelines, and more. It allows users to write data pipelines, ML training jobs, and distributed compute in Python without any DSL constraints. With features like async-first parallelism and fine-grained observability, Flyte 2 offers a seamless workflow experience. Users can leverage core concepts like TaskEnvironments for container configuration, pure Python workflows for flexibility, and async parallelism for distributed execution. Advanced features include sub-task observability with tracing and remote task execution. The tool also provides native Jupyter integration for running and monitoring workflows directly from notebooks. Configuration and deployment are made easy with configuration files and commands for deploying and running workflows. Flyte 2 is licensed under the Apache 2.0 License.

github

: 67

kilocode

Kilo Code is an open-source VS Code AI agent that allows users to generate code from natural language, check its own work, run terminal commands, automate the browser, and utilize the latest AI models. It offers features like task automation, automated refactoring, and integration with MCP servers. Users can access 400+ AI models and benefit from transparent pricing. Kilo Code is a fork of Roo Code and Cline, with improvements and unique features developed independently.

github

: 15.5k

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 13.7k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 186

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529

open-computer-use

README:

💻 Open Computer Use - Autonomous Computer Using Agents at Scale

Your AI Agent That Actually Uses Computers Like Humans Do

Preview

✨ What Makes This Special?

🎬 See It In Action

Browser Automation

Terminal Operations & Development

Multi-Agent Orchestration

Advanced Features

🎯 Core Capabilities

🌐 Browser Agent

💻 Terminal Agent

🖱️ Desktop Agent

🤖 Multi-Agent System

🏗️ Architecture

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Set Up Supabase Database

Create Supabase Project

Run Database Schema

3. Set Up Environment Variables

Required Variables

4. Install Dependencies

5. Start Development Servers

6. Create Your First Agent Session

🎨 Features

Multi-Provider AI Support

Bring Your Own Keys (BYOK)

Real-Time Streaming

Advanced Task Planning

Secure VM Isolation

📚 Use Cases

🔍 Research & Data Gathering

🧪 Testing & QA

📝 Content Creation

🔧 DevOps & Automation

🛒 E-commerce Operations

📊 Business Intelligence

🛠️ Technology Stack

Frontend

Backend

Infrastructure

🤝 Contributing

🐛 Found a Bug?

💡 Have a Feature Idea?

🔧 Want to Contribute Code?

📖 Documentation

🗺️ Roadmap

Q1 2026

Q2 2026

Future

📊 Performance & Benchmarks

⚠️ Responsible AI Use

📄 License

🙏 Acknowledgments

🌟 Star History

💬 Community & Support

⭐ Star us on GitHub if you find this useful!

For Tasks:

For Jobs:

Alternative AI tools for open-computer-use

Similar Open Source Tools

open-computer-use

Shannon

helix

vibium

vllm-mlx

aiohomematic

solo-server

mesh

AgentX

pilot

memsearch

giztoy

boxlite

MediCareAI

claudex