open-computer-use
The Open Framework for autonomous virtual computer agents at scale, fully open-source, safe, auditable, and production-ready.
Stars: 312
Open Computer Use is an open-source platform that enables AI agents to control computers through browser automation, terminal access, and desktop interaction. It is designed for developers to create autonomous AI workflows. The platform allows agents to browse the web, run terminal commands, control desktop applications, orchestrate multi-agents, stream execution, and is 100% open-source and self-hostable. It provides capabilities similar to Anthropic's Claude Computer Use but is fully open-source and extensible.
README:
Open Computer Use is an open-source platform that gives AI agents real computer control through browser automation, terminal access, and desktop interaction. Built for developers who want to create truly autonomous AI workflows.
Unlike traditional AI assistants that only talk about tasks, Open Computer Use enables AI agents to actually perform them by:
- π Browsing the web like a human (search, click, fill forms, extract data)
- π» Running terminal commands and managing files
- π±οΈ Controlling desktop applications with full UI automation
- π€ Multi-agent orchestration that breaks down complex tasks
- π Streaming execution with real-time feedback
- π― 100% open-source and self-hostable
"Computer use" capabilities similar to Anthropic's Claude Computer Use, but fully open-source and extensible.
AI agent searching, navigating, and interacting with websites autonomously
Executing commands, managing files, and running complex workflows
Complex tasks broken down and executed by specialized agents
Human-in-the-loop control and intelligent collaboration
|
|
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js 15) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Chat UI β β Model β β VM β β
β β Components β β Selection β β Management β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend API (FastAPI) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Multi-Agent Executor Service β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Planner ββ β Browser ββ β Terminal β β β
β β β Agent β β Agent β β Agent β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β WebSocket β β Database β β Billing β β
β β VM Control β β Service β β Service β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker VM (Ubuntu 22.04 + XFCE) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Chrome Browser β Terminal β Desktop Apps β Tools β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β WebSocket Agent Server (Port 8080) β β
β β VNC Server (Port 5900) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Node.js 20+ and npm
- Python 3.10+ and pip
- Docker and Docker Compose
- Supabase account (free tier works)
- API keys for AI providers (OpenAI, Anthropic, etc.)
git clone https://github.com/LLmHub-dev/open-computer-use.git
cd open-computer-use- Go to Supabase and create a new project
- Wait for the project to finish setting up
- Go to Project Settings β API to get your keys
Execute the schema to create all required tables:
# Option A: Using Supabase Dashboard
# 1. Go to SQL Editor in your Supabase dashboard
# 2. Copy contents of supabase/schema.sql
# 3. Paste and run the SQL
# Option B: Using Supabase CLI (recommended)
npm install -g supabase
supabase login
supabase link --project-ref your-project-ref
supabase db pushOr manually run the schema file:
psql -h db.your-project.supabase.co -U postgres -d postgres -f supabase/schema.sqlThis creates all necessary tables:
- π€ Users & Auth: users, user_preferences, user_keys
- π¬ Chat System: chats, messages, chat_participants, chat_attachments
- π€ AI Agents: machine_sessions, machine_usage, machine_ai_actions
- π³ Billing: user_credits, credit_transactions, stripe_customers, subscription_plans
- π Projects: projects, user_machines, machine_snapshots
# Frontend
cp .env.example .env
# Edit .env with your configuration
# Backend
cp backend/.env.example backend/.env
# Edit backend/.env with your configurationSupabase (Required)
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-from-supabase-dashboard
SUPABASE_SERVICE_ROLE=your-service-role-key-from-supabase-dashboardSecurity Keys (Required)
# Generate with: openssl rand -hex 32
ENCRYPTION_KEY=your-generated-32-byte-hex-string
CSRF_SECRET=your-generated-32-byte-hex-stringGoogle Search API (Required for web search)
GOOGLE_SEARCH_KEY=your-google-api-key
GOOGLE_SEARCH_CX=your-custom-search-engine-idGet these from Google Cloud Console:
- Enable Custom Search API
- Create API key
- Create Custom Search Engine at programmablesearchengine.google.com
AI Provider Keys (Choose at least one)
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
# Azure OpenAI (Optional)
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_VERSION=2024-02-15-previewAzure Container Instances (Optional - for cloud VM deployment)
AZURE_SUBSCRIPTION_ID=your-subscription-id
AZURE_RESOURCE_GROUP=your-resource-group
AZURE_TENANT_ID=your-tenant-id
AZURE_CLIENT_ID=your-client-id
AZURE_CLIENT_SECRET=your-client-secret
AZURE_CONTAINER_REGISTRY=your-registry.azurecr.io
AZURE_DESKTOP_IMAGE=your-registry.azurecr.io/ai-desktop:latestStripe (Optional - for billing)
STRIPE_API_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...# Frontend
npm install
# Backend
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..Option A: Using Docker (Recommended)
# Start all services
docker-compose up --build
# Access the application
# Frontend: http://localhost:3000
# Backend: http://localhost:8001Option B: Manual Start
# Terminal 1: Frontend
npm run dev
# Terminal 2: Backend
cd backend
python main.py
# Terminal 3: AI Desktop (if needed)
docker-compose -f docker-compose.ai-desktop.yml up --build- Open http://localhost:3000
- Sign up / Log in with Supabase Auth
- Start a new chat
- Try a command: "Search for the latest AI news and summarize the top 3 articles"
- Watch your AI agent work! π
Connect your own API keys and switch between providers mid-conversation:
- β OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
- β Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
- β Google (Gemini Pro, Gemini 1.5)
- β Azure OpenAI (Enterprise deployments)
- β xAI (Grok models)
- β Mistral AI (Mistral Large, Mixtral)
- β Perplexity (Online models)
- β OpenRouter (Access to 100+ models)
All API keys are encrypted and stored securely. You maintain full control over your AI costs and usage.
Watch your agents work in real-time with:
- π Task progress indicators
- π οΈ Tool call visualization
- πΈ Live screenshots from VM
- π¬ Streaming responses
- π Detailed execution logs
The AI automatically:
- Analyzes your request
- Breaks down into subtasks
- Assigns to specialized agents
- Executes with full context
- Reports detailed results
Each agent session runs in an isolated Docker container:
- π Sandboxed execution environment
- π Ephemeral containers (no data persistence)
- π Network isolation options
- π Resource limits and monitoring
|
|
|
|
|
|
- Framework: Next.js 15 (App Router, React 19)
- Language: TypeScript
- Styling: Tailwind CSS 4
- UI Components: Radix UI, shadcn/ui
- State Management: Zustand
- AI SDK: Vercel AI SDK
- Database: Supabase (Auth + Postgres)
- Payments: Stripe
- Framework: FastAPI (Python 3.10+)
- Async Runtime: asyncio, uvicorn
- WebSocket: websockets library
- AI Providers: openai, anthropic, google-generativeai
- Search: Google Custom Search API
- Caching: Redis (optional)
- Image Processing: Pillow, ImageMagick
- Containerization: Docker, Docker Compose
- VM Environment: Ubuntu 22.04 LTS + XFCE
- Browser: Google Chrome (with remote debugging)
- Automation: Selenium, Playwright, PyAutoGUI
- Cloud: Azure Container Instances (optional)
We love contributions! Here's how you can help:
Open an issue with:
- Clear description of the bug
- Steps to reproduce
- Expected vs actual behavior
- Screenshots or logs
- Check if it's already requested
- Open a new issue with the
enhancementlabel - Describe your use case and proposed solution
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Write tests if applicable
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
Please read our Contributing Guide for detailed guidelines.
- π¬ Discord Community
- [ ] Multi-VM orchestration (parallel agents)
- [ ] Advanced workflow builder (visual programming)
- [ ] Marketplace for custom agents
- [ ] Windows and macOS VM support
- [ ] Mobile app (iOS/Android)
- [ ] Plugin system for custom tools
- [ ] Collaborative agent sessions
- [ ] Advanced analytics dashboard
- [ ] Enterprise SSO support
- [ ] Self-hosted cloud deployment guides
- [ ] Voice control integration
- [ ] Video understanding capabilities
- [ ] Agent memory and learning
- [ ] Multi-modal agent interactions
- [ ] Community agent templates
Vote on features: Feature Requests
| Metric | Value |
|---|---|
| Average Task Completion | ~45 seconds |
| Concurrent Sessions | 50+ (per server) |
| Browser Navigation | ~2s per page |
| Tool Call Latency | <500ms |
| VM Startup Time | ~15 seconds |
| Memory per Session | ~2GB |
Benchmarks measured on: 4 CPU cores, 8GB RAM, SSD storage
Open Computer Use gives AI agents significant autonomy. Please use responsibly:
- β Do: Automate repetitive tasks, research, testing, content creation
- β Don't: Violate terms of service, spam, scrape without permission
- π Security: Never share credentials, use isolated environments
- π Compliance: Follow data protection laws (GDPR, CCPA, etc.)
- π€ Ethics: Respect website robots.txt and rate limits
Read our Responsible Use Guidelines for more details.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Apache License 2.0
Copyright (c) 2025 Open Computer Use Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Built with amazing open-source projects:
- Next.js - The React Framework
- FastAPI - Modern Python web framework
- Supabase - Open source Firebase alternative
- Vercel AI SDK - AI toolkit for TypeScript
- Radix UI - Unstyled, accessible components
- Anthropic - Inspiration from Claude Computer Use
- Docker - Containerization platform
Special thanks to all our contributors! π
- π¬ Discord: Join our community server
- π¦ Twitter: Follow @llmhub_dev
- π§ Email: [email protected]
- π Issues: GitHub Issues
- π‘ Discussions: GitHub Discussions
Made with β€οΈ by the Open Computer Use community
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for open-computer-use
Similar Open Source Tools
open-computer-use
Open Computer Use is an open-source platform that enables AI agents to control computers through browser automation, terminal access, and desktop interaction. It is designed for developers to create autonomous AI workflows. The platform allows agents to browse the web, run terminal commands, control desktop applications, orchestrate multi-agents, stream execution, and is 100% open-source and self-hostable. It provides capabilities similar to Anthropic's Claude Computer Use but is fully open-source and extensible.
Shannon
Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.
helix
HelixML is a private GenAI platform that allows users to deploy the best of open AI in their own data center or VPC while retaining complete data security and control. It includes support for fine-tuning models with drag-and-drop functionality. HelixML brings the best of open source AI to businesses in an ergonomic and scalable way, optimizing the tradeoff between GPU memory and latency.
vibium
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
vllm-mlx
vLLM-MLX is a tool that brings native Apple Silicon GPU acceleration to vLLM by integrating Apple's ML framework with unified memory and Metal kernels. It offers optimized LLM inference with KV cache and quantization, vision-language models for multimodal inference, speech-to-text and text-to-speech with native voices, text embeddings for semantic search and RAG, and more. Users can benefit from features like multimodal support for text, image, video, and audio, native GPU acceleration on Apple Silicon, compatibility with OpenAI API, Anthropic Messages API, reasoning models extraction, integration with external tools via Model Context Protocol, memory-efficient caching, and high throughput for multiple concurrent users.
aiohomematic
AIO Homematic (hahomematic) is a lightweight Python 3 library for controlling and monitoring HomeMatic and HomematicIP devices, with support for third-party devices/gateways. It automatically creates entities for device parameters, offers custom entity classes for complex behavior, and includes features like caching paramsets for faster restarts. Designed to integrate with Home Assistant, it requires specific firmware versions for HomematicIP devices. The public API is defined in modules like central, client, model, exceptions, and const, with example usage provided. Useful links include changelog, data point definitions, troubleshooting, and developer resources for architecture, data flow, model extension, and Home Assistant lifecycle.
solo-server
Solo Server is a lightweight server designed for managing hardware-aware inference. It provides seamless setup through a simple CLI and HTTP servers, an open model registry for pulling models from platforms like Ollama and Hugging Face, cross-platform compatibility for effortless deployment of AI models on hardware, and a configurable framework that auto-detects hardware components (CPU, GPU, RAM) and sets optimal configurations.
mesh
MCP Mesh is an open-source control plane for MCP traffic that provides a unified layer for authentication, routing, and observability. It replaces multiple integrations with a single production endpoint, simplifying configuration management. Built for multi-tenant organizations, it offers workspace/project scoping for policies, credentials, and logs. With core capabilities like MeshContext, AccessControl, and OpenTelemetry, it ensures fine-grained RBAC, full tracing, and metrics for tools and workflows. Users can define tools with input/output validation, access control checks, audit logging, and OpenTelemetry traces. The project structure includes apps for full-stack MCP Mesh, encryption, observability, and more, with deployment options ranging from Docker to Kubernetes. The tech stack includes Bun/Node runtime, TypeScript, Hono API, React, Kysely ORM, and Better Auth for OAuth and API keys.
AgentX
AgentX is a next-generation open-source AI agent development framework and runtime platform. It provides an event-driven runtime with a simple framework and minimal UI. The platform is ready-to-use and offers features like multi-user support, session persistence, real-time streaming, and Docker readiness. Users can build AI Agent applications with event-driven architecture using TypeScript for server-side (Node.js) and client-side (Browser/React) development. AgentX also includes comprehensive documentation, core concepts, guides, API references, and various packages for different functionalities. The architecture follows an event-driven design with layered components for server-side and client-side interactions.
pilot
Pilot is an AI tool designed to streamline the process of handling tickets from GitHub, Linear, Jira, or Asana. It plans the implementation, writes the code, runs tests, and opens a PR for you to review and merge. With features like Autopilot, Epic Decomposition, Self-Review, and more, Pilot aims to automate the ticket handling process and reduce the time spent on prioritizing and completing tasks. It integrates with various platforms, offers intelligence features, and provides real-time visibility through a dashboard. Pilot is free to use, with costs associated with Claude API usage. It is designed for bug fixes, small features, refactoring, tests, docs, and dependency updates, but may not be suitable for large architectural changes or security-critical code.
memsearch
Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.
giztoy
Giztoy is a multi-language framework designed for building AI toys and intelligent applications. It provides a unified abstraction layer that spans from resource-constrained embedded systems to powerful cloud services. With features like native support for ESP32 and other MCUs, cross-platform app development, a unified build system with Bazel, an agent framework for AI agents, audio processing capabilities, support for various Large Language Models, real-time models with WebSocket streaming, secure transport protocols, and multi-language implementations in Go, Rust, Zig, and C/C++, Giztoy serves as a versatile tool for developing AI-powered applications across different platforms and devices.
boxlite
BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.
MediCareAI
MediCareAI is an intelligent disease management system powered by AI, designed for patient follow-up and disease tracking. It integrates medical guidelines, AI-powered diagnosis, and document processing to provide comprehensive healthcare support. The system includes features like user authentication, patient management, AI diagnosis, document processing, medical records management, knowledge base system, doctor collaboration platform, and admin system. It ensures privacy protection through automatic PII detection and cleaning for document sharing.
claudex
Claudex is an open-source, self-hosted Claude Code UI that runs entirely on your machine. It provides multiple sandboxes, allows users to use their own plans, offers a full IDE experience with VS Code in the browser, and is extensible with skills, agents, slash commands, and MCP servers. Users can run AI agents in isolated environments, view and interact with a browser via VNC, switch between multiple AI providers, automate tasks with Celery workers, and enjoy various chat features and preview capabilities. Claudex also supports marketplace plugins, secrets management, integrations like Gmail, and custom instructions. The tool is configured through providers and supports various providers like Anthropic, OpenAI, OpenRouter, and Custom. It has a tech stack consisting of React, FastAPI, Python, PostgreSQL, Celery, Redis, and more.
Zen-Ai-Pentest
Zen-AI-Pentest is a professional AI-powered penetration testing framework designed for security professionals, bug bounty hunters, and enterprise security teams. It combines cutting-edge language models with 20+ integrated security tools, offering comprehensive security assessments. The framework is security-first with multiple safety controls, extensible with a plugin system, cloud-native for deployment on AWS, Azure, or GCP, and production-ready with CI/CD, monitoring, and support. It features autonomous AI agents, risk analysis, exploit validation, benchmarking, CI/CD integration, AI persona system, subdomain scanning, and multi-cloud & virtualization support.
For similar tasks
crawlee
Crawlee is a web scraping and browser automation library that helps you build reliable scrapers quickly. Your crawlers will appear human-like and fly under the radar of modern bot protections even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data, and store it to disk or cloud while staying configurable to suit your project's needs.
rpaframework
RPA Framework is an open-source collection of libraries and tools for Robotic Process Automation (RPA), designed to be used with Robot Framework and Python. It offers well-documented core libraries for Software Robot Developers, optimized for Robocorp Control Room and Developer Tools, and accepts external contributions. The project includes various libraries for tasks like archiving, browser automation, date/time manipulations, cloud services integration, encryption operations, database interactions, desktop automation, document processing, email operations, Excel manipulation, file system operations, FTP interactions, web API interactions, image manipulation, AI services, and more. The development of the repository is Python-based and requires Python version 3.8+, with tooling based on poetry and invoke for compiling, building, and running the package. The project is licensed under the Apache License 2.0.
apify-mcp-server
The Apify MCP Server enables AI agents to extract data from various websites using ready-made scrapers and automation tools. It supports OAuth for easy connection from clients like Claude.ai or Visual Studio Code. The server also supports Skyfire agentic payments for AI agents to pay for Actor runs without an API token. Compatible with various clients adhering to the Model Context Protocol, it allows dynamic tool discovery and interaction with Apify Actors. The server provides tools for interacting with Apify Actors, dynamic tool discovery, and telemetry data collection. It offers a set of example prompts and resources for users to explore and interact with Apify through MCP.
open-computer-use
Open Computer Use is an open-source platform that enables AI agents to control computers through browser automation, terminal access, and desktop interaction. It is designed for developers to create autonomous AI workflows. The platform allows agents to browse the web, run terminal commands, control desktop applications, orchestrate multi-agents, stream execution, and is 100% open-source and self-hostable. It provides capabilities similar to Anthropic's Claude Computer Use but is fully open-source and extensible.
dbt-airflow
A Python package that helps Data and Analytics engineers render dbt projects in Apache Airflow DAGs. It enables teams to automatically render their dbt projects in a granular level, creating individual Airflow tasks for every model, seed, snapshot, and test within the dbt project. This allows for full control at the task-level, improving visibility and management of data models within the team.
blades
Blades is a multimodal AI Agent framework in Go, supporting custom models, tools, memory, middleware, and more. It is well-suited for multi-turn conversations, chain reasoning, and structured output. The framework provides core components like Agent, Prompt, Chain, ModelProvider, Tool, Memory, and Middleware, enabling developers to build intelligent applications with flexible configuration and high extensibility. Blades leverages the characteristics of Go to achieve high decoupling and efficiency, making it easy to integrate different language model services and external tools. The project is in its early stages, inviting Go developers and AI enthusiasts to contribute and explore the possibilities of building AI applications in Go.
flyte-sdk
Flyte 2 SDK is a pure Python tool for type-safe, distributed orchestration of agents, ML pipelines, and more. It allows users to write data pipelines, ML training jobs, and distributed compute in Python without any DSL constraints. With features like async-first parallelism and fine-grained observability, Flyte 2 offers a seamless workflow experience. Users can leverage core concepts like TaskEnvironments for container configuration, pure Python workflows for flexibility, and async parallelism for distributed execution. Advanced features include sub-task observability with tracing and remote task execution. The tool also provides native Jupyter integration for running and monitoring workflows directly from notebooks. Configuration and deployment are made easy with configuration files and commands for deploying and running workflows. Flyte 2 is licensed under the Apache 2.0 License.
kilocode
Kilo Code is an open-source VS Code AI agent that allows users to generate code from natural language, check its own work, run terminal commands, automate the browser, and utilize the latest AI models. It offers features like task automation, automated refactoring, and integration with MCP servers. Users can access 400+ AI models and benefit from transparent pricing. Kilo Code is a fork of Roo Code and Cline, with improvements and unique features developed independently.
For similar jobs
promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".
leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.
carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

