
desktop
E2B Desktop Sandbox for LLMs. E2B Sandbox with desktop graphical environment that you can connect to any LLM for secure computer use.
Stars: 1097

E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.
README:
E2B Desktop Sandbox is an open source secure virtual desktop ready for Computer Use. Powered by E2B.
Each sandbox is isolated from the others and can be customized with any dependencies you want.
SDK Examples
- Basic Examples:
- Streaming Desktop Applications:
- Computer use made with 100% open source LLMs.
- OpenAI Computer Use Agent using E2B's Desktop Sandbox. Runs as a Next.js app.
The E2B Desktop Sandbox is built on top of E2B Sandbox.
Sign up at E2B and get your API key.
Set environment variable E2B_API_KEY
with your API key.
Python
pip install e2b-desktop
JavaScript
npm install @e2b/desktop
Python
from e2b_desktop import Sandbox
# Create a new desktop sandbox
desktop = Sandbox.create()
# Launch an application
desktop.launch('google-chrome') # or vscode, firefox, etc.
# Wait 10s for the application to open
desktop.wait(10000)
# Stream the application's window
# Note: There can be only one stream at a time
# You need to stop the current stream before streaming another application
desktop.stream.start(
window_id=desktop.get_current_window_id(), # if not provided the whole desktop will be streamed
require_auth=True
)
# Get the stream auth key
auth_key = desktop.stream.get_auth_key()
# Print the stream URL
print('Stream URL:', desktop.stream.get_url(auth_key=auth_key))
# Kill the sandbox after the tasks are finished
# desktop.kill()
JavaScript
import { Sandbox } from '@e2b/desktop'
// Start a new desktop sandbox
const desktop = await Sandbox.create()
// Launch an application
await desktop.launch('google-chrome') // or vscode, firefox, etc.
// Wait 10s for the application to open
await desktop.wait(10000)
// Stream the application's window
// Note: There can be only one stream at a time
// You need to stop the current stream before streaming another application
await desktop.stream.start({
windowId: await desktop.getCurrentWindowId(), // if not provided the whole desktop will be streamed
requireAuth: true,
})
// Get the stream auth key
const authKey = desktop.stream.getAuthKey()
// Print the stream URL
console.log('Stream URL:', desktop.stream.getUrl({ authKey }))
// Kill the sandbox after the tasks are finished
// await desktop.kill()
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Start the stream
desktop.stream.start()
# Get stream URL
url = desktop.stream.get_url()
print(url)
# Get stream URL and disable user interaction
url = desktop.stream.get_url(view_only=True)
print(url)
# Stop the stream
desktop.stream.stop()
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Start the stream
await desktop.stream.start()
// Get stream URL
const url = desktop.stream.getUrl()
console.log(url)
// Get stream URL and disable user interaction
const url = desktop.stream.getUrl({ viewOnly: true })
console.log(url)
// Stop the stream
await desktop.stream.stop()
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Start the stream
desktop.stream.start(
require_auth=True # Require authentication with an auto-generated key
)
# Retrieve the authentication key
auth_key = desktop.stream.get_auth_key()
# Get stream URL
url = desktop.stream.get_url(auth_key=auth_key)
print(url)
# Stop the stream
desktop.stream.stop()
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Start the stream
await desktop.stream.start({
requireAuth: true, // Require authentication with an auto-generated key
})
// Retrieve the authentication key
const authKey = await desktop.stream.getAuthKey()
// Get stream URL
const url = desktop.stream.getUrl({ authKey })
console.log(url)
// Stop the stream
await desktop.stream.stop()
[!WARNING]
- Will raise an error if the desired application is not open yet
- The stream will close once the application closes
- Creating multiple streams at the same time is not supported, you may have to stop the current stream and start a new one for each application
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Get current (active) window ID
window_id = desktop.get_current_window_id()
# Get all windows of the application
window_ids = desktop.get_application_windows("Firefox")
# Start the stream
desktop.stream.start(window_id=window_ids[0])
# Stop the stream
desktop.stream.stop()
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Get current (active) window ID
const windowId = await desktop.getCurrentWindowId()
// Get all windows of the application
const windowIds = await desktop.getApplicationWindows('Firefox')
// Start the stream
await desktop.stream.start({ windowId: windowIds[0] })
// Stop the stream
await desktop.stream.stop()
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
desktop.double_click()
desktop.left_click()
desktop.left_click(x=100, y=200)
desktop.right_click()
desktop.right_click(x=100, y=200)
desktop.middle_click()
desktop.middle_click(x=100, y=200)
desktop.scroll(10) # Scroll by the amount. Positive for up, negative for down.
desktop.move_mouse(100, 200) # Move to x, y coordinates
desktop.drag((100, 100), (200, 200)) # Drag using the mouse
desktop.mouse_press("left") # Press the mouse button
desktop.mouse_release("left") # Release the mouse button
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
await desktop.doubleClick()
await desktop.leftClick()
await desktop.leftClick(100, 200)
await desktop.rightClick()
await desktop.rightClick(100, 200)
await desktop.middleClick()
await desktop.middleClick(100, 200)
await desktop.scroll(10) // Scroll by the amount. Positive for up, negative for down.
await desktop.moveMouse(100, 200) // Move to x, y coordinates
await desktop.drag([100, 100], [200, 200]) // Drag using the mouse
await desktop.mousePress('left') // Press the mouse button
await desktop.mouseRelease('left') // Release the mouse button
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Write text at the current cursor position with customizable typing speed
desktop.write("Hello, world!") # Default: chunk_size=25, delay_in_ms=75
desktop.write("Fast typing!", chunk_size=50, delay_in_ms=25) # Faster typing
# Press keys
desktop.press("enter")
desktop.press("space")
desktop.press("backspace")
desktop.press(["ctrl", "c"]) # Key combination
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Write text at the current cursor position with customizable typing speed
await desktop.write('Hello, world!')
await desktop.write('Fast typing!', { chunkSize: 50, delayInMs: 25 }) // Faster typing
// Press keys
await desktop.press('enter')
await desktop.press('space')
await desktop.press('backspace')
await desktop.press(['ctrl', 'c']) // Key combination
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Get current (active) window ID
window_id = desktop.get_current_window_id()
# Get all windows of the application
window_ids = desktop.get_application_windows("Firefox")
# Get window title
title = desktop.get_window_title(window_id)
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Get current (active) window ID
const windowId = await desktop.getCurrentWindowId()
// Get all windows of the application
const windowIds = await desktop.getApplicationWindows('Firefox')
// Get window title
const title = await desktop.getWindowTitle(windowId)
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Take a screenshot and save it as "screenshot.png" locally
image = desktop.screenshot()
# Save the image to a file
with open("screenshot.png", "wb") as f:
f.write(image)
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
const image = await desktop.screenshot()
// Save the image to a file
fs.writeFileSync('screenshot.png', image)
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Open file with default application
desktop.files.write("/home/user/index.js", "console.log('hello')") # First create the file
desktop.open("/home/user/index.js") # Then open it
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Open file with default application
await desktop.files.write('/home/user/index.js', "console.log('hello')") // First create the file
await desktop.open('/home/user/index.js') // Then open it
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Launch the application
desktop.launch('google-chrome')
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Launch the application
await desktop.launch('google-chrome')
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Run any bash command
out = desktop.commands.run("ls -la /home/user")
print(out)
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Run any bash command
const out = await desktop.commands.run('ls -la /home/user')
console.log(out)
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
desktop.wait(1000) # Wait for 1 second
JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
await desktop.wait(1000) // Wait for 1 second
The desktop-like environment is based on Linux and Xfce at the moment. We chose Xfce because it's a fast and lightweight environment that's also popular and actively supported. However, this Sandbox template is fully customizable and you can create your own desktop environment. Check out the sandbox template's code here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for desktop
Similar Open Source Tools

desktop
E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.

aiotdlib
aiotdlib is a Python asyncio Telegram client based on TDLib. It provides automatic generation of types and functions from tl schema, validation, good IDE type hinting, and high-level API methods for simpler work with tdlib. The package includes prebuilt TDLib binaries for macOS (arm64) and Debian Bullseye (amd64). Users can use their own binary by passing `library_path` argument to `Client` class constructor. Compatibility with other versions of the library is not guaranteed. The tool requires Python 3.9+ and users need to get their `api_id` and `api_hash` from Telegram docs for installation and usage.

docutranslate
Docutranslate is a versatile tool for translating documents efficiently. It supports multiple file formats and languages, making it ideal for businesses and individuals needing quick and accurate translations. The tool uses advanced algorithms to ensure high-quality translations while maintaining the original document's formatting. With its user-friendly interface, Docutranslate simplifies the translation process and saves time for users. Whether you need to translate legal documents, technical manuals, or personal letters, Docutranslate is the go-to solution for all your document translation needs.

rust-genai
genai is a multi-AI providers library for Rust that aims to provide a common and ergonomic single API to various generative AI providers such as OpenAI, Anthropic, Cohere, Ollama, and Gemini. It focuses on standardizing chat completion APIs across major AI services, prioritizing ergonomics and commonality. The library initially focuses on text chat APIs and plans to expand to support images, function calling, and more in the future versions. Version 0.1.x will have breaking changes in patches, while version 0.2.x will follow semver more strictly. genai does not provide a full representation of a given AI provider but aims to simplify the differences at a lower layer for ease of use.

generative-ai-python
The Google AI Python SDK is the easiest way for Python developers to build with the Gemini API. The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, and code.

llm-scraper
LLM Scraper is a TypeScript library that allows you to convert any webpages into structured data using LLMs. It supports Local (GGUF), OpenAI, Groq chat models, and schemas defined with Zod. With full type-safety in TypeScript and based on the Playwright framework, it offers streaming when crawling multiple pages and supports four input modes: html, markdown, text, and image.

openrouter-kit
OpenRouter Kit is a powerful TypeScript/JavaScript library for interacting with the OpenRouter API. It simplifies working with LLMs by providing a high-level API for chats, dialogue history management, tool calls with error handling, security module, and cost tracking. Ideal for building chatbots, AI agents, and integrating LLMs into applications.

react-native-rag
React Native RAG is a library that enables private, local RAGs to supercharge LLMs with a custom knowledge base. It offers modular and extensible components like `LLM`, `Embeddings`, `VectorStore`, and `TextSplitter`, with multiple integration options. The library supports on-device inference, vector store persistence, and semantic search implementation. Users can easily generate text responses, manage documents, and utilize custom components for advanced use cases.

acte
Acte is a framework designed to build GUI-like tools for AI Agents. It aims to address the issues of cognitive load and freedom degrees when interacting with multiple APIs in complex scenarios. By providing a graphical user interface (GUI) for Agents, Acte helps reduce cognitive load and constraints interaction, similar to how humans interact with computers through GUIs. The tool offers APIs for starting new sessions, executing actions, and displaying screens, accessible via HTTP requests or the SessionManager class.

mediapipe-rs
MediaPipe-rs is a Rust library designed for MediaPipe tasks on WasmEdge WASI-NN. It offers easy-to-use low-code APIs similar to mediapipe-python, with low overhead and flexibility for custom media input. The library supports various tasks like object detection, image classification, gesture recognition, and more, including TfLite models, TF Hub models, and custom models. Users can create task instances, run sessions for pre-processing, inference, and post-processing, and speed up processing by reusing sessions. The library also provides support for audio tasks using audio data from symphonia, ffmpeg, or raw audio. Users can choose between CPU, GPU, or TPU devices for processing.

markdrop
Markdrop is a Python package that facilitates the conversion of PDFs to markdown format while extracting images and tables. It also generates descriptive text descriptions for extracted tables and images using various LLM clients. The tool offers additional functionalities such as PDF URL support, AI-powered image and table descriptions, interactive HTML output with downloadable Excel tables, customizable image resolution and UI elements, and a comprehensive logging system. Markdrop aims to simplify the process of handling PDF documents and enhancing their content with AI-generated descriptions.

openai-scala-client
This is a no-nonsense async Scala client for OpenAI API supporting all the available endpoints and params including streaming, chat completion, vision, and voice routines. It provides a single service called OpenAIService that supports various calls such as Models, Completions, Chat Completions, Edits, Images, Embeddings, Batches, Audio, Files, Fine-tunes, Moderations, Assistants, Threads, Thread Messages, Runs, Run Steps, Vector Stores, Vector Store Files, and Vector Store File Batches. The library aims to be self-contained with minimal dependencies and supports API-compatible providers like Azure OpenAI, Azure AI, Anthropic, Google Vertex AI, Groq, Grok, Fireworks AI, OctoAI, TogetherAI, Cerebras, Mistral, Deepseek, Ollama, FastChat, and more.

aiocryptopay
The aiocryptopay repository is an asynchronous API wrapper for interacting with the @cryptobot and @CryptoTestnetBot APIs. It provides methods for creating, getting, and deleting invoices and checks, as well as handling webhooks for invoice payments. Users can easily integrate this tool into their applications to manage cryptocurrency payments and transactions.

aiotieba
Aiotieba is an asynchronous Python library for interacting with the Tieba API. It provides a comprehensive set of features for working with Tieba, including support for authentication, thread and post management, and image and file uploading. Aiotieba is well-documented and easy to use, making it a great choice for developers who want to build applications that interact with Tieba.

flutter_gemma
Flutter Gemma is a family of lightweight, state-of-the art open models that bring the power of Google's Gemma language models directly to Flutter applications. It allows for local execution on user devices, supports both iOS and Android platforms, and offers LoRA support for tailored AI behavior. The tool provides a simple interface for integrating Gemma models into Flutter projects, enabling advanced AI capabilities without relying on external servers. Users can easily download pre-trained Gemma models, fine-tune them for specific use cases, and customize behavior using LoRA weights. The tool supports model and LoRA weight management, model initialization, response generation, and chat scenarios, with considerations for model size, LoRA weights, and production app deployment.

obsei
Obsei is an open-source, low-code, AI powered automation tool that consists of an Observer to collect unstructured data from various sources, an Analyzer to analyze the collected data with various AI tasks, and an Informer to send analyzed data to various destinations. The tool is suitable for scheduled jobs or serverless applications as all Observers can store their state in databases. Obsei is still in alpha stage, so caution is advised when using it in production. The tool can be used for social listening, alerting/notification, automatic customer issue creation, extraction of deeper insights from feedbacks, market research, dataset creation for various AI tasks, and more based on creativity.
For similar tasks

desktop
E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.

Warp
Warp is a blazingly-fast modern Rust based GPU-accelerated terminal built to make you and your team more productive. It is available for macOS and Linux users, with plans to support Windows and the Web (WASM) in the future. Warp has a community search page where you can find solutions to common issues, and you can file issue requests in the repo if you can't find a solution. Warp is open-source, and the team is planning to first open-source their Rust UI framework, and then parts and potentially all of their client codebase.

llm-code-interpreter
The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.

rosa
ROSA is an AI Agent designed to interact with ROS-based robotics systems using natural language queries. It can generate system reports, read and parse ROS log files, adapt to new robots, and run various ROS commands using natural language. The tool is versatile for robotics research and development, providing an easy way to interact with robots and the ROS environment.

llm2sh
llm2sh is a command-line utility that leverages Large Language Models (LLMs) to translate plain-language requests into shell commands. It provides a convenient way to interact with your system using natural language. The tool supports multiple LLMs for command generation, offers a customizable configuration file, YOLO mode for running commands without confirmation, and is easily extensible with new LLMs and system prompts. Users can set up API keys for OpenAI, Claude, Groq, and Cerebras to use the tool effectively. llm2sh does not store user data or command history, and it does not record or send telemetry by itself, but the LLM APIs may collect and store requests and responses for their purposes.

ai-terminal
AI Terminal is a Tauri + Angular terminal application with integrated AI capabilities, offering natural language command interpretation, an integrated AI assistant, command history and auto-completion, and cross-platform support (macOS, Windows, Linux). The modern UI is built with Tauri and Angular. The tool requires Node.js 18+, Rust and Cargo, and Ollama for AI features. Users can build a universal binary for macOS, install the tool using Homebrew, and use Ollama to download specific models. Contributions are welcome under the MIT License.

DesktopCommanderMCP
Desktop Commander MCP is a server that allows the Claude desktop app to execute long-running terminal commands on your computer and manage processes through Model Context Protocol (MCP). It is built on top of MCP Filesystem Server to provide additional search and replace file editing capabilities. The tool enables users to execute terminal commands with output streaming, manage processes, perform full filesystem operations, and edit code with surgical text replacements or full file rewrites. It also supports vscode-ripgrep based recursive code or text search in folders.

SWE-ReX
SWE-ReX is a runtime interface for interacting with sandboxed shell environments, allowing AI agents to run any command on any environment. It enables agents to interact with running shell sessions, use interactive command line tools, and manage multiple shell sessions in parallel. SWE-ReX simplifies agent development and evaluation by abstracting infrastructure concerns, supporting fast parallel runs on various platforms, and disentangling agent logic from infrastructure.
For similar jobs

AirGo
AirGo is a front and rear end separation, multi user, multi protocol proxy service management system, simple and easy to use. It supports vless, vmess, shadowsocks, and hysteria2.

n8n-docs
n8n is an extendable workflow automation tool that enables you to connect anything to everything. It is open-source and can be self-hosted or used as a service. n8n provides a visual interface for creating workflows, which can be used to automate tasks such as data integration, data transformation, and data analysis. n8n also includes a library of pre-built nodes that can be used to connect to a variety of applications and services. This makes it easy to create complex workflows without having to write any code.

Winpilot
Winpilot is a tool that helps you remove bloatware, optimize your system, and improve your privacy. It has a hybrid web app foundation that allows you to remove AI features in Windows and provides you with access to various system information and settings. Winpilot can also be used to install and uninstall apps, change various settings, and access third-party plugins and scripts.

vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

AirBattery
AirBattery is a tool for Mac that allows users to monitor the battery levels of all their connected devices, such as iPhone, iPad, and Apple Watch, and display this information in the Dock, menu bar, or widgets. It automatically detects devices that support wireless battery monitoring and provides a seamless user experience without the need for manual configuration. Users can customize the display settings, hide specific devices, and easily manage their battery information. The tool requires macOS 11.0 or higher and offers a convenient way to keep track of multiple device battery levels from a single interface.

tlm
tlm is a local CLI copilot tool powered by CodeLLaMa, providing efficient command line suggestions without the need for an API key or internet connection. It works on macOS, Linux, and Windows, with automatic shell detection for Powershell, Bash, and Zsh. The tool offers one-liner generation and command explanation, and can be installed via an installation script or using Go Install. Ollama is required to download necessary models, and the tool can be easily deployed and configured. Contributors are welcome to enhance the tool's functionality.

Open-Interface
Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.

AIDA64CRCK
AIDA64CRCK is a tool designed for Windows users to access the latest version for free. It provides users with comprehensive system information and diagnostics to optimize their computer performance. The tool is user-friendly and offers detailed insights into hardware components, software configurations, and system stability. With AIDA64CRCK, users can easily monitor their system health and troubleshoot any issues that may arise, making it a valuable utility for both casual users and tech enthusiasts.