desktop
E2B Desktop Sandbox for LLMs. E2B Sandbox with desktop graphical environment that you can connect to any LLM for secure computer use.
Stars: 1249
E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.
README:
E2B Desktop Sandbox is an open source secure virtual desktop ready for Computer Use. Powered by E2B.
Each sandbox is isolated from the others and can be customized with any dependencies you want.
SDK Examples
- Basic Examples:
- Streaming Desktop Applications:
- Computer use made with 100% open source LLMs.
- OpenAI Computer Use Agent using E2B's Desktop Sandbox. Runs as a Next.js app.
The E2B Desktop Sandbox is built on top of E2B Sandbox.
Sign up at E2B and get your API key.
Set environment variable E2B_API_KEY with your API key.
Python
pip install e2b-desktopJavaScript
npm install @e2b/desktopPython
from e2b_desktop import Sandbox
# Create a new desktop sandbox
desktop = Sandbox.create()
# Launch an application
desktop.launch('google-chrome') # or vscode, firefox, etc.
# Wait 10s for the application to open
desktop.wait(10000)
# Stream the application's window
# Note: There can be only one stream at a time
# You need to stop the current stream before streaming another application
desktop.stream.start(
window_id=desktop.get_current_window_id(), # if not provided the whole desktop will be streamed
require_auth=True
)
# Get the stream auth key
auth_key = desktop.stream.get_auth_key()
# Print the stream URL
print('Stream URL:', desktop.stream.get_url(auth_key=auth_key))
# Kill the sandbox after the tasks are finished
# desktop.kill()JavaScript
import { Sandbox } from '@e2b/desktop'
// Start a new desktop sandbox
const desktop = await Sandbox.create()
// Launch an application
await desktop.launch('google-chrome') // or vscode, firefox, etc.
// Wait 10s for the application to open
await desktop.wait(10000)
// Stream the application's window
// Note: There can be only one stream at a time
// You need to stop the current stream before streaming another application
await desktop.stream.start({
windowId: await desktop.getCurrentWindowId(), // if not provided the whole desktop will be streamed
requireAuth: true,
})
// Get the stream auth key
const authKey = desktop.stream.getAuthKey()
// Print the stream URL
console.log('Stream URL:', desktop.stream.getUrl({ authKey }))
// Kill the sandbox after the tasks are finished
// await desktop.kill()Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Start the stream
desktop.stream.start()
# Get stream URL
url = desktop.stream.get_url()
print(url)
# Get stream URL and disable user interaction
url = desktop.stream.get_url(view_only=True)
print(url)
# Stop the stream
desktop.stream.stop()JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Start the stream
await desktop.stream.start()
// Get stream URL
const url = desktop.stream.getUrl()
console.log(url)
// Get stream URL and disable user interaction
const url = desktop.stream.getUrl({ viewOnly: true })
console.log(url)
// Stop the stream
await desktop.stream.stop()Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Start the stream
desktop.stream.start(
require_auth=True # Require authentication with an auto-generated key
)
# Retrieve the authentication key
auth_key = desktop.stream.get_auth_key()
# Get stream URL
url = desktop.stream.get_url(auth_key=auth_key)
print(url)
# Stop the stream
desktop.stream.stop()JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Start the stream
await desktop.stream.start({
requireAuth: true, // Require authentication with an auto-generated key
})
// Retrieve the authentication key
const authKey = await desktop.stream.getAuthKey()
// Get stream URL
const url = desktop.stream.getUrl({ authKey })
console.log(url)
// Stop the stream
await desktop.stream.stop()[!WARNING]
- Will raise an error if the desired application is not open yet
- The stream will close once the application closes
- Creating multiple streams at the same time is not supported, you may have to stop the current stream and start a new one for each application
Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Get current (active) window ID
window_id = desktop.get_current_window_id()
# Get all windows of the application
window_ids = desktop.get_application_windows("Firefox")
# Start the stream
desktop.stream.start(window_id=window_ids[0])
# Stop the stream
desktop.stream.stop()JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Get current (active) window ID
const windowId = await desktop.getCurrentWindowId()
// Get all windows of the application
const windowIds = await desktop.getApplicationWindows('Firefox')
// Start the stream
await desktop.stream.start({ windowId: windowIds[0] })
// Stop the stream
await desktop.stream.stop()Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
desktop.double_click()
desktop.left_click()
desktop.left_click(x=100, y=200)
desktop.right_click()
desktop.right_click(x=100, y=200)
desktop.middle_click()
desktop.middle_click(x=100, y=200)
desktop.scroll(10) # Scroll by the amount. Positive for up, negative for down.
desktop.move_mouse(100, 200) # Move to x, y coordinates
desktop.drag((100, 100), (200, 200)) # Drag using the mouse
desktop.mouse_press("left") # Press the mouse button
desktop.mouse_release("left") # Release the mouse buttonJavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
await desktop.doubleClick()
await desktop.leftClick()
await desktop.leftClick(100, 200)
await desktop.rightClick()
await desktop.rightClick(100, 200)
await desktop.middleClick()
await desktop.middleClick(100, 200)
await desktop.scroll(10) // Scroll by the amount. Positive for up, negative for down.
await desktop.moveMouse(100, 200) // Move to x, y coordinates
await desktop.drag([100, 100], [200, 200]) // Drag using the mouse
await desktop.mousePress('left') // Press the mouse button
await desktop.mouseRelease('left') // Release the mouse buttonPython
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Write text at the current cursor position with customizable typing speed
desktop.write("Hello, world!") # Default: chunk_size=25, delay_in_ms=75
desktop.write("Fast typing!", chunk_size=50, delay_in_ms=25) # Faster typing
# Press keys
desktop.press("enter")
desktop.press("space")
desktop.press("backspace")
desktop.press(["ctrl", "c"]) # Key combinationJavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Write text at the current cursor position with customizable typing speed
await desktop.write('Hello, world!')
await desktop.write('Fast typing!', { chunkSize: 50, delayInMs: 25 }) // Faster typing
// Press keys
await desktop.press('enter')
await desktop.press('space')
await desktop.press('backspace')
await desktop.press(['ctrl', 'c']) // Key combinationPython
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Get current (active) window ID
window_id = desktop.get_current_window_id()
# Get all windows of the application
window_ids = desktop.get_application_windows("Firefox")
# Get window title
title = desktop.get_window_title(window_id)JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Get current (active) window ID
const windowId = await desktop.getCurrentWindowId()
// Get all windows of the application
const windowIds = await desktop.getApplicationWindows('Firefox')
// Get window title
const title = await desktop.getWindowTitle(windowId)Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Take a screenshot and save it as "screenshot.png" locally
image = desktop.screenshot()
# Save the image to a file
with open("screenshot.png", "wb") as f:
f.write(image)JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
const image = await desktop.screenshot()
// Save the image to a file
fs.writeFileSync('screenshot.png', image)Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Open file with default application
desktop.files.write("/home/user/index.js", "console.log('hello')") # First create the file
desktop.open("/home/user/index.js") # Then open itJavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Open file with default application
await desktop.files.write('/home/user/index.js', "console.log('hello')") // First create the file
await desktop.open('/home/user/index.js') // Then open itPython
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Launch the application
desktop.launch('google-chrome')JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Launch the application
await desktop.launch('google-chrome')Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
# Run any bash command
out = desktop.commands.run("ls -la /home/user")
print(out)JavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
// Run any bash command
const out = await desktop.commands.run('ls -la /home/user')
console.log(out)Python
from e2b_desktop import Sandbox
desktop = Sandbox.create()
desktop.wait(1000) # Wait for 1 secondJavaScript
import { Sandbox } from '@e2b/desktop'
const desktop = await Sandbox.create()
await desktop.wait(1000) // Wait for 1 secondThe desktop-like environment is based on Linux and Xfce at the moment. We chose Xfce because it's a fast and lightweight environment that's also popular and actively supported. However, this Sandbox template is fully customizable and you can create your own desktop environment. Check out the sandbox template's code here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for desktop
Similar Open Source Tools
desktop
E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.
generative-ai-python
The Google AI Python SDK is the easiest way for Python developers to build with the Gemini API. The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, and code.
aiotdlib
aiotdlib is a Python asyncio Telegram client based on TDLib. It provides automatic generation of types and functions from tl schema, validation, good IDE type hinting, and high-level API methods for simpler work with tdlib. The package includes prebuilt TDLib binaries for macOS (arm64) and Debian Bullseye (amd64). Users can use their own binary by passing `library_path` argument to `Client` class constructor. Compatibility with other versions of the library is not guaranteed. The tool requires Python 3.9+ and users need to get their `api_id` and `api_hash` from Telegram docs for installation and usage.
js-genai
The Google Gen AI JavaScript SDK is an experimental SDK for TypeScript and JavaScript developers to build applications powered by Gemini. It supports both the Gemini Developer API and Vertex AI. The SDK is designed to work with Gemini 2.0 features. Users can access API features through the GoogleGenAI classes, which provide submodules for querying models, managing caches, creating chats, uploading files, and starting live sessions. The SDK also allows for function calling to interact with external systems. Users can find more samples in the GitHub samples directory.
llm-scraper
LLM Scraper is a TypeScript library that allows you to convert any webpages into structured data using LLMs. It supports Local (GGUF), OpenAI, Groq chat models, and schemas defined with Zod. With full type-safety in TypeScript and based on the Playwright framework, it offers streaming when crawling multiple pages and supports four input modes: html, markdown, text, and image.
java-genai
Java idiomatic SDK for the Gemini Developer APIs and Vertex AI APIs. The SDK provides a Client class for interacting with both APIs, allowing seamless switching between the 2 backends without code rewriting. It supports features like generating content, embedding content, generating images, upscaling images, editing images, and generating videos. The SDK also includes options for setting API versions, HTTP request parameters, client behavior, and response schemas.
daytona
Daytona is a secure and elastic infrastructure tool designed for running AI-generated code. It offers lightning-fast infrastructure with sub-90ms sandbox creation, separated and isolated runtime for executing AI code with zero risk, massive parallelization for concurrent AI workflows, programmatic control through various APIs, unlimited sandbox persistence, and OCI/Docker compatibility. Users can create sandboxes using Python or TypeScript SDKs, run code securely inside the sandbox, and clean up the sandbox after execution. Daytona is open source under the GNU Affero General Public License and welcomes contributions from developers.
mediasoup-client-aiortc
mediasoup-client-aiortc is a handler for the aiortc Python library, allowing Node.js applications to connect to a mediasoup server using WebRTC for real-time audio, video, and DataChannel communication. It facilitates the creation of Worker instances to manage Python subprocesses, obtain audio/video tracks, and create mediasoup-client handlers. The tool supports features like getUserMedia, handlerFactory creation, and event handling for subprocess closure and unexpected termination. It provides custom classes for media stream and track constraints, enabling diverse audio/video sources like devices, files, or URLs. The tool enhances WebRTC capabilities in Node.js applications through seamless Python subprocess communication.
OpenAI-DotNet
OpenAI-DotNet is a simple C# .NET client library for OpenAI to use through their RESTful API. It is independently developed and not an official library affiliated with OpenAI. Users need an OpenAI API account to utilize this library. The library targets .NET 6.0 and above, working across various platforms like console apps, winforms, wpf, asp.net, etc., and on Windows, Linux, and Mac. It provides functionalities for authentication, interacting with models, assistants, threads, chat, audio, images, files, fine-tuning, embeddings, and moderations.
aiosonic
Aiosonic is a lightweight Python asyncio HTTP/WebSocket client that offers fast and efficient communication with HTTP/1.1, HTTP/2, and WebSocket protocols. It supports keepalive, connection pooling, multipart file uploads, chunked responses, timeouts, automatic decompression, redirect following, type annotations, WebSocket communication, HTTP proxy, cookie sessions, elegant cookies, and nearly 100% test coverage. It requires Python version 3.10 or higher for installation and provides a simple API for making HTTP requests and WebSocket connections. Additionally, it allows API wrapping for customizing response handling and includes a performance benchmark script for comparing its speed with other HTTP clients.
model.nvim
model.nvim is a tool designed for Neovim users who want to utilize AI models for completions or chat within their text editor. It allows users to build prompts programmatically with Lua, customize prompts, experiment with multiple providers, and use both hosted and local models. The tool supports features like provider agnosticism, programmatic prompts in Lua, async and multistep prompts, streaming completions, and chat functionality in 'mchat' filetype buffer. Users can customize prompts, manage responses, and context, and utilize various providers like OpenAI ChatGPT, Google PaLM, llama.cpp, ollama, and more. The tool also supports treesitter highlights and folds for chat buffers.
com.openai.unity
com.openai.unity is an OpenAI package for Unity that allows users to interact with OpenAI's API through RESTful requests. It is independently developed and not an official library affiliated with OpenAI. Users can fine-tune models, create assistants, chat completions, and more. The package requires Unity 2021.3 LTS or higher and can be installed via Unity Package Manager or Git URL. Various features like authentication, Azure OpenAI integration, model management, thread creation, chat completions, audio processing, image generation, file management, fine-tuning, batch processing, embeddings, and content moderation are available.
client
Gemini API PHP Client is a library that allows you to interact with Google's generative AI models, such as Gemini Pro and Gemini Pro Vision. It provides functionalities for basic text generation, multimodal input, chat sessions, streaming responses, tokens counting, listing models, and advanced usages like safety settings and custom HTTP client usage. The library requires an API key to access Google's Gemini API and can be installed using Composer. It supports various features like generating content, starting chat sessions, embedding content, counting tokens, and listing available models.
ElevenLabs-DotNet
ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.
aioshelly
Aioshelly is an asynchronous library designed to control Shelly devices. It is currently under development and requires Python version 3.11 or higher, along with dependencies like bluetooth-data-tools, aiohttp, and orjson. The library provides examples for interacting with Gen1 devices using CoAP protocol and Gen2/Gen3 devices using RPC and WebSocket protocols. Users can easily connect to Shelly devices, retrieve status information, and perform various actions through the provided APIs. The repository also includes example scripts for quick testing and usage guidelines for contributors to maintain consistency with the Shelly API.
mcp-ui
mcp-ui is a collection of SDKs that bring interactive web components to the Model Context Protocol (MCP). It allows servers to define reusable UI snippets, render them securely in the client, and react to their actions in the MCP host environment. The SDKs include @mcp-ui/server (TypeScript) for generating UI resources on the server, @mcp-ui/client (TypeScript) for rendering UI components on the client, and mcp_ui_server (Ruby) for generating UI resources in a Ruby environment. The project is an experimental community playground for MCP UI ideas, with rapid iteration and enhancements.
For similar tasks
desktop
E2B Desktop Sandbox is a secure virtual desktop environment powered by E2B, allowing users to create isolated sandboxes with customizable dependencies. It provides features such as streaming the desktop screen, mouse and keyboard control, taking screenshots, opening files, and running bash commands. The environment is based on Linux and Xfce, offering a fast and lightweight experience that can be fully customized to create unique desktop environments.
Warp
Warp is a blazingly-fast modern Rust based GPU-accelerated terminal built to make you and your team more productive. It is available for macOS and Linux users, with plans to support Windows and the Web (WASM) in the future. Warp has a community search page where you can find solutions to common issues, and you can file issue requests in the repo if you can't find a solution. Warp is open-source, and the team is planning to first open-source their Rust UI framework, and then parts and potentially all of their client codebase.
llm-code-interpreter
The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.
rosa
ROSA is an AI Agent designed to interact with ROS-based robotics systems using natural language queries. It can generate system reports, read and parse ROS log files, adapt to new robots, and run various ROS commands using natural language. The tool is versatile for robotics research and development, providing an easy way to interact with robots and the ROS environment.
llm2sh
llm2sh is a command-line utility that leverages Large Language Models (LLMs) to translate plain-language requests into shell commands. It provides a convenient way to interact with your system using natural language. The tool supports multiple LLMs for command generation, offers a customizable configuration file, YOLO mode for running commands without confirmation, and is easily extensible with new LLMs and system prompts. Users can set up API keys for OpenAI, Claude, Groq, and Cerebras to use the tool effectively. llm2sh does not store user data or command history, and it does not record or send telemetry by itself, but the LLM APIs may collect and store requests and responses for their purposes.
ai-terminal
AI Terminal is a Tauri + Angular terminal application with integrated AI capabilities, offering natural language command interpretation, an integrated AI assistant, command history and auto-completion, and cross-platform support (macOS, Windows, Linux). The modern UI is built with Tauri and Angular. The tool requires Node.js 18+, Rust and Cargo, and Ollama for AI features. Users can build a universal binary for macOS, install the tool using Homebrew, and use Ollama to download specific models. Contributions are welcome under the MIT License.
DesktopCommanderMCP
Desktop Commander MCP is a server that allows the Claude desktop app to execute long-running terminal commands on your computer and manage processes through Model Context Protocol (MCP). It is built on top of MCP Filesystem Server to provide additional search and replace file editing capabilities. The tool enables users to execute terminal commands with output streaming, manage processes, perform full filesystem operations, and edit code with surgical text replacements or full file rewrites. It also supports vscode-ripgrep based recursive code or text search in folders.
SWE-ReX
SWE-ReX is a runtime interface for interacting with sandboxed shell environments, allowing AI agents to run any command on any environment. It enables agents to interact with running shell sessions, use interactive command line tools, and manage multiple shell sessions in parallel. SWE-ReX simplifies agent development and evaluation by abstracting infrastructure concerns, supporting fast parallel runs on various platforms, and disentangling agent logic from infrastructure.
For similar jobs
AirGo
AirGo is a front and rear end separation, multi user, multi protocol proxy service management system, simple and easy to use. It supports vless, vmess, shadowsocks, and hysteria2.
n8n-docs
n8n is an extendable workflow automation tool that enables you to connect anything to everything. It is open-source and can be self-hosted or used as a service. n8n provides a visual interface for creating workflows, which can be used to automate tasks such as data integration, data transformation, and data analysis. n8n also includes a library of pre-built nodes that can be used to connect to a variety of applications and services. This makes it easy to create complex workflows without having to write any code.
Winpilot
Winpilot is a tool that helps you remove bloatware, optimize your system, and improve your privacy. It has a hybrid web app foundation that allows you to remove AI features in Windows and provides you with access to various system information and settings. Winpilot can also be used to install and uninstall apps, change various settings, and access third-party plugins and scripts.
vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.
AirBattery
AirBattery is a tool for Mac that allows users to monitor the battery levels of all their connected devices, such as iPhone, iPad, and Apple Watch, and display this information in the Dock, menu bar, or widgets. It automatically detects devices that support wireless battery monitoring and provides a seamless user experience without the need for manual configuration. Users can customize the display settings, hide specific devices, and easily manage their battery information. The tool requires macOS 11.0 or higher and offers a convenient way to keep track of multiple device battery levels from a single interface.
tlm
tlm is a local CLI copilot tool powered by CodeLLaMa, providing efficient command line suggestions without the need for an API key or internet connection. It works on macOS, Linux, and Windows, with automatic shell detection for Powershell, Bash, and Zsh. The tool offers one-liner generation and command explanation, and can be installed via an installation script or using Go Install. Ollama is required to download necessary models, and the tool can be easily deployed and configured. Contributors are welcome to enhance the tool's functionality.
Open-Interface
Open Interface is a self-driving software that automates computer tasks by sending user requests to a language model backend (e.g., GPT-4V) and simulating keyboard and mouse inputs to execute the steps. It course-corrects by sending current screenshots to the language models. The tool supports MacOS, Linux, and Windows, and requires setting up the OpenAI API key for access to GPT-4V. It can automate tasks like creating meal plans, setting up custom language model backends, and more. Open Interface is currently not efficient in accurate spatial reasoning, tracking itself in tabular contexts, and navigating complex GUI-rich applications. Future improvements aim to enhance the tool's capabilities with better models trained on video walkthroughs. The tool is cost-effective, with user requests priced between $0.05 - $0.20, and offers features like interrupting the app and primary display visibility in multi-monitor setups.
AIDA64CRCK
AIDA64CRCK is a tool designed for Windows users to access the latest version for free. It provides users with comprehensive system information and diagnostics to optimize their computer performance. The tool is user-friendly and offers detailed insights into hardware components, software configurations, and system stability. With AIDA64CRCK, users can easily monitor their system health and troubleshoot any issues that may arise, making it a valuable utility for both casual users and tech enthusiasts.
