UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

Stars: 19020

Visit

UI-TARS-desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model. It offers features such as natural language control powered by Vision-Language Model, screenshot and visual recognition support, precise mouse and keyboard control, cross-platform support (Windows/MacOS/Browser), real-time feedback and status display, and private and secure fully local processing. The application aims to enhance the user's computer experience, introduce new browser operation features, and support the advanced UI-TARS-1.5 model for improved performance and precise control.

README:

Introduction

English | 简体中文

TARS^* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:

Agent TARS	UI-TARS-desktop

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.	UI-TARS Desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model. It primarily ships a local and remote computer as well as browser operators.

News
Agent TARS
UI-TARS Desktop
Contributing
License
Citation

News

[2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
[2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
[2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
[2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
[2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Agent TARS

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.

It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

Showcase

Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline

https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8

Booking Hotel	Generate Chart with extra MCP Servers

Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me	Instruction: Draw me a chart of Hangzhou's weather for one month

For more use cases, please check out #842.

Core Features

🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

# Luanch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Documentation

🌟 Explore Agent TARS Universe 🌟

Category	Resource Link	Description
🏠 Central Hub		Your gateway to Agent TARS ecosystem
📚 Quick Start		Zero to hero in 5 minutes
🚀 What's New		Discover cutting-edge features & vision
🛠️ Developer Zone		Master every command & features
🎯 Showcase		View use cases built by the official and community
🔧 Reference		Complete technical reference

UI-TARS Desktop

UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.

&nbsp&nbsp 📑 Paper &nbsp&nbsp | 🤗 Hugging Face Models&nbsp&nbsp | &nbsp&nbsp🫨 Discord&nbsp&nbsp | &nbsp&nbsp🤖 ModelScope&nbsp&nbsp
🖥️ Desktop Application &nbsp&nbsp | &nbsp&nbsp 👓 Midscene (use in browser) &nbsp&nbsp

Showcase

Instruction	Local Operator	Remote Operator
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?

Features

🤖 Natural language control powered by Vision-Language Model
🖥️ Screenshot and visual recognition support
🎯 Precise mouse and keyboard control
💻 Cross-platform support (Windows/MacOS/Browser)
🔄 Real-time feedback and status display
🔐 Private and secure - fully local processing

Quick Start

See Quick Start

Contributing

See CONTRIBUTING.md.

License

This project is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

For Tasks:

Click tags to check more tools for each tasks

control browsers generate chart book flights book hotels compile transportation guide

For Jobs:

ai engineer software developer automation specialist data scientist product manager

Alternative AI tools for UI-TARS-desktop

Similar Open Source Tools

UI-TARS-desktop

github

: 19.0k

pyspur

PySpur is a graph-based editor designed for LLM (Large Language Models) workflows. It offers modular building blocks, node-level debugging, and performance evaluation. The tool is easy to hack, supports JSON configs for workflow graphs, and is lightweight with minimal dependencies. Users can quickly set up PySpur by cloning the repository, creating a .env file, starting docker services, and accessing the portal. PySpur can also work with local models served using Ollama, with steps provided for configuration. The roadmap includes features like canvas, async/batch execution, support for Ollama, new nodes, pipeline optimization, templates, code compilation, multimodal support, and more.

github

: 4.2k

big-AGI

big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.

github

: 6.3k

deepchat

DeepChat is a versatile chat tool that supports multiple model cloud services and local model deployment. It offers multi-channel chat concurrency support, platform compatibility, complete Markdown rendering, and easy usability with a comprehensive guide. The tool aims to enhance chat experiences by leveraging various AI models and ensuring efficient conversation management.

github

: 4.1k

fastRAG

fastRAG is a research framework designed to build and explore efficient retrieval-augmented generative models. It incorporates state-of-the-art Large Language Models (LLMs) and Information Retrieval to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation. The framework is optimized for Intel hardware, customizable, and includes key features such as optimized RAG pipelines, efficient components, and RAG-efficient components like ColBERT and Fusion-in-Decoder (FiD). fastRAG supports various unique components and backends for running LLMs, making it a versatile tool for research and development in the field of retrieval-augmented generation.

github

: 1.3k

RAG-Retrieval

RAG-Retrieval is an end-to-end code repository that provides training, inference, and distillation capabilities for the RAG retrieval model. It supports fine-tuning of various open-source RAG retrieval models, including embedding models, late interactive models, and reranker models. The repository offers a lightweight Python library for calling different RAG ranking models and allows distillation of LLM-based reranker models into bert-based reranker models. It includes features such as support for end-to-end fine-tuning, distillation of large models, advanced algorithms like MRL, multi-GPU training strategy, and a simple code structure for easy modifications.

github

: 755

RAG-Retrieval

RAG-Retrieval provides full-chain RAG retrieval fine-tuning and inference code. It supports fine-tuning any open-source RAG retrieval models, including vector (embedding, graph a), delayed interactive models (ColBERT, graph d), interactive models (cross encoder, graph c). For inference, RAG-Retrieval focuses on ranking (reranker) and has developed a lightweight Python library rag-retrieval, providing a unified way to call any different RAG ranking models.

github

: 655

Kiln

Kiln is an intuitive tool for fine-tuning LLM models, generating synthetic data, and collaborating on datasets. It offers desktop apps for Windows, MacOS, and Linux, zero-code fine-tuning for various models, interactive data generation, and Git-based version control. Users can easily collaborate with QA, PM, and subject matter experts, generate auto-prompts, and work with a wide range of models and providers. The tool is open-source, privacy-first, and supports structured data tasks in JSON format. Kiln is free to use and helps build high-quality AI products with datasets, facilitates collaboration between technical and non-technical teams, allows comparison of models and techniques without code, ensures structured data integrity, and prioritizes user privacy.

github

: 4.2k

kornia

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions.

github

: 9.9k

denser-retriever

Denser Retriever is an enterprise-grade AI retriever designed to streamline AI integration into applications, combining keyword-based searches, vector databases, and machine learning rerankers using xgboost. It provides state-of-the-art accuracy on MTEB Retrieval benchmarking and supports various heterogeneous retrievers for end-to-end applications like chatbots and semantic search.

github

: 119

llm-awq

AWQ (Activation-aware Weight Quantization) is a tool designed for efficient and accurate low-bit weight quantization (INT3/4) for Large Language Models (LLMs). It supports instruction-tuned models and multi-modal LMs, providing features such as AWQ search for accurate quantization, pre-computed AWQ model zoo for various LLMs, memory-efficient 4-bit linear in PyTorch, and efficient CUDA kernel implementation for fast inference. The tool enables users to run large models on resource-constrained edge platforms, delivering more efficient responses with LLM/VLM chatbots through 4-bit inference.

github

: 2.6k

assistant-ui

assistant-ui is a set of React components for AI chat, providing wide model provider support out of the box and the ability to integrate custom APIs. It includes integrations with Langchain, Vercel AI SDK, TailwindCSS, shadcn-ui, react-markdown, react-syntax-highlighter, React Hook Form, and more. The tool allows users to quickly create AI chat applications with pre-configured templates and easy setup steps.

github

: 6.5k

HuatuoGPT-o1

HuatuoGPT-o1 is a medical language model designed for advanced medical reasoning. It can identify mistakes, explore alternative strategies, and refine answers. The model leverages verifiable medical problems and a specialized medical verifier to guide complex reasoning trajectories and enhance reasoning through reinforcement learning. The repository provides access to models, data, and code for HuatuoGPT-o1, allowing users to deploy the model for medical reasoning tasks.

github

: 480

nexa-sdk

Nexa SDK is a comprehensive toolkit supporting ONNX and GGML models for text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. It offers an OpenAI-compatible API server with JSON schema mode and streaming support, along with a user-friendly Streamlit UI. Users can run Nexa SDK on any device with Python environment, with GPU acceleration supported. The toolkit provides model support, conversion engine, inference engine for various tasks, and differentiating features from other tools.

github

: 5.0k

shards

Shards is a high-performance, multi-platform, type-safe programming language designed for visual development. It is a dataflow visual programming language that enables building full-fledged apps and games without traditional coding. Shards features automatic type checking, optimized shard implementations for high performance, and an intuitive visual workflow for beginners. The language allows seamless round-trip engineering between code and visual models, empowering users to create multi-platform apps easily. Shards also powers an upcoming AI-powered game creation system, enabling real-time collaboration and game development in a low to no-code environment.

github

: 78

ai-marketplace-monitor

An intelligent tool that monitors Facebook Marketplace listings using AI to help users find the best deals. It provides instant notifications when items matching specific criteria are posted, along with AI-powered analysis of each listing. The tool offers smart search capabilities, AI-powered listing evaluation and recommendations, various notification options, support for multiple locations, and customizable search parameters. Users can configure the tool to search for specific products, filter by price and location, and receive notifications through different channels. The tool also supports AI service providers and offers a self-hosted model option.

github

: 66

For similar tasks

amadeus-python

Amadeus Python SDK provides a rich set of APIs for the travel industry. It allows users to make API calls for various travel-related tasks such as flight offers search, hotel bookings, trip purpose prediction, flight delay prediction, airport on-time performance, travel recommendations, and more. The SDK conveniently maps API paths to similar paths, making it easy to interact with the Amadeus APIs. Users can initialize the client with their API key and secret, make API calls, handle responses, and enable logging for debugging purposes. The SDK documentation includes detailed information about each SDK method, arguments, and return types.

github

: 212

temporal-ai-agent

Temporal AI Agent is a demo showcasing a multi-turn conversation with an AI agent running inside a Temporal workflow. The agent collects information towards a goal using a simple DSL input. It is currently set up to search for events, book flights around those events, and create an invoice for those flights. The AI agent responds with clarifications and prompts for missing information. Users can configure the agent to use ChatGPT 4o or a local LLM via Ollama. The tool requires Rapidapi key for sky-scrapper to find flights and a Stripe key for creating invoices. Users can customize the agent by modifying tool and goal definitions in the codebase.

github

: 64

UI-TARS-desktop

github

: 19.0k

amadeus-node

Amadeus Node SDK provides a rich set of APIs for the travel industry. It allows developers to interact with various endpoints related to flights, hotels, activities, and more. The SDK simplifies making API calls, handling promises, pagination, logging, and debugging. It supports a wide range of functionalities such as flight search, booking, seat maps, flight status, points of interest, hotel search, sentiment analysis, trip predictions, and more. Developers can easily integrate the SDK into their Node.js applications to access Amadeus APIs and build travel-related applications.

github

: 186

amadeus-java

Amadeus Java SDK provides a rich set of APIs for the travel industry, allowing developers to access various functionalities such as flight search, booking, airport information, and more. The SDK simplifies interaction with the Amadeus API by providing self-contained code examples and detailed documentation. Developers can easily make API calls, handle responses, and utilize features like pagination and logging. The SDK supports various endpoints for tasks like flight search, booking management, airport information retrieval, and travel analytics. It also offers functionalities for hotel search, booking, and sentiment analysis. Overall, the Amadeus Java SDK is a comprehensive tool for integrating Amadeus APIs into Java applications.

github

: 87

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

UI-TARS-desktop

README:

Introduction

Table of Contents

News

Agent TARS

Showcase

Core Features

Quick Start

Documentation

UI-TARS Desktop

Showcase

Features

Quick Start

Contributing

License

Citation

For Tasks:

For Jobs:

Alternative AI tools for UI-TARS-desktop

Similar Open Source Tools

UI-TARS-desktop

pyspur

big-AGI

deepchat

fastRAG

RAG-Retrieval

RAG-Retrieval

Kiln

kornia

denser-retriever

llm-awq

assistant-ui

HuatuoGPT-o1

nexa-sdk

shards

ai-marketplace-monitor

For similar tasks

amadeus-python

temporal-ai-agent

UI-TARS-desktop

amadeus-node

amadeus-java

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape