
mobile-use
AI agents can now use real Android and iOS apps, just like a human.
Stars: 1666

Mobile-use is an open-source AI agent that controls Android or IOS devices using natural language. It understands commands to perform tasks like sending messages and navigating apps. Features include natural language control, UI-aware automation, data scraping, and extensibility. Users can automate their mobile experience by setting up environment variables, customizing LLM configurations, and launching the tool via Docker or manually for development. The tool supports physical Android phones, Android simulators, and iOS simulators. Contributions are welcome, and the project is licensed under MIT.
README:
Mobile-use is a powerful, open-source AI agent that controls your Android or IOS device using natural language. It understands your commands and interacts with the UI to perform tasks, from sending messages to navigating complex apps.
Mobile-use is quickly evolving. Your suggestions, ideas, and reported bugs will shape this project. Do not hesitate to join in the conversation on Discord or contribute directly, we will reply to everyone! ❤️
- 🗣️ Natural Language Control: Interact with your phone using your native language.
- 📱 UI-Aware Automation: Intelligently navigates through app interfaces (note: currently has limited effectiveness with games as they don't provide accessibility tree data).
- 📊 Data Scraping: Extract information from any app and structure it into your desired format (e.g., JSON) using a natural language description.
- 🔧 Extensible & Customizable: Easily configure different LLMs to power the agents that power mobile-use.
We are global number 1 Opensource pass@1 on the AndroidWorld benchmark.
More info here: https://minitap.ai/research/mobile-ai-agents-benchmark
The official leaderboard is available here
Ready to automate your mobile experience? Follow these steps to get mobile-use up and running.
-
Set up Environment Variables: Copy the example
.env.example
file to.env
and add your API keys.cp .env.example .env
-
(Optional) Customize LLM Configuration: To use different models or providers, create your own LLM configuration file.
cp llm-config.override.template.jsonc llm-config.override.jsonc
Then, edit
llm-config.override.jsonc
to fit your needs.You can also use local LLMs or any other openai-api compatible providers :
- Set
OPENAI_BASE_URL
andOPENAI_API_KEY
in your.env
- In your
llm-config.override.jsonc
, setopenai
as the provider for the agent nodes you want, and choose a model supported by your provider.
[!NOTE]
If you want to use Google Vertex AI, you must either:- Have credentials configured for your environment (gcloud, workload identity, etc…)
- Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable
More information: - Credential types - google.auth API reference
- Set
[!NOTE]
This quickstart, is only available for Android devices/emulators as of now, and you must have Docker installed.
First:
- Either plug your Android device and enable USB-debugging via the Developer Options
- Or launch an Android emulator
[!IMPORTANT]
At some point, the terminal will HANG, and Maestro will ask youMaestro CLI would like to collect anonymous usage data to improve the product.
It's up to you whether you accept (i.e enter 'Y') or not (i.e. enter 'n').
Then run in your terminal:
- For Linux/macOS:
chmod +x mobile-use.sh
bash ./mobile-use.sh \
"Open Gmail, find first 3 unread emails, and list their sender and subject line" \
--output-description "A JSON list of objects, each with 'sender' and 'subject' keys"
- For Windows (inside a Powershell terminal):
powershell.exe -ExecutionPolicy Bypass -File mobile-use.ps1 `
"Open Gmail, find first 3 unread emails, and list their sender and subject line" `
--output-description "A JSON list of objects, each with 'sender' and 'subject' keys"
[!NOTE]
If using your own device, make sure to accept the ADB-related connection requests that will pop up on your device. Similarly, Maestro will need to install its APK on your device, which will also require you to accept the installation request.
The script will try to connect to your device via IP.
Therefore, your device must be connected to the same Wi-Fi network as your computer.
If the script fails with the following message:
Could not get device IP. Is a device connected via USB and on the same Wi-Fi network?
Then it couldn't find one of the common Wi-Fi interfaces on your device.
Therefore, you must determine what WLAN interface your phone is using via adb shell ip addr show up
.
Then add the --interface <YOUR_INTERFACE_NAME>
option to the script.
This is most probably an issue with your firewall blocking the connection. Therefore there is no clear fix for this.
Since UV docker images rely on a ghcr.io
public repositories, you may have an expired token if you used ghcr.io
before for private repositories.
Try running docker logout ghcr.io
and then run the script again.
For developers who want to set up the environment manually:
Mobile-use currently supports the following devices:
- Physical Android Phones: Connect via USB with USB debugging enabled.
- Android Simulators: Set up through Android Studio.
- iOS Simulators: Supported for macOS users.
[!NOTE]
Physical iOS devices are not yet supported.
For Android:
- Android Debug Bridge (ADB): A tool to connect to your device.
For iOS:
- Xcode: Apple's IDE for iOS development.
Before you begin, ensure you have the following installed:
- uv: A lightning-fast Python package manager.
- Maestro: The framework we use to interact with your device.
-
Clone the repository:
git clone https://github.com/minitap-ai/mobile-use.git && cd mobile-use
-
Create & activate the virtual environment:
# This will create a .venv directory using the Python version in .python-version uv venv # Activate the environment # On macOS/Linux: source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
# Sync with the locked dependencies for a consistent setup uv sync
To run mobile-use, simply pass your command as an argument.
Example 1: Basic Command
python ./src/mobile_use/main.py "Go to settings and tell me my current battery level"
Example 2: Data Scraping
Extract specific information and get it back in a structured format. For instance, to get a list of your unread emails:
python ./src/mobile_use/main.py \
"Open Gmail, find all unread emails, and list their sender and subject line" \
--output-description "A JSON list of objects, each with 'sender' and 'subject' keys"
[!NOTE]
If you haven't configured a specific model, mobile-use will prompt you to choose one from the available options.
This diagram is automatically updated from the codebase. This is our current agentic system architecture.
We love contributions! Whether you're fixing a bug, adding a feature, or improving documentation, your help is welcome. Please read our Contributing Guidelines to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for mobile-use
Similar Open Source Tools

mobile-use
Mobile-use is an open-source AI agent that controls Android or IOS devices using natural language. It understands commands to perform tasks like sending messages and navigating apps. Features include natural language control, UI-aware automation, data scraping, and extensibility. Users can automate their mobile experience by setting up environment variables, customizing LLM configurations, and launching the tool via Docker or manually for development. The tool supports physical Android phones, Android simulators, and iOS simulators. Contributions are welcome, and the project is licensed under MIT.

ChatGPT-desktop
ChatGPT Desktop Application is a multi-platform tool that provides a powerful AI wrapper for generating text. It offers features like text-to-speech, exporting chat history in various formats, automatic application upgrades, system tray hover window, support for slash commands, customization of global shortcuts, and pop-up search. The application is built using Tauri and aims to enhance user experience by simplifying text generation tasks. It is available for Mac, Windows, and Linux, and is designed for personal learning and research purposes.

Local-File-Organizer
The Local File Organizer is an AI-powered tool designed to help users organize their digital files efficiently and securely on their local device. By leveraging advanced AI models for text and visual content analysis, the tool automatically scans and categorizes files, generates relevant descriptions and filenames, and organizes them into a new directory structure. All AI processing occurs locally using the Nexa SDK, ensuring privacy and security. With support for multiple file types and customizable prompts, this tool aims to simplify file management and bring order to users' digital lives.

open-webui
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our Open WebUI Documentation.

upscayl
Upscayl is a free and open-source AI image upscaler that uses advanced AI algorithms to enlarge and enhance low-resolution images without losing quality. It is a cross-platform application built with the Linux-first philosophy, available on all major desktop operating systems. Upscayl utilizes Real-ESRGAN and Vulkan architecture for image enhancement, and its backend is fully open-source under the AGPLv3 license. It is important to note that a Vulkan compatible GPU is required for Upscayl to function effectively.

code2prompt
code2prompt is a command-line tool that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting. It automates generating LLM prompts from codebases of any size, customizing prompt generation with Handlebars templates, respecting .gitignore, filtering and excluding files using glob patterns, displaying token count, including Git diff output, copying prompt to clipboard, saving prompt to an output file, excluding files and folders, adding line numbers to source code blocks, and more. It helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.

langmanus
LangManus is a community-driven AI automation framework that combines language models with specialized tools for tasks like web search, crawling, and Python code execution. It implements a hierarchical multi-agent system with agents like Coordinator, Planner, Supervisor, Researcher, Coder, Browser, and Reporter. The framework supports LLM integration, search and retrieval tools, Python integration, workflow management, and visualization. LangManus aims to give back to the open-source community and welcomes contributions in various forms.

preswald
Preswald is a full-stack platform for building, deploying, and managing interactive data applications in Python. It simplifies the process by combining ingestion, storage, transformation, and visualization into one lightweight SDK. With Preswald, users can connect to various data sources, customize app themes, and easily deploy apps locally. The platform focuses on code-first simplicity, end-to-end coverage, and efficiency by design, making it suitable for prototyping internal tools or deploying production-grade apps with reduced complexity and cost.

chatgpt-vscode
ChatGPT-VSCode is a Visual Studio Code integration that allows users to prompt OpenAI's GPT-4, GPT-3.5, GPT-3, and Codex models within the editor. It offers features like using improved models via OpenAI API Key, Azure OpenAI Service deployments, generating commit messages, storing conversation history, explaining and suggesting fixes for compile-time errors, viewing code differences, and more. Users can customize prompts, quick fix problems, save conversations, and export conversation history. The extension is designed to enhance developer experience by providing AI-powered assistance directly within VS Code.

middleware
Middleware is an open-source engineering management tool that helps engineering leaders measure and analyze team effectiveness using DORA metrics. It integrates with CI/CD tools, automates DORA metric collection and analysis, visualizes key performance indicators, provides customizable reports and dashboards, and integrates with project management platforms. Users can set up Middleware using Docker or manually, generate encryption keys, set up backend and web servers, and access the application to view DORA metrics. The tool calculates DORA metrics using GitHub data, including Deployment Frequency, Lead Time for Changes, Mean Time to Restore, and Change Failure Rate. Middleware aims to provide DORA metrics to users based on their Git data, simplifying the process of tracking software delivery performance and operational efficiency.

ai-telephony-demo
Build a fully functional AI telephony agent using VideoSDK Agent. The project covers setting up the agent locally, configuring SIP trunks for inbound and outbound calls, and connecting the agent to the phone network. It provides step-by-step instructions, including creating environment variables, installing dependencies, and running the Python script. The agent can handle incoming calls, greet users, engage in conversations using natural speech, and respond using the Gemini Live model with voice synthesis. Additionally, it explains how to make outbound calls through API requests to the VideoSDK SIP endpoint. The project aims to help users create and deploy an AI agent for telephony tasks.

gitingest
GitIngest is a tool that allows users to turn any Git repository into a prompt-friendly text ingest for LLMs. It provides easy code context by generating a text digest from a git repository URL or directory. The tool offers smart formatting for optimized output format for LLM prompts and provides statistics about file and directory structure, size of the extract, and token count. GitIngest can be used as a CLI tool on Linux and as a Python package for code integration. The tool is built using Tailwind CSS for frontend, FastAPI for backend framework, tiktoken for token estimation, and apianalytics.dev for simple analytics. Users can self-host GitIngest by building the Docker image and running the container. Contributions to the project are welcome, and the tool aims to be beginner-friendly for first-time contributors with a simple Python and HTML codebase.

air-script
Air Script is a versatile tool designed for Wi-Fi penetration testing, offering automated and user-friendly features to streamline the hacking process. It allows users to easily capture handshakes from nearby networks, automate attacks, and even send email notifications upon completion. The tool is ideal for individuals looking to efficiently pwn Wi-Fi networks without extensive manual input. With additional tools and options available, Air Script caters to a wide range of users, including script kiddies, hackers, pentesters, and security researchers. Whether on the go or using a Raspberry Pi, Air Script provides a convenient solution for network penetration testing and password cracking.

Deep-Live-Cam
Deep-Live-Cam is a software tool designed to assist artists in tasks such as animating custom characters or using characters as models for clothing. The tool includes built-in checks to prevent unethical applications, such as working on inappropriate media. Users are expected to use the tool responsibly and adhere to local laws, especially when using real faces for deepfake content. The tool supports both CPU and GPU acceleration for faster processing and provides a user-friendly GUI for swapping faces in images or videos.

transcriptionstream
Transcription Stream is a self-hosted diarization service that works offline, allowing users to easily transcribe and summarize audio files. It includes a web interface for file management, Ollama for complex operations on transcriptions, and Meilisearch for fast full-text search. Users can upload files via SSH or web interface, with output stored in named folders. The tool requires a NVIDIA GPU and provides various scripts for installation and running. Ports for SSH, HTTP, Ollama, and Meilisearch are specified, along with access details for SSH server and web interface. Customization options and troubleshooting tips are provided in the documentation.

openroleplay.ai
Open Roleplay is an open-source alternative to Character.ai. It allows users to create their own AI characters, customize them, and generate images and voices for them. Open Roleplay also supports group chat and automatic translation. The tool is built with Next.js, React.js, Tailwind CSS, Vercel, Convex, and Clerk.
For similar tasks

mobile-use
Mobile-use is an open-source AI agent that controls Android or IOS devices using natural language. It understands commands to perform tasks like sending messages and navigating apps. Features include natural language control, UI-aware automation, data scraping, and extensibility. Users can automate their mobile experience by setting up environment variables, customizing LLM configurations, and launching the tool via Docker or manually for development. The tool supports physical Android phones, Android simulators, and iOS simulators. Contributions are welcome, and the project is licensed under MIT.

dspy.rb
DSPy.rb is a Ruby framework for building reliable LLM applications using composable, type-safe modules. It enables developers to define typed signatures and compose them into pipelines, offering a more structured approach compared to traditional prompting. The framework embraces Ruby conventions and adds innovations like CodeAct agents and enhanced production instrumentation, resulting in scalable LLM applications that are robust and efficient. DSPy.rb is actively developed, with a focus on stability and real-world feedback through the 0.x series before reaching a stable v1.0 API.

tuui
TUUI is a desktop MCP client designed for accelerating AI adoption through the Model Context Protocol (MCP) and enabling cross-vendor LLM API orchestration. It is an LLM chat desktop application based on MCP, created using AI-generated components with strict syntax checks and naming conventions. The tool integrates AI tools via MCP, orchestrates LLM APIs, supports automated application testing, TypeScript, multilingual, layout management, global state management, and offers quick support through the GitHub community and official documentation.

blurr
Panda is a proactive, on-device AI agent for Android that autonomously understands natural language commands and operates your phone's UI to achieve them. It acts as a personal operator, handling complex, multi-step tasks across different applications. With intelligent UI automation, high-quality voice, and personalized local memory, Panda simplifies interactions with technology. Built on Kotlin, Panda's architecture includes Eyes & Hands for physical device connection, The Brain for reasoning, and The Agent for execution. The project is a proof-of-concept aiming to become an indispensable assistant.

PythonPark
PythonPark is a paradise for learning Python, providing babysitter-level tutorials on AI labs, treasure videos, data structures, study guides, machine learning practicals, deep learning practicals, Python basics, web scraping, big company interview experiences, programming life, and resource sharing. Original articles are published at least twice a week, with the latest articles being first released on WeChat and videos on Bilibili. Join the WeChat group for technical discussions or to provide feedback. Continuously improving and outputting content!

Fyin
Fyin is an open-source tool that serves as an alternative to Perplexity AI, allowing users to run it locally for faster answers. It features the ability to run locally using ollama or OpenAI API, a local VectorDB for fast search, quick searching, scraping & answering due to parallelism, configurable number of search results to parse, and local scraping of websites. The tool aims to provide a more efficient and customizable solution for obtaining answers through search and scraping functionalities.

ArcadiaScriptPublic
ArcadiaScriptPublic is a repository containing various scripts for learning and practicing JavaScript, Python, and Shell scripting. It is intended for testing and educational purposes only, and not for commercial use. The repository does not guarantee the legality, accuracy, completeness, or effectiveness of the scripts, and users are advised to use them at their own discretion. No resources from the repository are allowed to be republished or redistributed by any public account or self-media. The repository owner disclaims any responsibility for script-related issues, including losses or damages resulting from script errors. Users indirectly utilizing the scripts, such as setting up VPS or engaging in activities that violate national/regional laws or regulations, are solely responsible for any privacy leaks or consequences. If any entity or individual believes that the scripts in the project may infringe upon their rights, they should promptly notify and provide proof of identity and ownership, upon which the relevant scripts will be removed after verification. Anyone viewing or using the scripts in this project should carefully read and accept the disclaimer provided by zjk2017/ArcadiaScriptPublic, as the repository reserves the right to change or supplement the disclaimer at any time. Users must completely delete the downloaded content from their computers or phones within 24 hours of downloading, and any form of profit chain generation is strictly prohibited.

browser
Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.