droidclaw

droidclaw

turn old phones into ai agents - give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

Stars: 236

Visit
 screenshot

Droidclaw is an experimental tool designed to turn old Android devices into AI agents. It allows users to give goals in plain English, which the tool then executes by reading the screen, asking an LLM for instructions, and using ADB commands. The tool can delegate tasks to various AI services like ChatGPT, Gemini, or Google Search on the device. Users can install their favorite apps, create workflows, or give instructions on the fly without worrying about complex APIs. Droidclaw offers two modes for automation: workflows for AI-powered tasks and flows for fixed sequences of actions. It supports various providers for AI intelligence and can be controlled remotely using Tailscale, making old Android devices useful for automation tasks without the need for APIs.

README:

droidclaw

experimental. i wanted to build something to turn my old android devices into ai agents. after a few hours reverse engineering accessibility trees and the kernel and playing with tailscale.. it worked.

ai agent that controls your android phone. give it a goal in plain english - it figures out what to tap, type, and swipe. it reads the screen, asks an llm what to do, executes via adb, and repeats until the job is done.

one of the biggest things it can do right now is delegate incoming requests to chatgpt, gemini, or google search on the device... and give us the result back. few years back we could run this kind of automation with predefined flows. now think of this as automation with ai intelligence... it can do stuff. you don't need to worry about messy api's. just install your fav apps, write workflows or tell them on the fly. it will get it done.

$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

setup

curl -fsSL https://droidclaw.ai/install.sh | sh

installs bun and adb if missing, clones the repo, sets up .env. or do it manually:

# install adb
brew install android-platform-tools

# install bun (required — npm/node won't work)
curl -fsSL https://bun.sh/install | bash

# clone and setup
git clone https://github.com/unitedbyai/droidclaw.git
cd droidclaw && bun install
cp .env.example .env

note: droidclaw requires bun, not node/npm. it uses bun-specific apis (Bun.spawnSync, native .env loading) that don't exist in node.

edit .env - fastest way to start is with groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here

or run fully local with ollama (no api key needed):

ollama pull llama3.2
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.2

connect your phone (usb debugging on):

adb devices   # should show your device
bun run src/kernel.ts

that's the simplest way - just type a goal and let the agent figure it out. but for anything you want to run repeatedly, there are two modes: workflows and flows.

workflows

workflows are ai-powered. you describe goals in natural language, and the llm decides how to navigate, what to tap, what to type. use these when the ui might change, when you need the agent to think, or when chaining goals across multiple apps.

bun run src/kernel.ts --workflow examples/workflows/research/weather-to-whatsapp.json

each workflow is a json file - just a name and a list of steps:

{
  "name": "weather to whatsapp",
  "steps": [
    { "app": "com.google.android.googlequicksearchbox", "goal": "search for chennai weather today" },
    { "goal": "share the result to whatsapp contact Sanju" }
  ]
}

you can also pass form data into steps when you need to inject specific text:

{
  "name": "slack standup",
  "steps": [
    {
      "app": "com.Slack",
      "goal": "open #standup channel, type the message and send it",
      "formData": { "Message": "yesterday: api integration\ntoday: tests\nblockers: none" }
    }
  ]
}

examples

35 ready-to-use workflows organised by category:

messaging - whatsapp, telegram, slack, email

social - instagram, youtube, cross-posting

productivity - calendar, notes, github, notifications

research - search, compare, monitor

lifestyle - food, transport, music, fitness

flows

for tasks where you don't need ai thinking at all - just a fixed sequence of taps and types. no llm calls, instant execution. good for things you do exactly the same way every time.

bun run src/kernel.ts --flow examples/flows/send-whatsapp.yaml
appId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- wait: 2
- tap: "Contact Name"
- wait: 1
- tap: "Message"
- type: "hello from droidclaw"
- tap: "Send"
- done: "Message sent"

examples

5 flow templates in examples/flows/:

quick comparison

workflows flows
format json yaml
uses ai yes no
handles ui changes yes no
speed slower (llm calls) instant
best for complex/multi-app tasks simple repeatable tasks

providers

provider cost vision notes
groq free tier no fastest to start
ollama free (local) yes* no api key, runs on your machine
openrouter per token yes 200+ models
openai per token yes gpt-4o
bedrock per token yes claude on aws

*ollama vision requires a vision model like llama3.2-vision or llava

config

all in .env:

key default what
MAX_STEPS 30 steps before giving up
STEP_DELAY 2 seconds between actions
STUCK_THRESHOLD 3 steps before stuck recovery
VISION_MODE fallback off / fallback / always
MAX_ELEMENTS 40 ui elements sent to llm

how it works

each step: dump accessibility tree → filter elements → send to llm → execute action → repeat.

the llm thinks before acting - returns { think, plan, action }. if the screen doesn't change for 3 steps, stuck recovery kicks in. when the accessibility tree is empty (webviews, flutter), it falls back to screenshots.

source

src/
  kernel.ts          main loop
  actions.ts         22 actions + adb retry
  skills.ts          6 multi-step skills
  workflow.ts        workflow orchestration
  flow.ts            yaml flow runner
  llm-providers.ts   5 providers + system prompt
  sanitizer.ts       accessibility xml parser
  config.ts          env config
  constants.ts       keycodes, coordinates
  logger.ts          session logging

remote control with tailscale

the default setup is usb - phone plugged into your laptop. but you can go further.

install tailscale on both your android device and your laptop/vps. once they're on the same tailnet, connect adb over the network:

# on your phone: enable wireless debugging (developer options → wireless debugging)
# note the ip:port shown on the screen

# from your laptop/vps, anywhere in the world:
adb connect <phone-tailscale-ip>:<port>
adb devices   # should show your phone

bun run src/kernel.ts

now your phone is a remote ai agent. leave it on a desk, plugged into power, and control it from your vps, your laptop at a cafe, or a cron job running workflows at 8am every morning. the phone doesn't need to be on the same wifi or even in the same country.

this is what makes old android devices useful again - they become always-on agents that can do things on apps that don't have api's.

troubleshooting

"adb: command not found" - install adb or set ADB_PATH in .env

"no devices found" - check usb debugging is on, tap "allow" on the phone

agent repeating - stuck detection handles this. if it persists, use a better model

contributors

built by unitedby.ai — an open ai community

acknowledgements

droidclaw's workflow orchestration was influenced by android action kernel from action state labs. we took the core idea of sub-goal decomposition and built a different system around it — with stuck recovery, 22 actions, multi-step skills, and vision fallback.

license

mit

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for droidclaw

Similar Open Source Tools

For similar tasks

For similar jobs