
cli
Next generation autonomous AI agent for end-to-end testing of web & desktop
Stars: 191

TestDriver is an innovative test framework that automates and scales QA using computer-use agents. It leverages AI vision, mouse, and keyboard emulation to control the entire desktop, making it more like a QA employee than a traditional test framework. With TestDriver, users can easily set up tests without complex selectors, reduce maintenance efforts as tests don't break with code changes, and gain more power to test any application and control any OS setting.
README:
Automate and scale QA with computer-use agents.
Docs | Website | GitHub Action | Join our Discord
Follow the instructions on our docs for more..
TestDriver isn't like any test framework you've used before. TestDriver is an OS Agent for QA. TestDriver uses AI vision along with mouse and keyboard emulation to control the entire desktop. It's more like a QA employee than a test framework. This kind of black-box testing has some major advantages:
- Easier set up: No need to add test IDs or craft complex selectors
- Less Maintenance: Tests don't break when code changes
- More Power: TestDriver can test any application and control any OS setting
https://github.com/user-attachments/assets/7cb9ee5a-0d05-4ff0-a4fa-084bcee12e98
- Test any user flow on any website in any browser
- Clone, build, and test any desktop app
- Render multiple browser windows and popups like 3rd party auth
- Test
<canvas>
,<iframe>
, and<video>
tags with ease - Use file selectors to upload files to the browser
- Test chrome extensions
- Test integrations between applications
- Integrates into CI/CD via GitHub Actions ($)
Check out the docs.
- Tell TestDriver what to do in natural language on your local machine using
npm i testdriverai -g
- TestDriver looks at the screen and uses mouse and keyboard emulation to accomplish the goal
- Run TestDriver tests on our test infrastructure
In your project directory:
npx testdriverai@latest init
Let's show TestDriver what we want to test. Run the following command:
npx testdriverai@latest .testdriver/test.yaml
TestDriver best practice is to start instructing TestDriver with your app in it's initial state. For browsers, this means creating a new tab with the website you want to test.
If you have multiple monitors, make sure you do this on your primary display.
Now, just tell TestDriver what you want it to do. For now, stick with single commands like "click sign up" and "scroll down."
Later, try to perform higher level objectives like "complete the onboarding."
> Click on sign up
TestDriver Generates a Test
TestDriver will look at your screen and generate a test script. TestDriver can see the screen, control the mouse, keyboard, and more!
TestDriver can only see your primary display!
To navigate to testdriver.ai, we need to focus on the
Google Chrome application, click on the search bar, type
the URL, and then press Enter.
Here are the steps:
1. Focus on the Google Chrome application.
2. Click on the search bar.
3. Type "testdriver.ai".
4. Press Enter.
Let's start with focusing on the Google Chrome application
and clicking on the search bar.
commands:
- command: focus-application
name: Google Chrome
- command: hover-text
text: Search Google or type a URL
description: main google search
action: click
After this, we will type the URL and press Enter.
TestDriver will execute the commands found in the yml codeblocks of the response.
See the yml TestDriver generated? That's our own schema. You can learn more about it in the reference.
Take your hands off the mouse and keyboard while TestDriver executes! TestDriver is not a fan of backseat drivers.
Feel free to ask TestDriver to perform some more tasks. Every time you prompt TestDriver it will look at your screen and generate more test step to complete your goal.
> navigate to airbnb.com
> search for destinations in austin tx
> click check in
> select august 8
If something didn't work, you can use /undo
to remove all of the test steps added since the last prompt.
Now it's time to make sure the test plan works before we deploy it. Use testdriver run to run the test file you just created with /save .
npx testdriverai@latest run testdriver/test.yaml
Make sure to reset the test state!
Now it's time to deploy your test using our GitHub action! testdriver init
already did the work for you and will start triggering tests once you commit the new files to your repository.
git add .
git commit -am "Add TestDriver tests"
gh pr create --web
Your test will run on every commit and the results will be posted as a Dashcam.io video within your GitHub summary! Learn more about deploying on CI here.
TestDriver can also be used programmatically as a Node.js module. This is useful when you want to integrate TestDriver into your own applications or customize the test file paths.
By default, TestDriver looks for test files at testdriver/testdriver.yaml
relative to the current working directory. You can customize this:
const TestDriverAgent = require("testdriverai");
// Option 1: Set default via environment variable
const agent1 = new TestDriverAgent({
TD_DEFAULT_TEST_FILE: "my-tests/integration.yaml",
});
// Option 2: Explicitly specify test file
const agent2 = new TestDriverAgent(
{},
{
args: ["path/to/specific/test.yaml"],
},
);
// Option 3: Custom working directory + relative path
const agent3 = new TestDriverAgent(
{ TD_DEFAULT_TEST_FILE: "tests/smoke.yaml" },
{ options: { workingDir: "/path/to/your/project" } },
);
// Run the test
await agent1.run();
You can also set the default test file path using environment variables:
export TD_DEFAULT_TEST_FILE="custom/path/test.yaml"
node your-script.js
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for cli
Similar Open Source Tools

cli
TestDriver is an innovative test framework that automates and scales QA using computer-use agents. It leverages AI vision, mouse, and keyboard emulation to control the entire desktop, making it more like a QA employee than a traditional test framework. With TestDriver, users can easily set up tests without complex selectors, reduce maintenance efforts as tests don't break with code changes, and gain more power to test any application and control any OS setting.

testdriverai
TestDriver.ai is a unique test framework that acts as an OS Agent for QA, utilizing AI vision, mouse, and keyboard emulation to control the desktop. It simplifies testing setup, requires less maintenance, and offers more power to test any application and control any OS setting. Users can automate testing of user flows on websites, desktop apps, browser windows, popups, HTML elements, file uploads, chrome extensions, and application integrations. The tool allows users to instruct TestDriver in natural language, generate test scripts, execute tests, and deploy tests using GitHub Actions for continuous integration.

cog-comfyui
Cog-ComfyUI is a tool designed to run ComfyUI workflows on Replicate. It allows users to easily integrate their own workflows into their app or website using the Replicate API. The tool includes popular model weights and custom nodes, with the option to request more custom nodes or models. Users can get their API JSON, gather input files, and use custom LoRAs from CivitAI or HuggingFace. Additionally, users can run their workflows and set up their own dedicated instances for better performance and control. The tool provides options for private deployments, forking using Cog, or creating new models from the train tab on Replicate. It also offers guidance on developing locally and running the Web UI from a Cog container.

mentat
Mentat is an AI tool designed to assist with coding tasks directly from the command line. It combines human creativity with computer-like processing to help users understand new codebases, add new features, and refactor existing code. Unlike other tools, Mentat coordinates edits across multiple locations and files, with the context of the project already in mind. The tool aims to enhance the coding experience by providing seamless assistance and improving edit quality.

cog-comfyui
Cog-comfyui allows users to run ComfyUI workflows on Replicate. ComfyUI is a visual programming tool for creating and sharing generative art workflows. With cog-comfyui, users can access a variety of pre-trained models and custom nodes to create their own unique artworks. The tool is easy to use and does not require any coding experience. Users simply need to upload their API JSON file and any necessary input files, and then click the "Run" button. Cog-comfyui will then generate the output image or video file.

azure-search-openai-javascript
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval.

CLI
Bito CLI provides a command line interface to the Bito AI chat functionality, allowing users to interact with the AI through commands. It supports complex automation and workflows, with features like long prompts and slash commands. Users can install Bito CLI on Mac, Linux, and Windows systems using various methods. The tool also offers configuration options for AI model type, access key management, and output language customization. Bito CLI is designed to enhance user experience in querying AI models and automating tasks through the command line interface.

reai-ghidra
The RevEng.AI Ghidra Plugin by RevEng.ai allows users to interact with their API within Ghidra for Binary Code Similarity analysis to aid in Reverse Engineering stripped binaries. Users can upload binaries, rename functions above a confidence threshold, and view similar functions for a selected function.

llm-code-interpreter
The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.

clapper
Clapper is an open-source AI story visualization tool that can interpret screenplays and render them into storyboards, videos, voice, sound, and music. It is currently in early development stages and not recommended for general use due to some non-functional features and lack of tutorials. A public alpha version is available on Hugging Face's platform. Users can sponsor specific features through bounties and developers can contribute to the project under the GPL v3 license. The tool lacks automated tests and code conventions like Prettier or a Linter.

devika
Devika is an advanced AI software engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika utilizes large language models, planning and reasoning algorithms, and web browsing abilities to intelligently develop software. Devika aims to revolutionize the way we build software by providing an AI pair programmer who can take on complex coding tasks with minimal human guidance. Whether you need to create a new feature, fix a bug, or develop an entire project from scratch, Devika is here to assist you.

alexa-skill-llm-intent
An Alexa Skill template that provides a ready-to-use skill for starting a conversation with an AI. Users can ask questions and receive answers in Alexa's voice, powered by ChatGPT or other llm. The template includes setup instructions for configuring the AI provider API and model, as well as usage commands for interacting with the skill. It serves as a starting point for creating custom Alexa Skills and should be used at the user's own risk.

ai-voice-cloning
This repository provides a tool for AI voice cloning, allowing users to generate synthetic speech that closely resembles a target speaker's voice. The tool is designed to be user-friendly and accessible, with a graphical user interface that guides users through the process of training a voice model and generating synthetic speech. The tool also includes a variety of features that allow users to customize the generated speech, such as the pitch, volume, and speaking rate. Overall, this tool is a valuable resource for anyone interested in creating realistic and engaging synthetic speech.

OSWorld
OSWorld is a benchmarking tool designed to evaluate multimodal agents for open-ended tasks in real computer environments. It provides a platform for running experiments, setting up virtual machines, and interacting with the environment using Python scripts. Users can install the tool on their desktop or server, manage dependencies with Conda, and run benchmark tasks. The tool supports actions like executing commands, checking for specific results, and evaluating agent performance. OSWorld aims to facilitate research in AI by providing a standardized environment for testing and comparing different agent baselines.

HackBot
HackBot is an AI-powered cybersecurity chatbot designed to provide accurate answers to cybersecurity-related queries, conduct code analysis, and scan analysis. It utilizes the Meta-LLama2 AI model through the 'LlamaCpp' library to respond coherently. The chatbot offers features like local AI/Runpod deployment support, cybersecurity chat assistance, interactive interface, clear output presentation, static code analysis, and vulnerability analysis. Users can interact with HackBot through a command-line interface and utilize it for various cybersecurity tasks.

serverless-chat-langchainjs
This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.
For similar tasks

testdriverai
TestDriver.ai is a unique test framework that acts as an OS Agent for QA, utilizing AI vision, mouse, and keyboard emulation to control the desktop. It simplifies testing setup, requires less maintenance, and offers more power to test any application and control any OS setting. Users can automate testing of user flows on websites, desktop apps, browser windows, popups, HTML elements, file uploads, chrome extensions, and application integrations. The tool allows users to instruct TestDriver in natural language, generate test scripts, execute tests, and deploy tests using GitHub Actions for continuous integration.

cli
TestDriver is an innovative test framework that automates and scales QA using computer-use agents. It leverages AI vision, mouse, and keyboard emulation to control the entire desktop, making it more like a QA employee than a traditional test framework. With TestDriver, users can easily set up tests without complex selectors, reduce maintenance efforts as tests don't break with code changes, and gain more power to test any application and control any OS setting.

psychic
Finic is an open source python-based integration platform designed to simplify integration workflows for both business users and developers. It offers a drag-and-drop UI, a dedicated Python environment for each workflow, and generative AI features to streamline transformation tasks. With a focus on decoupling integration from product code, Finic aims to provide faster and more flexible integrations by supporting custom connectors. The tool is open source and allows deployment to users' own cloud environments with minimal legal friction.

finic
Finic is an open source python-based integration platform designed for business users to create v1 integrations with minimal code, while also being flexible for developers to build complex integrations directly in python. It offers a low-code web UI, a dedicated Python environment for each workflow, and generative AI features. Finic decouples integration from product code, supports custom connectors, and is open source. It is not an ETL tool but focuses on integrating functionality between applications via APIs or SFTP, and it is not a workflow automation tool optimized for complex use cases.

specmatic
Eliminate API integration headaches with Specmatic's no-code AI-powered API development suite. Teams ship APIs 10x faster by transforming specifications into executable contracts instantly—no coding required, no integration surprises. In a complex, interdependent ecosystem, where each service is evolving rapidly, we want to make the dependencies between them explicit in the form of executable contracts. Contract Driven Development leverages API specifications like OpenAPI, AsyncAPI, GraphQL SDL files, gRPC Proto files, etc. as executable contracts allowing teams to get instantaneous feedback while making changes to avoid accidental breakage. With this ability, we can now independently deploy, at will, any service at any time without having to depend on expensive and fragile integration tests.

aim
Aim is a command-line tool for downloading and uploading files with resume support. It supports various protocols including HTTP, FTP, SFTP, SSH, and S3. Aim features an interactive mode for easy navigation and selection of files, as well as the ability to share folders over HTTP for easy access from other devices. Additionally, it offers customizable progress indicators and output formats, and can be integrated with other commands through piping. Aim can be installed via pre-built binaries or by compiling from source, and is also available as a Docker image for platform-independent usage.

allchat
ALLCHAT is a Node.js backend and React MUI frontend for an application that interacts with the Gemini Pro 1.5 (and others), with history, image generating/recognition, PDF/Word/Excel upload, code run, model function calls and markdown support. It is a comprehensive tool that allows users to connect models to the world with Web Tools, run locally, deploy using Docker, configure Nginx, and monitor the application using a dockerized monitoring solution (Loki+Grafana).

chat-xiuliu
Chat-xiuliu is a bidirectional voice assistant powered by ChatGPT, capable of accessing the internet, executing code, reading/writing files, and supporting GPT-4V's image recognition feature. It can also call DALL·E 3 to generate images. The project is a fork from a background of a virtual cat girl named Xiuliu, with removed live chat interaction and added voice input. It can receive questions from microphone or interface, answer them vocally, upload images and PDFs, process tasks through function calls, remember conversation content, search the web, generate images using DALL·E 3, read/write local files, execute JavaScript code in a sandbox, open local files or web pages, customize the cat girl's speaking style, save conversation screenshots, and support Azure OpenAI and other API endpoints in openai format. It also supports setting proxies and various AI models like GPT-4, GPT-3.5, and DALL·E 3.
For similar jobs

testzeus-hercules
Hercules is the world’s first open-source testing agent designed to handle the toughest testing tasks for modern web applications. It turns simple Gherkin steps into fully automated end-to-end tests, making testing simple, reliable, and efficient. Hercules adapts to various platforms like Salesforce and is suitable for CI/CD pipelines. It aims to democratize and disrupt test automation, making top-tier testing accessible to everyone. The tool is transparent, reliable, and community-driven, empowering teams to deliver better software. Hercules offers multiple ways to get started, including using PyPI package, Docker, or building and running from source code. It supports various AI models, provides detailed installation and usage instructions, and integrates with Nuclei for security testing and WCAG for accessibility testing. The tool is production-ready, open core, and open source, with plans for enhanced LLM support, advanced tooling, improved DOM distillation, community contributions, extensive documentation, and a bounty program.

SDET-GENIE
SDET-GENIE is a cutting-edge, AI-powered Quality Assurance (QA) automation framework that revolutionizes the software testing process. Leveraging a suite of specialized AI agents, SDET-GENIE transforms rough user stories into comprehensive, executable test automation code through a seamless end-to-end process. The framework integrates five powerful AI agents working in sequence: User Story Enhancement Agent, Manual Test Case Agent, Gherkin Scenario Agent, Browser Agent, and Code Generation Agent. It supports multiple testing frameworks and provides advanced browser automation capabilities with AI features.

cli
TestDriver is an innovative test framework that automates and scales QA using computer-use agents. It leverages AI vision, mouse, and keyboard emulation to control the entire desktop, making it more like a QA employee than a traditional test framework. With TestDriver, users can easily set up tests without complex selectors, reduce maintenance efforts as tests don't break with code changes, and gain more power to test any application and control any OS setting.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

nvidia_gpu_exporter
Nvidia GPU exporter for prometheus, using `nvidia-smi` binary to gather metrics.

tracecat
Tracecat is an open-source automation platform for security teams. It's designed to be simple but powerful, with a focus on AI features and a practitioner-obsessed UI/UX. Tracecat can be used to automate a variety of tasks, including phishing email investigation, evidence collection, and remediation plan generation.