Best AI tools for< Test Agents >
20 - AI tool Sites

Vocera
Vocera is an AI voice agent testing tool that allows users to test and monitor voice AI agents efficiently. It enables users to launch voice agents in minutes, ensuring a seamless conversational experience. With features like testing against AI-generated datasets, simulating scenarios, and monitoring AI performance, Vocera helps in evaluating and improving voice agent interactions. The tool provides real-time insights, detailed logs, and trend analysis for optimal performance, along with instant notifications for errors and failures. Vocera is designed to work for everyone, offering an intuitive dashboard and data-driven decision-making for continuous improvement.

Prompt Hippo
Prompt Hippo is an AI tool designed as a side-by-side LLM prompt testing suite to ensure the robustness, reliability, and safety of prompts. It saves time by streamlining the process of testing LLM prompts and allows users to test custom agents and optimize them for production. With a focus on science and efficiency, Prompt Hippo helps users identify the best prompts for their needs.

Hamming
Hamming is an AI tool designed to help automate voice agent testing and optimization. It offers features such as prompt optimization, automated voice testing, monitoring, and more. The platform allows users to test AI voice agents against simulated users, create optimized prompts, actively monitor AI app usage, and simulate customer calls to identify system gaps. Hamming is trusted by AI-forward enterprises and is built for inbound and outbound agents, including AI appointment scheduling, AI drive-through, AI customer support, AI phone follow-ups, AI personal assistant, and AI coaching and tutoring.

TestArmy
TestArmy is an AI-driven software testing platform that offers an army of testing agents to help users achieve software quality by balancing cost, speed, and quality. The platform leverages AI agents to generate Gherkin tests based on user specifications, automate test execution, and provide detailed logs and suggestions for test maintenance. TestArmy is designed for rapid scaling and adaptability to changes in the codebase, making it a valuable tool for both technical and non-technical users.

MAIHEM
MAIHEM is an AI-powered quality assurance platform that helps businesses test and improve the performance and safety of their AI applications. It automates the testing process, generates realistic test cases, and provides comprehensive analytics to help businesses identify and fix potential issues. MAIHEM is used by a variety of businesses, including those in the customer support, healthcare, education, and sales industries.

Elixir
Elixir is an AI tool designed for observability and testing of AI voice agents. It offers features such as automated testing, call review, monitoring, analytics, tracing, scoring, and reviewing. Elixir helps in simulating realistic test calls, analyzing conversations, identifying mistakes, and debugging issues with audio snippets and call transcripts. It provides detailed traces for complex abstractions, streamlines manual review processes, and allows for simulating thousands of calls for full test coverage. The tool is suitable for monitoring agent performance, detecting anomalies in real-time, and improving conversational systems through human-in-the-loop feedback.

Filuta AI
Filuta AI is an advanced AI application that redefines game testing with planning agents. It utilizes Composite AI with planning techniques to provide a 24/7 testing environment for smooth, bug-free releases. The application brings deep space technology to game testing, enabling intelligent agents to analyze game states, adapt in real time, and execute action sequences to achieve test goals. Filuta AI offers goal-driven testing, adaptive exploration, detailed insights, and shorter development cycles, making it a valuable tool for game developers, QA leads, game designers, automation engineers, and producers.

Danora
Danora is an AI application that offers Persona AI Agents to help personalize marketing strategies for Gen Z parents. These AI Agents decode online conversations to create virtual personas in real-time, providing actionable insights and enabling the launch of smarter, more personalized campaigns. The application gathers data from various platforms in multiple languages to deliver insights on trends, intent, emotions, behaviors, and sentiment of Gen Z parents. Users can interact with the AI Agents, ask questions, and receive instant answers on various topics. Danora aims to simplify the process from insight to action by offering a flexible and efficient solution for businesses to connect with their target audience effectively.

Coval
Coval is an AI tool designed to help users ship reliable AI agents faster by providing simulation and evaluations for voice and chat agents. It allows users to simulate thousands of scenarios from a few test cases, create prompts for testing, and evaluate agent interactions comprehensively. Coval offers AI-powered simulations, voice AI compatibility, performance tracking, workflow metrics, and customizable evaluation metrics to optimize AI agents efficiently.

Voiceflow
Voiceflow is a powerful, flexible, and collaborative platform for building AI automation. It allows teams of any size to build agents of any scale and complexity, easily. Voiceflow's visual workflow builder is used by developers and designers to collaboratively create, iterate, and ship complex agents. Voiceflow also offers a central CMS for managing all of your agent content, including variables, intents, entities, and knowledge base sources. With Voiceflow, you can integrate with any API or service, share and test prototypes, and launch agents to any interface.

WhenX
WhenX is an AI tool designed to create robots that monitor the web for users. It allows users to create Semantic Alerts by asking questions, searching the web for answers, and monitoring for any changes. Users can track updates on their favorite writers, job postings, or new product releases. WhenX is a personal project not intended for commercial use, and it is open source, built by edmar and hosted on Vercel.

Nunu.ai
Nunu.ai is an AI application focused on advancing Artificial General Intelligence (AGI) for games. The platform is dedicated to building multimodal gameplay agents that can test and play any game. These vision-based agents interact with games like humans, providing interpretable insights into their decision-making process. Nunu.ai introduces breakthrough capabilities in interactivity, reporting, and interpretability, specializing in Quality Assurance for gaming, particularly in open-world scenarios. The tool accelerates QA processes and extends to player simulation and other use cases.

Zencoder
Zencoder is an intuitive AI coding agent designed to assist developers in coding tasks by leveraging advanced AI workflows and intelligent systems. It offers features like Repo Grokking for deep code insights, AI Agents for streamlining development processes, and capabilities such as code generation, chat assistance, code completion, and more. Zencoder aims to enhance software development efficiency, code quality, and project alignment by integrating seamlessly into developers' workflows.

Functionize
Functionize is an AI Agentic Automation Platform for Enterprises that offers expert AI agents to handle business processes autonomously. The platform utilizes deep learning neural networks to deliver unparalleled performance across various enterprise applications. Functionize's AI agents run autonomously, self-heal workflows, and redefine efficiency and reliability in automation. The platform provides immediate value with pretrained automation, evolves with operational environments, and ensures seamless adaptability and precision in every task. Functionize helps mitigate risks, unlock gains, and support digital transformation for enterprises.

testRigor
testRigor is an AI-based test automation tool that allows users to create and execute test cases using plain English instructions. It leverages generative AI in software testing to automate test creation and maintenance, offering features such as no code/codeless testing, web, mobile, and desktop testing, Salesforce automation, and accessibility testing. With testRigor, users can achieve test coverage faster and with minimal maintenance, enabling organizations to reallocate QA engineers to build API tests and increase test coverage significantly. The tool is designed to simplify test automation, reduce QA headaches, and improve productivity by streamlining the testing process.

Octomind
Octomind is an AI-powered Playwright end-to-end testing tool for web applications. It automatically discovers, generates, and runs tests to find bugs before customers do. Octomind's AI agents analyze web apps, generate tests, run them, and provide debugging details. It offers auto-maintenance, parallel test execution, and integration with CI/CD pipelines. Octomind aims to provide stability, speed, and a better developer experience for testing. It eliminates vendor lock-in, requires no code access, and is built on top of Playwright.

Langflow
Langflow is a low-code app builder for RAG and multi-agent AI applications. It is Python-based and agnostic to any model, API, or database. Langflow offers a visual IDE for building and testing workflows, multi-agent orchestration, free cloud service, observability features, and ecosystem integrations. Users can customize workflows using Python and publish them as APIs or export as Python applications.

GitStart
GitStart is an AI-powered platform that offers Elastic Engineering Capacity by assigning tickets and delivering high-quality production code. It leverages AI agents and a global developer community to increase engineering capacity without hiring more staff. GitStart is supported by top tech leaders and provides solutions for bugs, tech debt, frontend and backend development. The platform aims to empower developers, create economic opportunities, and grow the world's future software talent.

Tabnine
Tabnine is an AI code assistant that accelerates and simplifies software development while keeping your code private, secure, and compliant. It offers industry-leading AI code assistance, personalized to fit your team's needs, ensuring total code privacy, and providing complete protection from intellectual property issues. Tabnine's AI agents cover various aspects of the software development lifecycle, from code generation and explanations to testing, documentation, and bug fixes.

KushoAI
Kusho is an AI-powered tool designed to help software developers build bug-free software efficiently. It offers the capability to transform API specs into exhaustive test suites that seamlessly integrate into the CI/CD pipeline. With KushoAI, developers can generate robust AI-generated test suites, receive AI-analyzed test results, and modify code instantly based on real-time reports. The tool is customizable to meet company's context and understands natural language prompts to produce test case code instantly. KushoAI ensures maximum test coverage in minutes, saves hours of manual effort, and adapts to the codebase to prevent missing any test cases.
20 - Open Source AI Tools

ASTRA.ai
Astra.ai is a multimodal agent powered by TEN, showcasing its capabilities in speech, vision, and reasoning through RAG from local documentation. It provides a platform for developing AI agents with features like RTC transportation, extension store, workflow builder, and local deployment. Users can build and test agents locally using Docker and Node.js, with prerequisites including Agora App ID, Azure's speech-to-text and text-to-speech API keys, and OpenAI API key. The platform offers advanced customization options through config files and API keys setup, enabling users to create and deploy their AI agents for various tasks.

agent-evaluation
Agent Evaluation is a generative AI-powered framework for testing virtual agents. It implements an LLM agent (evaluator) to orchestrate conversations with your own agent (target) and evaluate responses. It supports popular AWS services, allows concurrent multi-turn conversations, defines hooks for additional tasks, and can be used in CI/CD pipelines for faster delivery and stable production environments.

WindowsAgentArena
Windows Agent Arena (WAA) is a scalable Windows AI agent platform designed for testing and benchmarking multi-modal, desktop AI agents. It provides researchers and developers with a reproducible and realistic Windows OS environment for AI research, enabling testing of agentic AI workflows across various tasks. WAA supports deploying agents at scale using Azure ML cloud infrastructure, allowing parallel running of multiple agents and delivering quick benchmark results for hundreds of tasks in minutes.

synthora
Synthora is a lightweight and extensible framework for LLM-driven Agents and ALM research. It aims to simplify the process of building, testing, and evaluating agents by providing essential components. The framework allows for easy agent assembly with a single config, reducing the effort required for tuning and sharing agents. Although in early development stages with unstable APIs, Synthora welcomes feedback and contributions to enhance its stability and functionality.

GhostOS
GhostOS is an AI Agent framework designed to replace JSON Schema with a Turing-complete code interaction interface (Moss Protocol). It aims to create intelligent entities capable of continuous learning and growth through code generation and project management. The framework supports various capabilities such as turning Python files into web agents, real-time voice conversation, body movements control, and emotion expression. GhostOS is still in early experimental development and focuses on out-of-the-box capabilities for AI agents.

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

AutoGPT
AutoGPT is a revolutionary tool that empowers everyone to harness the power of AI. With AutoGPT, you can effortlessly build, test, and delegate tasks to AI agents, unlocking a world of possibilities. Our mission is to provide the tools you need to focus on what truly matters: innovation and creativity.

agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.

Arcade-Learning-Environment
The Arcade Learning Environment (ALE) is a simple framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games. It is built on top of the Atari 2600 emulator Stella and separates the details of emulation from agent design. The ALE currently supports three different interfaces: C++, Python, and OpenAI Gym.

AgentBench
AgentBench is a benchmark designed to evaluate Large Language Models (LLMs) as autonomous agents in various environments. It includes 8 distinct environments such as Operating System, Database, Knowledge Graph, Digital Card Game, and Lateral Thinking Puzzles. The tool provides a comprehensive evaluation of LLMs' ability to operate as agents by offering Dev and Test sets for each environment. Users can quickly start using the tool by following the provided steps, configuring the agent, starting task servers, and assigning tasks. AgentBench aims to bridge the gap between LLMs' proficiency as agents and their practical usability.

awesome-langchain
LangChain is an amazing framework to get LLM projects done in a matter of no time, and the ecosystem is growing fast. Here is an attempt to keep track of the initiatives around LangChain. Subscribe to the newsletter to stay informed about the Awesome LangChain. We send a couple of emails per month about the articles, videos, projects, and tools that grabbed our attention Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.

ollama
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Ollama is designed to be easy to use and accessible to developers of all levels. It is open source and available for free on GitHub.

Nothotdog
NotHotDog is an open-source testing framework for evaluating and validating voice and text-based AI agents. It offers a user-friendly interface for creating, managing, and executing tests against AI models. The framework supports WebSocket and REST API, test case management, automated evaluation of responses, and provides a seamless experience for test creation and execution.

vivaria
Vivaria is a web application tool designed for running evaluations and conducting agent elicitation research. Users can interact with Vivaria using a web UI and a command-line interface. It allows users to start task environments based on METR Task Standard definitions, run AI agents, perform agent elicitation research, view API requests and responses, add tags and comments to runs, store results in a PostgreSQL database, sync data to Airtable, test prompts against LLMs, and authenticate using Auth0.

agent-kit
AgentKit is a framework for creating and orchestrating AI Agents, enabling developers to build, test, and deploy reliable AI applications at scale. It allows for creating networked agents with separate tasks and instructions to solve specific tasks, as well as simple agents for tasks like writing content. The framework requires the Inngest TypeScript SDK as a dependency and provides documentation on agents, tools, network, state, and routing. Example projects showcase AgentKit in action, such as the Test Writing Network demo using Workflow Kit, Supabase, and OpenAI.

agents-flex
Agents-Flex is a LLM Application Framework like LangChain base on Java. It provides a set of tools and components for building LLM applications, including LLM Visit, Prompt and Prompt Template Loader, Function Calling Definer, Invoker and Running, Memory, Embedding, Vector Storage, Resource Loaders, Document, Splitter, Loader, Parser, LLMs Chain, and Agents Chain.

Co-LLM-Agents
This repository contains code for building cooperative embodied agents modularly with large language models. The agents are trained to perform tasks in two different environments: ThreeDWorld Multi-Agent Transport (TDW-MAT) and Communicative Watch-And-Help (C-WAH). TDW-MAT is a multi-agent environment where agents must transport objects to a goal position using containers. C-WAH is an extension of the Watch-And-Help challenge, which enables agents to send messages to each other. The code in this repository can be used to train agents to perform tasks in both of these environments.
20 - OpenAI Gpts

INSIGHT Business SIM
The future of business education: Generate and test ideas in a complex global market simulation, populated by autonomous agents. Powered by the MANNS engine for unparalleled entity autonomy and simulated market forces

Wordon, World's Worst Customer | Divergent AI
I simulate tough Customer Support scenarios for Agent Training.

(Unofficial) Bullhorn Support Agent
I am not affiliated with Bullhorn, nor do I have rights to this software. For this, please visit Bullhorn.com as they are the owner. The rights holders may ask me to remove this test bot.

Code de la route française - Entrainement
Entrainez-vous pour votre examen du code de la route en posant toutes sortes de questions sur différentes situations de la route.

Sports Nerds Trivia MCQ
I host a diverse range of sports trivia: Prompt a difficulty to begin

Chófer Pork 🐷🚘
Tu guía para el examen teórico del permiso de conducir B de España. Miles de preguntas reales de examen, apoyo intelectual y emocional de parte de un choffer profesional 🏁

Language Coach
Practice speaking another language like a local without being a local (use ChatGPT Voice via mobile app!)

Test Shaman
Test Shaman: Guiding software testing with Grug wisdom and humor, balancing fun with practical advice.

Raven's Progressive Matrices Test
Provides Raven's Progressive Matrices test with explanations and calculates your IQ score.

IQ Test Assistant
An AI conducting 30-question IQ tests, assessing and providing detailed feedback.