mahilo
mahilo: Multi-Agent Human-in-the-Loop Framework is a flexible framework for creating multi-agent systems that can each interact with humans while sharing relevant context internally.
Stars: 338
Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.
README:
mahilo is a multi-agent framework that allows you to create new agents or register agents from other frameworks in a team, where they can talk to each other and share information, all under human supervision.
pip install mahilo
For voice features, install with the voice extras as below. You also need to install PyAudio for your system. Learn how to do it for your OS, here.
pip install mahilo[voice]
from mahilo import BaseAgent, AgentManager, ServerManager
from mahilo.integrations.langgraph.agent import LangGraphAgent
# a mahilo agent
sales_agent = BaseAgent(
type="sales_agent",
description=sales_agent_prompt,
tools=sales_tools,
)
# a langgraph agent
marketing_agent = LangGraphAgent(
langgraph_agent=graph_builder,
name="MarketingAgent",
description=marketing_agent_prompt,
can_contact=[],
)
# Create Agent Manager (think of it as a team)
manager = AgentManager()
manager.register_agent(sales_agent)
manager.register_agent(marketing_agent)
# activate any agents with runtime params (here server_id is the thread_id for the langgraph agent)
marketing_agent.activate(server_id="1")
# initialize the server manager
server = ServerManager(manager)
# Start WebSocket Server
server.run()When the code above is run, it starts a websocket server on localhost (unless some other host is specified) that clients can connect to. In this case, clients can connect to three websocket endpoints corresponding to the three agents. For example, to do that for one agent, you can run the following command:
mahilo connect --url http://localhost:8000 --agent-name marketing_agentThis would then allow you to talk to the marketing agent. If you pass the --voice flag, you would be able to talk to the agent using voice.
Ideally, you would spin up more terminals for the other agents and can then observe how the conversation would unfold across the agents.
Every BaseAgent comes with a function called chat_with_an_agent that takes in a question or a message and the agent it is being sent to. This function is used by the agents whenever they feel that they want info from the other agents.
The AgentManager class manages the context and makes the last N conversations available across agents, for added visibility. More on this is in the Detailed Features section below.
For a demo of agents sharing context with each other, check out the video below, in addition to the Realtime API video above:
This project provides a flexible framework for defining and creating multi-agent systems that can each interact with humans while sharing relevant context internally. It allows developers to easily set up complex agent networks for various applications, from customer service to emergency response simulations.
Agents are aware of other agents in the system and can decide to talk to one or more agents based on the current conversation context, simultaneously. The system is designed to make humans more efficient by giving them an assistant that can handle context from multiple agents and help the human stay focused on their specific problem, while surfacing all relevant information on demand.
Above is an architecture diagram that shows the different components of the system in the context of a health emergency scenario. You have three humans talking to their respective agents, which all share information internally.
- Realtime API support for talking to your agents via voice!
- Easy-to-extend BaseAgent class to create your own agents
- WebSocket-based real-time communication with multiple users simultaneously
- Flexible communication patterns: peer-to-peer and hierarchical (or centralized)
- Control hierarchy in communication via
can_contactlists: limit what agents can talk to what other agents. - Session management for persistent conversations
- CLI client for easy testing and interaction
- Multiple users can connect to the same agent. In emergency situation scenarios, this means multiple police officers can connect to the same dispatcher and receive updates from the dispatcher.
- Agents are only activated when they are needed.
- Multi-provider LLM support through LiteLLM, allowing you to use models from OpenAI, Anthropic, Azure, and more with a simple environment variable.
- Message validation policies to prevent harmful content, repetitive loops, and ensure high-quality agent communications.
Above is an image that shows a three-agent system where a medical advisor is talking about a public health emergency and the agent decides to call the logistics coordinator and the public communication director agents simultaneously to coordinate the response to the emergency.
More information on the features can be found in the Detailed Features section below.
-
Install the package:
pip install mahiloNote that if you want to use the voice feature, you need to have
pyaudioinstalled. Learn how to do it for your OS, here.pip install mahilo[voice] -
Export your OpenAI API key:
export OPENAI_API_KEY=<your_api_key> -
Go to one of the example directories and run the server:
cd examples/your_example python run_server.pyThis starts the agent server locally at
http://localhost:8000. -
Connect to the server using the CLI: For each of the agents in the system, you can spin up a client to connect to the server.
mahilo connect --agent-name your_agent_nameRun this command in separate terminals for each of the agents and you can then start talking with them.
If you want to use the voice feature, you can run the same command with the
--voiceflag:mahilo connect --agent-name your_agent_name --voice
[!TIP] You dont have to specify the URL if you want to connect to the default server.
-
Define your agents looking at examples in the
templatesdirectory. -
Create a run script for your specific use case. See the examples in the
examplesdirectory. -
Run your server:
python examples/your_example/run_server.py -
Connect to your server using the CLI:
mahilo connect --agent-name your_agent_nameYou can connect to the same server using multiple clients to test the system with multiple users. This is useful for testing the system in a real-world scenario where multiple agents need to coordinate their actions.
If you want to use the voice feature, you can run the same command with the
--voiceflag:mahilo connect --agent-name your_agent_name --voice
[!TIP] You dont have to specify the URL if you want to connect to the default server.
-
agent_manager.py: Defines theAgentManagerandBaseAgentclasses -
server.py: Implements theServerManagerfor handling WebSocket connections -
session.py: Manages conversation sessions for each agent -
client.py: Provides a CLI client for interacting with the agents -
templates/: Contains agent templates for different use cases -
examples/: Includes example implementations of multi-agent systems
- Human-in-the-Loop
- Easy-to-use agent definition system
- WebSocket-based real-time communication
- Flexible communication patterns: peer-to-peer and hierarchical (or centralized)
- Flexible agent manager for handling multiple agent types
- Session management for persistent conversations
- Multi-provider LLM support
- Policy Validation for Inter-Agent Communication
- The human-in-the-loop is implemented by having the human client connect to each agent in the system.
- The system is designed to make humans more efficient by giving an assistant that can handle context from multiple agents and help the human stay focused on the conversation.
- The agents are aware of what's going on in all the conversations and can help the human get information on demand.
- The human can override the agent's decision to choose an agent for any situation.
The BaseAgent class is designed to be subclassed for defining new agents. It comes with:
- tools that allow it to talk to other agents.
- a message queue that stores the history of messages that the agent has received.
- a prompt that tells the agent about the system
- a method to process a message which takes care of the context and the conversation history.
- a session object that stores the conversation history in a file for persistence.
- The server uses FastAPI's WebSocket support to handle real-time communication between the agents and the client. This allows for natural, two-way conversations that can be used for a variety of applications, from customer service to emergency response simulations.
- The server keeps track of all connected agents and the messages they receive from other agents and coordinates the conversation between them.
- In a peer-to-peer communication pattern, agents are connected directly to each other and can call each other directly.
- This is useful when a complex problem needs to be tackled by a combination of one or more agents. The example directory contains a health emergency scenario where a medical advisor, a logistics coordinator and a public communication director each independently decide on a course of action.
- In a hierarchical (or centralized) communication pattern, agents are connected to a single dispatcher agent. This is useful for a group of agents who need to coordinate their actions with a single leader. The example directory contains a dispatch scenario where the dispatcher coordinates the actions of a plumber and mold removal specialist agent.
- The
AgentManagerclass is designed to manage multiple agent types, allowing for easy addition and removal of agents. - Agent manager makes sure that an agent can only talk to agents that are on its
can_contactlist.
- The
Sessionclass is designed to manage conversation sessions for each agent. It stores the conversation history in a file for persistence. - The messages from the queue or the shared context are not stored to avoid duplication and redundancy.
- Mahilo uses LiteLLM to support a wide range of LLM providers and models.
- You can easily switch between different models and providers by setting the
MAHILO_LLM_MODELenvironment variable in the formatprovider/model. - For detailed documentation, see LLM Integration.
- Enforces rules on messages exchanged between agents through a validation system
- Prevents common issues like repetitive conversation loops, off-topic discussions, and harmful content
- Supports two types of policies:
- Heuristic Policies: Rule-based, programmatic checks (e.g., message length, repetition detection)
- Natural Language Policies: Defined in plain English and evaluated using LLMs
- Comes with built-in policies:
- Anti-Loop Policy: Prevents repetitive patterns using similarity detection
- Message Length Policy: Ensures messages are neither too short nor too long
- Relevance Policy: Ensures messages stay on topic
- Toxicity Policy: Prevents harmful or inappropriate content
- Easily extendable with custom policies for specific use cases
- For detailed documentation, see Policy Validation.
Contributions are welcome! Please feel free to submit a Pull Request.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for mahilo
Similar Open Source Tools
mahilo
Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.
council
Council is an open-source platform designed for the rapid development and deployment of customized generative AI applications using teams of agents. It extends the LLM tool ecosystem by providing advanced control flow and scalable oversight for AI agents. Users can create sophisticated agents with predictable behavior by leveraging Council's powerful approach to control flow using Controllers, Filters, Evaluators, and Budgets. The framework allows for automated routing between agents, comparing, evaluating, and selecting the best results for a task. Council aims to facilitate packaging and deploying agents at scale on multiple platforms while enabling enterprise-grade monitoring and quality control.
AIlice
AIlice is a fully autonomous, general-purpose AI agent that aims to create a standalone artificial intelligence assistant, similar to JARVIS, based on the open-source LLM. AIlice achieves this goal by building a "text computer" that uses a Large Language Model (LLM) as its core processor. Currently, AIlice demonstrates proficiency in a range of tasks, including thematic research, coding, system management, literature reviews, and complex hybrid tasks that go beyond these basic capabilities. AIlice has reached near-perfect performance in everyday tasks using GPT-4 and is making strides towards practical application with the latest open-source models. We will ultimately achieve self-evolution of AI agents. That is, AI agents will autonomously build their own feature expansions and new types of agents, unleashing LLM's knowledge and reasoning capabilities into the real world seamlessly.
AppAgent
AppAgent is a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.
HybridAGI
HybridAGI is the first Programmable LLM-based Autonomous Agent that lets you program its behavior using a **graph-based prompt programming** approach. This state-of-the-art feature allows the AGI to efficiently use any tool while controlling the long-term behavior of the agent. Become the _first Prompt Programmers in history_ ; be a part of the AI revolution one node at a time! **Disclaimer: We are currently in the process of upgrading the codebase to integrate DSPy**
SalesGPT
SalesGPT is an open-source AI agent designed for sales, utilizing context-awareness and LLMs to work across various communication channels like voice, email, and texting. It aims to enhance sales conversations by understanding the stage of the conversation and providing tools like product knowledge base to reduce errors. The agent can autonomously generate payment links, handle objections, and close sales. It also offers features like automated email communication, meeting scheduling, and integration with various LLMs for customization. SalesGPT is optimized for low latency in voice channels and ensures human supervision where necessary. The tool provides enterprise-grade security and supports LangSmith tracing for monitoring and evaluation of intelligent agents built on LLM frameworks.
tribe
Tribe AI is a low code tool designed to rapidly build and coordinate multi-agent teams. It leverages the langgraph framework to customize and coordinate teams of agents, allowing tasks to be split among agents with different strengths for faster and better problem-solving. The tool supports persistent conversations, observability, tool calling, human-in-the-loop functionality, easy deployment with Docker, and multi-tenancy for managing multiple users and teams.
AutoGroq
AutoGroq is a revolutionary tool that dynamically generates tailored teams of AI agents based on project requirements, eliminating manual configuration. It enables users to effortlessly tackle questions, problems, and projects by creating expert agents, workflows, and skillsets with ease and efficiency. With features like natural conversation flow, code snippet extraction, and support for multiple language models, AutoGroq offers a seamless and intuitive AI assistant experience for developers and users.
crewAI
crewAI is a cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. It provides a flexible and structured approach to AI collaboration, enabling users to define agents with specific roles, goals, and tools, and assign them tasks within a customizable process. crewAI supports integration with various LLMs, including OpenAI, and offers features such as autonomous task delegation, flexible task management, and output parsing. It is open-source and welcomes contributions, with a focus on improving the library based on usage data collected through anonymous telemetry.
agency
Agency is a python library that provides an Actor model framework for creating agent-integrated systems. It offers an easy-to-use API for connecting agents with traditional software systems, enabling flexible and scalable architectures. Agency aims to empower developers in creating custom agent-based applications by providing a foundation for experimentation and development. Key features include an intuitive API, performance and scalability through multiprocessing and AMQP support, observability and control with action and lifecycle callbacks, access policies, and detailed logging. The library also includes a demo application with multiple agent examples, OpenAI agent examples, HuggingFace transformers agent example, operating system access, Gradio UI, and Docker configuration for reference and development.
GhostOS
GhostOS is an AI Agent framework designed to replace JSON Schema with a Turing-complete code interaction interface (Moss Protocol). It aims to create intelligent entities capable of continuous learning and growth through code generation and project management. The framework supports various capabilities such as turning Python files into web agents, real-time voice conversation, body movements control, and emotion expression. GhostOS is still in early experimental development and focuses on out-of-the-box capabilities for AI agents.
TapeAgents
TapeAgents is a framework that leverages a structured, replayable log of the agent session to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thoughts, actions, control flow steps, and append them to the tape. Key features include building agents as low-level state machines or high-level multi-agent team configurations, debugging agents with TapeAgent studio or TapeBrowser apps, serving agents with response streaming, and optimizing agent configurations using successful tapes. The Tape-centric design of TapeAgents provides ultimate flexibility in project development, allowing access to tapes for making prompts, generating next steps, and controlling agent behavior.
LaVague
LaVague is an open-source Large Action Model framework that uses advanced AI techniques to compile natural language instructions into browser automation code. It leverages Selenium or Playwright for browser actions. Users can interact with LaVague through an interactive Gradio interface to automate web interactions. The tool requires an OpenAI API key for default examples and offers a Playwright integration guide. Contributors can help by working on outlined tasks, submitting PRs, and engaging with the community on Discord. The project roadmap is available to track progress, but users should exercise caution when executing LLM-generated code using 'exec'.
aiid
The Artificial Intelligence Incident Database (AIID) is a collection of incidents involving the development and use of artificial intelligence (AI). The database is designed to help researchers, policymakers, and the public understand the potential risks and benefits of AI, and to inform the development of policies and practices to mitigate the risks and promote the benefits of AI. The AIID is a collaborative project involving researchers from the University of California, Berkeley, the University of Washington, and the University of Toronto.
LLMonFHIR
LLMonFHIR is an iOS application that utilizes large language models (LLMs) to interpret and provide context around patient data in the Fast Healthcare Interoperability Resources (FHIR) format. It connects to the OpenAI GPT API to analyze FHIR resources, supports multiple languages, and allows users to interact with their health data stored in the Apple Health app. The app aims to simplify complex health records, provide insights, and facilitate deeper understanding through a conversational interface. However, it is an experimental app for informational purposes only and should not be used as a substitute for professional medical advice. Users are advised to verify information provided by AI models and consult healthcare professionals for personalized advice.
project_alice
Alice is an agentic workflow framework that integrates task execution and intelligent chat capabilities. It provides a flexible environment for creating, managing, and deploying AI agents for various purposes, leveraging a microservices architecture with MongoDB for data persistence. The framework consists of components like APIs, agents, tasks, and chats that interact to produce outputs through files, messages, task results, and URL references. Users can create, test, and deploy agentic solutions in a human-language framework, making it easy to engage with by both users and agents. The tool offers an open-source option, user management, flexible model deployment, and programmatic access to tasks and chats.
For similar tasks
mahilo
Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
