mahilo

mahilo: Multi-Agent Human-in-the-Loop Framework is a flexible framework for creating multi-agent systems that can each interact with humans while sharing relevant context internally.

Stars: 338

Visit

Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.

README:

Control Plane For Your AI Agents

mahilo is a multi-agent framework that allows you to create new agents or register agents from other frameworks in a team, where they can talk to each other and share information, all under human supervision.

Install

pip install mahilo

For voice features, install with the voice extras as below. You also need to install PyAudio for your system. Learn how to do it for your OS, here.

pip install mahilo[voice]

Usage

from mahilo import BaseAgent, AgentManager, ServerManager
from mahilo.integrations.langgraph.agent import LangGraphAgent

# a mahilo agent
sales_agent = BaseAgent(
    type="sales_agent",
    description=sales_agent_prompt,
    tools=sales_tools,
)

# a langgraph agent
marketing_agent = LangGraphAgent(
    langgraph_agent=graph_builder,
    name="MarketingAgent",
    description=marketing_agent_prompt,
    can_contact=[],
)

# Create Agent Manager (think of it as a team)
manager = AgentManager()
manager.register_agent(sales_agent)
manager.register_agent(marketing_agent)

# activate any agents with runtime params (here server_id is the thread_id for the langgraph agent)
marketing_agent.activate(server_id="1")

# initialize the server manager
server = ServerManager(manager)
# Start WebSocket Server
server.run()

When the code above is run, it starts a websocket server on localhost (unless some other host is specified) that clients can connect to. In this case, clients can connect to three websocket endpoints corresponding to the three agents. For example, to do that for one agent, you can run the following command:

mahilo connect --url http://localhost:8000 --agent-name marketing_agent

This would then allow you to talk to the marketing agent. If you pass the --voice flag, you would be able to talk to the agent using voice.

Ideally, you would spin up more terminals for the other agents and can then observe how the conversation would unfold across the agents.

How does the transfer of context work?

Every BaseAgent comes with a function called chat_with_an_agent that takes in a question or a message and the agent it is being sent to. This function is used by the agents whenever they feel that they want info from the other agents.

The AgentManager class manages the context and makes the last N conversations available across agents, for added visibility. More on this is in the Detailed Features section below.

For a demo of agents sharing context with each other, check out the video below, in addition to the Realtime API video above:

Overview
Features
Getting Started
Project Structure
Detailed Features
Contributing

Overview

This project provides a flexible framework for defining and creating multi-agent systems that can each interact with humans while sharing relevant context internally. It allows developers to easily set up complex agent networks for various applications, from customer service to emergency response simulations.

Agents are aware of other agents in the system and can decide to talk to one or more agents based on the current conversation context, simultaneously. The system is designed to make humans more efficient by giving them an assistant that can handle context from multiple agents and help the human stay focused on their specific problem, while surfacing all relevant information on demand.

Features

Above is an architecture diagram that shows the different components of the system in the context of a health emergency scenario. You have three humans talking to their respective agents, which all share information internally.

TL;DR:

Realtime API support for talking to your agents via voice!
Easy-to-extend BaseAgent class to create your own agents
WebSocket-based real-time communication with multiple users simultaneously
Flexible communication patterns: peer-to-peer and hierarchical (or centralized)
Control hierarchy in communication via can_contact lists: limit what agents can talk to what other agents.
Session management for persistent conversations
CLI client for easy testing and interaction
Multiple users can connect to the same agent. In emergency situation scenarios, this means multiple police officers can connect to the same dispatcher and receive updates from the dispatcher.
Agents are only activated when they are needed.
Multi-provider LLM support through LiteLLM, allowing you to use models from OpenAI, Anthropic, Azure, and more with a simple environment variable.
Message validation policies to prevent harmful content, repetitive loops, and ensure high-quality agent communications.

Above is an image that shows a three-agent system where a medical advisor is talking about a public health emergency and the agent decides to call the logistics coordinator and the public communication director agents simultaneously to coordinate the response to the emergency.

More information on the features can be found in the Detailed Features section below.

Getting Started

🤘 Quickstart

Install the package:
```
pip install mahilo
```
Note that if you want to use the voice feature, you need to have pyaudio installed. Learn how to do it for your OS, here.
```
pip install mahilo[voice]
```
Export your OpenAI API key:
```
export OPENAI_API_KEY=<your_api_key>
```
Go to one of the example directories and run the server:
```
cd examples/your_example
python run_server.py
```
This starts the agent server locally at http://localhost:8000.
Connect to the server using the CLI: For each of the agents in the system, you can spin up a client to connect to the server.
```
mahilo connect --agent-name your_agent_name
```
Run this command in separate terminals for each of the agents and you can then start talking with them.

If you want to use the voice feature, you can run the same command with the --voice flag:
```
mahilo connect --agent-name your_agent_name --voice
```

[!TIP] You dont have to specify the URL if you want to connect to the default server.

🧑‍🍳 Building your own agents

Define your agents looking at examples in the templates directory.
Create a run script for your specific use case. See the examples in the examples directory.

Run your server:

python examples/your_example/run_server.py

Connect to your server using the CLI:
```
mahilo connect --agent-name your_agent_name
```
You can connect to the same server using multiple clients to test the system with multiple users. This is useful for testing the system in a real-world scenario where multiple agents need to coordinate their actions.

If you want to use the voice feature, you can run the same command with the --voice flag:
```
mahilo connect --agent-name your_agent_name --voice
```

[!TIP] You dont have to specify the URL if you want to connect to the default server.

Project Structure

agent_manager.py: Defines the AgentManager and BaseAgent classes
server.py: Implements the ServerManager for handling WebSocket connections
session.py: Manages conversation sessions for each agent
client.py: Provides a CLI client for interacting with the agents
templates/: Contains agent templates for different use cases
examples/: Includes example implementations of multi-agent systems

Detailed Features

Index

Human-in-the-Loop
Easy-to-use agent definition system
WebSocket-based real-time communication
Flexible communication patterns: peer-to-peer and hierarchical (or centralized)
Flexible agent manager for handling multiple agent types
Session management for persistent conversations
Multi-provider LLM support
Policy Validation for Inter-Agent Communication

Human-in-the-Loop

The human-in-the-loop is implemented by having the human client connect to each agent in the system.
The system is designed to make humans more efficient by giving an assistant that can handle context from multiple agents and help the human stay focused on the conversation.
The agents are aware of what's going on in all the conversations and can help the human get information on demand.
The human can override the agent's decision to choose an agent for any situation.

Easy-to-use agent definition system

The BaseAgent class is designed to be subclassed for defining new agents. It comes with:

tools that allow it to talk to other agents.
a message queue that stores the history of messages that the agent has received.
a prompt that tells the agent about the system
a method to process a message which takes care of the context and the conversation history.
a session object that stores the conversation history in a file for persistence.

WebSocket-based real-time communication

The server uses FastAPI's WebSocket support to handle real-time communication between the agents and the client. This allows for natural, two-way conversations that can be used for a variety of applications, from customer service to emergency response simulations.
The server keeps track of all connected agents and the messages they receive from other agents and coordinates the conversation between them.

Flexible communication patterns: peer-to-peer and hierarchical (or centralized)

In a peer-to-peer communication pattern, agents are connected directly to each other and can call each other directly.
This is useful when a complex problem needs to be tackled by a combination of one or more agents. The example directory contains a health emergency scenario where a medical advisor, a logistics coordinator and a public communication director each independently decide on a course of action.
In a hierarchical (or centralized) communication pattern, agents are connected to a single dispatcher agent. This is useful for a group of agents who need to coordinate their actions with a single leader. The example directory contains a dispatch scenario where the dispatcher coordinates the actions of a plumber and mold removal specialist agent.

Flexible agent manager for handling multiple agent types

The AgentManager class is designed to manage multiple agent types, allowing for easy addition and removal of agents.
Agent manager makes sure that an agent can only talk to agents that are on its can_contact list.

Session management for persistent conversations

The Session class is designed to manage conversation sessions for each agent. It stores the conversation history in a file for persistence.
The messages from the queue or the shared context are not stored to avoid duplication and redundancy.

Multi-provider LLM support

Mahilo uses LiteLLM to support a wide range of LLM providers and models.
You can easily switch between different models and providers by setting the MAHILO_LLM_MODEL environment variable in the format provider/model.
For detailed documentation, see LLM Integration.

Policy Validation for Inter-Agent Communication

Enforces rules on messages exchanged between agents through a validation system
Prevents common issues like repetitive conversation loops, off-topic discussions, and harmful content
Supports two types of policies:
- Heuristic Policies: Rule-based, programmatic checks (e.g., message length, repetition detection)
- Natural Language Policies: Defined in plain English and evaluated using LLMs
Comes with built-in policies:
- Anti-Loop Policy: Prevents repetitive patterns using similarity detection
- Message Length Policy: Ensures messages are neither too short nor too long
- Relevance Policy: Ensures messages stay on topic
- Toxicity Policy: Prevents harmful or inappropriate content
Easily extendable with custom policies for specific use cases
For detailed documentation, see Policy Validation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

For Tasks:

Click tags to check more tools for each tasks

manage customer conversations coordinate emergency responses facilitate peer-to-peer communication handle context from multiple agents support voice interactions

For Jobs:

software engineer ai researcher customer support specialist emergency response coordinator conversation designer

Alternative AI tools for mahilo

Similar Open Source Tools

mahilo

github

: 338

AppAgent

AppAgent is a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.

github

: 4.7k

airbroke

Airbroke is an open-source error catcher tool designed for modern web applications. It provides a PostgreSQL-based backend with an Airbrake-compatible HTTP collector endpoint and a React-based frontend for error management. The tool focuses on simplicity, maintaining a small database footprint even under heavy data ingestion. Users can ask AI about issues, replay HTTP exceptions, and save/manage bookmarks for important occurrences. Airbroke supports multiple OAuth providers for secure user authentication and offers occurrence charts for better insights into error occurrences. The tool can be deployed in various ways, including building from source, using Docker images, deploying on Vercel, Render.com, Kubernetes with Helm, or Docker Compose. It requires Node.js, PostgreSQL, and specific system resources for deployment.

github

: 179

vector-vein

VectorVein is a no-code AI workflow software inspired by LangChain and langflow, aiming to combine the powerful capabilities of large language models and enable users to achieve intelligent and automated daily workflows through simple drag-and-drop actions. Users can create powerful workflows without the need for programming, automating all tasks with ease. The software allows users to define inputs, outputs, and processing methods to create customized workflow processes for various tasks such as translation, mind mapping, summarizing web articles, and automatic categorization of customer reviews.

github

: 887

trinityX

TrinityX is an open-source HPC, AI, and cloud platform designed to provide all services required in a modern system, with full customization options. It includes default services like Luna node provisioner, OpenLDAP, SLURM or OpenPBS, Prometheus, Grafana, OpenOndemand, and more. TrinityX also sets up NFS-shared directories, OpenHPC applications, environment modules, HA, and more. Users can install TrinityX on Enterprise Linux, configure network interfaces, set up passwordless authentication, and customize the installation using Ansible playbooks. The platform supports HA, OpenHPC integration, and provides detailed documentation for users to contribute to the project.

github

: 80

lumigator

Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.

github

: 194

chronon

Chronon is a platform that simplifies and improves ML workflows by providing a central place to define features, ensuring point-in-time correctness for backfills, simplifying orchestration for batch and streaming pipelines, offering easy endpoints for feature fetching, and guaranteeing and measuring consistency. It offers benefits over other approaches by enabling the use of a broad set of data for training, handling large aggregations and other computationally intensive transformations, and abstracting away the infrastructure complexity of data plumbing.

github

: 766

amazon-transcribe-live-call-analytics

The Amazon Transcribe Live Call Analytics (LCA) with Agent Assist Sample Solution is designed to help contact centers assess and optimize caller experiences in real time. It leverages Amazon machine learning services like Amazon Transcribe, Amazon Comprehend, and Amazon SageMaker to transcribe and extract insights from contact center audio. The solution provides real-time supervisor and agent assist features, integrates with existing contact centers, and offers a scalable, cost-effective approach to improve customer interactions. The end-to-end architecture includes features like live call transcription, call summarization, AI-powered agent assistance, and real-time analytics. The solution is event-driven, ensuring low latency and seamless processing flow from ingested speech to live webpage updates.

github

: 85

ezkl

EZKL is a library and command-line tool for doing inference for deep learning models and other computational graphs in a zk-snark (ZKML). It enables the following workflow: 1. Define a computational graph, for instance a neural network (but really any arbitrary set of operations), as you would normally in pytorch or tensorflow. 2. Export the final graph of operations as an .onnx file and some sample inputs to a .json file. 3. Point ezkl to the .onnx and .json files to generate a ZK-SNARK circuit with which you can prove statements such as: > "I ran this publicly available neural network on some private data and it produced this output" > "I ran my private neural network on some public data and it produced this output" > "I correctly ran this publicly available neural network on some public data and it produced this output" In the backend we use the collaboratively-developed Halo2 as a proof system. The generated proofs can then be verified with much less computational resources, including on-chain (with the Ethereum Virtual Machine), in a browser, or on a device.

github

: 1.0k

ask-astro

Ask Astro is an open-source reference implementation of Andreessen Horowitz's LLM Application Architecture built by Astronomer. It provides an end-to-end example of a Q&A LLM application used to answer questions about Apache Airflow® and Astronomer. Ask Astro includes Airflow DAGs for data ingestion, an API for business logic, a Slack bot, a public UI, and DAGs for processing user feedback. The tool is divided into data retrieval & embedding, prompt orchestration, and feedback loops.

github

: 181

eureka-ml-insights

The Eureka ML Insights Framework is a repository containing code designed to help researchers and practitioners run reproducible evaluations of generative models efficiently. Users can define custom pipelines for data processing, inference, and evaluation, as well as utilize pre-defined evaluation pipelines for key benchmarks. The framework provides a structured approach to conducting experiments and analyzing model performance across various tasks and modalities.

github

: 106

ChainForge

ChainForge is a visual programming environment for battle-testing prompts to LLMs. It is geared towards early-stage, quick-and-dirty exploration of prompts, chat responses, and response quality that goes beyond ad-hoc chatting with individual LLMs. With ChainForge, you can: * Query multiple LLMs at once to test prompt ideas and variations quickly and effectively. * Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case. * Setup evaluation metrics (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings. * Hold multiple conversations at once across template parameters and chat models. Template not just prompts, but follow-up chat messages, and inspect and evaluate outputs at each turn of a chat conversation. ChainForge comes with a number of example evaluation flows to give you a sense of what's possible, including 188 example flows generated from benchmarks in OpenAI evals. This is an open beta of Chainforge. We support model providers OpenAI, HuggingFace, Anthropic, Google PaLM2, Azure OpenAI endpoints, and Dalai-hosted models Alpaca and Llama. You can change the exact model and individual model settings. Visualization nodes support numeric and boolean evaluation metrics. ChainForge is built on ReactFlow and Flask.

github

: 2.6k

examor

Examor is a website application that allows you to take exams based on your knowledge notes. It helps you to remember what you have learned and written. The application generates a set of questions from the documents you upload, and you can answer them to test your knowledge. Examor also uses GPT to score and validate your answers, and provides you with feedback. The application is still in its early stages of development, but it has the potential to be a valuable tool for learners.

github

: 1.0k

Robyn

Robyn is an experimental, semi-automated and open-sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. It uses various machine learning techniques to define media channel efficiency and effectivity, explore adstock rates and saturation curves. Built for granular datasets with many independent variables, especially suitable for digital and direct response advertisers with rich data sources. Aiming to democratize MMM, make it accessible for advertisers of all sizes, and contribute to the measurement landscape.

github

: 1.2k

atomic_agents

Atomic Agents is a modular and extensible framework designed for creating powerful applications. It follows the principles of Atomic Design, emphasizing small and single-purpose components. Leveraging Pydantic for data validation and serialization, the framework offers a set of tools and agents that can be combined to build AI applications. It depends on the Instructor package and supports various APIs like OpenAI, Cohere, Anthropic, and Gemini. Atomic Agents is suitable for developers looking to create AI agents with a focus on modularity and flexibility.

github

: 236

aici

The Artificial Intelligence Controller Interface (AICI) lets you build Controllers that constrain and direct output of a Large Language Model (LLM) in real time. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations. Controllers incorporate custom logic during the token-by-token decoding and maintain state during an LLM request. This allows diverse Controller strategies, from programmatic or query-based decoding to multi-agent conversations to execute efficiently in tight integration with the LLM itself.

github

: 1.8k

For similar tasks

mahilo

github

: 338

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

mahilo

README:

Control Plane For Your AI Agents

Install

Usage

How does the transfer of context work?

Table of Contents

Overview

Features

TL;DR:

Getting Started

🤘 Quickstart

🧑‍🍳 Building your own agents

Project Structure

Detailed Features

Index

Human-in-the-Loop

Easy-to-use agent definition system

WebSocket-based real-time communication

Flexible communication patterns: peer-to-peer and hierarchical (or centralized)

Flexible agent manager for handling multiple agent types

Session management for persistent conversations

Multi-provider LLM support

Policy Validation for Inter-Agent Communication

Contributing

For Tasks:

For Jobs:

Alternative AI tools for mahilo

Similar Open Source Tools

mahilo

AppAgent

airbroke

vector-vein

trinityX

lumigator

chronon

amazon-transcribe-live-call-analytics

ezkl

ask-astro

eureka-ml-insights

ChainForge

examor

Robyn

atomic_agents

aici

For similar tasks

mahilo

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick