
AIOS
AIOS: AI Agent Operating System
Stars: 4629

AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.
README:
AIOS is the AI Agent Operating System, which embeds large language model (LLM) into the operating system and facilitates the development and deployment of LLM-based AI Agents. AIOS is designed to address problems (e.g., scheduling, context switch, memory management, storage management, tool management, Agent SDK management, etc.) during the development and deployment of LLM-based agents, towards a better AIOS-Agent ecosystem for agent developers and agent users. AIOS includes the AIOS Kernel (this AIOS repository) and the AIOS SDK (the Cerebrum repository). AIOS supports both Web UI and Terminal UI.
The AIOS system is comprised of two key components: the AIOS kernel and the AIOS SDK. The AIOS kernel acts as an abstraction layer over the operating system kernel, managing various resources that agents require, such as LLM, memory, storage and tool. The AIOS SDK is designed for agent users and developers, enabling them to build and run agent applications by interacting with the AIOS kernel. AIOS kernel is the current repository and AIOS SDK can be found at here
Below shows how agents utilize AIOS SDK to interact with AIOS kernel and how AIOS kernel receives agent queries and leverage the chain of syscalls that are scheduled and dispatched to run in different modules.
For computer-use agent, the architecture extends the AIOS Kernel with significant enhancements focused on computer contextualization. While preserving essential components like LLM Core(s), Context Manager, and Memory Manager, the Tool Manager module has been fundamentally redesigned to incorporate a VM (Virtual Machine) Controller and MCP Server. This redesign creates a sandboxed environment that allows agents to safely interact with computer systems while maintaining a consistent semantic mapping between agent intentions and computer operations.
- [2025-07-08] π The foundational paper AIOS: LLM Agent Operating System has been accepted by the Conference on Language Modeling (COLM 2025). Congratulations to the team!
- [2025-07-02] π AIOS has been selected as the finalist for AgentX β LLM Agents MOOC Competition, hosted by Berkeley RDI in conjunction with the Advanced LLM Agents MOOC. Congratulations to the team!
- [2025-05-24] π Check out our paper on computer-use agent: LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS and the corresponding codebase.
- [2025-03-13] π Paper Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery has been accepted by NAACL 2025! Features has been integrated into Cerebrum.
- [2025-03-12] π₯ A major refactor of the codebase packed with powerful new features have been integrated into the main repo. Please check out the AIOS v0.2.2 release.
- [2025-03-10] π Check out our paper on agentic memory A-MEM: Agentic Memory for LLM Agents and the corresponding codebase.
- [2025-02-07] π Our paper From Commands to Prompts: LLM-based Semantic File System for AIOS has been accepted by ICLR2025! The features of this paper has been integrated into AIOS as the Terminal UI.
- [2025-01-27] π₯ Deepseek-r1 (1.5b, 7b, 8b, 14b, 32b, 70b, 671b) has already been supported in AIOS, both open-sourced versions and deepseek apis (deepseek-chat and deepseek-reasoner) are available.
- [2024-11-30] π₯ AIOS v0.2: Disentangled AIOS Kernel (this AIOS repository) and AIOS SDK (The Cerebrum repository), Remote Kernel for agent users.
- [2024-09-01] π₯ AIOS supports multiple agent creation frameworks (e.g., ReAct, Reflexion, OpenAGI, AutoGen, Open Interpreter, MetaGPT). Agents created by these frameworks can onboard AIOS. Onboarding guidelines can be found at the Doc.
- [2024-07-10] π AIOS documentation is up, which can be found at Website.
- [2024-06-20] π₯ Function calling for open-sourced LLMs (native huggingface, vLLM, ollama) is supported.
- [2024-05-20] π More agents with ChatGPT-based tool calling are added (i.e., MathAgent, RecAgent, TravelAgent, AcademicAgent and CreationAgent), their profiles and workflows can be found in OpenAGI.
- [2024-05-13] π οΈ Local models (diffusion models) as tools from HuggingFace are integrated.
- [2024-05-01] π οΈ The agent creation in AIOS is refactored, which can be found in our OpenAGI package.
- [2024-04-05] π οΈ AIOS currently supports external tool callings (google search, wolframalpha, rapid API, etc).
- [2024-04-02] π€ AIOS Discord Community is up. Welcome to join the community for discussions, brainstorming, development, or just random chats! For how to contribute to AIOS, please see CONTRIBUTE.
-
[2024-03-25]
βοΈ Our paper AIOS: LLM Agent Operating System is released! - [2023-12-06] π After several months of working, our perspective paper LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem is officially released.
Here are some key notations that are required to know before introducing the different modes of AIOS.
- AHM (Agent Hub Machine): Central server that hosts the agent marketplace/repository where users can publish, download, and share agents. Acts as the distribution center for all agent-related resources.
- AUM (Agent UI Machine): Client machine that provides user interface for interacting with agents. Can be any device from mobile phones to desktops that supports agent visualization and control.
- ADM (Agent Development Machine): Development environment where agent developers write, debug and test their agents. Requires proper development tools and libraries.
- ARM (Agent Running Machine): Execution environment where agents actually run and perform tasks. Needs adequate computational resources for agent operations.
The following parts introduce different modes of deploying AIOS. Currently, AIOS already supports Mode 1 and Mode 2, other modes with new features are still ongoing.
- Features:
- For agent users: They can download agents from agent hub from Machine B and run agents on Machine A.
- For agent developers: They can develop and test agents in Machine A and can upload agents to agent hub on Machine B.
- Features:
- Remote use of agents: Agent users / developers can use agents on Machine B, which is different from the development and running machine (Machine A).
- Benefit users who would like to use agents on resource-restricted machine (e.g., mobile device or edge device)
- Features:
- Remote development of agents: Agent developers can develop their agents on Machine B while running and testing their agents in Machine A. Benefit developers who would like to develop agents on resource-restricted machine (e.g., mobile device or edge device)
- Critical technique:
- Packaging and agent transmission on different machines for distributed agent development and testing
-
Ongoing Features:
- Each user/developer can have their personal AIOS with long-term persistent data as long as they have registered account in the AIOS ecosystem
- Their personal data can be synced to different machines with the same account
-
Critical techniques:
- User account registration and verification mechanism
- Persistent personal data storage for each user's AIOS
- Synchronization for different AIOS instances on different devices within the same account
- Data privacy mechanism
- Ongoing Features:
- Different user/developer's personal AIOS kernels can co-exist in the same physical machine through virtualization
- Critical techniques:
- Virtualization of different AIOS kernel instances in the same machine
- Scheduling and resource allocation mechanism for different virtual machines located in the same machine
Please see our ongoing documentation for more information.
- Supported versions: Python 3.10 - 3.11
Git clone AIOS kernel
git clone https://github.com/agiresearch/AIOS.git
Create venv environment
python3.x -m venv venv # Only support for Python 3.10 and 3.11
source venv/bin/activate
or create conda environment
conda create -n venv python=3.x # Only support for Python 3.10 and 3.11
conda activate venv
[!TIP] We strongly recommend using uv for faster and more reliable package installation. To install uv:
bash pip install uv
For GPU environments:
uv pip install -r requirements-cuda.txt
For CPU-only environments:
uv pip install -r requirements.txt
Alternatively, if you prefer using pip:
For GPU environments:
pip install -r requirements-cuda.txt
For CPU-only environments:
pip install -r requirements.txt
-
Clone the Cerebrum repository:
git clone https://github.com/agiresearch/Cerebrum.git
-
Install using uv (recommended):
cd Cerebrum && uv pip install -e .
Or using pip:
cd Cerebrum && pip install -e .
To use the mcp for computer-use agent, we strongly recommend you install a virtualized environment equipped with GUI. Instructions can be found in here.
Note: The machine where the AIOS kernel (AIOS) is installed must also have the AIOS SDK (Cerebrum) installed. Installing AIOS kernel will install the AIOS SDK automatically by default. If you are using the Local Kernel mode, i.e., you are running AIOS and agents on the same machine, then simply install both AIOS and Cerebrum on that machine. If you are using Remote Kernel mode, i.e., running AIOS on Machine 1 and running agents on Machine 2 and the agents remotely interact with the kernel, then you need to install both AIOS kernel and AIOS SDK on Machine 1, and install the AIOS SDK alone on Machine 2. Please follow the guidelines at Cerebrum regarding how to install the SDK.
Before launching AIOS, it is required to set up configurations. AIOS provides two ways of setting up configurations, one is to set up by directly modifying the configuration file, another is to set up interactively.
You need API keys for services like OpenAI, Anthropic, Groq and HuggingFace. The simplest way to configure them is to edit the aios/config/config.yaml
.
[!TIP] It is important to mention that, we strongly recommend using the
aios/config/config.yaml
file to set up your API keys. This method is straightforward and helps avoid potential sychronization issues with environment variables.
A simple example to set up your API keys in aios/config/config.yaml
is shown below:
api_keys:
openai: "your-openai-key"
gemini: "your-gemini-key"
groq: "your-groq-key"
anthropic: "your-anthropic-key"
huggingface:
auth_token: "your-huggingface-token-for-authorized-models"
cache_dir: "your-cache-dir-for-saving-models"
novita: "your-novita-api-key"
To obtain these API keys:
- Deepseek API: Visit https://api-docs.deepseek.com/
- OpenAI API: Visit https://platform.openai.com/api-keys
- Google Gemini API: Visit https://makersuite.google.com/app/apikey
- Groq API: Visit https://console.groq.com/keys
- HuggingFace Token: Visit https://huggingface.co/settings/tokens
- Anthropic API: Visit https://console.anthropic.com/keys
- Novita AI API: Visit https://novita.ai/api-keys
You can configure which LLM models to use in the same aios/config/config.yaml
file. Here's an example configuration:
llms:
models:
# Ollama Models
- name: "qwen2.5:7b"
backend: "ollama"
hostname: "http://localhost:11434" # Make sure to run ollama server
# vLLM Models
- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "vllm"
hostname: "http://localhost:8091/v1" # Make sure to run vllm server
Using Ollama Models:
- First, download ollama from https://ollama.com/
- Start the ollama server in a separate terminal:
ollama serve
- Pull your desired models from https://ollama.com/library:
ollama pull qwen2.5:7b # example model
[!TIP] Ollama supports both CPU-only and GPU environments. For more details about ollama usage, visit ollama documentation
Using vLLM Models:
- Install vLLM following their installation guide
- Start the vLLM server in a separate terminal:
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8091
[!NOTE] vLLM currently only supports Linux and GPU-enabled environments. If you don't have a compatible environment, please choose other backend options. To enable the tool calling feature of vllm, refer to https://docs.vllm.ai/en/latest/features/tool_calling.html
Using HuggingFace Models: You can configure HuggingFace models with specific GPU memory allocation:
- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "huggingface"
max_gpu_memory: {0: "24GB", 1: "24GB"} # GPU memory allocation
eval_device: "cuda:0" # Device for model evaluation
Alternatively, you can set up aios configurations interactively by using the following command.
-
aios env list
: Show current environment variables, or show available API keys if no variables are set -
aios env set
: Show current environment variables, or show available API keys if no variables are set -
aios refresh
: Refresh AIOS configuration. Reloads the configuration from aios/config/config.yaml. Reinitializes all components without restarting the server. The server must be running.
When no environment variables are set, the following API keys will be shown:
-
DEEPSEEK_API_KEY
: Deepseek API key for accessing Deepseek services -
OPENAI_API_KEY
: OpenAI API key for accessing OpenAI services -
GEMINI_API_KEY
: Google Gemini API key for accessing Google's Gemini services -
GROQ_API_KEY
: Groq API key for accessing Groq services -
HF_AUTH_TOKEN
: HuggingFace authentication token for accessing models -
HF_HOME
: Optional path to store HuggingFace models -
NOVITA_API_KEY
: Novita AI API key for accessing Novita AI services
After you setup your keys or environment parameters, then you can follow the instructions below to start.
Run:
bash runtime/launch_kernel.sh
Or if you need to explicity set the Python version by running python3.10
, python3.11
, python3
, etc. run the command below:
python3.x -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000 # replace the port with your own port
You also need to set up the host and port in the configuration of Cerebrum (AIOS SDK) to make sure it is consistent with the configurations of AIOS.
You can also force the kernel to run in the background with:
python3.x -m uvicorn runtime.launch:app --host 0.0.0.0 > uvicorn.log 2>&1 &
And you can run it even after the shell closes by typing nohup
before the entire command.
Command to launch the kernel in the background so it continues running even after the active shell is closed, while also logging information to the specified log file (recommended):
nohup python3 -m uvicorn runtime.launch:app --host 0.0.0.0 --port 8000 > uvicorn.log 2>&1 &
To interact with the AIOS terminal (LLM-based semantic file system), you can run the following command to start the AIOS terminal.
python scripts/run_terminal.py
Then you can start interacting with the AIOS terminal by typing natural language commands.
If you successfully start the AIOS terminal, it will be shown as below:
Detailed instructions of how to use the AIOS terminal can be found at here
[!WARNING] The rollback feature of the AIOS terminal requires the connection to the redis server. Make sure you have the redis server running if you would like to use the rollback feature.
Make sure you have installed a virtualized environment with GUI, then you can refer to Cerebrum for how to run the computer-use agent.
Provider π’ | Model Name π€ | Open Source π | Model String β¨οΈ | Backend βοΈ | Required API Key |
---|---|---|---|---|---|
Anthropic | All Models | β | model-name | anthropic | ANTHROPIC_API_KEY |
OpenAI | All Models | β | model-name | openai | OPENAI_API_KEY |
Deepseek | All Models | β | model-name | deepseek | DEEPSEEK_API_KEY |
All Models | β | model-name | gemini | GEMINI_API_KEY | |
Groq | All Models | β | model-name | groq | GROQ_API_KEY |
HuggingFace | All Models | β | model-name | huggingface | HF_HOME |
ollama | All Models | β | model-name | ollama | - |
vLLM | All Models | β | model-name | vllm | - |
Novita | All Models | β | model-name | novita | NOVITA_API_KEY |
An early experimental Rust scaffold lives in aios-rs/
providing trait definitions and minimal placeholder implementations (context, memory, storage, tool, scheduler, llm). This is NOT feature-parity yet; it's a foundation for incremental porting and performance-focused components.
cd aios-rs
cargo build
cargo test
use aios_rs::prelude::*;
fn main() -> anyhow::Result<()> {
let llm = std::sync::Arc::new(EchoLLM);
let memory = std::sync::Arc::new(std::sync::Mutex::new(InMemoryMemoryManager::new()));
let storage = std::sync::Arc::new(FsStorageManager::new("/tmp/aios_store"));
let tool = std::sync::Arc::new(NoopToolManager);
let mut scheduler = NoopScheduler::new(llm, memory, storage, tool);
scheduler.start()?;
scheduler.stop()?;
Ok(())
}
- [x] Core trait scaffolding
- [ ] Async runtime + channels
- [ ] Vector store abstraction
- [ ] Python bridge (pyo3 / IPC)
- [ ] Port FIFO / RR schedulers
- [ ] Benchmarks & feature flags
Contributions welcome via focused PRs extending this scaffold. See `aios-rs/README.md` for details.
@article{mei2025aios,
title={AIOS: LLM Agent Operating System},
author={Mei, Kai and Zhu, Xi and Xu, Wujiang and Hua, Wenyue and Jin, Mingyu and Li, Zelong and Xu, Shuyuan and Ye, Ruosong and Ge, Yingqiang and Zhang, Yongfeng}
journal={In Proceedings of the 2nd Conference on Language Modeling (COLM 2025)},
year={2025}
}
@article{mei2025litecua,
title={LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS},
author={Mei, Kai and Zhu, Xi and Gao, Hang and Lin, Shuhang and Zhang, Yongfeng},
journal={arXiv preprint arXiv:2505.18829},
year={2025}
}
@article{xu2025mem,
title={A-Mem: Agentic Memory for LLM Agents},
author={Xu, Wujiang and Liang, Zujie and Mei, Kai and Gao, Hang and Tan, Juntao and Zhang, Yongfeng},
journal={arXiv:2502.12110},
year={2025}
}
@inproceedings{rama2025cerebrum,
title={Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery},
author={Balaji Rama and Kai Mei and Yongfeng Zhang},
booktitle={2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
year={2025}
}
@inproceedings{shi2025from,
title={From Commands to Prompts: {LLM}-based Semantic File System for AIOS},
author={Zeru Shi and Kai Mei and Mingyu Jin and Yongye Su and Chaoji Zuo and Wenyue Hua and Wujiang Xu and Yujie Ren and Zirui Liu and Mengnan Du and Dong Deng and Yongfeng Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=2G021ZqUEZ}
}
@article{ge2023llm,
title={LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem},
author={Ge, Yingqiang and Ren, Yujie and Hua, Wenyue and Xu, Shuyuan and Tan, Juntao and Zhang, Yongfeng},
journal={arXiv:2312.03815},
year={2023}
}
For how to contribute, see CONTRIBUTE. If you would like to contribute to the codebase, issues or pull requests are always welcome!
We learned the design and reused code from the following projects: LiteLLM, OSWorld.
If you would like to join the community, ask questions, chat with fellows, learn about or propose new features, and participate in future developments, join our Discord Community!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AIOS
Similar Open Source Tools

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

DataFlow
DataFlow is a data preparation and training system designed to parse, generate, process, and evaluate high-quality data from noisy sources, improving the performance of large language models in specific domains. It constructs diverse operators and pipelines, validated to enhance domain-oriented LLM's performance in fields like healthcare, finance, and law. DataFlow also features an intelligent DataFlow-agent capable of dynamically assembling new pipelines by recombining existing operators on demand.

bee-agent-framework
The Bee Agent Framework is an open-source tool for building, deploying, and serving powerful agentic workflows at scale. It provides AI agents, tools for creating workflows in Javascript/Python, a code interpreter, memory optimization strategies, serialization for pausing/resuming workflows, traceability features, production-level control, and upcoming features like model-agnostic support and a chat UI. The framework offers various modules for agents, llms, memory, tools, caching, errors, adapters, logging, serialization, and more, with a roadmap including MLFlow integration, JSON support, structured outputs, chat client, base agent improvements, guardrails, and evaluation.

aigne-framework
AIGNE Framework is a functional AI application development framework designed to simplify and accelerate the process of building modern applications. It combines functional programming features, powerful artificial intelligence capabilities, and modular design principles to help developers easily create scalable solutions. With key features like modular design, TypeScript support, multiple AI model support, flexible workflow patterns, MCP protocol integration, code execution capabilities, and Blocklet ecosystem integration, AIGNE Framework offers a comprehensive solution for developers. The framework provides various workflow patterns such as Workflow Router, Workflow Sequential, Workflow Concurrency, Workflow Handoff, Workflow Reflection, Workflow Orchestration, Workflow Code Execution, and Workflow Group Chat to address different application scenarios efficiently. It also includes built-in MCP support for running MCP servers and integrating with external MCP servers, along with packages for core functionality, agent library, CLI, and various models like OpenAI, Gemini, Claude, and Nova.

raga-llm-hub
Raga LLM Hub is a comprehensive evaluation toolkit for Language and Learning Models (LLMs) with over 100 meticulously designed metrics. It allows developers and organizations to evaluate and compare LLMs effectively, establishing guardrails for LLMs and Retrieval Augmented Generation (RAG) applications. The platform assesses aspects like Relevance & Understanding, Content Quality, Hallucination, Safety & Bias, Context Relevance, Guardrails, and Vulnerability scanning, along with Metric-Based Tests for quantitative analysis. It helps teams identify and fix issues throughout the LLM lifecycle, revolutionizing reliability and trustworthiness.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

starwhale
Starwhale is an MLOps/LLMOps platform that brings efficiency and standardization to machine learning operations. It streamlines the model development lifecycle, enabling teams to optimize workflows around key areas like model building, evaluation, release, and fine-tuning. Starwhale abstracts Model, Runtime, and Dataset as first-class citizens, providing tailored capabilities for common workflow scenarios including Models Evaluation, Live Demo, and LLM Fine-tuning. It is an open-source platform designed for clarity and ease of use, empowering developers to build customized MLOps features tailored to their needs.

AgentBench
AgentBench is a benchmark designed to evaluate Large Language Models (LLMs) as autonomous agents in various environments. It includes 8 distinct environments such as Operating System, Database, Knowledge Graph, Digital Card Game, and Lateral Thinking Puzzles. The tool provides a comprehensive evaluation of LLMs' ability to operate as agents by offering Dev and Test sets for each environment. Users can quickly start using the tool by following the provided steps, configuring the agent, starting task servers, and assigning tasks. AgentBench aims to bridge the gap between LLMs' proficiency as agents and their practical usability.

inferable
Inferable is an open source platform that helps users build reliable LLM-powered agentic automations at scale. It offers a managed agent runtime, durable tool calling, zero network configuration, multiple language support, and is fully open source under the MIT license. Users can define functions, register them with Inferable, and create runs that utilize these functions to automate tasks. The platform supports Node.js/TypeScript, Go, .NET, and React, and provides SDKs, core services, and bootstrap templates for various languages.

OmAgent
OmAgent is an open-source agent framework designed to streamline the development of on-device multimodal agents. It enables agents to empower various hardware devices, integrates speed-optimized SOTA multimodal models, provides SOTA multimodal agent algorithms, and focuses on optimizing the end-to-end computing pipeline for real-time user interaction experience. Key features include easy connection to diverse devices, scalability, flexibility, and workflow orchestration. The architecture emphasizes graph-based workflow orchestration, native multimodality, and device-centricity, allowing developers to create bespoke intelligent agent programs.

agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

nous
Nous is an open-source TypeScript platform for autonomous AI agents and LLM based workflows. It aims to automate processes, support requests, review code, assist with refactorings, and more. The platform supports various integrations, multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It offers advanced features like reasoning/planning, memory and function call history, hierarchical task decomposition, and control-loop function calling options. Nous is designed to be a flexible platform for the TypeScript community to expand and support different use cases and integrations.

multi-agent-orchestrator
Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries to the most suitable agent based on context and content, supports dual language implementation in Python and TypeScript, offers flexible agent responses, context management across agents, extensible architecture for customization, universal deployment options, and pre-built agents and classifiers. It is suitable for various applications, from simple chatbots to sophisticated AI systems, accommodating diverse requirements and scaling efficiently.

glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

sophia
Sophia is an open-source TypeScript platform designed for autonomous AI agents and LLM based workflows. It aims to automate processes, review code, assist with refactorings, and support various integrations. The platform offers features like advanced autonomous agents, reasoning/planning inspired by Google's Self-Discover paper, memory and function call history, adaptive iterative planning, and more. Sophia supports multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It provides a flexible platform for the TypeScript community to expand and support various use cases and integrations.

mmore
MMORE is an open-source, end-to-end pipeline for ingesting, processing, indexing, and retrieving knowledge from various file types such as PDFs, Office docs, images, audio, video, and web pages. It standardizes content into a unified multimodal format, supports distributed CPU/GPU processing, and offers hybrid dense+sparse retrieval with an integrated RAG service through CLI and APIs.
For similar tasks

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

jupyter-ai
Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.

khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.

langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.