
AIOS
AIOS: AI Agent Operating System
Stars: 3927

AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.
README:
AIOS is the AI Agent Operating System, which embeds large language model (LLM) into the operating system and facilitates the development and deployment of LLM-based AI Agents. AIOS is designed to address problems (e.g., scheduling, context switch, memory management, storage management, tool management, Agent SDK management, etc.) during the development and deployment of LLM-based agents, towards a better AIOS-Agent ecosystem for agent developers and agent users. AIOS includes the AIOS Kernel (this AIOS repository) and the AIOS SDK (the Cerebrum repository). AIOS supports both Web UI and Terminal UI.
The AIOS system is comprised of two key components: the AIOS kernel and the AIOS SDK. The AIOS kernel acts as an abstraction layer over the operating system kernel, managing various resources that agents require, such as LLM, memory, storage and tool. The AIOS SDK is designed for agent users and developers, enabling them to build and run agent applications by interacting with the AIOS kernel. AIOS kernel is the current repository and AIOS SDK can be found at here
Below shows how agents utilize AIOS SDK to interact with AIOS kernel and how AIOS kernel receives agent queries and leverage the chain of syscalls that are scheduled and dispatched to run in different modules.
- [2025-03-13] π Paper "Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery" has been accepted by NAACL 2025! Features has been integrated into Cerebrum.
- [2025-03-12] π₯ A major refactor of the codebase packed with powerful new features have been integrated into the main repo. Please check out the AIOS v0.2.2 release.
- [2025-03-10] π Check out our paper on agentic memory A-MEM: Agentic Memory for LLM Agents and the corresponding codebase.
- [2025-02-07] π Our paper From Commands to Prompts: LLM-based Semantic File System for AIOS has been accepted by ICLR2025! The features of this paper has been integrated into AIOS as the Terminal UI.
- [2025-01-27] π₯ Deepseek-r1 (1.5b, 7b, 8b, 14b, 32b, 70b, 671b) has already been supported in AIOS, both open-sourced versions and deepseek apis (deepseek-chat and deepseek-reasoner) are available.
- [2024-11-30] π₯ AIOS v0.2: Disentangled AIOS Kernel (this AIOS repository) and AIOS SDK (The Cerebrum repository), Remote Kernel for agent users.
- [2024-09-01] π₯ AIOS supports multiple agent creation frameworks (e.g., ReAct, Reflexion, OpenAGI, AutoGen, Open Interpreter, MetaGPT). Agents created by these frameworks can onboard AIOS. Onboarding guidelines can be found at the Doc.
- [2024-07-10] π AIOS documentation is up, which can be found at Website.
- [2024-06-20] π₯ Function calling for open-sourced LLMs (native huggingface, vLLM, ollama) is supported.
- [2024-05-20] π More agents with ChatGPT-based tool calling are added (i.e., MathAgent, RecAgent, TravelAgent, AcademicAgent and CreationAgent), their profiles and workflows can be found in OpenAGI.
- [2024-05-13] π οΈ Local models (diffusion models) as tools from HuggingFace are integrated.
- [2024-05-01] π οΈ The agent creation in AIOS is refactored, which can be found in our OpenAGI package.
- [2024-04-05] π οΈ AIOS currently supports external tool callings (google search, wolframalpha, rapid API, etc).
- [2024-04-02] π€ AIOS Discord Community is up. Welcome to join the community for discussions, brainstorming, development, or just random chats! For how to contribute to AIOS, please see CONTRIBUTE.
-
[2024-03-25]
βοΈ Our paper AIOS: LLM Agent Operating System is released! - [2023-12-06] π After several months of working, our perspective paper LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem is officially released.
Here are some key notations that are required to know before introducing the different modes of AIOS.
- AHM (Agent Hub Machine): Central server that hosts the agent marketplace/repository where users can publish, download, and share agents. Acts as the distribution center for all agent-related resources.
- AUM (Agent UI Machine): Client machine that provides user interface for interacting with agents. Can be any device from mobile phones to desktops that supports agent visualization and control.
- ADM (Agent Development Machine): Development environment where agent developers write, debug and test their agents. Requires proper development tools and libraries.
- ARM (Agent Running Machine): Execution environment where agents actually run and perform tasks. Needs adequate computational resources for agent operations.
The following parts introduce different modes of deploying AIOS. Currently, AIOS already supports Mode 1 and Mode 2, other modes with new features are still ongoing.
- Features:
- For agent users: They can download agents from agent hub from Machine B and run agents on Machine A.
- For agent developers: They can develop and test agents in Machine A and can upload agents to agent hub on Machine B.
- Features:
- Remote use of agents: Agent users / developers can use agents on Machine B, which is different from the development and running machine (Machine A).
- Benefit users who would like to use agents on resource-restricted machine (e.g., mobile device or edge device)
- Features:
- Remote development of agents: Agent developers can develop their agents on Machine B while running and testing their agents in Machine A. Benefit developers who would like to develop agents on resource-restricted machine (e.g., mobile device or edge device)
- Critical technique:
- Packaging and agent transmission on different machines for distributed agent development and testing
-
Ongoing Features:
- Each user/developer can have their personal AIOS with long-term persistent data as long as they have registered account in the AIOS ecosystem
- Their personal data can be synced to different machines with the same account
-
Critical techniques:
- User account registration and verification mechanism
- Persistent personal data storage for each user's AIOS
- Synchronization for different AIOS instances on different devices within the same account
- Data privacy mechanism
- Ongoing Features:
- Different user/developer's personal AIOS kernels can co-exist in the same physical machine through virtualization
- Critical techniques:
- Virtualization of different AIOS kernel instances in the same machine
- Scheduling and resource allocation mechanism for different virtual machines located in the same machine
Please see our ongoing documentation for more information.
- Supported versions: Python 3.10 - 3.11
Git clone AIOS kernel
git clone https://github.com/agiresearch/AIOS.git
Create venv environment
python3.x -m venv venv # Only support for Python 3.10 and 3.11
source venv/bin/activate
or create conda environment
conda create -n venv python=3.x # Only support for Python 3.10 and 3.11
conda activate venv
[!TIP] We strongly recommend using uv for faster and more reliable package installation. To install uv:
pip install uv
For GPU environments:
uv pip install -r requirements-cuda.txt
For CPU-only environments:
uv pip install -r requirements.txt
Alternatively, if you prefer using pip:
For GPU environments:
pip install -r requirements-cuda.txt
For CPU-only environments:
pip install -r requirements.txt
-
Clone the Cerebrum repository:
git clone https://github.com/agiresearch/Cerebrum.git
-
Install using uv (recommended):
cd Cerebrum && uv pip install -e .
Or using pip:
cd Cerebrum && pip install -e .
Note: The machine where the AIOS kernel (AIOS) is installed must also have the AIOS SDK (Cerebrum) installed. Installing AIOS kernel will install the AIOS SDK automatically by default. If you are using the Local Kernel mode, i.e., you are running AIOS and agents on the same machine, then simply install both AIOS and Cerebrum on that machine. If you are using Remote Kernel mode, i.e., running AIOS on Machine 1 and running agents on Machine 2 and the agents remotely interact with the kernel, then you need to install both AIOS kernel and AIOS SDK on Machine 1, and install the AIOS SDK alone on Machine 2. Please follow the guidelines at Cerebrum regarding how to install the SDK.
Before launching AIOS, it is required to set up configurations. AIOS provides two ways of setting up configurations, one is to set up by directly modifying the configuration file, another is to set up interactively.
You need API keys for services like OpenAI, Anthropic, Groq and HuggingFace. The simplest way to configure them is to edit the aios/config/config.yaml
.
[!TIP] It is important to mention that, we strongly recommend using the
aios/config/config.yaml
file to set up your API keys. This method is straightforward and helps avoid potential sychronization issues with environment variables.
A simple example to set up your API keys in aios/config/config.yaml
is shown below:
api_keys:
openai: "your-openai-key"
gemini: "your-gemini-key"
groq: "your-groq-key"
anthropic: "your-anthropic-key"
huggingface:
auth_token: "your-huggingface-token"
home: "optional-path" # Optional: HuggingFace models path
To obtain these API keys:
- Deepseek API: Visit https://api-docs.deepseek.com/
- OpenAI API: Visit https://platform.openai.com/api-keys
- Google Gemini API: Visit https://makersuite.google.com/app/apikey
- Groq API: Visit https://console.groq.com/keys
- HuggingFace Token: Visit https://huggingface.co/settings/tokens
- Anthropic API: Visit https://console.anthropic.com/keys
You can configure which LLM models to use in the same aios/config/config.yaml
file. Here's an example configuration:
llms:
models:
# Ollama Models
- name: "qwen2.5:7b"
backend: "ollama"
hostname: "http://localhost:11434" # Make sure to run ollama server
# vLLM Models
- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "vllm"
hostname: "http://localhost:8091/v1" # Make sure to run vllm server
Using Ollama Models:
- First, download ollama from https://ollama.com/
- Start the ollama server in a separate terminal:
ollama serve
- Pull your desired models from https://ollama.com/library:
ollama pull qwen2.5:7b # example model
[!TIP] Ollama supports both CPU-only and GPU environments. For more details about ollama usage, visit ollama documentation
Using vLLM Models:
- Install vLLM following their installation guide
- Start the vLLM server in a separate terminal:
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8091
[!NOTE] vLLM currently only supports Linux and GPU-enabled environments. If you don't have a compatible environment, please choose other backend options. To enable the tool calling feature of vllm, refer to https://docs.vllm.ai/en/latest/features/tool_calling.html
Using HuggingFace Models: You can configure HuggingFace models with specific GPU memory allocation:
- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "huggingface"
max_gpu_memory: {0: "24GB", 1: "24GB"} # GPU memory allocation
eval_device: "cuda:0" # Device for model evaluation
Alternatively, you can set up aios configurations interactively by using the following command.
-
aios env list
: Show current environment variables, or show available API keys if no variables are set -
aios env set
: Show current environment variables, or show available API keys if no variables are set -
aios refresh
: Refresh AIOS configuration. Reloads the configuration from aios/config/config.yaml. Reinitializes all components without restarting the server. The server must be running.
When no environment variables are set, the following API keys will be shown:
-
DEEPSEEK_API_KEY
: Deepseek API key for accessing Deepseek services -
OPENAI_API_KEY
: OpenAI API key for accessing OpenAI services -
GEMINI_API_KEY
: Google Gemini API key for accessing Google's Gemini services -
GROQ_API_KEY
: Groq API key for accessing Groq services -
HF_AUTH_TOKEN
: HuggingFace authentication token for accessing models -
HF_HOME
: Optional path to store HuggingFace models
After you setup your keys or environment parameters, then you can follow the instructions below to start.
Run:
bash runtime/launch_kernel.sh
Or if you need to explicity set the Python version by running python3.10
, python3.11
, python3
, etc. run the command below:
python3.x -m uvicorn runtime.kernel:app --host 0.0.0.0 --port 8000 # replace the port with your own port
You also need to set up the host and port in the configuration of Cerebrum (AIOS SDK) to make sure it is consistent with the configurations of AIOS.
You can also force the kernel to run in the background with:
python3.x -m uvicorn runtime.kernel:app --host 0.0.0.0 & 2>&1 > MYLOGFILE.txt
And you can run it even after the shell closes by typing nohup
before the entire command.
To interact with the AIOS terminal (LLM-based semantic file system), you can run the following command to start the AIOS terminal.
python scripts/run_terminal.py
Then you can start interacting with the AIOS terminal by typing natural language commands.
If you successfully start the AIOS terminal, it will be shown as below:
Detailed instructions of how to use the AIOS terminal can be found at here
[!WARNING] The rollback feature of the AIOS terminal requires the connection to the redis server. Make sure you have the redis server running if you would like to use the rollback feature.
Provider π’ | Model Name π€ | Open Source π | Model String β¨οΈ | Backend βοΈ | Required API Key |
---|---|---|---|---|---|
Deepseek | Deepseek-reasoner | β | deepseek-reasoner | deepseek | DEEPSEEK_API_KEY |
Deepseek | Deepseek-chat | β | deepseek-chat | deepseek | DEEPSEEK_API_KEY |
Anthropic | Claude 3.5 Sonnet | β | claude-3-5-sonnet-20241022 | anthropic | ANTHROPIC_API_KEY |
Anthropic | Claude 3.5 Haiku | β | claude-3-5-haiku-20241022 | anthropic | ANTHROPIC_API_KEY |
Anthropic | Claude 3 Opus | β | claude-3-opus-20240229 | anthropic | ANTHROPIC_API_KEY |
Anthropic | Claude 3 Sonnet | β | claude-3-sonnet-20240229 | anthropic | ANTHROPIC_API_KEY |
Anthropic | Claude 3 Haiku | β | claude-3-haiku-20240307 | anthropic | ANTHROPIC_API_KEY |
OpenAI | GPT-4 | β | gpt-4 | openai | OPENAI_API_KEY |
OpenAI | GPT-4 Turbo | β | gpt-4-turbo | openai | OPENAI_API_KEY |
OpenAI | GPT-4o | β | gpt-4o | openai | OPENAI_API_KEY |
OpenAI | GPT-4o mini | β | gpt-4o-mini | openai | OPENAI_API_KEY |
OpenAI | GPT-3.5 Turbo | β | gpt-3.5-turbo | openai | OPENAI_API_KEY |
Gemini 1.5 Flash | β | gemini-1.5-flash | GEMINI_API_KEY | ||
Gemini 1.5 Flash-8B | β | gemini-1.5-flash-8b | GEMINI_API_KEY | ||
Gemini 1.5 Pro | β | gemini-1.5-pro | GEMINI_API_KEY | ||
Gemini 1.0 Pro | β | gemini-1.0-pro | GEMINI_API_KEY | ||
Groq | Llama 3.2 90B Vision | β | llama-3.2-90b-vision-preview | groq | GROQ_API_KEY |
Groq | Llama 3.2 11B Vision | β | llama-3.2-11b-vision-preview | groq | GROQ_API_KEY |
Groq | Llama 3.1 70B | β | llama-3.1-70b-versatile | groq | GROQ_API_KEY |
Groq | Llama Guard 3 8B | β | llama-guard-3-8b | groq | GROQ_API_KEY |
Groq | Llama 3 70B | β | llama3-70b-8192 | groq | GROQ_API_KEY |
Groq | Llama 3 8B | β | llama3-8b-8192 | groq | GROQ_API_KEY |
Groq | Mixtral 8x7B | β | mixtral-8x7b-32768 | groq | GROQ_API_KEY |
Groq | Gemma 7B | β | gemma-7b-it | groq | GROQ_API_KEY |
Groq | Gemma 2B | β | gemma2-9b-it | groq | GROQ_API_KEY |
Groq | Llama3 Groq 70B | β | llama3-groq-70b-8192-tool-use-preview | groq | GROQ_API_KEY |
Groq | Llama3 Groq 8B | β | llama3-groq-8b-8192-tool-use-preview | groq | GROQ_API_KEY |
ollama | All Models | β | model-name | ollama | - |
vLLM | All Models | β | model-name | vllm | - |
HuggingFace | All Models | β | model-name | huggingface | HF_HOME |
@article{xu2025mem,
title={A-mem: Agentic memory for llm agents},
author={Xu, Wujiang and Liang, Zujie and Mei, Kai and Gao, Hang and Tan, Juntao and Zhang, Yongfeng},
journal={arXiv preprint arXiv:2502.12110},
year={2025}
}
@inproceedings{shi2025from,
title={From Commands to Prompts: {LLM}-based Semantic File System for AIOS},
author={Zeru Shi and Kai Mei and Mingyu Jin and Yongye Su and Chaoji Zuo and Wenyue Hua and Wujiang Xu and Yujie Ren and Zirui Liu and Mengnan Du and Dong Deng and Yongfeng Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=2G021ZqUEZ}
}
@article{mei2024aios,
title={AIOS: LLM Agent Operating System},
author={Mei, Kai and Zhu, Xi and Xu, Wujiang and Hua, Wenyue and Jin, Mingyu and Li, Zelong and Xu, Shuyuan and Ye, Ruosong and Ge, Yingqiang and Zhang, Yongfeng}
journal={arXiv:2403.16971},
year={2024}
}
@article{ge2023llm,
title={LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem},
author={Ge, Yingqiang and Ren, Yujie and Hua, Wenyue and Xu, Shuyuan and Tan, Juntao and Zhang, Yongfeng},
journal={arXiv:2312.03815},
year={2023}
}
For how to contribute, see CONTRIBUTE. If you would like to contribute to the codebase, issues or pull requests are always welcome!
If you would like to join the community, ask questions, chat with fellows, learn about or propose new features, and participate in future developments, join our Discord Community!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AIOS
Similar Open Source Tools

AIOS
AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

openrl
OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks such as single-agent, multi-agent, offline RL, self-play, and natural language. Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable platform for the reinforcement learning research community. It supports a universal interface for all tasks/environments, single-agent and multi-agent tasks, offline RL training with expert dataset, self-play training, reinforcement learning training for natural language tasks, DeepSpeed, Arena for evaluation, importing models and datasets from Hugging Face, user-defined environments, models, and datasets, gymnasium environments, callbacks, visualization tools, unit testing, and code coverage testing. It also supports various algorithms like PPO, DQN, SAC, and environments like Gymnasium, MuJoCo, Atari, and more.

OmAgent
OmAgent is an open-source agent framework designed to streamline the development of on-device multimodal agents. It enables agents to empower various hardware devices, integrates speed-optimized SOTA multimodal models, provides SOTA multimodal agent algorithms, and focuses on optimizing the end-to-end computing pipeline for real-time user interaction experience. Key features include easy connection to diverse devices, scalability, flexibility, and workflow orchestration. The architecture emphasizes graph-based workflow orchestration, native multimodality, and device-centricity, allowing developers to create bespoke intelligent agent programs.

inference
Xorbits Inference (Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.

superduperdb
SuperDuperDB is a Python framework for integrating AI models, APIs, and vector search engines directly with your existing databases, including hosting of your own models, streaming inference and scalable model training/fine-tuning. Build, deploy and manage any AI application without the need for complex pipelines, infrastructure as well as specialized vector databases, and moving our data there, by integrating AI at your data's source: - Generative AI, LLMs, RAG, vector search - Standard machine learning use-cases (classification, segmentation, regression, forecasting recommendation etc.) - Custom AI use-cases involving specialized models - Even the most complex applications/workflows in which different models work together SuperDuperDB is **not** a database. Think `db = superduper(db)`: SuperDuperDB transforms your databases into an intelligent platform that allows you to leverage the full AI and Python ecosystem. A single development and deployment environment for all your AI applications in one place, fully scalable and easy to manage.

vision-parse
Vision Parse is a tool that leverages Vision Language Models to parse PDF documents into beautifully formatted markdown content. It offers smart content extraction, content formatting, multi-LLM support, PDF document support, and local model hosting using Ollama. Users can easily convert PDFs to markdown with high precision and preserve document hierarchy and styling. The tool supports multiple Vision LLM providers like OpenAI, LLama, and Gemini for accuracy and speed, making document processing efficient and effortless.

auto-news
Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.

openlit
OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie β literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

SoM-LLaVA
SoM-LLaVA is a new data source and learning paradigm for Multimodal LLMs, empowering open-source Multimodal LLMs with Set-of-Mark prompting and improved visual reasoning ability. The repository provides a new dataset that is complementary to existing training sources, enhancing multimodal LLMs with Set-of-Mark prompting and improved general capacity. By adding 30k SoM data to the visual instruction tuning stage of LLaVA, the tool achieves 1% to 6% relative improvements on all benchmarks. Users can train SoM-LLaVA via command line and utilize the implementation to annotate COCO images with SoM. Additionally, the tool can be loaded in Huggingface for further usage.

keras-llm-robot
The Keras-llm-robot Web UI project is an open-source tool designed for offline deployment and testing of various open-source models from the Hugging Face website. It allows users to combine multiple models through configuration to achieve functionalities like multimodal, RAG, Agent, and more. The project consists of three main interfaces: chat interface for language models, configuration interface for loading models, and tools & agent interface for auxiliary models. Users can interact with the language model through text, voice, and image inputs, and the tool supports features like model loading, quantization, fine-tuning, role-playing, code interpretation, speech recognition, image recognition, network search engine, and function calling.

evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.

glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

FuzzyAI
The FuzzyAI Fuzzer is a powerful tool for automated LLM fuzzing, designed to help developers and security researchers identify jailbreaks and mitigate potential security vulnerabilities in their LLM APIs. It supports various fuzzing techniques, provides input generation capabilities, can be easily integrated into existing workflows, and offers an extensible architecture for customization and extension. The tool includes attacks like ArtPrompt, Taxonomy-based paraphrasing, Many-shot jailbreaking, Genetic algorithm, Hallucinations, DAN (Do Anything Now), WordGame, Crescendo, ActorAttack, Back To The Past, Please, Thought Experiment, and Default. It supports models from providers like Anthropic, OpenAI, Gemini, Azure, Bedrock, AI21, and Ollama, with the ability to add support for newer models. The tool also supports various cloud APIs and datasets for testing and experimentation.

Consistency_LLM
Consistency Large Language Models (CLLMs) is a family of efficient parallel decoders that reduce inference latency by efficiently decoding multiple tokens in parallel. The models are trained to perform efficient Jacobi decoding, mapping any randomly initialized token sequence to the same result as auto-regressive decoding in as few steps as possible. CLLMs have shown significant improvements in generation speed on various tasks, achieving up to 3.4 times faster generation. The tool provides a seamless integration with other techniques for efficient Large Language Model (LLM) inference, without the need for draft models or architectural modifications.

CodeGeeX4
CodeGeeX4-ALL-9B is an open-source multilingual code generation model based on GLM-4-9B, offering enhanced code generation capabilities. It supports functions like code completion, code interpreter, web search, function call, and repository-level code Q&A. The model has competitive performance on benchmarks like BigCodeBench and NaturalCodeBench, outperforming larger models in terms of speed and performance.

tts-generation-webui
TTS Generation WebUI is a comprehensive tool that provides a user-friendly interface for text-to-speech and voice cloning tasks. It integrates various AI models such as Bark, MusicGen, AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and MAGNeT. The tool offers one-click installers, Google Colab demo, videos for guidance, and extra voices for Bark. Users can generate audio outputs, manage models, caches, and system space for AI projects. The project is open-source and emphasizes ethical and responsible use of AI technology.
For similar tasks

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

jupyter-ai
Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.

khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.

langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.