agentscope
Start building LLM-empowered multi-agent applications in an easier way.
Stars: 4817
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
README:
English | 中文
Start building LLM-empowered multi-agent applications in an easier way.
-
If you find our work helpful, please kindly cite our paper.
-
Visit our workstation to build multi-agent applications with dragging-and-dropping.
- Welcome to join our community on
Discord | DingTalk |
---|---|
-
[2024-09-03] AgentScope supports Web Browser Control now! Refer to our example for more details.
- [2024-07-18] AgentScope supports streaming mode now! Refer to our tutorial and example conversation in stream mode for more details.
-
[2024-07-15] AgentScope has implemented the Mixture-of-Agents algorithm. Refer to our MoA example for more details.
-
[2024-06-14] A new prompt tuning module is available in AgentScope to help developers generate and optimize the agents' system prompts! Refer to our tutorial for more details!
-
[2024-06-11] The RAG functionality is available for agents in AgentScope now! A quick introduction to RAG in AgentScope can help you equip your agent with external knowledge!
-
[2024-06-09] We release AgentScope v0.0.5 now! In this new version, AgentScope Workstation (the online version is running on agentscope.io) is open-sourced with the refactored AgentScope Studio!
Full News
-
[2024-05-24] We are pleased to announce that features related to the AgentScope Workstation will soon be open-sourced! The online website services are temporarily offline. The online website service will be upgraded and back online shortly. Stay tuned...
-
[2024-05-15] A new Parser Module for formatted response is added in AgentScope! Refer to our tutorial for more details. The
DictDialogAgent
and werewolf game example are updated simultaneously.
https://github.com/qbc2016/AgentScope/assets/22984042/22d45aee-3470-4923-850f-348a5b0faaa7
-
[2024-05-14] Dear AgentScope users, we are conducting a survey on AgentScope Workstation & Copilot user experience. We currently need your valuable feedback to help us improve the experience of AgentScope's Drag & Drop multi-agent application development and Copilot. Your feedback is valuable and the survey will take about 3~5 minutes. Please click URL to participate in questionnaire surveys. Thank you very much for your support and contribution!
-
[2024-05-14] AgentScope supports gpt-4o as well as other OpenAI vision models now! Try gpt-4o with its model configuration and new example Conversation with gpt-4o!
-
[2024-04-30] We release AgentScope v0.0.4 now!
-
[2024-04-27] AgentScope Workstation is now online! You are welcome to try building your multi-agent application simply with our drag-and-drop platform and ask our copilot questions about AgentScope!
-
[2024-04-19] AgentScope supports Llama3 now! We provide scripts and example model configuration for quick set-up. Feel free to try llama3 in our examples!
-
[2024-04-06] We release AgentScope v0.0.3 now!
-
[2024-04-06] New examples Gomoku, Conversation with ReAct Agent, Conversation with RAG Agent and Distributed Parallel Optimization are available now!
-
[2024-03-19] We release AgentScope v0.0.2 now! In this new version, AgentScope supports ollama(A local CPU inference engine), DashScope and Google Gemini APIs.
-
[2024-03-19] New examples "Autonomous Conversation with Mentions" and "Basic Conversation with LangChain library" are available now!
-
[2024-03-19] The Chinese tutorial of AgentScope is online now!
-
[2024-02-27] We release AgentScope v0.0.1 now, which is also available in PyPI!
-
[2024-02-14] We release our paper "AgentScope: A Flexible yet Robust Multi-Agent Platform" in arXiv now!
AgentScope is an innovative multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities:
-
🤝 Easy-to-Use: Designed for developers, with fruitful components, comprehensive documentation, and broad compatibility. Besides, AgentScope Workstation provides a drag-and-drop programming platform and a copilot for beginners of AgentScope!
-
✅ High Robustness: Supporting customized fault-tolerance controls and retry mechanisms to enhance application stability.
-
🚀 Actor-Based Distribution: Building distributed multi-agent applications in a centralized programming manner for streamlined development.
Supported Model Libraries
AgentScope provides a list of ModelWrapper
to support both local model
services and third-party model APIs.
API | Task | Model Wrapper | Configuration | Some Supported Models |
---|---|---|---|---|
OpenAI API | Chat | OpenAIChatWrapper |
guidance template |
gpt-4o, gpt-4, gpt-3.5-turbo, ... |
Embedding | OpenAIEmbeddingWrapper |
guidance template |
text-embedding-ada-002, ... | |
DALL·E | OpenAIDALLEWrapper |
guidance template |
dall-e-2, dall-e-3 | |
DashScope API | Chat | DashScopeChatWrapper |
guidance template |
qwen-plus, qwen-max, ... |
Image Synthesis | DashScopeImageSynthesisWrapper |
guidance template |
wanx-v1 | |
Text Embedding | DashScopeTextEmbeddingWrapper |
guidance template |
text-embedding-v1, text-embedding-v2, ... | |
Multimodal | DashScopeMultiModalWrapper |
guidance template |
qwen-vl-max, qwen-vl-chat-v1, qwen-audio-chat | |
Gemini API | Chat | GeminiChatWrapper |
guidance template |
gemini-pro, ... |
Embedding | GeminiEmbeddingWrapper |
guidance template |
models/embedding-001, ... | |
ZhipuAI API | Chat | ZhipuAIChatWrapper |
guidance template |
glm-4, ... |
Embedding | ZhipuAIEmbeddingWrapper |
guidance template |
embedding-2, ... | |
ollama | Chat | OllamaChatWrapper |
guidance template |
llama3, llama2, Mistral, ... |
Embedding | OllamaEmbeddingWrapper |
guidance template |
llama2, Mistral, ... | |
Generation | OllamaGenerationWrapper |
guidance template |
llama2, Mistral, ... | |
LiteLLM API | Chat | LiteLLMChatWrapper |
guidance template |
models supported by litellm... |
Yi API | Chat | YiChatWrapper |
guidance template |
yi-large, yi-medium, ... |
Post Request based API | - | PostAPIModelWrapper |
guidance template |
- |
Supported Local Model Deployment
AgentScope enables developers to rapidly deploy local model services using the following libraries.
Supported Services
- Web Search
- Data Query
- Retrieval
- Code Execution
- File Operation
- Text Processing
- Multi Modality
- Wikipedia Search and Retrieval
- TripAdvisor Search
- Web Browser Control
Example Applications
-
Model
-
Conversation
- Basic Conversation
- Autonomous Conversation with Mentions
- Self-Organizing Conversation
- Basic Conversation with LangChain library
- Conversation with ReAct Agent
- Conversation in Natural Language to Query SQL
- Conversation with RAG Agent
- Conversation with gpt-4o
- Conversation with Software Engineering Agent
- Conversation with Customized Tools
- Mixture of Agents Algorithm
- Conversation in Stream Mode
- Conversation with CodeAct Agent
- Conversation with Router Agent
-
Game
-
Distribution
More models, services and examples are coming soon!
AgentScope requires Python 3.9 or higher.
Note: This project is currently in active development, it's recommended to install AgentScope from source.
- Install AgentScope in editable mode:
# Pull the source code from GitHub
git clone https://github.com/modelscope/agentscope.git
# Install the package in editable mode
cd agentscope
pip install -e .
- Install AgentScope from pip:
pip install agentscope
To support different deployment scenarios, AgentScope provides several optional dependencies. Full list of optional dependencies refers to tutorial Taking distribution mode as an example, you can install its dependencies as follows:
# From source
pip install -e .[distribute]
# From pypi
pip install agentscope[distribute]
# From source
pip install -e .\[distribute\]
# From pypi
pip install agentscope\[distribute\]
In AgentScope, the model deployment and invocation are decoupled by
ModelWrapper
.
To use these model wrappers, you need to prepare a model config file as follows.
model_config = {
# The identifies of your config and used model wrapper
"config_name": "{your_config_name}", # The name to identify the config
"model_type": "{model_type}", # The type to identify the model wrapper
# Detailed parameters into initialize the model wrapper
# ...
}
Taking OpenAI Chat API as an example, the model configuration is as follows:
openai_model_config = {
"config_name": "my_openai_config", # The name to identify the config
"model_type": "openai_chat", # The type to identify the model wrapper
# Detailed parameters into initialize the model wrapper
"model_name": "gpt-4", # The used model in openai API, e.g. gpt-4, gpt-3.5-turbo, etc.
"api_key": "xxx", # The API key for OpenAI API. If not set, env
# variable OPENAI_API_KEY will be used.
"organization": "xxx", # The organization for OpenAI API. If not set, env
# variable OPENAI_ORGANIZATION will be used.
}
More details about how to set up local model services and prepare model configurations is in our tutorial.
Create built-in user and assistant agents as follows.
from agentscope.agents import DialogAgent, UserAgent
import agentscope
# Load model configs
agentscope.init(model_configs="./model_configs.json")
# Create a dialog agent and a user agent
dialog_agent = DialogAgent(name="assistant",
model_config_name="my_openai_config")
user_agent = UserAgent()
In AgentScope, message is the bridge among agents, which is a
dict that contains two necessary fields name
and content
and an
optional field url
to local files (image, video or audio) or website.
from agentscope.message import Msg
x = Msg(name="Alice", content="Hi!")
x = Msg("Bob", "What about this picture I took?", url="/path/to/picture.jpg")
Start a conversation between two agents (e.g. dialog_agent and user_agent) with the following code:
x = None
while True:
x = dialog_agent(x)
x = user_agent(x)
if x.content == "exit": # user input "exit" to exit the conversation_basic
break
AgentScope provides an easy-to-use runtime user interface capable of displaying multimodal output on the front end, including text, images, audio and video.
Refer to our tutorial for more details.
- About AgentScope
- Installation
- Quick Start
- Model
- Prompt Engineering
- Agent
- Memory
- Response Parser
- Tool
- Pipeline and MsgHub
- Distribution
- AgentScope Studio
- Logging
- Monitor
- Example: Werewolf Game
AgentScope is released under Apache License 2.0.
Contributions are always welcomed!
We provide a developer version with additional pre-commit hooks to perform checks compared to the official version:
# For windows
pip install -e .[dev]
# For mac
pip install -e .\[dev\]
# Install pre-commit hooks
pre-commit install
Please refer to our Contribution Guide for more details.
If you find our work helpful for your research or application, please cite our papers.
-
AgentScope: A Flexible yet Robust Multi-Agent Platform
@article{agentscope, author = {Dawei Gao and Zitao Li and Xuchen Pan and Weirui Kuang and Zhijian Ma and Bingchen Qian and Fei Wei and Wenhao Zhang and Yuexiang Xie and Daoyuan Chen and Liuyi Yao and Hongyi Peng and Ze Yu Zhang and Lin Zhu and Chen Cheng and Hongzhu Shi and Yaliang Li and Bolin Ding and Jingren Zhou} title = {AgentScope: A Flexible yet Robust Multi-Agent Platform}, journal = {CoRR}, volume = {abs/2402.14034}, year = {2024}, }
-
On the Design and Analysis of LLM-Based Algorithms
@article{llm_based_algorithms, author = {Yanxi Chen and Yaliang Li and Bolin Ding and Jingren Zhou}, title = {On the Design and Analysis of LLM-Based Algorithms}, journal = {CoRR}, volume = {abs/2407.14788}, year = {2024}, }
-
Very Large-Scale Multi-Agent Simulation in AgentScope
@article{agentscope_simulation, author = {Xuchen Pan and Dawei Gao and Yuexiang Xie and Zhewei Wei and Yaliang Li and Bolin Ding and Ji{-}Rong Wen and Jingren Zhou}, title = {Very Large-Scale Multi-Agent Simulation in AgentScope}, journal = {CoRR}, volume = {abs/2407.17789}, year = {2024}, }
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for agentscope
Similar Open Source Tools
agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
TempCompass
TempCompass is a benchmark designed to evaluate the temporal perception ability of Video LLMs. It encompasses a diverse set of temporal aspects and task formats to comprehensively assess the capability of Video LLMs in understanding videos. The benchmark includes conflicting videos to prevent models from relying on single-frame bias and language priors. Users can clone the repository, install required packages, prepare data, run inference using examples like Video-LLaVA and Gemini, and evaluate the performance of their models across different tasks such as Multi-Choice QA, Yes/No QA, Caption Matching, and Caption Generation.
aichat
Aichat is an AI-powered CLI chat and copilot tool that seamlessly integrates with over 10 leading AI platforms, providing a powerful combination of chat-based interaction, context-aware conversations, and AI-assisted shell capabilities, all within a customizable and user-friendly environment.
MooER
MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.
cb-tumblebug
CB-Tumblebug (CB-TB) is a system for managing multi-cloud infrastructure consisting of resources from multiple cloud service providers. It provides an overview, features, and architecture. The tool supports various cloud providers and resource types, with ongoing development and localization efforts. Users can deploy a multi-cloud infra with GPUs, enjoy multiple LLMs in parallel, and utilize LLM-related scripts. The tool requires Linux, Docker, Docker Compose, and Golang for building the source. Users can run CB-TB with Docker Compose or from the Makefile, set up prerequisites, contribute to the project, and view a list of contributors. The tool is licensed under an open-source license.
LynxHub
LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.
duolingo-clone
Lingo is an interactive platform for language learning that provides a modern UI/UX experience. It offers features like courses, quests, and a shop for users to engage with. The tech stack includes React JS, Next JS, Typescript, Tailwind CSS, Vercel, and Postgresql. Users can contribute to the project by submitting changes via pull requests. The platform utilizes resources from CodeWithAntonio, Kenney Assets, Freesound, Elevenlabs AI, and Flagpack. Key dependencies include @clerk/nextjs, @neondatabase/serverless, @radix-ui/react-avatar, and more. Users can follow the project creator on GitHub and Twitter, as well as subscribe to their YouTube channel for updates. To learn more about Next.js, users can refer to the Next.js documentation and interactive tutorial.
Q-Bench
Q-Bench is a benchmark for general-purpose foundation models on low-level vision, focusing on multi-modality LLMs performance. It includes three realms for low-level vision: perception, description, and assessment. The benchmark datasets LLVisionQA and LLDescribe are collected for perception and description tasks, with open submission-based evaluation. An abstract evaluation code is provided for assessment using public datasets. The tool can be used with the datasets API for single images and image pairs, allowing for automatic download and usage. Various tasks and evaluations are available for testing MLLMs on low-level vision tasks.
llama-assistant
Llama Assistant is an AI-powered assistant that helps with daily tasks, such as voice recognition, natural language processing, summarizing text, rephrasing sentences, answering questions, and more. It runs offline on your local machine, ensuring privacy by not sending data to external servers. The project is a work in progress with regular feature additions.
Liger-Kernel
Liger Kernel is a collection of Triton kernels designed for LLM training, increasing training throughput by 20% and reducing memory usage by 60%. It includes Hugging Face Compatible modules like RMSNorm, RoPE, SwiGLU, CrossEntropy, and FusedLinearCrossEntropy. The tool works with Flash Attention, PyTorch FSDP, and Microsoft DeepSpeed, aiming to enhance model efficiency and performance for researchers, ML practitioners, and curious novices.
cortex.cpp
Cortex is a C++ AI engine with a Docker-like command-line interface and client libraries. It supports running AI models using ONNX, TensorRT-LLM, and llama.cpp engines. Cortex can function as a standalone server or be integrated as a library. The tool provides support for various engines and models, allowing users to easily deploy and interact with AI models. It offers a range of CLI commands for managing models, embeddings, and engines, as well as a REST API for interacting with models. Cortex is designed to simplify the deployment and usage of AI models in C++ applications.
dora
Dataflow-oriented robotic application (dora-rs) is a framework that makes creation of robotic applications fast and simple. Building a robotic application can be summed up as bringing together hardwares, algorithms, and AI models, and make them communicate with each others. At dora-rs, we try to: make integration of hardware and software easy by supporting Python, C, C++, and also ROS2. make communication low latency by using zero-copy Arrow messages. dora-rs is still experimental and you might experience bugs, but we're working very hard to make it stable as possible.
auto-news
Auto-News is an automatic news aggregator tool that utilizes Large Language Models (LLM) to pull information from various sources such as Tweets, RSS feeds, YouTube videos, web articles, Reddit, and journal notes. The tool aims to help users efficiently read and filter content based on personal interests, providing a unified reading experience and organizing information effectively. It features feed aggregation with summarization, transcript generation for videos and articles, noise reduction, task organization, and deep dive topic exploration. The tool supports multiple LLM backends, offers weekly top-k aggregations, and can be deployed on Linux/MacOS using docker-compose or Kubernetes.
GPTSwarm
GPTSwarm is a graph-based framework for LLM-based agents that enables the creation of LLM-based agents from graphs and facilitates the customized and automatic self-organization of agent swarms with self-improvement capabilities. The library includes components for domain-specific operations, graph-related functions, LLM backend selection, memory management, and optimization algorithms to enhance agent performance and swarm efficiency. Users can quickly run predefined swarms or utilize tools like the file analyzer. GPTSwarm supports local LM inference via LM Studio, allowing users to run with a local LLM model. The framework has been accepted by ICML2024 and offers advanced features for experimentation and customization.
PromptClip
PromptClip is a tool that allows developers to create video clips using LLM prompts. Users can upload videos from various sources, prompt the video in natural language, use different LLM models, instantly watch the generated clips, finetune the clips, and add music or image overlays. The tool provides a seamless way to extract specific moments from videos based on user queries, making video editing and content creation more efficient and intuitive.
For similar tasks
h2ogpt
h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.
serverless-chat-langchainjs
This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.
react-native-vercel-ai
Run Vercel AI package on React Native, Expo, Web and Universal apps. Currently React Native fetch API does not support streaming which is used as a default on Vercel AI. This package enables you to use AI library on React Native but the best usage is when used on Expo universal native apps. On mobile you get back responses without streaming with the same API of `useChat` and `useCompletion` and on web it will fallback to `ai/react`
LLamaSharp
LLamaSharp is a cross-platform library to run 🦙LLaMA/LLaVA model (and others) on your local device. Based on llama.cpp, inference with LLamaSharp is efficient on both CPU and GPU. With the higher-level APIs and RAG support, it's convenient to deploy LLM (Large Language Model) in your application with LLamaSharp.
gpt4all
GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs and any GPU. Note that your CPU needs to support AVX or AVX2 instructions. Learn more in the documentation. A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models.
ChatGPT-Telegram-Bot
ChatGPT Telegram Bot is a Telegram bot that provides a smooth AI experience. It supports both Azure OpenAI and native OpenAI, and offers real-time (streaming) response to AI, with a faster and smoother experience. The bot also has 15 preset bot identities that can be quickly switched, and supports custom bot identities to meet personalized needs. Additionally, it supports clearing the contents of the chat with a single click, and restarting the conversation at any time. The bot also supports native Telegram bot button support, making it easy and intuitive to implement required functions. User level division is also supported, with different levels enjoying different single session token numbers, context numbers, and session frequencies. The bot supports English and Chinese on UI, and is containerized for easy deployment.
twinny
Twinny is a free and open-source AI code completion plugin for Visual Studio Code and compatible editors. It integrates with various tools and frameworks, including Ollama, llama.cpp, oobabooga/text-generation-webui, LM Studio, LiteLLM, and Open WebUI. Twinny offers features such as fill-in-the-middle code completion, chat with AI about your code, customizable API endpoints, and support for single or multiline fill-in-middle completions. It is easy to install via the Visual Studio Code extensions marketplace and provides a range of customization options. Twinny supports both online and offline operation and conforms to the OpenAI API standard.
agnai
Agnaistic is an AI roleplay chat tool that allows users to interact with personalized characters using their favorite AI services. It supports multiple AI services, persona schema formats, and features such as group conversations, user authentication, and memory/lore books. Agnaistic can be self-hosted or run using Docker, and it provides a range of customization options through its settings.json file. The tool is designed to be user-friendly and accessible, making it suitable for both casual users and developers.
For similar jobs
h2ogpt
h2oGPT is an Apache V2 open-source project that allows users to query and summarize documents or chat with local private GPT LLMs. It features a private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, Youtube, Audio, Code, Text, MarkDown, etc.), a persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.), and efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach). h2oGPT also offers parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model, HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses, a variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.), GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models. Additionally, h2oGPT provides Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.), a UI or CLI with streaming of all models, the ability to upload and view documents through the UI (control multiple collaborative or personal collections), Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision, Image Generation Stable Diffusion (sdxl-turbo, sdxl) and PlaygroundAI (playv2), Voice STT using Whisper with streaming audio conversion, Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion, Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion, AI Assistant Voice Control Mode for hands-free control of h2oGPT chat, Bake-off UI mode against many models at the same time, Easy Download of model artifacts and control over models like LLaMa.cpp through the UI, Authentication in the UI by user/password via Native or Google OAuth, State Preservation in the UI by user/password, Linux, Docker, macOS, and Windows support, Easy Windows Installer for Windows 10 64-bit (CPU/CUDA), Easy macOS Installer for macOS (CPU/M1/M2), Inference Servers support (oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI, Anthropic), OpenAI-compliant, Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server), Python client API (to talk to Gradio server), JSON Mode with any model via code block extraction. Also supports MistralAI JSON mode, Claude-3 via function calling with strict Schema, OpenAI via JSON mode, and vLLM via guided_json with strict Schema, Web-Search integration with Chat and Document Q/A, Agents for Search, Document Q/A, Python Code, CSV frames (Experimental, best with OpenAI currently), Evaluate performance using reward models, and Quality maintained with over 1000 unit and integration tests taking over 4 GPU-hours.
mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.
ollama
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. Ollama is designed to be easy to use and accessible to developers of all levels. It is open source and available for free on GitHub.
llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output (objects). It provides a simple yet robust interface and supports llama-cpp-python and OpenAI endpoints with GBNF grammar support (like the llama-cpp-python server) and the llama.cpp backend server. It works by generating a formal GGML-BNF grammar of the user defined structures and functions, which is then used by llama.cpp to generate text valid to that grammar. In contrast to most GBNF grammar generators it also supports nested objects, dictionaries, enums and lists of them.
llama_ros
This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.
MITSUHA
OneReality is a virtual waifu/assistant that you can speak to through your mic and it'll speak back to you! It has many features such as: * You can speak to her with a mic * It can speak back to you * Has short-term memory and long-term memory * Can open apps * Smarter than you * Fluent in English, Japanese, Korean, and Chinese * Can control your smart home like Alexa if you set up Tuya (more info in Prerequisites) It is built with Python, Llama-cpp-python, Whisper, SpeechRecognition, PocketSphinx, VITS-fast-fine-tuning, VITS-simple-api, HyperDB, Sentence Transformers, and Tuya Cloud IoT.
wenxin-starter
WenXin-Starter is a spring-boot-starter for Baidu's "Wenxin Qianfan WENXINWORKSHOP" large model, which can help you quickly access Baidu's AI capabilities. It fully integrates the official API documentation of Wenxin Qianfan. Supports text-to-image generation, built-in dialogue memory, and supports streaming return of dialogue. Supports QPS control of a single model and supports queuing mechanism. Plugins will be added soon.
FlexFlow
FlexFlow Serve is an open-source compiler and distributed system for **low latency**, **high performance** LLM serving. FlexFlow Serve outperforms existing systems by 1.3-2.0x for single-node, multi-GPU inference and by 1.4-2.4x for multi-node, multi-GPU inference.