Open-LLM-VTuber
Talk to any LLM with hands-free voice interaction, voice interruption, Live2D taking face, and long-term memory running locally across platforms
Stars: 542
Open-LLM-VTuber is a project in early stages of development that allows users to interact with Large Language Models (LLM) using voice commands and receive responses through a Live2D talking face. The project aims to provide a minimum viable prototype for offline use on macOS, Linux, and Windows, with features like long-term memory using MemGPT, customizable LLM backends, speech recognition, and text-to-speech providers. Users can configure the project to chat with LLMs, choose different backend services, and utilize Live2D models for visual representation. The project supports perpetual chat, offline operation, and GPU acceleration on macOS, addressing limitations of existing solutions on macOS.
README:
⚠️ Read this if you are updating from an old version without the voice interruption feature: The latest version changed how to open the live2d server and the backend:server.py
now launches everything it needs (except the browser). To run with Live2D and the browser, launchserver.py
and open the web page in the browser. You no longer need to runmain.py
with theserver.py
. Runningserver.py
assumes Live2D mode with the browser, and runningmain.py
assumes no Live2D mode without the browser. In addition, optionsMIC-IN-BROWSER
andLIVE2D
in the configuration file no longer have any effects and have been deprecated due to the changes in the backend.
⚠️ This project is in its early stages and is currently under active development. Features are unstable, code is messy, and breaking changes will occur. The main goal of this stage is to build a minimum viable prototype using technologies that are easy to integrate.
⚠️ This project currently has a lot of issues on Windows. In theory, it should all work, but many people using Windows have many problems with many dependencies. I might fix those features in the future, but Windows support currently requires testing and debugging. If you have a Mac or a Linux machine, use them instead for the time being. Join the Discord server if you need help or to get updates about this project.
⚠️ If you want to run this program on a server and access it remotely on your laptop, the microphone on the front end will only launch in a secure context (a.k.a. https or localhost). See MDN Web Doc. Therefore, you might want to configure https with a reverse proxy if you want to access the page on a remote machine (non-localhost).
Open-LLM-VTuber allows you to talk to (and interrupt!) any LLM locally by voice (hands-free) with a Live2D talking face. The LLM inference backend, speech recognition, and speech synthesizer are all designed to be swappable. This project can be configured to run offline on macOS, Linux, and Windows. Online LLM/ASR/TTS options are also supported.
Long-term memory with MemGPT can be configured to achieve perpetual chat, infinite* context length, and external data source.
This project started as an attempt to recreate the closed-source AI VTuber neuro-sama
with open-source alternatives that can run offline on platforms other than Windows.
English demo:
https://github.com/user-attachments/assets/1a147c4c-68e6-4248-a429-47ef286cc9c8
中文 demo:
- It works on macOS
- Many existing solutions display Live2D models with VTube Studio and achieve lip sync by routing desktop internal audio into VTube Studio and controlling the lips with that. On macOS, however, there is no easy way to let VTuber Studio listen to internal audio on the desktop.
- Many existing solutions lack support for GPU acceleration on macOS, which makes them run slow on Mac.
- This project supports MemGPT for perpetual chat. The chatbot remembers what you've said.
- No data leaves your computer if you wish to
- You can choose local LLM/voice recognition/speech synthesis solutions; everything works offline. Tested on macOS.
- You can interrupt the LLM anytime with your voice without wearing headphones.
- [x] Chat with any LLM by voice
- [x] Interrupt LLM with voice at any time
- [x] Choose your own LLM backend
- [x] Choose your own Speech Recognition & Text to Speech provider
- [x] Long-term memory
- [x] Live2D frontend
- macOS
- Linux
- Windows
- [Sep 17, 2024] Added DeepLX translation to change the language for audio
- [Sep 6, 2024] Added GroqWhisperASR
- [Sep 5, 2024] Better Docker support
- [Sep 1, 2024] Added voice interruption (and refactored the backend)
- [Jul 15, 2024] Added MeloTTS
- [Jul 15, 2024] Refactored llm and launch.py and reduced TTS latency
- [Jul 11, 2024] Added CosyVoiceTTS
- [Jul 11, 2024] Added FunASR with SenseVoiceSmall speech recognition model.
- [Jul 7, 2024] Totally untested Docker support with Nvidia GPU passthrough (no Mac, no AMD)
- [Jul 6, 2024] Support for Chinese 支持中文 and probably some other languages...
- [Jul 6, 2024] WhisperCPP with macOS GPU acceleration. Dramatically decreased latency on Mac
- ...
- Talk to LLM with voice. Offline.
-
RAG on chat history(temporarily removed)
Currently supported LLM backend
- Any OpenAI-API-compatible backend, such as Ollama, Groq, LM Studio, OpenAI, and more.
- MemGPT (setup required)
Currently supported Speech recognition backend
-
FunASR, which support SenseVoiceSmall and many other models. (
LocalCurrently requires an internet connection for loading. Compute locally) - Faster-Whisper (Local)
- Whisper-CPP using the python binding pywhispercpp (Local, mac GPU acceleration can be configured)
- Whisper (local)
- Groq Whisper (API Key required). This is a hosted Whisper endpoint, which is fast and has a generous free limit every day.
- Azure Speech Recognition (API Key required)
- The microphone in the server terminal will be used by default. You can change the setting
MIC_IN_BROWSER
in theconf.yaml
to move the microphone (and voice activation detection) to the browser (at the cost of latency, for now). You might want to use the microphone on your client (the browser) rather than the one on your server if you run the backend on a different machine or inside a VM or docker.
Currently supported Text to Speech backend
- py3-tts (Local, it uses your system's default TTS engine)
- meloTTS (Local, fast)
- bark (Local, very resource-consuming)
- CosyVoice (Local, very resource-consuming)
- xTTSv2 (Local, very resource-consuming)
- Edge TTS (online, no API key required)
- Azure Text-to-Speech (online, API Key required)
Fast Text Synthesis
- Synthesize sentences as soon as they arrive, so there is no need to wait for the entire LLM response.
- Producer-consumer model with multithreading: Audio will be continuously synthesized in the background. They will be played one by one whenever the new audio is ready. The audio player will not block the audio synthesizer.
Live2D Talking face
- Change Live2D model with
config.yaml
(model needs to be listed in model_dict.json) - Load local Live2D models. Check
doc/live2d.md
for documentation. - Uses expression keywords in LLM response to control facial expression, so there is no additional model for emotion detection. The expression keywords are automatically loaded into the system prompt and excluded from the speech synthesis output.
live2d technical details
- Uses guansss/pixi-live2d-display to display live2d models in browser
- Uses WebSocket to control facial expressions and talking state between the server and the front end
- All the required packages are locally available, so the front end works offline.
- You can load live2d models from a URL or the one stored locally in the
live2d-models
directory. The defaultshizuku-local
is stored locally and works offline. If the URL property of the model in the model_dict.json is a URL rather than a path starting with/live2d-models
, they will need to be fetched from the specified URL whenever the front end is opened. Readdoc/live2d.md
for documentation on loading your live2D model from local. - Run the
server.py
to run the WebSocket communication server, open theindex.html
in the./static
folder to open the front end, and runlaunch.py
main.py
to run the backend for LLM/ASR/TTS processing.
New installation instruction is being created here
Install FFmpeg on your computer.
Clone this repository.
You need to have Ollama or any other OpenAI-API-Compatible backend ready and running. If you want to use MemGPT as your backend, scroll down to the MemGPT section.
Prepare the LLM of your choice. Edit the BASE_URL and MODEL in the project directory's conf.yaml
.
This project was developed using Python 3.10.13
and is incompatible with Python versions lower than 3.9
. I strongly recommend creating a virtual Python environment like conda for this project (because the dependencies are a mess!).
Run the following in the terminal to install the dependencies.
pip install -r requirements.txt # Run this in the project directory
# Install Speech recognition dependencies and text-to-speech dependencies according to the instructions below
This project, by default, launches the audio interaction mode, meaning you can talk to the LLM by voice, and the LLM will talk back to you by voice.
Edit the conf.yaml
for configurations. You can follow the configuration used in the demo video.
If you want to use live2d, run server.py
. Open the page localhost:12393
(you can change this) with your browser, and you are ready. Once the live2D model appears on the screen, it's ready to talk to you.
If you don't want the live2d, you can run main.py
with Python for cli mode.
Some models will be downloaded on your first launch, which may require an internet connection and may take a while.
Back up the configuration files conf.yaml
if you've edited them, and then update the repo.
Or just clone the repo again and make sure to transfer your configurations. The configuration file will sometimes change because this project is still in its early stages. Be cautious when updating the program.
Edit the ASR_MODEL settings in the conf.yaml
to change the provider.
Here are the options you have for speech recognition:
FunASR
(local) (Runs very fast even on CPU. Not sure how they did it)
- FunASR is a Fundamental End-to-End Speech Recognition Toolkit from ModelScope that runs many ASR models. The result and speed are pretty good with the SenseVoiceSmall from FunAudioLLM at Alibaba Group.
- Install with
pip install -U funasr modelscope huggingface_hub
. Also, ensure you have torch (torch>=1.13) and torchaudio. Install them withpip install torch torchaudio
- It requires an internet connection on launch even if the models are locally available. See https://github.com/modelscope/FunASR/issues/1897
Faster-Whisper
(local)
- Whisper, but faster. On macOS, it runs on CPU only, which is not so fast, but it's easy to use.
WhisperCPP
(local) (runs super fast on a Mac if configured correctly)
- If you are on a Mac, read below for instructions on setting up WhisperCPP with coreML support. If you want to use CPU or Nvidia GPU, install the package by running
pip install pywhispercpp
. - The whisper cpp python binding. It can run on coreML with configuration, which makes it very fast on macOS.
- On CPU or Nvidia GPU, it's probably slower than Faster-Whisper
WhisperCPP coreML configuration:
- Uninstall the original
pywhispercpp
if you have already installed it. We are building the package. - Run
install_coreml_whisper.py
with Python to automatically clone and build the coreML-supportedpywhispercpp
for you. - Prepare the appropriate coreML models.
- You can either convert models to coreml according to the documentation on Whisper.cpp repo
- ...or you can find some magical huggingface repo that happens to have those converted models. Just remember to decompress them. If the program fails to load the model, it will produce a segmentation fault.
- You don't need to include those weird prefixes in the model name in the
conf.yaml
. For example, if the coreML model's name looks likeggml-base-encoder.mlmodelc
, just putbase
into themodel_name
underWhisperCPP
settings in theconf.yaml
.
Whisper
(local)
- Original Whisper from OpenAI. Install it with
pip install -U openai-whisper
- The slowest of all. Added as an experiment to see if it can utilize macOS GPU. It didn't.
GroqWhisperASR
(online, API Key required)
- Whisper endpoint from Groq. It's very fast and has a lot of free usage every day. It's pre-installed. Get an API key from groq and add it into the GroqWhisper setting in the
conf.yaml
. - API key and internet connection are required.
AzureASR
(online, API Key required)
- Azure Speech Recognition. Install with
pip install azure-cognitiveservices-speech
. - API key and internet connection are required.
Install the respective package and turn it on using the TTS_MODEL
option in conf.yaml
.
pyttsx3TTS
(local, fast)
- Install with the command
pip install py3-tts
. - This package will use the default TTS engine on your system. It uses
sapi5
on Windows,nsss
on Mac, andespeak
on other platforms. -
py3-tts
is used instead of the more famouspyttsx3
becausepyttsx3
seems unmaintained, and I couldn't get the latest version ofpyttsx3
working.
meloTTS
(local, fast)
- Install MeloTTS according to their documentation (don't install via docker) (A nice place to clone the repo is the submodule folder, but you can put it wherever you want). If you encounter a problem related to
mecab-python
, try this fork (hasn't been merging into the main as of July 16, 2024). - It's not the best, but it's definitely better than pyttsx3TTS, and it's pretty fast on my mac. I would choose this for now if I can't access the internet (and I would use edgeTTS if I have the internet).
barkTTS
(local, slow)
- Install the pip package with this command
pip install git+https://github.com/suno-ai/bark.git
and turn it on inconf.yaml
. - The required models will be downloaded on the first launch.
cosyvoiceTTS
(local, slow)
- Configure CosyVoice and launch the WebUI demo according to their documentation.
- Edit
conf.yaml
to match your desired configurations. Check their WebUI and the API documentation on the WebUI to see the meaning of the configurations under the settingcosyvoiceTTS
in theconf.yaml
.
xTTSv2
(local, slow)
- Recommend to use xtts-api-server, it has clear api docs and relative easy to deploy.
edgeTTS
(online, no API key required)
- Install the pip package with this command
pip install edge-tts
and turn it on inconf.yaml
. - It sounds pretty good. Runs pretty fast.
- Remember to connect to the internet when using edge tts.
AzureTTS
(online, API key required)
- See below
Create a file named api_keys.py
in the project directory, paste the following text into the file, and fill in the API keys and region you gathered from your Azure account.
# Azure API key
AZURE_API_Key="YOUR-API-KEY-GOES-HERE"
# Azure region
AZURE_REGION="YOUR-REGION"
# Choose the Text to speech model you want to use
AZURE_VOICE="en-US-AshleyNeural"
If you're using macOS, you need to enable the microphone permission of your terminal emulator (you run this program inside your terminal, right? Enable the microphone permission for your terminal). If you fail to do so, the speech recognition will not be able to hear you because it does not have permission to use your microphone.
DeepLX translation was implemented to let the program speaks in a language different from the conversation language. For example, the LLM might be thinking in English, the subtitle is in English, and you are speaking English, but the voice of the LLM is in Japanese. This is achieved by translating the sentence before it was sent for audio generation.
DeepLX is the only supported translation backend for now. Other providers will be implemented soon.
- Set
TRANSLATE_AUDIO
inconf.yaml
to True - Set
DEEPLX_TARGET_LANG
to your desired language. Make sure this language matches the language of the TTS speaker (for example, if theDEEPLX_TARGET_LANG
is "JA", which is Japanese, the TTS should also be speaking Japanese.).
MemGPT integration is very experimental and requires quite a lot of setup. In addition, MemGPT requires a powerful LLM (larger than 7b and quantization above Q5) with a lot of token footprint, which means it's a lot slower. MemGPT does have its own LLM endpoint for free, though. You can test things with it. Check their docs.
This project can use MemGPT as its LLM backend. MemGPT enables LLM with long-term memory.
To use MemGPT, you need to have the MemGPT server configured and running. You can install it using pip
or docker
or run it on a different machine. Check their GitHub repo and official documentation.
⚠️ I recommend you install MemGPT either in a separate Python virtual environment or in docker because there is currently a dependency conflict between this project and MemGPT (on fast API, it seems). You can check this issue Can you please upgrade typer version in your dependancies #1382.
Here is a checklist:
- Install memgpt
- Configure memgpt
- Run
memgpt
usingmemgpt server
command. Remember to have the server running before launching Open-LLM-VTuber. - Set up an agent either through its cli or web UI. Add your system prompt with the Live2D Expression Prompt and the expression keywords you want to use (find them in
model_dict.json
) into MemGPT - Copy the
server admin password
and theAgent id
into./llm/memgpt_config.yaml
. By the way,agent id
is not the agent's name. - Set the
LLM_PROVIDER
tomemgpt
inconf.yaml
. - Remember, if you use
memgpt
, all LLM-related configurations inconf.yaml
will be ignored becausememgpt
doesn't work that way.
PortAudio
Missing
- Install
libportaudio2
to your computer via your package manager like apt
You can either build the image youself or pull it from the docker hub.
- (but the image size is crazy large)
- The image on the docker hub might not updated as regularly as it can be. GitHub action can't build an image as big as this. I might look into other options.
Current issues:
- Large image size (~20GB), and will require more space because some models are optional and will be downloaded only when used.
- Nvidia GPU required (GPU passthrough limitation)
- Nvidia Container Toolkit needs to be configured for GPU passthrough.
- Some models will have to be downloaded again if you stop the container. (will be fixed)
- Don't build the image on an Arm machine. One of the dependencies (grpc, to be exact) will fail for some reason https://github.com/grpc/grpc/issues/34998.
- And as mentioned before, you can't run it on a remote server unless the web page has https. That's because the web mic on the front end will only launch in a secure context (which means localhost or https environment only).
Most of the asr and tts will be pre-installed. However, bark TTS and the original OpenAI Whisper (Whisper
, not WhisperCPP) are NOT included in the default build process because they are huge (~8GB, which makes the whole container about 25GB). In addition, they don't deliver the best performance either. To include bark and/or whisper in the image, add the argument --build-arg INSTALL_ORIGINAL_WHISPER=true --build-arg INSTALL_BARK=true
to the image build command.
Setup guide:
-
Review
conf.yaml
before building (currently burned into the image, I'm sorry): -
Build the image:
docker build -t open-llm-vtuber .
(Grab a drink, this will take a while)
-
Grab a
conf.yaml
configuration file. Grab aconf.yaml
file from this repo. Or you can get it directly from this link. -
Run the container:
$(pwd)/conf.yaml
should be the path of your conf.yaml
file.
docker run -it --net=host --rm -v $(pwd)/conf.yaml:/app/conf.yaml -p 12393:12393 open-llm-vtuber
- Open localhost:12393 to test
(this project is in the active prototyping stage, so many things will change)
Some abbreviations used in this project:
- LLM: Large Language Model
- TTS: Text-to-speech, Speech Synthesis, Voice Synthesis
- ASR: Automatic Speech Recognition, Speech recognition, Speech to text, STT
- VAD: Voice Activation Detection
You can assume that the sample rate is 16000
throughout this project.
The frontend stream chunks of Float32Array
with a sample rate of 16000
to the backend.
- Implement
TTSInterface
defined intts/tts_interface.py
. - Add your new TTS provider into
tts_factory
: the factory to instantiate and return the tts instance. - Add configuration to
conf.yaml
. The dict with the same name will be passed into the constructor of your TTSEngine as kwargs.
- Implement
ASRInterface
defined inasr/asr_interface.py
. - Add your new ASR provider into
asr_factory
: the factory to instantiate and return the ASR instance. - Add configuration to
conf.yaml
. The dict with the same name will be passed into the constructor of your class as kwargs.
- Implement
LLMInterface
defined inllm/llm_interface.py
. - Add your new LLM provider into
llm_factory
: the factory to instantiate and return the LLM instance. - Add configuration to
conf.yaml
. The dict with the same name will be passed into the constructor of your class as kwargs.
Awesome projects I learned from
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Open-LLM-VTuber
Similar Open Source Tools
Open-LLM-VTuber
Open-LLM-VTuber is a project in early stages of development that allows users to interact with Large Language Models (LLM) using voice commands and receive responses through a Live2D talking face. The project aims to provide a minimum viable prototype for offline use on macOS, Linux, and Windows, with features like long-term memory using MemGPT, customizable LLM backends, speech recognition, and text-to-speech providers. Users can configure the project to chat with LLMs, choose different backend services, and utilize Live2D models for visual representation. The project supports perpetual chat, offline operation, and GPU acceleration on macOS, addressing limitations of existing solutions on macOS.
StableSwarmUI
StableSwarmUI is a modular Stable Diffusion web user interface that emphasizes making power tools easily accessible, high performance, and extensible. It is designed to be a one-stop-shop for all things Stable Diffusion, providing a wide range of features and capabilities to enhance the user experience.
gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.
SwarmUI
SwarmUI is a modular stable diffusion web-user-interface designed to make powertools easily accessible, high performance, and extensible. It is in Beta status, offering a primary Generate tab for beginners and a Comfy Workflow tab for advanced users. The tool aims to become a full-featured one-stop-shop for all things Stable Diffusion, with plans for better mobile browser support, detailed 'Current Model' display, dynamic tab shifting, LLM-assisted prompting, and convenient direct distribution as an Electron app.
GlaDOS
This project aims to create a real-life version of GLaDOS, an aware, interactive, and embodied AI entity. It involves training a voice generator, developing a 'Personality Core,' implementing a memory system, providing vision capabilities, creating 3D-printable parts, and designing an animatronics system. The software architecture focuses on low-latency voice interactions, utilizing a circular buffer for data recording, text streaming for quick transcription, and a text-to-speech system. The project also emphasizes minimal dependencies for running on constrained hardware. The hardware system includes servo- and stepper-motors, 3D-printable parts for GLaDOS's body, animations for expression, and a vision system for tracking and interaction. Installation instructions cover setting up the TTS engine, required Python packages, compiling llama.cpp, installing an inference backend, and voice recognition setup. GLaDOS can be run using 'python glados.py' and tested using 'demo.ipynb'.
claude.vim
Claude.vim is a Vim plugin that integrates Claude, an AI pair programmer, into your Vim workflow. It allows you to chat with Claude about what to build or how to debug problems, and Claude offers opinions, proposes modifications, or even writes code. The plugin provides a chat/instruction-centric interface optimized for human collaboration, with killer features like access to chat history and vimdiff interface. It can refactor code, modify or extend selected pieces of code, execute complex tasks by reading documentation, cloning git repositories, and more. Note that it is early alpha software and expected to rapidly evolve.
RouteLLM
RouteLLM is a framework for serving and evaluating LLM routers. It allows users to launch an OpenAI-compatible API that routes requests to the best model based on cost thresholds. Trained routers are provided to reduce costs while maintaining performance. Users can easily extend the framework, compare router performance, and calibrate cost thresholds. RouteLLM supports multiple routing strategies and benchmarks, offering a lightweight server and evaluation framework. It enables users to evaluate routers on benchmarks, calibrate thresholds, and modify model pairs. Contributions for adding new routers and benchmarks are welcome.
redbox-copilot
Redbox Copilot is a retrieval augmented generation (RAG) app that uses GenAI to chat with and summarise civil service documents. It increases organisational memory by indexing documents and can summarise reports read months ago, supplement them with current work, and produce a first draft that lets civil servants focus on what they do best. The project uses a microservice architecture with each microservice running in its own container defined by a Dockerfile. Dependencies are managed using Python Poetry. Contributions are welcome, and the project is licensed under the MIT License.
REINVENT4
REINVENT is a molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design, molecule optimization, and other small molecule design tasks. It uses a Reinforcement Learning (RL) algorithm to generate optimized molecules compliant with a user-defined property profile defined as a multi-component score. Transfer Learning (TL) can be used to create or pre-train a model that generates molecules closer to a set of input molecules.
RAGMeUp
RAG Me Up is a generic framework that enables users to perform Retrieve and Generate (RAG) on their own dataset easily. It consists of a small server and UIs for communication. Best run on GPU with 16GB vRAM. Users can combine RAG with fine-tuning using LLaMa2Lang repository. The tool allows configuration for LLM, data, LLM parameters, prompt, and document splitting. Funding is sought to democratize AI and advance its applications.
llamafile
llamafile is a tool that enables users to distribute and run Large Language Models (LLMs) with a single file. It combines llama.cpp with Cosmopolitan Libc to create a framework that simplifies the complexity of LLMs into a single-file executable called a 'llamafile'. Users can run these executable files locally on most computers without the need for installation, making open LLMs more accessible to developers and end users. llamafile also provides example llamafiles for various LLM models, allowing users to try out different LLMs locally. The tool supports multiple CPU microarchitectures, CPU architectures, and operating systems, making it versatile and easy to use.
ezkl
EZKL is a library and command-line tool for doing inference for deep learning models and other computational graphs in a zk-snark (ZKML). It enables the following workflow: 1. Define a computational graph, for instance a neural network (but really any arbitrary set of operations), as you would normally in pytorch or tensorflow. 2. Export the final graph of operations as an .onnx file and some sample inputs to a .json file. 3. Point ezkl to the .onnx and .json files to generate a ZK-SNARK circuit with which you can prove statements such as: > "I ran this publicly available neural network on some private data and it produced this output" > "I ran my private neural network on some public data and it produced this output" > "I correctly ran this publicly available neural network on some public data and it produced this output" In the backend we use the collaboratively-developed Halo2 as a proof system. The generated proofs can then be verified with much less computational resources, including on-chain (with the Ethereum Virtual Machine), in a browser, or on a device.
ollama-autocoder
Ollama Autocoder is a simple to use autocompletion engine that integrates with Ollama AI. It provides options for streaming functionality and requires specific settings for optimal performance. Users can easily generate text completions by pressing a key or using a command pallete. The tool is designed to work with Ollama API and a specified model, offering real-time generation of text suggestions.
llm.c
LLM training in simple, pure C/CUDA. There is no need for 245MB of PyTorch or 107MB of cPython. For example, training GPT-2 (CPU, fp32) is ~1,000 lines of clean code in a single file. It compiles and runs instantly, and exactly matches the PyTorch reference implementation. I chose GPT-2 as the first working example because it is the grand-daddy of LLMs, the first time the modern stack was put together.
ultravox
Ultravox is a fast multimodal Language Model (LLM) that can understand both text and human speech in real-time without the need for a separate Audio Speech Recognition (ASR) stage. By extending Meta's Llama 3 model with a multimodal projector, Ultravox converts audio directly into a high-dimensional space used by Llama 3, enabling quick responses and potential understanding of paralinguistic cues like timing and emotion in human speech. The current version (v0.3) has impressive speed metrics and aims for further enhancements. Ultravox currently converts audio to streaming text and plans to emit speech tokens for direct audio conversion. The tool is open for collaboration to enhance this functionality.
llamabot
LlamaBot is a Pythonic bot interface to Large Language Models (LLMs), providing an easy way to experiment with LLMs in Jupyter notebooks and build Python apps utilizing LLMs. It supports all models available in LiteLLM. Users can access LLMs either through local models with Ollama or by using API providers like OpenAI and Mistral. LlamaBot offers different bot interfaces like SimpleBot, ChatBot, QueryBot, and ImageBot for various tasks such as rephrasing text, maintaining chat history, querying documents, and generating images. The tool also includes CLI demos showcasing its capabilities and supports contributions for new features and bug reports from the community.
For similar tasks
glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.
agents-flex
Agents-Flex is a LLM Application Framework like LangChain base on Java. It provides a set of tools and components for building LLM applications, including LLM Visit, Prompt and Prompt Template Loader, Function Calling Definer, Invoker and Running, Memory, Embedding, Vector Storage, Resource Loaders, Document, Splitter, Loader, Parser, LLMs Chain, and Agents Chain.
secret-llama
Entirely-in-browser, fully private LLM chatbot supporting Llama 3, Mistral and other open source models. Fully private = No conversation data ever leaves your computer. Runs in the browser = No server needed and no install needed! Works offline. Easy-to-use interface on par with ChatGPT, but for open source LLMs. System requirements include a modern browser with WebGPU support. Supported models include TinyLlama-1.1B-Chat-v0.4-q4f32_1-1k, Llama-3-8B-Instruct-q4f16_1, Phi1.5-q4f16_1-1k, and Mistral-7B-Instruct-v0.2-q4f16_1. Looking for contributors to improve the interface, support more models, speed up initial model loading time, and fix bugs.
shellgpt
ShellGPT is a tool that allows users to chat with a large language model (LLM) in the terminal. It can be used for various purposes such as generating shell commands, telling stories, and interacting with Linux terminal. The tool provides different modes of usage including direct mode for asking questions, REPL mode for chatting with LLM, and TUI mode tailored for inferring shell commands. Users can customize the tool by setting up different language model backends such as Ollama or using OpenAI compatible API endpoints. Additionally, ShellGPT comes with built-in system contents for general questions, correcting typos, generating URL slugs, programming questions, shell command inference, and git commit message generation. Users can define their own content or share customized contents in the discuss section.
Open-LLM-VTuber
Open-LLM-VTuber is a project in early stages of development that allows users to interact with Large Language Models (LLM) using voice commands and receive responses through a Live2D talking face. The project aims to provide a minimum viable prototype for offline use on macOS, Linux, and Windows, with features like long-term memory using MemGPT, customizable LLM backends, speech recognition, and text-to-speech providers. Users can configure the project to chat with LLMs, choose different backend services, and utilize Live2D models for visual representation. The project supports perpetual chat, offline operation, and GPU acceleration on macOS, addressing limitations of existing solutions on macOS.
demo-chatbot
The demo-chatbot repository contains a simple app to chat with an LLM, allowing users to create any LLM Inference Web Apps using Python. The app utilizes OpenAI's GPT-4 API to generate responses to user messages, with the flexibility to switch to other APIs or models. The repository includes a tutorial in the Taipy documentation for creating the app. Users need an OpenAI account with an active API key to run the app by cloning the repository, installing dependencies, setting up the API key in a .env file, and running the main.py file.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.