

Local SRT/LLM/TTS Voicechat

Stars: 269


Voicechat2 is a fast, fully local AI voice chat tool that uses WebSockets for communication. It includes a WebSocket server for remote access, default web UI with VAD and Opus support, and modular/swappable SRT, LLM, TTS servers. Users can customize components like SRT, LLM, and TTS servers, and run different models for voice-to-voice communication. The tool aims to reduce latency in voice communication and provides flexibility in server configurations.



A fast, fully local AI Voicechat using WebSockets

  • WebSocket server, allows for simple remote access
  • Default web UI w/ VAD, Opus support
  • Modular/swappable SRT, LLM, TTS servers

voicechat2 demo video

Unmute to hear the audio

On an 7900-class AMD RDNA3 card, voice-to-voice latency is in the 1 second range:

On a 4090, using Faster Whisper with faster-distil-whisper-large-v2 we can cut the latency down to as low as 300ms:

voicechat2 demo

You can of course run any model or swap out any of the SRT, LLM, TTS components as you like. For example, you can run whisper.cpp for SRT, or we have a StyleTTS2 server in the test folder for an alternative TTS. For a bit more about this project, see my Hackster.io writeup.


These installation instructions are for Ubuntu LTS and assume you've setup your ROCm or CUDA already.

I recommend you use conda or (my preferred), mamba for environment management. It will make your life easier.

System Prereqs

sudo apt update

# Not strictly required but the helpers we use
sudo apt install byobu curl wget

# Audio processing
sudo apt install espeak-ng ffmpeg libopus0 libopus-dev 

Checkout code

# Create env
mamba create -y -n voicechat2 python=3.11

# Setup
mamba activate voicechat2
git clone https://github.com/lhl/voicechat2
cd voicechat2
pip install -r requirements.txt


# Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# AMD version
make GGML_HIPBLAS=1 -j 
# Nvidia version
make GGML_CUDA=1 -j 

# Grab your preferred GGUF model
wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf

# If you're going to go to the next instruction
cd ..

Some extra convenience scripts for launching:

run-voicechat2.sh - on your GPU machine, tries to launch all servers in separate byobu sessions; update the MODEL variables
remote-tunnel.sh - connect your GPU machine to a jump machine
local-tunnel.sh - connect to the GPU machine via a jump machine

Other AI Voicechat Projects


The demo shows a fair amount of latency (~10s) but this project isn't the closest to what we're doing (it uses WebRTC not websockets) from voicechat2 (HF Transformers, Ollama)


A console-based local client (HF Transformers, Ollama, Coqui TTS, PortAudio)


This is a very responsive console-based local-client app that also has VAD and interruption support, plus a really clever hook! (whisper.cpp, llama.cpp, piper, espeak)


Another console-based local client, more of a proof of concept but with w/ blog writeup.

BUD-E - natural_voice_assistant

Another console-based local client (FastConformer, HF Transformers, StyleTTS2, espeak)


KoljaB has a number of interesting projects around console-based local clients like RealtimeSTT, RealtimeTTS, Linguflex, etc. (faster_whisper, llama.cpp, Coqui XTTS)


This is not a local voicechat client, but it does have a neat WebRTC front-end, so might be worth poking around into (Vite/React, Tailwind, Radix)

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for voicechat2

Similar Open Source Tools

For similar tasks

For similar jobs