LocalAIVoiceChat

LocalAIVoiceChat

Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

Stars: 362

Visit
 screenshot

LocalAIVoiceChat is an experimental alpha software that enables real-time voice chat with a customizable AI personality and voice on your PC. It integrates Zephyr 7B language model with speech-to-text and text-to-speech libraries. The tool is designed for users interested in state-of-the-art voice solutions and provides an early version of a local real-time chatbot.

README:

Local AI Voice Chat

Provides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice.

Hint: Anybody interested in state-of-the-art voice solutions please also have a look at Linguflex. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.

Note: If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see here). Please downgrade to an older transformers version: pip install transformers==4.38.2 or upgrade RealtimeTTS to latest version pip install realtimetts==0.4.1.

About the Project

Integrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot.

https://github.com/KoljaB/LocalAIVoiceChat/assets/7604638/cebacdad-8a57-4a03-bfd1-a469730dda51

Tech Stack

  • llama_cpp with Zephyr 7B
    • library interface for llamabased language models
  • RealtimeSTT with faster_whisper
    • real-time speech-to-text transcription library
  • RealtimeTTS with Coqui XTTS
    • real-time text-to-speech synthesis library

Notes

This software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity.

Please take this as a first attempt to provide an early version of a local realtime chatbot.

Updates

  • Update to Coqui XTTS 2.0 model
  • Bugfix to RealtimeTTS (download of Coqui model did not work properly)

Prerequisites

You will need a GPU with around 8 GB VRAM to run this in real-time.

For nVidia users

  • NVIDIA CUDA Toolkit 11.8:

  • NVIDIA cuDNN 8.7.0 for CUDA 11.x:

    • Navigate to NVIDIA cuDNN Archive.
    • Locate and download "cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
    • Follow the provided installation guide.

For AMD users

  • Install ROCm v.5.7.1

  • FFmpeg:

    Install FFmpeg according to your operating system:

    • Ubuntu/Debian:

      sudo apt update && sudo apt install ffmpeg
    • Arch Linux:

      sudo pacman -S ffmpeg
    • macOS (Homebrew):

      brew install ffmpeg
    • Windows (Chocolatey):

      choco install ffmpeg
    • Windows (Scoop):

      scoop install ffmpeg

Installation Steps

  1. Clone the repository or download the source code package.

  2. Install llama.cpp

    • (for AMD users) Before the next step set env variable LLAMA_HIPBLAS value to on

    • Official way:

    pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
    • If the official installation does not work for you, please install text-generation-webui, which provides some excellent wheels for a lot of platforms and environments
  3. Install realtime libraries

    • Install the main libraries:
      pip install RealtimeSTT==0.1.7
      pip install RealtimeTTS==0.2.7
  4. Download zephyr-7b-beta.Q5_K_M.gguf from here.

    • Open creation_params.json and enter the filepath to the downloaded model into model_path.
    • Adjust n_gpu_layers (0-35, raise if you have more VRAM) and n_threads (number of CPU threads, i recommend not using all available cores but leave some for TTS)
  5. If dependency conflicts occur, install specific versions of conflicting libraries:

    pip install networkx==2.8.8
    pip install typing_extensions==4.8.0
    pip install fsspec==2023.6.0
    pip install imageio==2.31.6
    pip install numpy==1.24.3
    pip install requests==2.31.0

Running the Application

 python ai_voicetalk_local.py

Customize

Change AI personality

Open chat_params.json to change the talk scenario.

Change AI Voice

  • Open ai_voicetalk_local.py.
  • Find this line: coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en")
  • Change "female.wav" to the filename of a wave file (44100 or 22050 Hz mono 16-bit) containing the voice to clone

Speech end detection

If the first sentence is transcribed before you get to the second one, raise post_speech_silence_duration on AudioToTextRecorder: AudioToTextRecorder(model="tiny.en", language="en", spinner=False, post_speech_silence_duration = 1.5)

Contributing

Contributions to enhance or improve the project are warmly welcomed. Feel free to open a pull request with your proposed changes or fixes.

License

The project is under Coqui Public Model License 1.0.0.

This license allows only non-commercial use of a machine learning model and its outputs.

Contact

Kolja Beigel

Feel free to reach out for any queries or support related to this project.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LocalAIVoiceChat

Similar Open Source Tools

For similar tasks

For similar jobs