Best AI tools for< Voice Assistant Developer >
Infographic
1 - AI tool Sites

SoundHound
SoundHound is a leading innovator of conversational intelligence and voice AI technologies. Our independent voice AI platform is built for more natural conversation, enabling businesses to create customized and scalable voice AI solutions for their specific industries and use cases. With SoundHound, you can build voice assistants, enhance smart devices, improve customer experiences, and drive business value.
9 - Open Source Tools

SirChatalot
A Telegram bot that proves you don't need a body to have a personality. It can use various text and image generation APIs to generate responses to user messages. For text generation, the bot can use: * OpenAI's ChatGPT API (or other compatible API). Vision capabilities can be used with GPT-4 models. Function calling can be used with Function calling. * Anthropic's Claude API. Vision capabilities can be used with Claude 3 models. Function calling can be used with tool use. * YandexGPT API Bot can also generate images with: * OpenAI's DALL-E * Stability AI * Yandex ART This bot can also be used to generate responses to voice messages. Bot will convert the voice message to text and will then generate a response. Speech recognition can be done using the OpenAI's Whisper model. To use this feature, you need to install the ffmpeg library. This bot is also support working with files, see Files section for more details. If function calling is enabled, bot can generate images and search the web (limited).

chat-xiuliu
Chat-xiuliu is a bidirectional voice assistant powered by ChatGPT, capable of accessing the internet, executing code, reading/writing files, and supporting GPT-4V's image recognition feature. It can also call DALL·E 3 to generate images. The project is a fork from a background of a virtual cat girl named Xiuliu, with removed live chat interaction and added voice input. It can receive questions from microphone or interface, answer them vocally, upload images and PDFs, process tasks through function calls, remember conversation content, search the web, generate images using DALL·E 3, read/write local files, execute JavaScript code in a sandbox, open local files or web pages, customize the cat girl's speaking style, save conversation screenshots, and support Azure OpenAI and other API endpoints in openai format. It also supports setting proxies and various AI models like GPT-4, GPT-3.5, and DALL·E 3.

june
june-va is a local voice chatbot that combines Ollama for language model capabilities, Hugging Face Transformers for speech recognition, and the Coqui TTS Toolkit for text-to-speech synthesis. It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. The tool supports various interaction modes including text input/output, voice input/text output, text input/audio output, and voice input/audio output. Users can customize the tool's behavior with a JSON configuration file and utilize voice conversion features for voice cloning. The application can be further customized using a configuration file with attributes for language model, speech-to-text model, and text-to-speech model configurations.

OpenVoiceChat
OpenVoiceChat is an open-source tool designed for having natural voice conversations with an LLM model. It supports various speech-to-text (STT), text-to-speech (TTS), and large language model (LLM) models. The tool aims to provide an alternative to closed commercial implementations, with well-abstracted APIs that are easy to use and extend. Users can install base and functionality-specific packages using pip, and the tool supports interruptions during conversations. The project encourages contributions through bounties and has a detailed roadmap available for reference.

kobold_assistant
Kobold-Assistant is a fully offline voice assistant interface to KoboldAI's large language model API. It can work online with the KoboldAI horde and online speech-to-text and text-to-speech models. The assistant, called Jenny by default, uses the latest coqui 'jenny' text to speech model and openAI's whisper speech recognition. Users can customize the assistant name, speech-to-text model, text-to-speech model, and prompts through configuration. The tool requires system packages like GCC, portaudio development libraries, and ffmpeg, along with Python >=3.7, <3.11, and runs on Ubuntu/Debian systems. Users can interact with the assistant through commands like 'serve' and 'list-mics'.

bailing
Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.

jaison-core
J.A.I.son is a Python project designed for generating responses using various components and applications. It requires specific plugins like STT, T2T, TTSG, and TTSC to function properly. Users can customize responses, voice, and configurations. The project provides a Discord bot, Twitch events and chat integration, and VTube Studio Animation Hotkeyer. It also offers features for managing conversation history, training AI models, and monitoring conversations.

xiaozhi-esphome
This GitHub project provides a simple way to use Xiaozhi-based devices with ESPHome, allowing them to serve as voice assistants integrated with Home Assistant. Users can follow a step-by-step installation guide to connect their devices, edit configurations, and set up the voice assistant. The project supports various devices such as Spotpear Ball, Muma Box, Puck, Guition Taichi pi, Xingzhi Cube, and more. Additionally, it offers links to purchase supported devices and accessories, including 3D files for holders and wireless chargers.

TTS-WebUI
TTS WebUI is a comprehensive tool for text-to-speech synthesis, audio/music generation, and audio conversion. It offers a user-friendly interface for various AI projects related to voice and audio processing. The tool provides a range of models and extensions for different tasks, along with integrations like Silly Tavern and OpenWebUI. With support for Docker setup and compatibility with Linux and Windows, TTS WebUI aims to facilitate creative and responsible use of AI technologies in a user-friendly manner.
20 - OpenAI Gpts

DateMate
Your friendly AI assistant for voice-based dating, offering personalized tips, safety advice, and fun interactions.

🤖 SmartLink Integrator 🌎
Your AI bridge to the Internet of Things! Easily connect, control, and automate your smart devices with voice or text commands. 🏠💎

Concept Tutor
Assistant focused on teaching concepts, evaluating comprehension, and recommending subsequent topics. USE WITH VOICE.

Ren'Py Visual Novel Assistant
Friendly and casual assistant for creating Ren'Py visual novels

Him
He is an incredibly humanlike friend, deeply trained for engaging voice conversation and meaningful connection.

😴 SleepyTales
(aka ChatSleepy-T) Spinning long and boring stories to help you unwind and fall asleep. Designed for voice mode, turn it on and chill...

Dialysis Assistant
Home Hemodialysis Helper for NxStage system. Step-by-step guidance, help for tricky situations, and voice interaction recommended.

Text Playground
Best AI-powered Text Playground!! I am your go-to assistant for text-to other media conversions. Flawelessly convert any text to voice, image, or video!! I am here to help. Ask me anything!!

Skillful Voice
Premier expert in household management, offering unparalleled advice and guidance.

CliniType EHR
Voice-to-text, Vision-to-text transcription, Transcript-to-‘Clinical format’ integrated with CDS. Writes clinical notes, referral letter, generate PDF,prepare discharge summary. (Ultimate aid for clinicians)

BostonGPT
Chat with the Boston Accent. For best results, use voice in the native ChatGPT mobile app

Marina the Brazilian Portuguese Tutor
More than your average AI Teacher! A Teacher with a REAL personality👋🏻 Hi there! ❤️ Learn with me Brazilian Portuguese ✅ I coach beginner to advanced level 💬 Practice vocabulary, writing, reading, speaking, or learn a new topic 📲 Use voice in mobile for talking

Bring Your Writing Voice to Every Task
This GPT will help you recreate your writing voice across multiple tasks. All you need is a prior writing sample (email, blog, article, tweet) and a new task.