Best AI tools for< Voice Assistant Developer >
Infographic
20 - AI tool Sites
SoundHound
SoundHound is a leading innovator of conversational intelligence and voice AI technologies. Our independent voice AI platform is built for more natural conversation, enabling businesses to create customized and scalable voice AI solutions for their specific industries and use cases. With SoundHound, you can build voice assistants, enhance smart devices, improve customer experiences, and drive business value.
Amy
Amy is a workplace assistant that uses conversational technology to help users with a variety of tasks, including communication, HR, web management, and recruitment. Amy can be used to send messages, schedule meetings, manage attendance and leaves, update websites, post blogs and jobs, and find talent. Amy is designed to be easy to use and can be accessed through a variety of devices, including smartphones, tablets, and computers.
BharatGPT
BharatGPT is an AI-powered conversational AI platform designed for the Indian market. It offers generative text, voice, and video capabilities, supporting over 12 Indian languages. The platform focuses on fostering domestic AI development and ensuring data localization in India. BharatGPT is optimized for Indian users, providing features like custom knowledge base integration, omni-channel support, and dialogue management.
OpenVoiceOS
OpenVoiceOS is a community-driven, open-source voice AI platform for creating custom voice-controlled interfaces across devices with NLP, a customizable UI, and a focus on privacy and security. OpenVoiceOS is designed to provide users with a seamless and intuitive voice interface for controlling their smart home devices, playing music, setting reminders, and much more. OpenVoiceOS is open to all developers and contributors wanting to support a specific device or a platform. OpenVoiceOS is the platform to throw your ideas at if you have an experimental feature you want users to experience before landing them into any of the Linux-based open-source voice assistant projects upstream.
APOB
APOB is an AI creator tool that allows users to generate AI personas. With APOB, users can easily create unique and customized AI characters for various purposes such as storytelling, gaming, and virtual assistants. The tool provides a user-friendly interface and a wide range of customization options to bring your AI persona to life. Whether you're a writer looking for character inspiration or a game developer in need of unique NPCs, APOB is the perfect tool to unleash your creativity and enhance your projects.
Moshi AI
Moshi AI is a new voice assistant with advanced vocal capabilities that simulate human-like conversations. It can be used as a personal coach or companion, providing guidance and support in various scenarios. Moshi AI offers real-time voice interaction, efficient multimodal processing, and enhanced privacy and security features. The application is designed to enhance business operations, improve customer interactions, and streamline decision-making processes.
helpmee.ai
helpmee.ai is an AI-guided computer help platform designed to empower seniors and individuals with tech challenges through patient, voice-enabled conversations, screen sharing, and cutting-edge AI vision technology. The platform offers personalized assistance in 50+ languages, 24/7, using OpenAI's latest GPT-4o model to ensure users can navigate the digital world with confidence and independence. With subscription plans tailored to different needs, helpmee.ai aims to provide digital autonomy and minimize family tech support frustrations.
CONVA
CONVA is a platform that allows developers to add voice-first AI copilot functionality to their mobile and web apps. It provides natural, multilingual, and multimodal conversational AI experiences for app users. CONVA's voice copilot can help users with tasks such as product discovery, search, navigation, and recommendations. It is easy to integrate and can be used across multiple platforms, including iOS, Android, Flutter, React, Web, and Shopify. CONVA also offers pre-trained category-specific voice copilot, cross-platform availability, multilingual support, demand insights, and a full-stack solution for voice and text-driven search, action, navigation, and recommendations.
Open GPT 4o
Open GPT 4o is an advanced large multimodal language model developed by OpenAI, offering real-time audiovisual responses, emotion recognition, and superior visual capabilities. It can handle text, audio, and image inputs, providing a rich and interactive user experience. GPT 4o is free for all users and features faster response times, advanced interactivity, and the ability to recognize and output emotions. It is designed to be more powerful and comprehensive than its predecessor, GPT 4, making it suitable for applications requiring voice interaction and multimodal processing.
SoundHound AI
SoundHound AI is a global leader in conversational intelligence, providing voice AI solutions for businesses to offer exceptional conversational experiences to their customers. Their proprietary technology enables best-in-class speed and accuracy in multiple languages across automotive, TV, IoT, and customer service industries. SoundHound offers innovative AI-driven products like Smart Answering, Smart Ordering, and Dynamic Interaction™, a real-time customer service interface. With SoundHound Chat AI, a powerful voice assistant integrated with Generative AI, the company powers millions of products and services, handling billions of interactions annually for top-tier businesses.
Text to Speech Online
Text to Speech Online is a free AI tool that offers unlimited text-to-speech conversion with over 409 realistic voices and 129 languages & dialects. Users can convert text to speech in seconds without the need to log in or sign up. The tool supports multiple languages and accents, including standard voices and AI voices, and offers flexible pricing models. Users can enjoy a full set of SSML features, create natural-sounding speech, download audio in MP3 or WAV formats, and share results on various platforms. Text to Speech Online is a versatile tool that can be used for various purposes, including providing audio cues for visually impaired users, assisting in education, creating audio versions of books, and developing virtual assistants.
AiCogni
AiCogni is a multi-lingual voice chat bot and writing assistant, powered by ChatGPT. It is designed to be a versatile AI companion that can help users with a wide range of tasks, from learning and communication to creativity and productivity. AiCogni's advanced ChatGPT technology enables it to understand and respond to user queries in a natural and informative way, making it an ideal tool for anyone looking to enhance their communication and learning experiences.
Voqal
Voqal is an intelligent voice coding assistant designed to provide software developers with natural speech programming capabilities. It offers customizable features, context extensions, and access to various compute providers. Voqal simplifies coding processes by allowing users to navigate, edit, and confirm changes using voice commands. With a low learning curve and high skill ceiling, Voqal aims to enhance software development efficiency and productivity.
Voice Air
Voice Air is an AI-powered Text to Speech Generator that allows users to create studio-quality audio and video content with advanced AI voices on web and mobile applications. It offers cutting-edge features to enhance content creation, such as human-like voiceovers, award-winning music library, and AI features for content scaling. Voice Air is used in 70+ countries, with 100,000+ downloads and is loved by 12,000+ content creators. The application aims to revolutionize content creation by providing high-quality, natural-sounding voices and innovative features.
Dasha
Dasha is a conversational AI-as-a-service platform that allows developers to embed realistic voice and text conversational capabilities into their apps or products. With a single integration, developers can create smart conversational apps for web, desktop, mobile, IoT, and call centers. Dasha's declarative programming language, DashaScript, makes it easy to design complex real-world conversations that pass a limited Turing test. Developers can use Dasha to automate call center conversations, recreate the Google Duplex demo, or create no-code GUIs for their users. Dasha's platform is flexible and can be integrated with any platform or programming language. It also offers a free tier for builders and testers.
Iconi Ai
Iconi Ai is an all-in-one platform that provides a suite of AI-powered tools to help businesses and individuals create and manage content, generate code, and automate tasks. With Iconi Ai, users can generate text, images, code, chatbots, and more, all with just a few clicks. The platform also includes a range of features to help users track their progress, manage their team, and get support. Iconi Ai is a powerful tool that can help businesses and individuals save time, money, and effort while creating high-quality content and code.
Bibit AI
Bibit AI is a real estate marketing AI designed to enhance the efficiency and effectiveness of real estate marketing and sales. It can help create listings, descriptions, and property content, and offers a host of other features. Bibit AI is the world's first AI for Real Estate. We are transforming the real estate industry by boosting efficiency and simplifying tasks like listing creation and content generation.
Fluid
Fluid is a private AI assistant designed for Mac users, specifically those with Apple Silicon and macOS 14 or later. It offers offline capabilities and is powered by the advanced Llama 3 AI by Meta. Fluid ensures unparalleled privacy by keeping all chats and data on the user's Mac, without the need to send sensitive information to third parties. The application features voice control, one-click installation, easy access, security by design, auto-updates, history mode, web search capabilities, context awareness, and memory storage. Users can interact with Fluid by typing or using voice commands, making it a versatile and user-friendly AI tool for various tasks.
Tune Chat
Tune Chat is a chat application that utilizes open-source Large Language Models (LLMs) to provide users with a conversational and informative experience. It is designed to understand and respond to a wide range of user queries, offering assistance with various tasks and engaging in natural language conversations.
Millis AI
Millis AI is an advanced AI tool that enables users to effortlessly create next-gen voice agents with ultra-low latency, providing a seamless and natural conversational experience. It offers affordable pricing, integration with various services through webhooks, and the ability to connect phone numbers to AI voice agents for inbound/outbound calls in over 100 countries. With Millis AI, users can build and deploy voice agents in minutes, from no-code to low-code developers, and transform voice interactions across industries.
20 - Open Source Tools
kobold_assistant
Kobold-Assistant is a fully offline voice assistant interface to KoboldAI's large language model API. It can work online with the KoboldAI horde and online speech-to-text and text-to-speech models. The assistant, called Jenny by default, uses the latest coqui 'jenny' text to speech model and openAI's whisper speech recognition. Users can customize the assistant name, speech-to-text model, text-to-speech model, and prompts through configuration. The tool requires system packages like GCC, portaudio development libraries, and ffmpeg, along with Python >=3.7, <3.11, and runs on Ubuntu/Debian systems. Users can interact with the assistant through commands like 'serve' and 'list-mics'.
chat-xiuliu
Chat-xiuliu is a bidirectional voice assistant powered by ChatGPT, capable of accessing the internet, executing code, reading/writing files, and supporting GPT-4V's image recognition feature. It can also call DALL·E 3 to generate images. The project is a fork from a background of a virtual cat girl named Xiuliu, with removed live chat interaction and added voice input. It can receive questions from microphone or interface, answer them vocally, upload images and PDFs, process tasks through function calls, remember conversation content, search the web, generate images using DALL·E 3, read/write local files, execute JavaScript code in a sandbox, open local files or web pages, customize the cat girl's speaking style, save conversation screenshots, and support Azure OpenAI and other API endpoints in openai format. It also supports setting proxies and various AI models like GPT-4, GPT-3.5, and DALL·E 3.
SirChatalot
A Telegram bot that proves you don't need a body to have a personality. It can use various text and image generation APIs to generate responses to user messages. For text generation, the bot can use: * OpenAI's ChatGPT API (or other compatible API). Vision capabilities can be used with GPT-4 models. Function calling can be used with Function calling. * Anthropic's Claude API. Vision capabilities can be used with Claude 3 models. Function calling can be used with tool use. * YandexGPT API Bot can also generate images with: * OpenAI's DALL-E * Stability AI * Yandex ART This bot can also be used to generate responses to voice messages. Bot will convert the voice message to text and will then generate a response. Speech recognition can be done using the OpenAI's Whisper model. To use this feature, you need to install the ffmpeg library. This bot is also support working with files, see Files section for more details. If function calling is enabled, bot can generate images and search the web (limited).
june
june-va is a local voice chatbot that combines Ollama for language model capabilities, Hugging Face Transformers for speech recognition, and the Coqui TTS Toolkit for text-to-speech synthesis. It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. The tool supports various interaction modes including text input/output, voice input/text output, text input/audio output, and voice input/audio output. Users can customize the tool's behavior with a JSON configuration file and utilize voice conversion features for voice cloning. The application can be further customized using a configuration file with attributes for language model, speech-to-text model, and text-to-speech model configurations.
OpenVoiceChat
OpenVoiceChat is an open-source tool designed for having natural voice conversations with an LLM model. It supports various speech-to-text (STT), text-to-speech (TTS), and large language model (LLM) models. The tool aims to provide an alternative to closed commercial implementations, with well-abstracted APIs that are easy to use and extend. Users can install base and functionality-specific packages using pip, and the tool supports interruptions during conversations. The project encourages contributions through bounties and has a detailed roadmap available for reference.
Srt-AI-Voice-Assistant
Srt-AI-Voice-Assistant is a convenient tool that generates audio from uploaded .srt subtitle files by calling APIs such as Bert-VITS2 (HiyoriUI), GPT-SoVITS, and Microsoft TTS (online). The code is currently not perfect, and feedback on bugs or suggestions can be provided at https://github.com/YYuX-1145/Srt-AI-Voice-Assistant/issues. Recent updates include adding custom API functionality with a focus on security, support for Microsoft online TTS (requires key configuration), error handling improvements, automatic project path detection, compatibility with API-v1 for limited functionality, and significant feature updates supporting card synthesis.
ai-devices
AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.
agents-js
LiveKit Agents for Node.js is a framework designed for building realtime, programmable voice agents that can see, hear, and understand. It includes support for OpenAI Realtime API, allowing for ultra-low latency WebRTC transport between GPT-4o and users' devices. The framework provides concepts like Agents, Workers, and Plugins to create complex tasks. It offers a CLI interface for running agents and a versatile web frontend called 'playground' for building and testing agents. The framework is suitable for developers looking to create conversational voice agents with advanced capabilities.
local-talking-llm
The 'local-talking-llm' repository provides a tutorial on building a voice assistant similar to Jarvis or Friday from Iron Man movies, capable of offline operation on a computer. The tutorial covers setting up a Python environment, installing necessary libraries like rich, openai-whisper, suno-bark, langchain, sounddevice, pyaudio, and speechrecognition. It utilizes Ollama for Large Language Model (LLM) serving and includes components for speech recognition, conversational chain, and speech synthesis. The implementation involves creating a TextToSpeechService class for Bark, defining functions for audio recording, transcription, LLM response generation, and audio playback. The main application loop guides users through interactive voice-based conversations with the assistant.
ovos-buildroot
OVOS - Buildroot OS is a minimalistic Linux OS designed to bring the open source voice assistant ovos-core to embedded, low-spec headless, and small touchscreen devices. It includes a full 64-bit distribution with Linux kernel 6.1.x, Buildroot 2023.02.x, and OVOS framework utilizing ovos-docker containers. The supported hardware includes Raspberry Pi 3, 3b, 3b+, Raspberry Pi 4, x86_64 Intel-based computers, and Open Virtual Appliance. The project is inspired by Mycroft AI, Buildroot, and HassOS, offering a platform for building voice assistant solutions on various devices.
agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
AI0x0.com
AI 0x0 is a versatile AI query generation desktop floating assistant application that supports MacOS and Windows. It allows users to utilize AI capabilities in any desktop software to query and generate text, images, audio, and video data, helping them work more efficiently. The application features a dynamic desktop floating ball, floating dialogue bubbles, customizable presets, conversation bookmarking, preset packages, network acceleration, query mode, input mode, mouse navigation, deep customization of ChatGPT Next Web, support for full-format libraries, online search, voice broadcasting, voice recognition, voice assistant, application plugins, multi-model support, online text and image generation, image recognition, frosted glass interface, light and dark theme adaptation for each language model, and free access to all language models except Chat0x0 with a key.
AIOsense
AIOsense is an all-in-one sensor that is modular, affordable, and easy to solder. It is designed to be an alternative to commercially available sensors and focuses on upgradeability. AIOsense is cheaper and better than most commercial sensors and supports a variety of sensors and modules, including: - (RGB)-LED - Barometer - Breath VOC equivalent - Buzzer / Beeper - CO² equivalent - Humidity sensor - Light / Illumination sensor - PIR motion sensor - Temperature sensor - mmWave / Radar sensor Upcoming features include full voice assistant support, microphone, and speaker. All supported sensors & modules are listed in the documentation. AIOsense has a low power consumption, with an idle power consumption of 0.45W / 0.09A on a fully equipped board. Without a mmWave sensor, the idle power consumption is around 0.11W / 0.02A. To get started with AIOsense, you can refer to the documentation. If you have any questions, you can open an issue.
awesome-chatgpt
Awesome ChatGPT is an artificial intelligence chatbot developed by OpenAI. It offers a wide range of applications, web apps, browser extensions, CLI tools, bots, integrations, and packages for various platforms. Users can interact with ChatGPT through different interfaces and use it for tasks like generating text, creating presentations, summarizing content, and more. The ecosystem around ChatGPT includes tools for developers, writers, researchers, and individuals looking to leverage AI technology for different purposes.
whatsapp-chatgpt
This repository contains a WhatsApp bot that utilizes OpenAI's GPT and DALL-E 2 to respond to user inputs. Users can interact with the bot through voice messages, which are transcribed and responded to. The bot requires Node.js, npm, an OpenAI API key, and a WhatsApp account. It uses Puppeteer to run a real instance of Whatsapp Web to avoid being blocked. However, there is a risk of being blocked by WhatsApp as it does not allow bots or unofficial clients on its platform. The bot is not free to use, and users will be charged by OpenAI for each request made.
voicechat2
Voicechat2 is a fast, fully local AI voice chat tool that uses WebSockets for communication. It includes a WebSocket server for remote access, default web UI with VAD and Opus support, and modular/swappable SRT, LLM, TTS servers. Users can customize components like SRT, LLM, and TTS servers, and run different models for voice-to-voice communication. The tool aims to reduce latency in voice communication and provides flexibility in server configurations.
doc2plan
doc2plan is a browser-based application that helps users create personalized learning plans by extracting content from documents. It features a Creator for manual or AI-assisted plan construction and a Viewer for interactive plan navigation. Users can extract chapters, key topics, generate quizzes, and track progress. The application includes AI-driven content extraction, quiz generation, progress tracking, plan import/export, assistant management, customizable settings, viewer chat with text-to-speech and speech-to-text support, and integration with various Retrieval-Augmented Generation (RAG) models. It aims to simplify the creation of comprehensive learning modules tailored to individual needs.
awesome-ai
Awesome AI is a curated list of artificial intelligence resources including courses, tools, apps, and open-source projects. It covers a wide range of topics such as machine learning, deep learning, natural language processing, robotics, conversational interfaces, data science, and more. The repository serves as a comprehensive guide for individuals interested in exploring the field of artificial intelligence and its applications across various domains.
awesome-langchain
LangChain is an amazing framework to get LLM projects done in a matter of no time, and the ecosystem is growing fast. Here is an attempt to keep track of the initiatives around LangChain. Subscribe to the newsletter to stay informed about the Awesome LangChain. We send a couple of emails per month about the articles, videos, projects, and tools that grabbed our attention Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.
20 - OpenAI Gpts
DateMate
Your friendly AI assistant for voice-based dating, offering personalized tips, safety advice, and fun interactions.
🤖 SmartLink Integrator 🌎
Your AI bridge to the Internet of Things! Easily connect, control, and automate your smart devices with voice or text commands. 🏠💎
Concept Tutor
Assistant focused on teaching concepts, evaluating comprehension, and recommending subsequent topics. USE WITH VOICE.
Ren'Py Visual Novel Assistant
Friendly and casual assistant for creating Ren'Py visual novels
Him
He is an incredibly humanlike friend, deeply trained for engaging voice conversation and meaningful connection.
😴 SleepyTales
(aka ChatSleepy-T) Spinning long and boring stories to help you unwind and fall asleep. Designed for voice mode, turn it on and chill...
Dialysis Assistant
Home Hemodialysis Helper for NxStage system. Step-by-step guidance, help for tricky situations, and voice interaction recommended.
Text Playground
Best AI-powered Text Playground!! I am your go-to assistant for text-to other media conversions. Flawelessly convert any text to voice, image, or video!! I am here to help. Ask me anything!!
Skillful Voice
Premier expert in household management, offering unparalleled advice and guidance.
CliniType EHR
Voice-to-text, Vision-to-text transcription, Transcript-to-‘Clinical format’ integrated with CDS. Writes clinical notes, referral letter, generate PDF,prepare discharge summary. (Ultimate aid for clinicians)
BostonGPT
Chat with the Boston Accent. For best results, use voice in the native ChatGPT mobile app
Marina the Brazilian Portuguese Tutor
More than your average AI Teacher! A Teacher with a REAL personality👋🏻 Hi there! ❤️ Learn with me Brazilian Portuguese ✅ I coach beginner to advanced level 💬 Practice vocabulary, writing, reading, speaking, or learn a new topic 📲 Use voice in mobile for talking
Bring Your Writing Voice to Every Task
This GPT will help you recreate your writing voice across multiple tasks. All you need is a prior writing sample (email, blog, article, tweet) and a new task.