Best AI tools for< Voice Assistant >
Infographic
20 - AI tool Sites
Personal Voice and Vision Assistant
This AI-powered voice and vision assistant offers a range of features to enhance communication, productivity, and learning. Engage in natural voice conversations, get assistance with daily tasks, manage your schedule, and interact with visuals seamlessly. The assistant adapts to your needs, providing personalized support and advice. With its intuitive interface and affordable pricing, it's an ideal companion for individuals of all ages and interests.
Suki Assistant
Suki Assistant is an enterprise-grade AI assistant designed to help clinicians save time by providing ambient documentation, dictation, ICD-10 and HCC coding, and answering questions in one solution. It offers deep EHR integrations with all major EHRs, ensuring safe AI practices, hassle-free partnership, proven ROI, and advanced EHR integrations. Suki is trusted by health systems across the country for its reliability, scalability, and convenience in clinical documentation.
AIReception
AIReception is a conversational AI voice assistant platform that allows businesses to build virtual receptionists capable of answering customer questions 24/7. The AI voice assistants are designed to replicate human speech patterns and interactions, providing a natural and immersive experience. The platform offers features such as hyper-realistic voices, human-like interaction, perfect memory, customizable responses, and call transferring. AIReception aims to enhance customer service, reduce overhead costs, and provide detailed analytics for customer interactions.
VoiceGPT
VoiceGPT is an Android app that provides a voice-based interface to interact with AI language models like ChatGPT, Bing AI, and Bard. It offers features such as unlimited free messages, voice input and output in 67+ languages, a floating bubble for easy switching between apps, OCR text recognition, code execution, image generation with DALL-E 2, and support for ChatGPT Plus accounts. VoiceGPT is designed to be accessible for users with visual impairments, dyslexia, or other conditions, and it can be set as the default assistant to be activated hands-free with a custom hotword.
Swift
Swift is an AI-powered voice assistant that utilizes cutting-edge technologies such as Groq, Cartesia, VAD, and Vercel to provide users with a fast and efficient voice interaction experience. With Swift, users can perform various tasks using voice commands, making it a versatile tool for hands-free operation in different settings. The application aims to streamline daily tasks and enhance user productivity through seamless voice recognition capabilities.
SiriGPT
SiriGPT is a voice assistant that allows users to access the power of ChatGPT on their iPhone and Mac devices. It is the fastest way to use GPT, and it is easy to set up and use. With SiriGPT, you can ask ChatGPT questions, get help with tasks, and more. It is a powerful tool that can help you be more productive and efficient.
TakeNote
TakeNote is a cutting-edge speech-to-text AI that transforms audio and video into documents, boosting productivity and enhancing meeting experiences. Its advanced AI models provide exceptional accuracy, approaching human-level robustness and accuracy in English speech recognition. TakeNote AI empowers teams to transcribe meetings into accurate transcripts, generate precise summaries, analyze sentiment, and identify speakers, all while ensuring high levels of security and data protection.
Fireflies.ai
Fireflies.ai is an AI-powered notetaker that helps teams transcribe, summarize, search, and analyze voice conversations. It integrates with popular video conferencing apps and dialers, allowing users to automatically record and transcribe meetings. Fireflies.ai also offers advanced features such as AI-powered search, collaboration tools, and conversation intelligence, enabling teams to quickly find key information, collaborate on meeting notes, and gain insights from their conversations.
Amy
Amy is a workplace assistant that uses conversational technology to help users with a variety of tasks, including communication, HR, web management, and recruitment. Amy can be used to send messages, schedule meetings, manage attendance and leaves, update websites, post blogs and jobs, and find talent. Amy is designed to be easy to use and can be accessed through a variety of devices, including smartphones, tablets, and computers.
Ascenscia
Ascenscia is a specialized AI voice assistant designed to streamline lab digitization processes. It integrates with laboratory software and machines to enable hands-free interactions, automating data collection, optimizing workflows, and accelerating R&D cycles. Ascenscia offers features such as data accessibility, data capturing, inventory access, and additional task management. The application is designed for scientific labs, addressing concerns with precision, safety, and adaptability. It boasts high accuracy in understanding scientific terminologies, end-to-end data encryption, multi-lingual support, and customization options for different lab workflows.
SoundHound
SoundHound is a leading innovator of conversational intelligence and voice AI technologies. Our independent voice AI platform is built for more natural conversation, enabling businesses to create customized and scalable voice AI solutions for their specific industries and use cases. With SoundHound, you can build voice assistants, enhance smart devices, improve customer experiences, and drive business value.
Soundverse AI
Soundverse AI is an AI music generator and music assistant that allows users to create music instantly from text prompts, interact with a voice assistant for music-related help, chat with the assistant for music recommendations, extend existing tracks with new sections, isolate individual audio tracks from a mix, auto-complete songs using initial ideas, craft lyrics with AI assistance, and more. The platform offers a range of AI tools to help users iterate and personalize their music creation process, making it easy to transform ideas into music in seconds.
OpenVoiceOS
OpenVoiceOS is a community-driven, open-source voice AI platform for creating custom voice-controlled interfaces across devices with NLP, a customizable UI, and a focus on privacy and security. OpenVoiceOS is designed to provide users with a seamless and intuitive voice interface for controlling their smart home devices, playing music, setting reminders, and much more. OpenVoiceOS is open to all developers and contributors wanting to support a specific device or a platform. OpenVoiceOS is the platform to throw your ideas at if you have an experimental feature you want users to experience before landing them into any of the Linux-based open-source voice assistant projects upstream.
PolyAI
PolyAI is an AI tool that offers a conversational platform for contact centers, enabling natural interactions with customers. It provides voice AI solutions to handle various tasks like account management, authentication, billing, booking, and troubleshooting. PolyAI aims to enhance customer experience, increase operational efficiency, and drive revenue generation through voice assistants. The platform is designed to transform call centers into revenue generators by resolving inquiries, improving customer satisfaction, and reducing operational costs.
Moshi AI
Moshi AI is a new voice assistant with advanced vocal capabilities that simulate human-like conversations. It can be used as a personal coach or companion, providing guidance and support in various scenarios. Moshi AI offers real-time voice interaction, efficient multimodal processing, and enhanced privacy and security features. The application is designed to enhance business operations, improve customer interactions, and streamline decision-making processes.
Voicebot.ai
Voicebot.ai is an AI-focused website that provides comprehensive information and insights on voice assistants, AI models, generative AI, and related technologies. The platform covers a wide range of topics such as smart speakers, voice shopping, healthcare voice assistants, and AI in marketing. It also offers reports, research, and best practices in the field of voice technology. Voicebot.ai aims to educate and inform its audience about the latest developments and trends in the AI industry.
chatQR.ai
chatQR.ai is an AI-powered ordering application that serves as a complete Point Of Sale/Kiosk replacement. It utilizes voice recognition technology combined with the latest Large Language Model (LLM) AI to create a seamless QR code ordering experience for customers. The system is designed to be AI-first, offering mature point of sale features and the ability to integrate the ChatQR Voice Assistant into existing systems. With support for multiple currencies and payment providers like Stripe and Square, chatQR.ai aims to revolutionize the way businesses manage orders and payments.
Inbox Narrator
Inbox Narrator is an AI-powered email assistant application that provides users with morning summaries of their emails in a smooth voice. It offers features such as email chat, email podcast, and human-level email summaries. Users can seamlessly integrate with voice assistants like Siri and Google Assistant to manage their inbox efficiently. The application ensures privacy and security by providing read-only access to Gmail accounts and not storing email content. With a subscription fee of $5 per month after a 30-day free trial, Inbox Narrator aims to transform users' mornings by delivering personalized email summaries.
CONVA
CONVA is a platform that allows developers to add voice-first AI copilot functionality to their mobile and web apps. It provides natural, multilingual, and multimodal conversational AI experiences for app users. CONVA's voice copilot can help users with tasks such as product discovery, search, navigation, and recommendations. It is easy to integrate and can be used across multiple platforms, including iOS, Android, Flutter, React, Web, and Shopify. CONVA also offers pre-trained category-specific voice copilot, cross-platform availability, multilingual support, demand insights, and a full-stack solution for voice and text-driven search, action, navigation, and recommendations.
MTS AI
MTS AI is a platform offering AI-based products and solutions, leveraging artificial intelligence technologies to create voice assistants, chatbots, video analysis solutions, and more. They develop AI solutions using natural language processing, computer vision, and edge computing technologies, collaborating with leading tech companies and global experts. MTS AI aims to find the most viable AI applications for the benefit of society, providing automation for customer service systems, security control, and voice and video data analysis.
20 - Open Source Tools
Srt-AI-Voice-Assistant
Srt-AI-Voice-Assistant is a convenient tool that generates audio from uploaded .srt subtitle files by calling APIs such as Bert-VITS2 (HiyoriUI), GPT-SoVITS, and Microsoft TTS (online). The code is currently not perfect, and feedback on bugs or suggestions can be provided at https://github.com/YYuX-1145/Srt-AI-Voice-Assistant/issues. Recent updates include adding custom API functionality with a focus on security, support for Microsoft online TTS (requires key configuration), error handling improvements, automatic project path detection, compatibility with API-v1 for limited functionality, and significant feature updates supporting card synthesis.
kobold_assistant
Kobold-Assistant is a fully offline voice assistant interface to KoboldAI's large language model API. It can work online with the KoboldAI horde and online speech-to-text and text-to-speech models. The assistant, called Jenny by default, uses the latest coqui 'jenny' text to speech model and openAI's whisper speech recognition. Users can customize the assistant name, speech-to-text model, text-to-speech model, and prompts through configuration. The tool requires system packages like GCC, portaudio development libraries, and ffmpeg, along with Python >=3.7, <3.11, and runs on Ubuntu/Debian systems. Users can interact with the assistant through commands like 'serve' and 'list-mics'.
chat-xiuliu
Chat-xiuliu is a bidirectional voice assistant powered by ChatGPT, capable of accessing the internet, executing code, reading/writing files, and supporting GPT-4V's image recognition feature. It can also call DALL·E 3 to generate images. The project is a fork from a background of a virtual cat girl named Xiuliu, with removed live chat interaction and added voice input. It can receive questions from microphone or interface, answer them vocally, upload images and PDFs, process tasks through function calls, remember conversation content, search the web, generate images using DALL·E 3, read/write local files, execute JavaScript code in a sandbox, open local files or web pages, customize the cat girl's speaking style, save conversation screenshots, and support Azure OpenAI and other API endpoints in openai format. It also supports setting proxies and various AI models like GPT-4, GPT-3.5, and DALL·E 3.
pipecat
Pipecat is an open-source framework designed for building generative AI voice bots and multimodal assistants. It provides code building blocks for interacting with AI services, creating low-latency data pipelines, and transporting audio, video, and events over the Internet. Pipecat supports various AI services like speech-to-text, text-to-speech, image generation, and vision models. Users can implement new services and contribute to the framework. Pipecat aims to simplify the development of applications like personal coaches, meeting assistants, customer support bots, and more by providing a complete framework for integrating AI services.
ai-devices
AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.
ovos-buildroot
OVOS - Buildroot OS is a minimalistic Linux OS designed to bring the open source voice assistant ovos-core to embedded, low-spec headless, and small touchscreen devices. It includes a full 64-bit distribution with Linux kernel 6.1.x, Buildroot 2023.02.x, and OVOS framework utilizing ovos-docker containers. The supported hardware includes Raspberry Pi 3, 3b, 3b+, Raspberry Pi 4, x86_64 Intel-based computers, and Open Virtual Appliance. The project is inspired by Mycroft AI, Buildroot, and HassOS, offering a platform for building voice assistant solutions on various devices.
local-talking-llm
The 'local-talking-llm' repository provides a tutorial on building a voice assistant similar to Jarvis or Friday from Iron Man movies, capable of offline operation on a computer. The tutorial covers setting up a Python environment, installing necessary libraries like rich, openai-whisper, suno-bark, langchain, sounddevice, pyaudio, and speechrecognition. It utilizes Ollama for Large Language Model (LLM) serving and includes components for speech recognition, conversational chain, and speech synthesis. The implementation involves creating a TextToSpeechService class for Bark, defining functions for audio recording, transcription, LLM response generation, and audio playback. The main application loop guides users through interactive voice-based conversations with the assistant.
M.I.L.E.S
M.I.L.E.S. (Machine Intelligent Language Enabled System) is a voice assistant powered by GPT-4 Turbo, offering a range of capabilities beyond existing assistants. With its advanced language understanding, M.I.L.E.S. provides accurate and efficient responses to user queries. It seamlessly integrates with smart home devices, Spotify, and offers real-time weather information. Additionally, M.I.L.E.S. possesses persistent memory, a built-in calculator, and multi-tasking abilities. Its realistic voice, accurate wake word detection, and internet browsing capabilities enhance the user experience. M.I.L.E.S. prioritizes user privacy by processing data locally, encrypting sensitive information, and adhering to strict data retention policies.
outspeed
Outspeed is a PyTorch-inspired SDK for building real-time AI applications on voice and video input. It offers low-latency processing of streaming audio and video, an intuitive API familiar to PyTorch users, flexible integration of custom AI models, and tools for data preprocessing and model deployment. Ideal for developing voice assistants, video analytics, and other real-time AI applications processing audio-visual data.
AI0x0.com
AI 0x0 is a versatile AI query generation desktop floating assistant application that supports MacOS and Windows. It allows users to utilize AI capabilities in any desktop software to query and generate text, images, audio, and video data, helping them work more efficiently. The application features a dynamic desktop floating ball, floating dialogue bubbles, customizable presets, conversation bookmarking, preset packages, network acceleration, query mode, input mode, mouse navigation, deep customization of ChatGPT Next Web, support for full-format libraries, online search, voice broadcasting, voice recognition, voice assistant, application plugins, multi-model support, online text and image generation, image recognition, frosted glass interface, light and dark theme adaptation for each language model, and free access to all language models except Chat0x0 with a key.
RealtimeSTT_LLM_TTS
RealtimeSTT is an easy-to-use, low-latency speech-to-text library for realtime applications. It listens to the microphone and transcribes voice into text, making it ideal for voice assistants and applications requiring fast and precise speech-to-text conversion. The library utilizes Voice Activity Detection, Realtime Transcription, and Wake Word Activation features. It supports GPU-accelerated transcription using PyTorch with CUDA support. RealtimeSTT offers various customization options for different parameters to enhance user experience and performance. The library is designed to provide a seamless experience for developers integrating speech-to-text functionality into their applications.
AIOsense
AIOsense is an all-in-one sensor that is modular, affordable, and easy to solder. It is designed to be an alternative to commercially available sensors and focuses on upgradeability. AIOsense is cheaper and better than most commercial sensors and supports a variety of sensors and modules, including: - (RGB)-LED - Barometer - Breath VOC equivalent - Buzzer / Beeper - CO² equivalent - Humidity sensor - Light / Illumination sensor - PIR motion sensor - Temperature sensor - mmWave / Radar sensor Upcoming features include full voice assistant support, microphone, and speaker. All supported sensors & modules are listed in the documentation. AIOsense has a low power consumption, with an idle power consumption of 0.45W / 0.09A on a fully equipped board. Without a mmWave sensor, the idle power consumption is around 0.11W / 0.02A. To get started with AIOsense, you can refer to the documentation. If you have any questions, you can open an issue.
awesome-ml
Awesome ML is a curated list of resources and tools related to machine learning, covering a wide range of topics such as large language models, image models, video models, audio models, and marketing data science. It includes open LLM models, tools, GUIs, backends, voice assistants, code generation, libraries, fine tuning, data sets, research, image and video models, audio tasks like compression, speech recognition, and music generation, as well as resources for marketing data science. The repository aims to provide a comprehensive collection of resources for individuals interested in machine learning and its applications.
Conversational-Azure-OpenAI-Accelerator
The Conversational Azure OpenAI Accelerator is a tool designed to provide rapid, no-cost custom demos tailored to customer use cases, from internal HR/IT to external contact centers. It focuses on top use cases of GenAI conversation and summarization, plus live backend data integration. The tool automates conversations across voice and text channels, providing a valuable way to save money and improve customer and employee experience. By combining Azure OpenAI + Cognitive Search, users can efficiently deploy a ChatGPT experience using web pages, knowledge base articles, and data sources. The tool enables simultaneous deployment of conversational content to chatbots, IVR, voice assistants, and more in one click, eliminating the need for in-depth IT involvement. It leverages Microsoft's advanced AI technologies, resulting in a conversational experience that can converse in human-like dialogue, respond intelligently, and capture content for omni-channel unified analytics.
agents
The LiveKit Agent Framework is designed for building real-time, programmable participants that run on servers. Easily tap into LiveKit WebRTC sessions and process or generate audio, video, and data streams. The framework includes plugins for common workflows, such as voice activity detection and speech-to-text. Agents integrates seamlessly with LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.
june
june-va is a local voice chatbot that combines Ollama for language model capabilities, Hugging Face Transformers for speech recognition, and the Coqui TTS Toolkit for text-to-speech synthesis. It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. The tool supports various interaction modes including text input/output, voice input/text output, text input/audio output, and voice input/audio output. Users can customize the tool's behavior with a JSON configuration file and utilize voice conversion features for voice cloning. The application can be further customized using a configuration file with attributes for language model, speech-to-text model, and text-to-speech model configurations.
agents-js
LiveKit Agents for Node.js is a framework designed for building realtime, programmable voice agents that can see, hear, and understand. It includes support for OpenAI Realtime API, allowing for ultra-low latency WebRTC transport between GPT-4o and users' devices. The framework provides concepts like Agents, Workers, and Plugins to create complex tasks. It offers a CLI interface for running agents and a versatile web frontend called 'playground' for building and testing agents. The framework is suitable for developers looking to create conversational voice agents with advanced capabilities.
vocode-python
Vocode is an open source library that enables users to easily build voice-based LLM (Large Language Model) apps. With Vocode, users can create real-time streaming conversations with LLMs and deploy them for phone calls, Zoom meetings, and more. The library offers abstractions and integrations for transcription services, LLMs, and synthesis services, making it a comprehensive tool for voice-based applications.
vocode-core
Vocode is an open source library that enables users to build voice-based LLM (Large Language Model) applications quickly and easily. With Vocode, users can create real-time streaming conversations with LLMs and deploy them for phone calls, Zoom meetings, and more. The library offers abstractions and integrations for transcription services, LLMs, and synthesis services, making it a comprehensive tool for voice-based app development. Vocode also provides out-of-the-box integrations with various services like AssemblyAI, OpenAI, Microsoft Azure, and more, allowing users to leverage these services seamlessly in their applications.
voicechat2
Voicechat2 is a fast, fully local AI voice chat tool that uses WebSockets for communication. It includes a WebSocket server for remote access, default web UI with VAD and Opus support, and modular/swappable SRT, LLM, TTS servers. Users can customize components like SRT, LLM, and TTS servers, and run different models for voice-to-voice communication. The tool aims to reduce latency in voice communication and provides flexibility in server configurations.
20 - OpenAI Gpts
DateMate
Your friendly AI assistant for voice-based dating, offering personalized tips, safety advice, and fun interactions.
Him
He is an incredibly humanlike friend, deeply trained for engaging voice conversation and meaningful connection.
🤖 SmartLink Integrator 🌎
Your AI bridge to the Internet of Things! Easily connect, control, and automate your smart devices with voice or text commands. 🏠💎
😴 SleepyTales
(aka ChatSleepy-T) Spinning long and boring stories to help you unwind and fall asleep. Designed for voice mode, turn it on and chill...
Dialysis Assistant
Home Hemodialysis Helper for NxStage system. Step-by-step guidance, help for tricky situations, and voice interaction recommended.
Concept Tutor
Assistant focused on teaching concepts, evaluating comprehension, and recommending subsequent topics. USE WITH VOICE.
Text Playground
Best AI-powered Text Playground!! I am your go-to assistant for text-to other media conversions. Flawelessly convert any text to voice, image, or video!! I am here to help. Ask me anything!!
Skillful Voice
Premier expert in household management, offering unparalleled advice and guidance.
Ren'Py Visual Novel Assistant
Friendly and casual assistant for creating Ren'Py visual novels
CliniType EHR
Voice-to-text, Vision-to-text transcription, Transcript-to-‘Clinical format’ integrated with CDS. Writes clinical notes, referral letter, generate PDF,prepare discharge summary. (Ultimate aid for clinicians)
BostonGPT
Chat with the Boston Accent. For best results, use voice in the native ChatGPT mobile app
Marina the Brazilian Portuguese Tutor
More than your average AI Teacher! A Teacher with a REAL personality👋🏻 Hi there! ❤️ Learn with me Brazilian Portuguese ✅ I coach beginner to advanced level 💬 Practice vocabulary, writing, reading, speaking, or learn a new topic 📲 Use voice in mobile for talking
Bring Your Writing Voice to Every Task
This GPT will help you recreate your writing voice across multiple tasks. All you need is a prior writing sample (email, blog, article, tweet) and a new task.
English Mentor
I assist with English learning, mind maps, voice conversations, and writing.