Best AI tools for< Voice Input >
20 - AI tool Sites
VoiceGPT
VoiceGPT is an Android app that provides a voice-based interface to interact with AI language models like ChatGPT, Bing AI, and Bard. It offers features such as unlimited free messages, voice input and output in 67+ languages, a floating bubble for easy switching between apps, OCR text recognition, code execution, image generation with DALL-E 2, and support for ChatGPT Plus accounts. VoiceGPT is designed to be accessible for users with visual impairments, dyslexia, or other conditions, and it can be set as the default assistant to be activated hands-free with a custom hotword.
Audiobox
Audiobox is an AI tool developed by Meta for audio generation. It allows users to create custom audio content by generating voices and sound effects using voice inputs and natural language text prompts. The tool is designed to be user-friendly and versatile, catering to a wide range of use cases. Audiobox offers a series of interactive audio demos to showcase its unique capabilities and provides a platform for users to express their creativity through audio storytelling. The tool is built upon the shared self-supervised model Audiobox SSL, ensuring a safe and reliable AI experience for all users.
KnowledgeBot
The website offers an AI tool called KnowledgeBot that helps businesses save time by providing expert-level responses to repetitive questions. It uses AI to quote directly from experts and content, auto-escalates to experts when unsure, and learns reusable information from replies. KnowledgeBot can resolve help requests, find collateral quickly, discover popular queries, and absorb informal chats to capture insights. It aims to streamline sales enablement, customer support, and knowledge management processes, ultimately saving time and improving efficiency for businesses.
SpeakStruct
SpeakStruct is an AI-powered application that enables professionals, businesses, and developers to effortlessly convert voice input into structured formats using customizable templates. The platform leverages advanced AI and natural language processing to ensure high accuracy in voice transcription and data structuring, making it ideal for various industries such as sales & marketing, customer support, product & engineering, financial/mortgage advisors, and healthcare professionals. SpeakStruct's flexible template builder allows users to tailor the application to their specific needs, capturing voice input from any channel and transforming it into a consistent, structured format.
Spoken AI
Spoken AI is an innovative AI tool that enables users to interact with technology through voice commands. It leverages cutting-edge natural language processing and machine learning algorithms to understand and respond to spoken language. With Spoken AI, users can perform various tasks hands-free, such as setting reminders, sending messages, playing music, and getting weather updates. The application aims to enhance user experience by providing a seamless and intuitive way to engage with devices using voice input.
MonAi
MonAi is an AI-powered expense tracker that simplifies the process of tracking expenses by allowing users to input their expenses through voice messages. The AI technology automatically categorizes the expenses and generates a short description and amount. Users can easily confirm and save the details without the need for logging in. The data is securely stored in the user's private iCloud account. MonAi also enables users to share and collaborate on expense tracking. It offers a convenient and efficient way to manage expenses with the help of AI technology.
EmpathixAI
EmpathixAI is an innovative AI tool designed to analyze and interpret human emotions through text and voice inputs. The tool uses advanced natural language processing and sentiment analysis algorithms to provide accurate insights into the emotional state of individuals. EmpathixAI helps businesses understand customer feedback, improve communication strategies, and enhance user experiences. With its user-friendly interface and powerful analytics capabilities, EmpathixAI is a valuable tool for companies looking to gain a deeper understanding of customer sentiment and emotions.
ChatLabs
ChatLabs is an AI application that provides users with access to a variety of AI models for tasks such as chatting, writing, web searching, image generation, and more. Users can interact with AI assistants, browse the web, generate AI art, and utilize voice input features. The platform offers a prompt library, chat with files functionality, split-screen mode, and a Chrome extension for enhanced user experience.
MindOS
MindOS is an AI tool designed to streamline various aspects of business operations by offering AI agents that can be trained to handle tasks such as answering customer FAQs, scheduling appointments, collecting leads, and transitioning from AI to human support seamlessly. The platform provides a user-friendly interface for incorporating data sources, developing personalized AI agents, tailoring them to brand preferences, and integrating them into websites. MindOS stands out for its powerful features, including special avatar customization, access to various data sources, easy feedback mechanisms, prompt and precise answers, voice input, whitelabeling, multilingual support, and the latest AI models.
Genji
Genji is an AI Browser Assistant that aims to revolutionize the way users interact with their web browsers. By leveraging artificial intelligence, Genji acts as a virtual sidekick, capable of automating various tasks and actions within the browser environment. Users can delegate tasks to Genji using plain language commands, allowing them to focus on more important matters while Genji handles the rest. With features like task automation, voice input commands, and task scheduling, Genji offers a seamless browsing experience for both personal and professional use.
Debatia
Debatia is a free, real-time debate platform that allows users to debate anyone, worldwide, in text or voice, in their own language. Debatia's AI Judging System uses ChatGPT to deliver fair judgment in debates, offering a novel and engaging experience. Users are paired based on their debate skill level by Debatia's algorithm.
X-Me
X-Me is an AI-powered platform that allows users to create realistic digital human videos using just a selfie video and text input. With X-Me, users can generate videos in over 147 languages, and the platform offers a variety of features to customize the videos, including the ability to add music, change the background, and adjust the lighting. X-Me is a powerful tool for creating engaging and shareable content, and it is perfect for businesses, educators, and anyone who wants to create high-quality videos without the need for expensive equipment or software.
X-Me
X-Me is an AI tool that allows users to generate personalized AI avatar videos effortlessly. Users can create AI avatars that mimic famous personalities like AI Trump, AI Musk, AI Johnson, Al Kard, and AI Gaga by inputting text in multiple languages. The tool supports 147 languages, offers zero customization fees, and requires zero training. With X-Me, users can upload a selfie video, enter text, and generate AI avatar videos that reflect their face, voice, and story. The platform is known for its efficient, fast, and user-friendly approach to creating realistic digital human videos without the need for complex model training processes.
TikTok Voice Generator
TikTok Voice Generator is an AI tool that allows users to generate various AI voices for TikTok videos. Users can choose from a wide range of voice options including different languages, accents, genders, and characters. The tool provides a simple interface where users can input text and generate the desired voice within seconds. It is a free text-to-speech generator specifically designed for TikTok content creators.
Firebay Studios
Firebay Studios is an AI-powered platform that enables users to create high-quality radio ads in seconds. The tool helps companies and organizations of all sizes to automate production processes, streamline ad creation, and ultimately boost revenue. With features like AI & Cloned Voices, Editing & Production, Script Writing, SFX & Music, and support for 29 languages, Firebay Studios offers a comprehensive solution for creating captivating audio-based advertisements effortlessly.
Form2Agent AI
Form2Agent AI is a voice-assisted AI solution designed to enhance user experience by providing precise data entry support through text, voice, and file inputs. It offers hands-free operation, productivity enhancement, global reach expansion, flexible input options, seamless AI integration, and meaningful interactions. The application caters to various industries such as Fintech, Healthtech, support systems, help desk sectors, E-commerce, logistics, recruitment, HR, and software development, offering efficiency, simplification, accuracy, assistance, and accessibility.
ChatTTS
ChatTTS is an open-source text-to-speech model designed for dialogue scenarios, supporting both English and Chinese speech generation. Trained on approximately 100,000 hours of Chinese and English data, it delivers speech quality comparable to human dialogue. The tool is particularly suitable for tasks involving large language model assistants and creating dialogue-based audio and video introductions. It provides developers with a powerful and easy-to-use tool based on open-source natural language processing and speech synthesis technologies.
MacWhisper
MacWhisper is a native macOS application that utilizes OpenAI's Whisper technology for transcribing audio files into text. It offers a user-friendly interface for recording, transcribing, and editing audio, making it suitable for various use cases such as transcribing meetings, lectures, interviews, and podcasts. The application is designed to protect user privacy by performing all transcriptions locally on the device, ensuring that no data leaves the user's machine.
SpeechForms
SpeechForms is an AI-powered application that revolutionizes the traditional form-filling process by enabling users to verbally input information instead of typing. By leveraging cutting-edge voice recognition technology, SpeechForms simplifies data entry tasks and enhances user experience. Developed by Toggl ai, this innovative tool streamlines the form completion process, offering a seamless and efficient solution for individuals and businesses alike.
VMEG
VMEG is an AI-powered platform that enables users to create infinite AI-crafted videos for marketing purposes. It allows users to transform their inventory and ideas into dynamic and diverse short videos instantly. The platform supports multiple input formats such as video, image, text, and URL, and utilizes AI crafting to generate high-quality videos with various effects. VMEG offers features like automatic video subtitle generation, eye-catching title creation, precise alignment of audio and vision, and easy distribution to multiple platforms. With VMEG, users can efficiently create professional-level video content and significantly improve their marketing efforts.
20 - Open Source AI Tools
ai-devices
AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.
chat-xiuliu
Chat-xiuliu is a bidirectional voice assistant powered by ChatGPT, capable of accessing the internet, executing code, reading/writing files, and supporting GPT-4V's image recognition feature. It can also call DALL·E 3 to generate images. The project is a fork from a background of a virtual cat girl named Xiuliu, with removed live chat interaction and added voice input. It can receive questions from microphone or interface, answer them vocally, upload images and PDFs, process tasks through function calls, remember conversation content, search the web, generate images using DALL·E 3, read/write local files, execute JavaScript code in a sandbox, open local files or web pages, customize the cat girl's speaking style, save conversation screenshots, and support Azure OpenAI and other API endpoints in openai format. It also supports setting proxies and various AI models like GPT-4, GPT-3.5, and DALL·E 3.
moco-ai-client
The moco-ai-client is an AI assistant tool that allows users to send prompts continuously without waiting for answers. It saves conversation history locally to protect privacy. The tool supports various AI services like Google Gemini, ChatGPT, and GPT3.5. It also enables voice input in Chinese and English, text-to-speech in multiple languages, and image generation. Users can customize roles and share content easily. The tool is under development, and suggestions are welcome for improvements.
june
june-va is a local voice chatbot that combines Ollama for language model capabilities, Hugging Face Transformers for speech recognition, and the Coqui TTS Toolkit for text-to-speech synthesis. It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers. The tool supports various interaction modes including text input/output, voice input/text output, text input/audio output, and voice input/audio output. Users can customize the tool's behavior with a JSON configuration file and utilize voice conversion features for voice cloning. The application can be further customized using a configuration file with attributes for language model, speech-to-text model, and text-to-speech model configurations.
chatty
Chatty is a private AI tool that runs large language models natively and privately in the browser, ensuring in-browser privacy and offline usability. It supports chat history management, open-source models like Gemma and Llama2, responsive design, intuitive UI, markdown & code highlight, chat with files locally, custom memory support, export chat messages, voice input support, response regeneration, and light & dark mode. It aims to bring popular AI interfaces like ChatGPT and Gemini into an in-browser experience.
ChatPilot
ChatPilot is a chat agent tool that enables AgentChat conversations, supports Google search, URL conversation (RAG), and code interpreter functionality, replicates Kimi Chat (file, drag and drop; URL, send out), and supports OpenAI/Azure API. It is based on LangChain and implements ReAct and OpenAI Function Call for agent Q&A dialogue. The tool supports various automatic tools such as online search using Google Search API, URL parsing tool, Python code interpreter, and enhanced RAG file Q&A with query rewriting support. It also allows front-end and back-end service separation using Svelte and FastAPI, respectively. Additionally, it supports voice input/output, image generation, user management, permission control, and chat record import/export.
nextjs-ollama-llm-ui
This web interface provides a user-friendly and feature-rich platform for interacting with Ollama Large Language Models (LLMs). It offers a beautiful and intuitive UI inspired by ChatGPT, making it easy for users to get started with LLMs. The interface is fully local, storing chats in local storage for convenience, and fully responsive, allowing users to chat on their phones with the same ease as on a desktop. It features easy setup, code syntax highlighting, and the ability to easily copy codeblocks. Users can also download, pull, and delete models directly from the interface, and switch between models quickly. Chat history is saved and easily accessible, and users can choose between light and dark mode. To use the web interface, users must have Ollama downloaded and running, and Node.js (18+) and npm installed. Installation instructions are provided for running the interface locally. Upcoming features include the ability to send images in prompts, regenerate responses, import and export chats, and add voice input support.
ChatGPT-Next-Web-Pro
ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.
Local-Multimodal-AI-Chat
Local Multimodal AI Chat is a multimodal chat application that integrates various AI models to manage audio, images, and PDFs seamlessly within a single interface. It offers local model processing with Ollama for data privacy, integration with OpenAI API for broader AI capabilities, audio chatting with Whisper AI for accurate voice interpretation, and PDF chatting with Chroma DB for efficient PDF interactions. The application is designed for AI enthusiasts and developers seeking a comprehensive solution for multimodal AI technologies.
local-talking-llm
The 'local-talking-llm' repository provides a tutorial on building a voice assistant similar to Jarvis or Friday from Iron Man movies, capable of offline operation on a computer. The tutorial covers setting up a Python environment, installing necessary libraries like rich, openai-whisper, suno-bark, langchain, sounddevice, pyaudio, and speechrecognition. It utilizes Ollama for Large Language Model (LLM) serving and includes components for speech recognition, conversational chain, and speech synthesis. The implementation involves creating a TextToSpeechService class for Bark, defining functions for audio recording, transcription, LLM response generation, and audio playback. The main application loop guides users through interactive voice-based conversations with the assistant.
LLM-Zero-to-Hundred
LLM-Zero-to-Hundred is a repository showcasing various applications of LLM chatbots and providing insights into training and fine-tuning Language Models. It includes projects like WebGPT, RAG-GPT, WebRAGQuery, LLM Full Finetuning, RAG-Master LLamaindex vs Langchain, open-source-RAG-GEMMA, and HUMAIN: Advanced Multimodal, Multitask Chatbot. The projects cover features like ChatGPT-like interaction, RAG capabilities, image generation and understanding, DuckDuckGo integration, summarization, text and voice interaction, and memory access. Tutorials include LLM Function Calling and Visualizing Text Vectorization. The projects have a general structure with folders for README, HELPER, .env, configs, data, src, images, and utils.
Awesome-AITools
This repo collects AI-related utilities. ## All Categories * All Categories * ChatGPT and other closed-source LLMs * AI Search engine * Open Source LLMs * GPT/LLMs Applications * LLM training platform * Applications that integrate multiple LLMs * AI Agent * Writing * Programming Development * Translation * AI Conversation or AI Voice Conversation * Image Creation * Speech Recognition * Text To Speech * Voice Processing * AI generated music or sound effects * Speech translation * Video Creation * Video Content Summary * OCR(Optical Character Recognition)
llmchat
LLMChat is an all-in-one AI chat interface that supports multiple language models, offers a plugin library for enhanced functionality, enables web search capabilities, allows customization of AI assistants, provides text-to-speech conversion, ensures secure local data storage, and facilitates data import/export. It also includes features like knowledge spaces, prompt library, personalization, and can be installed as a Progressive Web App (PWA). The tech stack includes Next.js, TypeScript, Pglite, LangChain, Zustand, React Query, Supabase, Tailwind CSS, Framer Motion, Shadcn, and Tiptap. The roadmap includes upcoming features like speech-to-text and knowledge spaces.
smartcat
Smartcat is a CLI interface that brings language models into the Unix ecosystem, allowing power users to leverage the capabilities of LLMs in their daily workflows. It features a minimalist design, seamless integration with terminal and editor workflows, and customizable prompts for specific tasks. Smartcat currently supports OpenAI, Mistral AI, and Anthropic APIs, providing access to a range of language models. With its ability to manipulate file and text streams, integrate with editors, and offer configurable settings, Smartcat empowers users to automate tasks, enhance code quality, and explore creative possibilities.
openai-chat-api-workflow
**OpenAI Chat API Workflow for Alfred** An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT-3.5/GPT-4 🤖💬 It also allows image generation 🖼️, image understanding 👀, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈 **Features:** * Execute all features using Alfred UI, selected text, or a dedicated web UI * Web UI is constructed by the workflow and runs locally on your Mac 💻 * API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI 🔒 * OpenAI does not use the data from the API Platform for training 🚫 * Export chat data to a simple JSON format external file 📄 * Continue the chat by importing the exported data later 🔄
clickolas-cage
Clickolas-cage is a Chrome extension designed to autonomously perform web browsing actions to achieve specific goals using LLM as a brain. Users can interact with the extension by setting goals, which triggers a series of actions including navigation, element extraction, and step generation. The extension is developed using Node.js and can be locally run for testing and development purposes before packing it for submission to the Chrome Web Store.
RTutor
RTutor is an AI-based app that generates and tests R code by translating natural language into R scripts using API calls to OpenAI's ChatGPT. It executes the scripts within the Shiny platform, generating R Markdown source files and HTML reports. The tool features GPT-4 for accurate code, comprehensive EDA reports, and a chat window for code explanation, making it ideal for learning R and statistics.
AGiXT
AGiXT is a dynamic Artificial Intelligence Automation Platform engineered to orchestrate efficient AI instruction management and task execution across a multitude of providers. Our solution infuses adaptive memory handling with a broad spectrum of commands to enhance AI's understanding and responsiveness, leading to improved task completion. The platform's smart features, like Smart Instruct and Smart Chat, seamlessly integrate web search, planning strategies, and conversation continuity, transforming the interaction between users and AI. By leveraging a powerful plugin system that includes web browsing and command execution, AGiXT stands as a versatile bridge between AI models and users. With an expanding roster of AI providers, code evaluation capabilities, comprehensive chain management, and platform interoperability, AGiXT is consistently evolving to drive a multitude of applications, affirming its place at the forefront of AI technology.
20 - OpenAI Gpts
Voice Memo
Record your thoughts with ChatGPT Voice Conversations 💡. Get started by clicking the 🎧 icon right to the chat input. Available on mobile only. Ask 'how do you work?' to learn more.
Anime Voice Match
Anime Voice Match, identifies anime characters similar to the user's voice.
Voice/Style/Tone AI Prompt Snippet Generator
Analyzes your writing and produces a prompt snippet you can use in any other prompt to guide AI in replicating your voice, style, and tone. Just provide the text in the prompt box or in a document (don't use a link or image). You don't need to write any additional prompt language with your text.
Vedic Voice
A scholar in Hindu literature providing positive, brief insights against negativity.
Skillful Voice
Premier expert in household management, offering unparalleled advice and guidance.
Earth Conscious Voice
Hi ;) Ask me for data & insights gathered from an environmentally aware global community
Bring Your Writing Voice to Every Task
This GPT will help you recreate your writing voice across multiple tasks. All you need is a prior writing sample (email, blog, article, tweet) and a new task.
Passive to Active Voice Text Converter AI
I convert and rewrite passive voice text into active voice tone and language. Simply put your passive voice text below! Perfect for sentences, paragraphs, daily emails, and longer texts.