Best AI tools for< Recognize Speech >
20 - AI tool Sites
Speech Studio
Speech Studio is a cloud-based speech-to-text and text-to-speech platform that enables developers to add speech capabilities to their applications. With Speech Studio, developers can easily transcribe audio and video files, generate synthetic speech, and build custom speech models. Speech Studio is a powerful tool that can be used to improve the accessibility, efficiency, and user experience of any application.
Future Tools
Future Tools is a website that collects and organizes AI tools. It provides a comprehensive list of AI tools categorized into various domains, including AI detection, aggregators, avatar chat, copywriting, finance, gaming, generative art, generative code, generative video, image improvement, image scanning, inspiration, marketing, motion capture, music, podcasting, productivity, prompt guides, research, self-improvement, social media, speech-to-text, text-to-speech, text-to-video, translation, video editing, and voice modulation. The website also offers a search bar to help users find specific tools based on their needs.
OneAudio
OneAudio is an AI-powered tool that allows users to summarize, transcribe, and convert audio files into notes effortlessly. With the ability to recognize words accurately and efficiently, OneAudio helps users organize their ideas in one place. The tool leverages the OpenAI GPT-4 and GPT-4o models to provide users with features like recording audio, saving notes, rewriting summaries using AI, and more. Users can trust the community's positive feedback and enjoy a seamless experience with OneAudio.
Onyxium
Onyxium is an AI platform that provides a comprehensive collection of AI tools for various tasks such as image recognition, text analysis, and speech recognition. It offers users the ability to access and utilize the latest AI technologies in one place, empowering them to enhance their projects and workflows with advanced AI capabilities. With a user-friendly interface and affordable pricing plans, Onyxium aims to make AI tools accessible to everyone, from individuals to large-scale businesses.
Mac AI Tools and Utilities
This website offers a variety of AI tools and utilities for Mac users. The tools include text assistants, speech-to-text software, image recognition software, and more. The utilities include tools for managing your Mac's settings, improving your productivity, and customizing your Mac's appearance.
AppTek
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.
AIBrain
AIBrain is a tech start-up in Palo Alto, California with its focus on Education and Entertainment. AIBrain was recognized as a top 5 entertainment AI company in 2023 by Datamation. This includes bestseller AI courses, Autonomous Game AI, Humanoid AI, and Soccer AI/VR Assistant. AIBrain has also been actively involved in the Stanford Computer Forum as a member company since 2013. AIBrain has been leading the technology development on the areas of entertainment and education. AIBrain provides the Game Changer Football AI x VR solutions, called SAIVA (Sports AI Virtual Assistant) and SAICA (Sports AI Coach Assistant). As a world-class football / soccer solution, it was ranked at top 3 contender in the Camera Calibration Challenge, Soccer Net Challenges 2023. AIBrain Asia has been developing robotic AI such as Tyche, Talking Robot AI and Gretchen, Humanoid AI. In addition, we provide bestseller AI training program for non-AI professionals including Udemy Online: Automated Machine Learning for Beginners (Google & Apple), Bestseller, Udemy, 60,829 students, Dec 2023 Gretchen: Open Humanoid AI Platform. Beta Launch: January.
ASAPP
ASAPP is a generative AI tool designed for contact centers to enhance agent productivity, automate call summaries, and transcribe calls accurately. It offers conversational AI voice and chat agents, automation of business intelligence, and real-time AI assistance for knowledge base answers. ASAPP has been recognized as a leader in AI-led innovation and provides transformational results for customer experience.
Quick, Draw!
Quick, Draw! is a game built with machine learning. You draw, and a neural network tries to guess what you're drawing. Of course, it doesn't always work. But the more you play with it, the more it will learn. So far we have trained it on a few hundred concepts, and we hope to add more over time. We made this as an example of how you can use machine learning in fun ways.
Teachable Machine
Teachable Machine is a web-based tool that makes it easy to create custom machine learning models, even if you don't have any coding experience. With Teachable Machine, you can train models to recognize images, sounds, and poses. Once you've trained a model, you can export it to use in your own projects.
AI Calorie Calculator
This AI Calorie Calculator is a free online tool that uses advanced AI algorithms to analyze the food in your uploaded images and estimate the total calorie count. It is designed to help you manage your diet and plan your meals effectively. The calculator is versatile and includes specialized features for children's calorie calculation, weight loss planning, athlete calorie estimation, sauna calorie estimation, and more. It also supports various dietary needs and counting methods globally.
Credly
Credly is a digital credentialing platform that helps organizations issue, manage, and track digital badges and certificates. It provides a network of over 3,500 certification, assessment, and training providers and employers, allowing earners to connect and grow through a catalog of over 90,000 learnings. Credly's solutions include digital credentialing, workforce insights, strategic workforce planning, and candidate assessment.
Alan AI
Alan AI is an advanced conversational AI platform that offers a wide range of AI solutions for various industries. It simplifies tasks, enhances business operations, and empowers sales strategies through AI technology. The platform provides features like question answering, semantic search, reporting, private data sources, and context awareness. With a focus on actionable AI, Alan AI aims to redefine learning and streamline decision-making processes. It offers a comprehensive suite of tools for developers, including technology architecture overview, integration, deployment, and analytics. Alan AI stands out for its innovative approach to AI reasoning, transparency, and control, making it a valuable asset for organizations seeking to leverage AI capabilities.
Ximilar Visual AI for Business
Ximilar Visual AI for Business is an AI tool that offers a comprehensive platform for image recognition and visual search solutions. It provides features such as image classification, regression, object detection, AI model combination, image annotation, and more. Users can easily build custom machine learning models without coding, access ready-to-use visual AI demos, and benefit from features like image upscaling, background removal, and color extraction. The platform caters to various industries including fashion, home decor, stock photos, collectibles, med & biotech, manufacturing, and real estate.
GoProfiles
GoProfiles is an AI People Platform designed for employee engagement and recognition. It offers features such as employee profiles, peer recognition, rewards, org chart visualization, dynamic people data search, and an AI assistant for company questions and connections. The platform aims to foster a connected and engaged culture within organizations by providing tools for meaningful coworker interactions and employee insights.
Japan Computer Vision (JCV)
Japan Computer Vision (JCV) is a leading technology company specializing in advanced computer vision solutions (image recognition). As a 100% subsidiary of SoftBank Corp., JCV focuses on security and innovation to provide cutting-edge technologies that transform industries and improve lives worldwide. Through solutions for smart buildings and smart retail, JCV enhances office environments, streamlines operations, improves hospitality in stores and commercial facilities, and creates new work and lifestyle experiences.
WizAI
WizAI is an AI tool that offers ChatGPT for WhatsApp, Instagram, and the web. It provides users with the ability to engage in text and voice chat, image and video recognition, and more. WizAI is powered by OpenAI's ChatGPT, offering advanced AI capabilities for generating smart replies and interacting with users in a human-like manner.
Neural4D
Neural4D is an AI tool designed to provide advanced neural network solutions. It offers a range of features for deep learning applications, including image recognition, natural language processing, and predictive analytics. With Neural4D, users can build and train complex neural networks to solve various real-world problems. The tool is user-friendly and suitable for both beginners and experienced AI practitioners.
Luxonis
Luxonis is a platform that offers robotic vision solutions through high-resolution cameras with depth vision and on-chip machine learning capabilities. Their products include OAK Cameras and Modules, providing features like Stereo Depth Sensing, Computer Vision, Artificial Intelligence, and Cloud Management. Luxonis enables the development of computer vision products and companies by offering performant and affordable hardware solutions. The platform caters to enterprises and hobbyists, empowering them to easily build embedded vision systems.
NuMind
NuMind is an AI tool designed to solve information extraction tasks efficiently. It offers high-quality lightweight models tailored to users' needs, automating classification, entity recognition, and structured extraction. The tool is powered by task-specific and domain-agnostic foundation models, outperforming GPT-4 and similar models. NuMind provides solutions for various industries such as insurance and healthcare, ensuring privacy, cost-effectiveness, and faster NLP projects.
20 - Open Source AI Tools
Awesome-AITools
This repo collects AI-related utilities. ## All Categories * All Categories * ChatGPT and other closed-source LLMs * AI Search engine * Open Source LLMs * GPT/LLMs Applications * LLM training platform * Applications that integrate multiple LLMs * AI Agent * Writing * Programming Development * Translation * AI Conversation or AI Voice Conversation * Image Creation * Speech Recognition * Text To Speech * Voice Processing * AI generated music or sound effects * Speech translation * Video Creation * Video Content Summary * OCR(Optical Character Recognition)
aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.
VSP-LLM
VSP-LLM (Visual Speech Processing incorporated with LLMs) is a novel framework that maximizes context modeling ability by leveraging the power of LLMs. It performs multi-tasks of visual speech recognition and translation, where given instructions control the task type. The input video is mapped to the input latent space of a LLM using a self-supervised visual speech model. To address redundant information in input frames, a deduplication method is employed using visual speech units. VSP-LLM utilizes Low Rank Adaptors (LoRA) for computationally efficient training.
edenai-apis
Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !
LogChat
LogChat is an open-source and free AI chat client that supports various chat models and technologies such as ChatGPT, 讯飞星火, DeepSeek, LLM, TTS, STT, and Live2D. The tool provides a user-friendly interface designed using Qt Creator and can be used on Windows systems without any additional environment requirements. Users can interact with different AI models, perform voice synthesis and recognition, and customize Live2D character models. LogChat also offers features like language translation, AI platform integration, and menu items like screenshot editing, clock, and application launcher.
hezar
Hezar is an all-in-one AI library designed specifically for the Persian community. It brings together various AI models and tools, making it easy to use AI with just a few lines of code. The library seamlessly integrates with Hugging Face Hub, offering a developer-friendly interface and task-based model interface. In addition to models, Hezar provides tools like word embeddings, tokenizers, feature extractors, and more. It also includes supplementary ML tools for deployment, benchmarking, and optimization.
emeltal
Emeltal is a local ML voice chat tool that uses high-end models to provide a self-contained, user-friendly out-of-the-box experience. It offers a hand-picked list of proven open-source high-performance models, aiming to provide the best model for each category/size combination. Emeltal heavily relies on the llama.cpp for LLM processing, and whisper.cpp for voice recognition. Text rendering uses Ink to convert between Markdown and HTML. It uses PopTimer for debouncing things. Emeltal is released under the terms of the MIT license, and all model data which is downloaded locally by the app comes from HuggingFace, and use of the models and data is subject to the respective license of each specific model.
FunClip
FunClip is an open-source, locally deployed automated video clipping tool that leverages Alibaba TONGYI speech lab's FunASR Paraformer series models for speech recognition on videos. Users can select text segments or speakers from recognition results to obtain corresponding video clips. It integrates industrial-grade models for accurate predictions and offers hotword customization and speaker recognition features. The tool is user-friendly with Gradio interaction, supporting multi-segment clipping and providing full video and target segment subtitles. FunClip is suitable for users looking to automate video clipping tasks with advanced AI capabilities.
west
WeST is a Speech Recognition/Transcript tool developed in 300 lines of code, inspired by SLAM-ASR and LLaMA 3.1. The model includes a Language Model (LLM), a Speech Encoder, and a trainable Projector. It requires training data in jsonl format with 'wav' and 'txt' entries. WeST can be used for training and decoding speech recognition models.
FunClip
FunClip is an open-source, locally deployable automated video editing tool that utilizes the FunASR Paraformer series models from Alibaba DAMO Academy for speech recognition in videos. Users can select text segments or speakers from the recognition results and click the clip button to obtain the corresponding video segments. FunClip integrates advanced features such as the Paraformer-Large model for accurate Chinese ASR, SeACo-Paraformer for customized hotword recognition, CAM++ speaker recognition model, Gradio interactive interface for easy usage, support for multiple free edits with automatic SRT subtitles generation, and segment-specific SRT subtitles.
lobe-chat
Lobe Chat is an open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible ([function call][docs-functionc-call]) plugin system. One-click **FREE** deployment of your private OpenAI ChatGPT/Claude/Gemini/Groq/Ollama chat application.
Next-Gen-Dialogue
Next Gen Dialogue is a Unity dialogue plugin that combines traditional dialogue design with AI techniques. It features a visual dialogue editor, modular dialogue functions, AIGC support for generating dialogue at runtime, AIGC baking dialogue in Editor, and runtime debugging. The plugin aims to provide an experimental approach to dialogue design using large language models. Users can create dialogue trees, generate dialogue content using AI, and bake dialogue content in advance. The tool also supports localization, VITS speech synthesis, and one-click translation. Users can create dialogue by code using the DialogueSystem and DialogueTree components.
speechlib
Speechlib is a Python library that provides functionalities for speaker diarization, speaker recognition, and transcription on audio files. It offers features such as converting audio formats to WAV, converting stereo to mono, and re-encoding to 16-bit PCM. The library allows users to transcribe audio files, store transcripts, specify language and model size, and perform speaker recognition using voice samples. It supports various languages and provides performance metrics for different model sizes. Speechlib utilizes huggingface models for speaker recognition and transcription tasks.
rivet
Rivet is a desktop application for creating complex AI agents and prompt chaining, and embedding it in your application. Rivet currently has LLM support for OpenAI GPT-3.5 and GPT-4, Anthropic Claude Instant and Claude 2, [Anthropic Claude 3 Haiku, Sonnet, and Opus](https://www.anthropic.com/news/claude-3-family), and AssemblyAI LeMUR framework for voice data. Rivet has embedding/vector database support for OpenAI Embeddings and Pinecone. Rivet also supports these additional integrations: Audio Transcription from AssemblyAI. Rivet core is a TypeScript library for running graphs created in Rivet. It is used by the Rivet application, but can also be used in your own applications, so that Rivet can call into your own application's code, and your application can call into Rivet graphs.
llms
The 'llms' repository is a comprehensive guide on Large Language Models (LLMs), covering topics such as language modeling, applications of LLMs, statistical language modeling, neural language models, conditional language models, evaluation methods, transformer-based language models, practical LLMs like GPT and BERT, prompt engineering, fine-tuning LLMs, retrieval augmented generation, AI agents, and LLMs for computer vision. The repository provides detailed explanations, examples, and tools for working with LLMs.
ai-notes
Notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter. This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped. This is now an AI Engineering notes repo.
awesome-generative-ai
A curated list of Generative AI projects, tools, artworks, and models
20 - OpenAI Gpts
N.A.R.C. Bott
This app decodes texts from narcissists, advising across all life scenarios. Navigate. Analyze. Recognize. Communicate.
Bot Psycho - Le pervers narcissique.
Je te parle des pervers narcissique. Je t'informe de leurs traits et de leur comportement. Je t'aide à reconnaitre les signes d'une relation toxique.
Street Sign Recognition GPT
Friendly and professional guide for street sign app development.
Coffee Beginner Cupping Assistant
Tell me the origin, processing method, and variety of a premium coffee that interests you, and I will provide you with some possible cupping notes about it
スタイル泥棒 / Style Thief
アップロードした画像のスタイルを教えてくれるよ!/ It'll tell you the style of the image you've uploaded!
Identify movies, dramas, and animations by image
Just send us an image of a scene from a video work and i will guess the name of the work!
Cause Crafters AI
Expert in EQ, workplace transformation, grant writing, resume creation, and team recognition.
DeepCSV
Realiza consultas de Deep Learning basado en el contenido del canal de Youtube DotCSV
Charlie Dumas : Directrice IA & Innovation
Directrice de l'innovation chez KingLand, experte en IA, gestion de projets et R&D.
AI Detektor
Der AI Detektor GPT wird von Winston AI betrieben und wurde entwickelt, um AI-generierte Inhalte zu identifizieren. Es wurde entwickelt, um Ihnen zu helfen, die Verwendung von KI-Schreib-Chatbots wie ChatGPT, Claude und Bard zu erkennen.
Journal Recognizer OCR
Optimized OCR for Handwritten Notebooks, up to 10 image transcript copy w/1-click. No text prompt necessary. Reads journals, reports, notes. All handwriting transcribed verbatim, then text summarized, graphic image features described. Ask to change any behavior.