Best AI tools for< Edit Audio >
19 - AI tool Sites
AudioCut
AudioCut is an AI-powered tool designed for audio cutting, offering users a faster and smarter way to edit audio files. By leveraging artificial intelligence technology, AudioCut helps users improve work efficiency by accurately identifying and deleting unnecessary audio segments based on subtitles. The tool is equipped with features such as quick subtitle location, super accurate time points, and various subscription plans to cater to different user needs.
Kingshiper
Kingshiper is a versatile multimedia tool offering a wide range of audio, photo, and video conversion and editing features. It provides tools for screen recording, video compression, screen mirroring, audio editing, vocal removal, and more. With support for over 1000+ formats, Kingshiper aims to simplify multimedia processing tasks for users. Additionally, it offers utilities for office tasks, system tools, data solutions, and image processing, catering to various user needs. The software is designed to enhance productivity and creativity by providing efficient and user-friendly tools for multimedia and office-related tasks.
Apowersoft
Apowersoft is a multimedia solutions provider catering to both business and daily needs. The platform offers a wide range of tools and services, including video editing, file compression, screen recording, audio recording, and more. With a focus on enhancing working efficiency and productivity in the digital age, Apowersoft aims to simplify everyday challenges through innovative and creative solutions.
karaok-AI
karaok-AI is an open-source karaoke Player / Editor with automatic clip creation from any song file using vocals and lyrics extraction (Speech-to-Text). It uses WhisperHallu and WhisperTimeSync to extract vocals and lyrics. karaok-AI also includes kaiDJ, a minimalist and easy-to-use DJ Party Player with multi-sound cards support, two players with auto-mix between songs, and a pre-listen player. It can index thousands of songs in a single efficient database and allows for direct search and selection over all songs. Additionally, it offers playlist management with nested groups and the ability to open and save m3u and m3u8 playlists while keeping group definitions.
HitPaw Online
HitPaw Online is a website that provides a suite of AI-powered editing tools for photos, videos, and audio. The tools are easy to use and can be accessed online without the need to install any software. HitPaw Online's tools are powered by advanced AI algorithms that can automatically enhance the quality of your media files. For example, the Photo Enhancer tool can improve the resolution of images, remove noise, and adjust the colors. The Video Enhancer tool can upscale videos to 4K resolution, remove watermarks, and add subtitles. The Audio Enhancer tool can reduce background noise, extract audio from videos, and convert audio formats.
Transkribieren.xyz
Transkribieren.xyz is an all-in-one AI workspace that brings together the best AI tools for a more efficient workday. It offers a variety of features including transcription, text chat, and image creation. Transkribieren.xyz is trusted by people worldwide and is the go-to transcription solution for those who value speed, accuracy, and simplicity.
Podcastle
Podcastle is an all-in-one podcasting software that empowers creators of all backgrounds and experience levels with an intuitive, AI-powered platform. It offers a wide range of features, including a recording studio, audio editor, video editor, AI-generated voices, and hosting hub, making it easy to create, edit, and publish high-quality podcasts and videos. Podcastle is designed to be user-friendly and accessible, with no prior experience or technical expertise required.
Audiogen
Audiogen is a powerful audio creation tool that utilizes generative AI to empower users with the ability to generate an infinite array of sounds, sound effects, samples, and music instantaneously. With Audiogen, users can unleash their creativity and explore a boundless sonic landscape, pushing the boundaries of audio production.
Amped Studio
Amped Studio is an online music sequencer and sound editor that provides users with the tools and resources they need to create music. The platform offers a variety of features, including a built-in drum machine, sequencer tracks, and a rich sample library. Amped Studio also allows users to connect third-party instruments and effects using VST technology. The platform is designed to be easy to use, even for beginners, and it offers a variety of tutorials and articles to help users get started.
BlendAI
BlendAI is a platform that centralizes top AI models in one place, offering a pay-as-you-go model without the need for a monthly subscription. Its multi-modal graph interface allows easy chaining of models where you can do text to text to image to video to anything.
Voicera
Voicera is a text-to-speech tool that allows users to convert written content into natural-sounding speech. With Voicera, users can create audio versions of their articles, blog posts, and other written content, making it more accessible to a wider audience. Voicera offers a variety of features to help users create high-quality audio content, including a library of natural-sounding voices, advanced audio editing tools, and the ability to add music and sound effects.
Audacity
Audacity is a free and open-source audio editing and recording software that runs on Windows, macOS, GNU/Linux, and other operating systems. It is popular for its ease of use, multi-track editing capabilities, and support for a wide range of audio formats. Audacity can be used for a variety of tasks, including recording and editing podcasts, music, and other audio content. It also supports a variety of plugins, which can extend its functionality even further.
Adobe Firefly
Adobe Firefly is a cloud-based AI platform that helps businesses automate and accelerate their creative processes. It provides a suite of tools for image editing, video editing, and audio editing, all powered by AI. With Firefly, businesses can save time and money on their creative projects, while also improving the quality of their work.
SRVO
SRVO is a voice over service that provides high-quality, professional voice overs for a variety of purposes, including commercials, e-learning, and audiobooks. With a team of experienced voice actors and a state-of-the-art recording studio, SRVO can create custom voice overs that meet the specific needs of each client. SRVO also offers a variety of additional services, such as scriptwriting, audio editing, and mixing.
CyberLink
CyberLink is a leading provider of multimedia software, including video editing, photo editing, and media playback software. The company's products are used by consumers, businesses, and professionals around the world. CyberLink's mission is to provide innovative and easy-to-use software that helps people create and enjoy their own multimedia content.
OptimizerAI
OptimizerAI is an AI-powered sound effects generator that allows users to create custom sound effects for games, videos, animations, and other creative projects. The tool uses advanced machine learning algorithms to generate high-quality, realistic sound effects that can be used in a variety of applications. OptimizerAI is easy to use and requires no prior experience in sound design. Simply enter a description of the sound effect you want to create, and OptimizerAI will generate a custom sound effect that meets your needs.
Smart Media Cutter
Smart Media Cutter is an AI-powered tool designed for video and podcast creators to streamline the editing process. It offers fast and accurate lossless cutting of video and audio, transcription-aided editing, multi-track transcriptions, advanced speech denoiser, and wide support for common media formats. The tool runs on desktop platforms like Windows and macOS, with plans tailored for individual creators, small production companies, and enterprise clients. Smart Media Cutter ensures privacy by keeping all AI features offline on the user's computer.
Alitu Showplanner
Alitu Showplanner is an AI-powered tool designed to help users launch their podcasts quickly and efficiently. By answering a few questions about their podcast idea, users can generate a personalized launch kit including a catchy name, trailer script, episode ideas, and more. The tool simplifies the podcast creation process by providing step-by-step guidance from planning to recording and publishing. Created by The Podcast Host & Alitu team, Alitu Showplanner aims to streamline the podcasting experience for beginners and experienced creators alike.
TikTok Voice Generator
The TikTok Voice Generator is a free text-to-speech tool that allows users to transform text into various TikTok voices, such as popular lady voice, rocket, Ghostface (scream), and many more. It supports multiple languages and voice styles, giving users the option to download the generated voice for various purposes like reading text aloud, creating content, or editing. The tool offers a user-friendly interface and a wide range of voice options to cater to different preferences and needs.
20 - Open Source AI Tools
WavCraft
WavCraft is an LLM-driven agent for audio content creation and editing. It applies LLM to connect various audio expert models and DSP function together. With WavCraft, users can edit the content of given audio clip(s) conditioned on text input, create an audio clip given text input, get more inspiration from WavCraft by prompting a script setting and let the model do the scriptwriting and create the sound, and check if your audio file is synthesized by WavCraft.
ai-collective-tools
ai-collective-tools is an open-source community dedicated to creating a comprehensive collection of AI tools for developers, researchers, and enthusiasts. The repository provides a curated selection of AI tools and resources across various categories such as 3D, Agriculture, Art, Audio Editing, Avatars, Chatbots, Code Assistant, Cooking, Copywriting, Crypto, Customer Support, Dating, Design Assistant, Design Generator, Developer, E-Commerce, Education, Email Assistant, Experiments, Fashion, Finance, Fitness, Fun Tools, Gaming, General Writing, Gift Ideas, HealthCare, Human Resources, Image Classification, Image Editing, Image Generator, Interior Designing, Legal Assistant, Logo Generator, Low Code, Models, Music, Paraphraser, Personal Assistant, Presentations, Productivity, Prompt Generator, Psychology, Real Estate, Religion, Research, Resume, Sales, Search Engine, SEO, Shopping, Social Media, Spreadsheets, SQL, Startup Tools, Story Teller, Summarizer, Testing, Text to Speech, Text to Image, Transcriber, Travel, Video Editing, Video Generator, Weather, Writing Generator, and Other Resources.
XLICON-V2-MD
XLICON-V2-MD is a versatile Multi-Device WhatsApp bot developed by Salman Ahamed. It offers a wide range of features, making it an advanced and user-friendly bot for various purposes. The bot supports multi-device operation, AI photo enhancement, downloader commands, hidden NSFW commands, logo generation, anime exploration, economic activities, games, and audio/video editing. Users can deploy the bot on platforms like Heroku, Replit, Codespace, Okteto, Railway, Mongenius, Coolify, and Render. The bot is maintained by Salman Ahamed and Abraham Dwamena, with contributions from various developers and testers. Misusing the bot may result in a ban from WhatsApp, so users are advised to use it at their own risk.
Topu-ai
TOPU Md is a simple WhatsApp user bot created by Topu Tech. It offers various features such as multi-device support, AI photo enhancement, downloader commands, hidden NSFW commands, logo commands, anime commands, economy menu, various games, and audio/video editor commands. Users can fork the repo, get a session ID by pairing code, and deploy on Heroku. The bot requires Node version 18.x or higher for optimal performance. Contributions to TOPU-MD are welcome, and the tool is safe for use on WhatsApp and Heroku. The tool is licensed under the MIT License and is designed to enhance the WhatsApp experience with diverse features.
Applio
Applio is a VITS-based Voice Conversion tool focused on simplicity, quality, and performance. It features a user-friendly interface, cross-platform compatibility, and a range of customization options. Applio is suitable for various tasks such as voice cloning, voice conversion, and audio editing. Its key features include a modular codebase, hop length implementation, translations in over 30 languages, optimized requirements, streamlined installation, hybrid F0 estimation, easy-to-use UI, optimized code and dependencies, plugin system, overtraining detector, model search, enhancements in pretrained models, voice blender, accessibility improvements, new F0 extraction methods, output format selection, hashing system, model download system, TTS enhancements, split audio, Discord presence, Flask integration, and support tab.
ai-audio-startups
The 'ai-audio-startups' repository is a community list of startups working with AI for audio and music tech. It includes a comprehensive collection of tools and platforms that leverage artificial intelligence to enhance various aspects of music creation, production, source separation, analysis, recommendation, health & wellbeing, radio/podcast, hearing, sound detection, speech transcription, synthesis, enhancement, and manipulation. The repository serves as a valuable resource for individuals interested in exploring innovative AI applications in the audio and music industry.
Scriberr
Scriberr is a self-hostable AI audio transcription app that utilizes open-source Whisper models from OpenAI for transcribing audio files locally on user's hardware. It offers fast transcription with customizable compute settings, local transcription on device, API endpoints for automation, and integration with other tools. Users can optionally summarize transcripts using ChatGPT or Ollama, with support for custom prompts. The app is mobile-ready, simple, and easy to use, with planned features including speaker diarization, audio recording, file actions, full text fuzzy search, tag-based organization, follow-along text with playback, edit summaries, export options, and support for other languages. Despite being in beta, Scriberr is functional and usable, albeit with some rough edges and minor bugs.
friendly-stable-audio-tools
This repository is a refactored and updated version of `stable-audio-tools`, an open-source code for audio/music generative models originally by Stability AI. It contains refactored codes for improved readability and usability, useful scripts for evaluating and playing with trained models, and instructions on how to train models such as `Stable Audio 2.0`. The repository does not contain any pretrained checkpoints. Requirements include PyTorch 2.0 or later for Flash Attention support and Python 3.8.10 or later for development. The repository provides guidance on installing, building a training environment using Docker or Singularity, logging with Weights & Biases, training configurations, and stages for VAE-GAN and Diffusion Transformer (DiT) training.
ruby-openai
Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct
liboai
liboai is a simple C++17 library for the OpenAI API, providing developers with access to OpenAI endpoints through a collection of methods and classes. It serves as a spiritual port of OpenAI's Python library, 'openai', with similar structure and features. The library supports various functionalities such as ChatGPT, Audio, Azure, Functions, Image DALL·E, Models, Completions, Edit, Embeddings, Files, Fine-tunes, Moderation, and Asynchronous Support. Users can easily integrate the library into their C++ projects to interact with OpenAI services.
DownEdit
DownEdit is a powerful program that allows you to download videos from various social media platforms such as TikTok, Douyin, Kuaishou, and more. With DownEdit, you can easily download videos from user profiles and edit them in bulk. You have the option to flip the videos horizontally or vertically throughout the entire directory with just a single click. Stay tuned for more exciting features coming soon!
simple-openai
Simple-OpenAI is a Java library that provides a simple way to interact with the OpenAI API. It offers consistent interfaces for various OpenAI services like Audio, Chat Completion, Image Generation, and more. The library uses CleverClient for HTTP communication, Jackson for JSON parsing, and Lombok to reduce boilerplate code. It supports asynchronous requests and provides methods for synchronous calls as well. Users can easily create objects to communicate with the OpenAI API and perform tasks like text-to-speech, transcription, image generation, and chat completions.
ultravox
Ultravox is a fast multimodal Language Model (LLM) that can understand both text and human speech in real-time without the need for a separate Audio Speech Recognition (ASR) stage. By extending Meta's Llama 3 model with a multimodal projector, Ultravox converts audio directly into a high-dimensional space used by Llama 3, enabling quick responses and potential understanding of paralinguistic cues like timing and emotion in human speech. The current version (v0.3) has impressive speed metrics and aims for further enhancements. Ultravox currently converts audio to streaming text and plans to emit speech tokens for direct audio conversion. The tool is open for collaboration to enhance this functionality.
Pandrator
Pandrator is a GUI tool for generating audiobooks and dubbing using voice cloning and AI. It transforms text, PDF, EPUB, and SRT files into spoken audio in multiple languages. It leverages XTTS, Silero, and VoiceCraft models for text-to-speech conversion and voice cloning, with additional features like LLM-based text preprocessing and NISQA for audio quality evaluation. The tool aims to be user-friendly with a one-click installer and a graphical interface.
Synthalingua
Synthalingua is an advanced, self-hosted tool that leverages artificial intelligence to translate audio from various languages into English in near real time. It offers multilingual outputs and utilizes GPU and CPU resources for optimized performance. Although currently in beta, it is actively developed with regular updates to enhance capabilities. The tool is not intended for professional use but for fun, language learning, and enjoying content at a reasonable pace. Users must ensure speakers speak clearly for accurate translations. It is not a replacement for human translators and users assume their own risk and liability when using the tool.
freegenius
FreeGenius AI is an ambitious project offering a comprehensive suite of AI solutions that mirror the capabilities of LetMeDoIt AI. It is designed to engage in intuitive conversations, execute codes, provide up-to-date information, and perform various tasks. The tool is free, customizable, and provides access to real-time data and device information. It aims to support offline and online backends, open-source large language models, and optional API keys. Users can use FreeGenius AI for tasks like generating tweets, analyzing audio, searching financial data, checking weather, and creating maps.
noScribe
noScribe is an AI-based software designed for automated audio transcription, specifically tailored for transcribing interviews for qualitative social research or journalistic purposes. It is a free and open-source tool that runs locally on the user's computer, ensuring data privacy. The software can differentiate between speakers and supports transcription in 99 languages. It includes a user-friendly editor for reviewing and correcting transcripts. Developed by Kai Dröge, a PhD in sociology with a background in computer science, noScribe aims to streamline the transcription process and enhance the efficiency of qualitative analysis.
OpenAI-DotNet
OpenAI-DotNet is a simple C# .NET client library for OpenAI to use through their RESTful API. It is independently developed and not an official library affiliated with OpenAI. Users need an OpenAI API account to utilize this library. The library targets .NET 6.0 and above, working across various platforms like console apps, winforms, wpf, asp.net, etc., and on Windows, Linux, and Mac. It provides functionalities for authentication, interacting with models, assistants, threads, chat, audio, images, files, fine-tuning, embeddings, and moderations.
ElevenLabs-DotNet
ElevenLabs-DotNet is a non-official Eleven Labs voice synthesis RESTful client that allows users to convert text to speech. The library targets .NET 8.0 and above, working across various platforms like console apps, winforms, wpf, and asp.net, and across Windows, Linux, and Mac. Users can authenticate using API keys directly, from a configuration file, or system environment variables. The tool provides functionalities for text to speech conversion, streaming text to speech, accessing voices, dubbing audio or video files, generating sound effects, managing history of synthesized audio clips, and accessing user information and subscription status.
metavoice-src
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text
13 - OpenAI Gpts
Mike Russell
Virtual Mike Russell from Music Radio Creative. Ask me your audio, podcasting and AI questions!
Logic Pro - Talk to the Manual
I'm Logic Pro X's manual. Let me answer your questions, troubleshoot whatever issue you're having and get you back into the groove!
Ableton Live Mentor
Your personal Ableton Live mentor. Ask me anything about using Live for music production or live performance.
ConvertAnything
The ultimate tool for converting files, whether they are images, audio, video, documents, or other types. It can process single files or multiple files in bulk, accepts ZIP files, and offers a download link [Updated version].
ReaperGPT
Expert for the Reaper DAW with extensive knowledge on Reapack Packages, ReaScript, EEL, Lua, Python, general commands, and audio workflows.