AICoverGen

A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.

Stars: 1031

Visit

AICoverGen is an autonomous pipeline designed to create covers using any RVC v2 trained AI voice from YouTube videos or local audio files. It caters to developers looking to incorporate singing functionality into AI assistants/chatbots/vtubers, as well as individuals interested in hearing their favorite characters sing. The tool offers a WebUI for easy conversions, cover generation from local audio files, volume control for vocals and instrumentals, pitch detection method control, pitch change for vocals and instrumentals, and audio output format options. Users can also download and upload RVC models via the WebUI, run the pipeline using CLI, and access various advanced options for voice conversion and audio mixing.

README:

AICoverGen

An autonomous pipeline to create covers with any RVC v2 trained AI voice from YouTube videos or a local audio file. For developers who may want to add a singing functionality into their AI assistant/chatbot/vtuber, or for people who want to hear their favourite characters sing their favourite song.

Showcase: https://www.youtube.com/watch?v=2qZuE4WM7CM

Setup Guide: https://www.youtube.com/watch?v=pdlhk4vVHQk

WebUI is under constant development and testing, but you can try it out right now on both local and colab!

Changelog

WebUI for easier conversions and downloading of voice models
Support for cover generations from a local audio file
Option to keep intermediate files generated. e.g. Isolated vocals/instrumentals
Download suggested public voice models from table with search/tag filters
Support for Pixeldrain download links for voice models
Implement new rmvpe pitch extraction technique for faster and higher quality vocal conversions
Volume control for AI main vocals, backup vocals and instrumentals
Index Rate for Voice conversion
Reverb Control for AI main vocals
Local network sharing option for webui
Extra RVC options - filter_radius, rms_mix_rate, protect
Local file upload via file browser option
Upload of locally trained RVC v2 models via WebUI
Pitch detection method control, e.g. rmvpe/mangio-crepe
Pitch change for vocals and instrumentals together. Same effect as changing key of song in Karaoke.
Audio output format option: wav or mp3.

Update AICoverGen to latest version

Install and pull any new requirements and changes by opening a command line window in the AICoverGen directory and running the following commands.

pip install -r requirements.txt
git pull

For colab users, simply click Runtime in the top navigation bar of the colab notebook and Disconnect and delete runtime in the dropdown menu. Then follow the instructions in the notebook to run the webui.

Colab notebook

For those without a powerful enough NVIDIA GPU, you may try AICoverGen out using Google Colab.

For those who face issues with Google Colab notebook disconnecting after a few minutes, here's an alternative that doesn't use the WebUI.

For those who want to run this locally, follow the setup guide below.

Setup

Install Git and Python

Follow the instructions here to install Git on your computer. Also follow this guide to install Python VERSION 3.9 if you haven't already. Using other versions of Python may result in dependency conflicts.

Install ffmpeg

Follow the instructions here to install ffmpeg on your computer.

Install sox

Follow the instructions here to install sox and add it to your Windows path environment.

Clone AICoverGen repository

Open a command line window and run these commands to clone this entire repository and install the additional dependencies required.

git clone https://github.com/SociallyIneptWeeb/AICoverGen
cd AICoverGen
pip install -r requirements.txt

Download required models

Run the following command to download the required MDXNET vocal separation models and hubert base model.

python src/download_models.py

Usage with WebUI

To run the AICoverGen WebUI, run the following command.

python src/webui.py

Flag	Description
`-h`, `--help`	Show this help message and exit.
`--share`	Create a public URL. This is useful for running the web UI on Google Colab.
`--listen`	Make the web UI reachable from your local network.
`--listen-host LISTEN_HOST`	The hostname that the server will use.
`--listen-port LISTEN_PORT`	The listening port that the server will use.

Once the following output message Running on local URL: http://127.0.0.1:7860 appears, you can click on the link to open a tab with the WebUI.

Download RVC models via WebUI

Navigate to the Download model tab, and paste the download link to the RVC model and give it a unique name. You may search the AI Hub Discord where already trained voice models are available for download. You may refer to the examples for how the download link should look like. The downloaded zip file should contain the .pth model file and an optional .index file.

Once the 2 input fields are filled in, simply click Download! Once the output message says [NAME] Model successfully downloaded!, you should be able to use it in the Generate tab after clicking the refresh models button!

Upload RVC models via WebUI

For people who have trained RVC v2 models locally and would like to use them for AI Cover generations. Navigate to the Upload model tab, and follow the instructions. Once the output message says [NAME] Model successfully uploaded!, you should be able to use it in the Generate tab after clicking the refresh models button!

Running the pipeline via WebUI

From the Voice Models dropdown menu, select the voice model to use. Click Update if you added the files manually to the rvc_models directory to refresh the list.
In the song input field, copy and paste the link to any song on YouTube or the full path to a local audio file.
Pitch should be set to either -12, 0, or 12 depending on the original vocals and the RVC AI modal. This ensures the voice is not out of tune.
Other advanced options for Voice conversion and audio mixing can be viewed by clicking the accordion arrow to expand.

Once all Main Options are filled in, click Generate and the AI generated cover should appear in a less than a few minutes depending on your GPU.

Usage with CLI

Manual Download of RVC models

Unzip (if needed) and transfer the .pth and .index files to a new folder in the rvc_models directory. Each folder should only contain one .pth and one .index file.

The directory structure should look something like this:

├── rvc_models
│   ├── John
│   │   ├── JohnV2.pth
│   │   └── added_IVF2237_Flat_nprobe_1_v2.index
│   ├── May
│   │   ├── May.pth
│   │   └── added_IVF2237_Flat_nprobe_1_v2.index
│   ├── MODELS.txt
│   └── hubert_base.pt
├── mdxnet_models
├── song_output
└── src

Running the pipeline

To run the AI cover generation pipeline using the command line, run the following command.

python src/main.py [-h] -i SONG_INPUT -dir RVC_DIRNAME -p PITCH_CHANGE [-k | --keep-files | --no-keep-files] [-ir INDEX_RATE] [-fr FILTER_RADIUS] [-rms RMS_MIX_RATE] [-palgo PITCH_DETECTION_ALGO] [-hop CREPE_HOP_LENGTH] [-pro PROTECT] [-mv MAIN_VOL] [-bv BACKUP_VOL] [-iv INST_VOL] [-pall PITCH_CHANGE_ALL] [-rsize REVERB_SIZE] [-rwet REVERB_WETNESS] [-rdry REVERB_DRYNESS] [-rdamp REVERB_DAMPING] [-oformat OUTPUT_FORMAT]

Flag	Description
`-h`, `--help`	Show this help message and exit.
`-i SONG_INPUT`	Link to a song on YouTube or path to a local audio file. Should be enclosed in double quotes for Windows and single quotes for Unix-like systems.
`-dir MODEL_DIR_NAME`	Name of folder in rvc_models directory containing your `.pth` and `.index` files for a specific voice.
`-p PITCH_CHANGE`	Change pitch of AI vocals in octaves. Set to 0 for no change. Generally, use 1 for male to female conversions and -1 for vice-versa.
`-k`	Optional. Can be added to keep all intermediate audio files generated. e.g. Isolated AI vocals/instrumentals. Leave out to save space.
`-ir INDEX_RATE`	Optional. Default 0.5. Control how much of the AI's accent to leave in the vocals. 0 <= INDEX_RATE <= 1.
`-fr FILTER_RADIUS`	Optional. Default 3. If >=3: apply median filtering median filtering to the harvested pitch results. 0 <= FILTER_RADIUS <= 7.
`-rms RMS_MIX_RATE`	Optional. Default 0.25. Control how much to use the original vocal's loudness (0) or a fixed loudness (1). 0 <= RMS_MIX_RATE <= 1.
`-palgo PITCH_DETECTION_ALGO`	Optional. Default rmvpe. Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).
`-hop CREPE_HOP_LENGTH`	Optional. Default 128. Controls how often it checks for pitch changes in milliseconds when using mangio-crepe algo specifically. Lower values leads to longer conversions and higher risk of voice cracks, but better pitch accuracy.
`-pro PROTECT`	Optional. Default 0.33. Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable. 0 <= PROTECT <= 0.5.
`-mv MAIN_VOCALS_VOLUME_CHANGE`	Optional. Default 0. Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.
`-bv BACKUP_VOCALS_VOLUME_CHANGE`	Optional. Default 0. Control volume of backup AI vocals.
`-iv INSTRUMENTAL_VOLUME_CHANGE`	Optional. Default 0. Control volume of the background music/instrumentals.
`-pall PITCH_CHANGE_ALL`	Optional. Default 0. Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.
`-rsize REVERB_SIZE`	Optional. Default 0.15. The larger the room, the longer the reverb time. 0 <= REVERB_SIZE <= 1.
`-rwet REVERB_WETNESS`	Optional. Default 0.2. Level of AI vocals with reverb. 0 <= REVERB_WETNESS <= 1.
`-rdry REVERB_DRYNESS`	Optional. Default 0.8. Level of AI vocals without reverb. 0 <= REVERB_DRYNESS <= 1.
`-rdamp REVERB_DAMPING`	Optional. Default 0.7. Absorption of high frequencies in the reverb. 0 <= REVERB_DAMPING <= 1.
`-oformat OUTPUT_FORMAT`	Optional. Default mp3. wav for best quality and large file size, mp3 for decent quality and small file size.

Terms of Use

The use of the converted voice for the following purposes is prohibited.

Criticizing or attacking individuals.
Advocating for or opposing specific political positions, religions, or ideologies.
Publicly displaying strongly stimulating expressions without proper zoning.
Selling of voice models and generated voice clips.
Impersonation of the original owner of the voice with malicious intentions to harm/hurt others.
Fraudulent purposes that lead to identity theft or fraudulent phone calls.

Disclaimer

I am not liable for any direct, indirect, consequential, incidental, or special damages arising out of or in any way connected with the use/misuse or inability to use this software.

For Tasks:

Click tags to check more tools for each tasks

create covers generate singing ai control vocal volume change pitch download/upload models

For Jobs:

software developer ai engineer chatbot developer music producer content creator

Alternative AI tools for AICoverGen

Similar Open Source Tools

AICoverGen

github

: 1.0k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 13.2k

vidur

Vidur is a high-fidelity and extensible LLM inference simulator designed for capacity planning, deployment configuration optimization, testing new research ideas, and studying system performance of models under different workloads and configurations. It supports various models and devices, offers chrome trace exports, and can be set up using mamba, venv, or conda. Users can run the simulator with various parameters and monitor metrics using wandb. Contributions are welcome, subject to a Contributor License Agreement and adherence to the Microsoft Open Source Code of Conduct.

github

: 241

llm-foundry

LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

github

: 4.1k

WindowsAgentArena

Windows Agent Arena (WAA) is a scalable Windows AI agent platform designed for testing and benchmarking multi-modal, desktop AI agents. It provides researchers and developers with a reproducible and realistic Windows OS environment for AI research, enabling testing of agentic AI workflows across various tasks. WAA supports deploying agents at scale using Azure ML cloud infrastructure, allowing parallel running of multiple agents and delivering quick benchmark results for hundreds of tasks in minutes.

github

: 147

lingua

Meta Lingua is a minimal and fast LLM training and inference library designed for research. It uses easy-to-modify PyTorch components to experiment with new architectures, losses, and data. The codebase enables end-to-end training, inference, and evaluation, providing tools for speed and stability analysis. The repository contains essential components in the 'lingua' folder and scripts that combine these components in the 'apps' folder. Researchers can modify the provided templates to suit their experiments easily. Meta Lingua aims to lower the barrier to entry for LLM research by offering a lightweight and focused codebase.

github

: 4.4k

weblinx

WebLINX is a Python library and dataset for real-world website navigation with multi-turn dialogue. The repository provides code for training models reported in the WebLINX paper, along with a comprehensive API to work with the dataset. It includes modules for data processing, model evaluation, and utility functions. The modeling directory contains code for processing, training, and evaluating models such as DMR, LLaMA, MindAct, Pix2Act, and Flan-T5. Users can install specific dependencies for HTML processing, video processing, model evaluation, and library development. The evaluation module provides metrics and functions for evaluating models, with ongoing work to improve documentation and functionality.

github

: 112

OneKE

OneKE is a flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news. It employs a collaborative multi-agent approach and includes a user-customizable knowledge base to enable tailored extraction. OneKE offers various IE tasks support, data sources support, LLMs support, extraction method support, and knowledge base configuration. Users can start with examples using YAML, Python, or Web UI, and perform tasks like Named Entity Recognition, Relation Extraction, Event Extraction, Triple Extraction, and Open Domain IE. The tool supports different source formats like Plain Text, HTML, PDF, Word, TXT, and JSON files. Users can choose from various extraction models like OpenAI, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, and OneKE for information extraction tasks. Extraction methods include Schema Agent, Extraction Agent, and Reflection Agent. The tool also provides support for schema repository and case repository management, along with solutions for network issues. Contributors to the project include Ningyu Zhang, Haofen Wang, Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, and Huajun Chen.

github

: 57

katrain

KaTrain is a tool designed for analyzing games and playing go with AI feedback from KataGo. Users can review their games to find costly moves, play against AI with immediate feedback, play against weakened AI versions, and generate focused SGF reviews. The tool provides various features such as previews, tutorials, installation instructions, and configuration options for KataGo. Users can play against AI, receive instant feedback on moves, explore variations, and request in-depth analysis. KaTrain also supports distributed training for contributing to KataGo's strength and training bigger models. The tool offers themes customization, FAQ section, and opportunities for support and contribution through GitHub issues and Discord community.

github

: 1.6k

Upscaler

Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.

github

: 262

NekoImageGallery

NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.

github

: 97

btp-genai-starter-kit

This repository provides a quick way for users of the SAP Business Technology Platform (BTP) to learn how to use generative AI with BTP services. It guides users through setting up the necessary infrastructure, deploying AI models, and running genAI experiments on SAP BTP. The repository includes scripts, examples, and instructions to help users get started with generative AI on the SAP BTP platform.

github

: 83

TaxHacker

github

: 230

ShortcutsBench

ShortcutsBench is a project focused on collecting and analyzing workflows created in the Shortcuts app, providing a dataset of shortcut metadata, source files, and API information. It aims to study the integration of large language models with Apple devices, particularly focusing on the role of shortcuts in enhancing user experience. The project offers insights for Shortcuts users, enthusiasts, and researchers to explore, customize workflows, and study automated workflows, low-code programming, and API-based agents.

github

: 72

NeoGPT

NeoGPT is an AI assistant that transforms your local workspace into a powerhouse of productivity from your CLI. With features like code interpretation, multi-RAG support, vision models, and LLM integration, NeoGPT redefines how you work and create. It supports executing code seamlessly, multiple RAG techniques, vision models, and interacting with various language models. Users can run the CLI to start using NeoGPT and access features like Code Interpreter, building vector database, running Streamlit UI, and changing LLM models. The tool also offers magic commands for chat sessions, such as resetting chat history, saving conversations, exporting settings, and more. Join the NeoGPT community to experience a new era of efficiency and contribute to its evolution.

github

: 64

vim-ollama

The 'vim-ollama' plugin for Vim adds Copilot-like code completion support using Ollama as a backend, enabling intelligent AI-based code completion and integrated chat support for code reviews. It does not rely on cloud services, preserving user privacy. The plugin communicates with Ollama via Python scripts for code completion and interactive chat, supporting Vim only. Users can configure LLM models for code completion tasks and interactive conversations, with detailed installation and usage instructions provided in the README.

github

: 147

For similar tasks

AICoverGen

github

: 1.0k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k