UltraSinger
AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
Stars: 230
UltraSinger is a tool under development that automatically creates UltraStar.txt, midi, and notes from music. It pitches UltraStar files, adds text and tapping, creates separate UltraStar karaoke files, re-pitches current UltraStar files, and calculates in-game score. It uses multiple AI models to extract text from voice and determine pitch. Users should mention UltraSinger in UltraStar.txt files and only use it on Creative Commons licensed songs.
README:
⚠️ This project is still under development!
UltraSinger is a tool to automatically create UltraStar.txt, midi and notes from music. It automatically pitches UltraStar files, adding text and tapping to UltraStar files and creates separate UltraStar karaoke files. It also can re-pitch current UltraStar files and calculates the possible in-game score.
Multiple AI models are used to extract text from the voice and to determine the pitch.
Please mention UltraSinger in your UltraStar.txt file if you use it. It helps others find this tool, and it helps this tool get improved and maintained. You should only use it on Creative Commons licensed songs.
There are many ways to support this project. Starring ⭐️ the repo is just one 🙏
You can also support this work on GitHub sponsors or Patreon or Buy Me a Coffee.
This will help me a lot to keep this project alive and improve it.
- Install Python 3.10 (older and newer versions has some breaking changes). Download
- Also install ffmpeg separately with PATH. Download
- Go to folder
install
and run install script for your OS.- Choose
GPU
if you have an nvidia CUDA GPU. - Choose
CPU
if you don't have an nvidia CUDA GPU.
- Choose
- In root folder just run
run_on_windows.bat
orrun_on_linux.sh
to start the app. - Now you can use the UltraSinger source code with
py UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]
. See How to use for more information.
Not all options working now!
UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]
[opt]
-h This help text.
-i Ultrastar.txt
audio like .mp3, .wav, youtube link
-o Output folder
[mode]
## if INPUT is audio ##
default Creates all
# Single file creation selection is in progress, you currently getting all!
(-u Create ultrastar txt file) # In Progress
(-m Create midi file) # In Progress
(-s Create sheet file) # In Progress
## if INPUT is ultrastar.txt ##
default Creates all
# Single selection is in progress, you currently getting all!
(-r repitch Ultrastar.txt (input has to be audio)) # In Progress
(-p Check pitch of Ultrastar.txt input) # In Progress
(-m Create midi file) # In Progress
[transcription]
# Default is whisper
--whisper Multilingual model > tiny|base|small|medium|large-v1|large-v2 >> ((default) is large-v2
English-only model > tiny.en|base.en|small.en|medium.en
--whisper_align_model Use other languages model for Whisper provided from huggingface.co
--language Override the language detected by whisper, does not affect transcription but steps after transcription
--whisper_batch_size Reduce if low on GPU mem >> ((default) is 16)
--whisper_compute_type Change to "int8" if low on GPU mem (may reduce accuracy) >> ((default) is "float16" for cuda devices, "int8" for cpu)
[pitcher]
# Default is crepe
--crepe tiny|full >> ((default) is full)
--crepe_step_size unit is miliseconds >> ((default) is 10)
[extra]
--hyphenation True|False >> ((default) is True)
--disable_separation True|False >> ((default) is False)
--disable_karaoke True|False >> ((default) is False)
--create_audio_chunks True|False >> ((default) is False)
--keep_cache True|False >> ((default) is False)
--plot True|False >> ((default) is False)
--format_version 0.3.0|1.0.0|1.1.0 >> ((default) is 1.0.0)
--musescore_path path to MuseScore executable
[device]
--force_cpu True|False >> ((default) is False) All steps will be forced to cpu
--force_whisper_cpu True|False >> ((default) is False) Only whisper will be forced to cpu
--force_crepe_cpu True|False >> ((default) is False) Only crepe will be forced to cpu
For standard use, you only need to use [opt]. All other options are optional.
-i "input/music.mp3"
-i https://www.youtube.com/watch?v=BaW_jenozKc
This re-pitch the audio and creates a new txt file.
-i "input/ultrastar.txt"
Keep in mind that while a larger model is more accurate, it also takes longer to transcribe.
For the first test run, use the tiny
, to be accurate use the large-v2
model.
-i XYZ --whisper large-v2
Currently provided default language models are en, fr, de, es, it, ja, zh, nl, uk, pt
.
If the language is not in this list, you need to find a phoneme-based ASR model from
🤗 huggingface model hub. It will download automatically.
Example for romanian:
-i XYZ --whisper_align_model "gigant/romanian-wav2vec2"
Is on by default. Can also be deactivated if hyphenation does not produce anything useful. Note that the word is simply split, without paying attention to whether the separated word really starts at the place or is heard.
-i XYZ --hyphenation True
Pitching is done with the crepe
model.
Also consider that a bigger model is more accurate, but also takes longer to pitch.
For just testing you should use tiny
.
If you want solid accurate, then use the full
model.
-i XYZ --crepe full
The vocals are separated from the audio before they are passed to the models. If problems occur with this, you have the option to disable this function; in which case the original audio file is used instead.
-i XYZ --disable_separation True
For Sheet Music generation you need to have MuseScore
installed on your system.
Or provide the path to the MuseScore
executable.
-i XYZ --musescore_path "C:/Program Files/MuseScore 4/bin/MuseScore4.exe"
This defines the format version of the UltraStar.txt file. For more info see Official UltraStar format specification.
You can choose between 3 different format versions. The default is 1.0.0
.
-
0.3.0
is the old format version. Use this if you have problems with the new format. -
1.0.0
is the current format version. -
1.1.0
is the upcoming format version. It is not finished yet.
-i XYZ --format_version 1.0.0
The score that the singer in the audio would receive will be measured. You get 2 scores, simple and accurate. You wonder where the difference is? Ultrastar is not interested in pitch hights. As long as it is in the pitch range A-G you get one point. This makes sense for the game, because otherwise men don't get points for high female voices and women don't get points for low male voices. Accurate is the real tone specified in the txt. I had txt files where the pitch was in a range not singable by humans, but you could still reach the 10k points in the game. The accuracy is important here, because from this MIDI and sheet are created. And you also want to have accurate files
With a GPU you can speed up the process. Also the quality of the transcription and pitching is better.
You need a cuda device for this to work. Sorry, there is no cuda device for macOS.
It is optional (but recommended) to install the cuda driver for your gpu: see driver.
Install torch with cuda separately in your venv
. See tourch+cuda.
Also check you GPU cuda support. See cuda support
Command for pip
:
pip3 install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117 --index-url https://download.pytorch.org/whl/cu117
When you want to use conda
instead you need a different installation command.
The pitch tracker used by UltraSinger (crepe) uses TensorFlow as its backend. TensorFlow dropped GPU support for Windows for versions >2.10 as you can see in this release note and their installation instructions.
For now UltraSinger runs the latest version available that still supports GPUs on windows.
For running later versions of TensorFlow on windows while still taking advantage of GPU support the suggested solution is:
- install WSL2
- within the Ubuntu WSL2 installation
- run
sudo apt update && sudo apt install nvidia-cuda-toolkit
- follow the setup instructions for UltraSinger at the top of this document
- run
If something crashes because of low VRAM then use a smaller model.
Whisper needs more than 8GB VRAM in the large
model!
You can also force cpu usage with the extra option --force_cpu
.
to run the docker run git clone https://github.com/rakuri255/UltraSinger.git
enter the UltraSinger folder.
run this command to build the docker
docker build -t ultrasinger .
make sure to include the "." at the end
let this run till complete.
then run this command
docker run --gpus all -it --name UltraSinger -v $pwd/src/output:/app/src/output ultrasinger
Docker-Compose
there are two files that you can pick from.
cd into docker-compose
folder and then cd into Nvidia
or NonGPU
Run docker-compose up
to download and setup
Nvidia is for if you have a nvidia gpu to use with UltraSinger. NonGPU is for if you wish to only use the CPU for UltraSinger.
Output
by default the docker-compose will setup the output folder as /output
inside the docker.
on the host machine it will map to the folder with the docker-compose.yml
file under output
you may chnage this by editing the docker-compose.yml
to edit the file.
use any text editor you wish. i would recoment nano.
run nano docker-compose.yml
then change this line
- ./output:/app/UltraSinger/src/output
to anything you line for on your host machine.
- /yourfolderpathhere:/app/UltraSinger/src/output
sample
- /mnt/user/appdata/UltraSinger:/output
note the blank space before the -
formating is important here in this file.
this will create and drop you into the docker.
now run this command.
python3 UltraSinger.py -i file
or
python3 UltraSinger.py -i youtube_url
to use mp3's in the folder you git cloned you must place all songs you like in UltraSinger/src/output.
this will be the place for youtube links aswell.
to quit the docker just type exit.
to reenter docker run this command
docker start UltraSinger && Docker exec -it UltraSinger /bin/bash
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for UltraSinger
Similar Open Source Tools
UltraSinger
UltraSinger is a tool under development that automatically creates UltraStar.txt, midi, and notes from music. It pitches UltraStar files, adds text and tapping, creates separate UltraStar karaoke files, re-pitches current UltraStar files, and calculates in-game score. It uses multiple AI models to extract text from voice and determine pitch. Users should mention UltraSinger in UltraStar.txt files and only use it on Creative Commons licensed songs.
linkedin-api
The Linkedin API for Python allows users to programmatically search profiles, send messages, and find jobs using a regular Linkedin user account. It does not require 'official' API access, just a valid Linkedin account. However, it is important to note that this library is not officially supported by LinkedIn and using it may violate LinkedIn's Terms of Service. Users can authenticate using any Linkedin account credentials and access features like getting profiles, profile contact info, and connections. The library also provides commercial alternatives for extracting data, scraping public profiles, and accessing a full LinkedIn API. It is not endorsed or supported by LinkedIn and is intended for educational purposes and personal use only.
fiftyone
FiftyOne is an open-source tool designed for building high-quality datasets and computer vision models. It supercharges machine learning workflows by enabling users to visualize datasets, interpret models faster, and improve efficiency. With FiftyOne, users can explore scenarios, identify failure modes, visualize complex labels, evaluate models, find annotation mistakes, and much more. The tool aims to streamline the process of improving machine learning models by providing a comprehensive set of features for data analysis and model interpretation.
bia-bob
BIA `bob` is a Jupyter-based assistant for interacting with data using large language models to generate Python code. It can utilize OpenAI's chatGPT, Google's Gemini, Helmholtz' blablador, and Ollama. Users need respective accounts to access these services. Bob can assist in code generation, bug fixing, code documentation, GPU-acceleration, and offers a no-code custom Jupyter Kernel. It provides example notebooks for various tasks like bio-image analysis, model selection, and bug fixing. Installation is recommended via conda/mamba environment. Custom endpoints like blablador and ollama can be used. Google Cloud AI API integration is also supported. The tool is extensible for Python libraries to enhance Bob's functionality.
obs-cleanstream
CleanStream is an OBS plugin that utilizes AI to clean live audio streams by removing unwanted words and utterances, such as 'uh's and 'um's, and configurable words like profanity. It uses a neural network (OpenAI Whisper) in real-time to predict speech and eliminate unwanted words. The plugin is still experimental and not recommended for live production use, but it is functional for testing purposes. Users can adjust settings and configure the plugin to enhance audio quality during live streams.
aio-switch-updater
AIO-Switch-Updater is a Nintendo Switch homebrew app that allows users to download and update custom firmware, firmware files, cheat codes, and more. It supports Atmosphère, ReiNX, and SXOS on both unpatched and patched Switches. The app provides features like updating CFW with custom RCM payload, updating Hekate/payload, custom downloads, downloading firmwares and cheats, and various tools like rebooting to specific payload, changing color schemes, consulting cheat codes, and more. Users can contribute by submitting PRs and suggestions, and the app supports localization. It does not host or distribute any files and gives special thanks to contributors and supporters.
gpt-engineer
GPT-Engineer is a tool that allows you to specify a software in natural language, sit back and watch as an AI writes and executes the code, and ask the AI to implement improvements.
NekoImageGallery
NekoImageGallery is an online AI image search engine that utilizes the Clip model and Qdrant vector database. It supports keyword search and similar image search. The tool generates 768-dimensional vectors for each image using the Clip model, supports OCR text search using PaddleOCR, and efficiently searches vectors using the Qdrant vector database. Users can deploy the tool locally or via Docker, with options for metadata storage using Qdrant database or local file storage. The tool provides API documentation through FastAPI's built-in Swagger UI and can be used for tasks like image search, text extraction, and vector search.
Upscaler
Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.
mmwave-gesture-recognition
This repository provides a setup for basic gesture recognition using the TI AWR1642 mmWave sensor. Users can collect data from the sensor and choose from various neural network architectures for gesture recognition. The supported gestures include Swipe Up, Swipe Down, Swipe Right, Swipe Left, Spin Clockwise, Spin Counterclockwise, Letter Z, Letter S, and Letter X. The repository includes data and models for training and inference, along with instructions for installation, serial permissions setup, flashing firmware, running the system, collecting data, training models, selecting different models, and accessing help documentation. The project is developed using Python and TensorFlow 2.15.
obs-cleanstream
CleanStream is an OBS plugin that utilizes real-time local AI to clean live audio streams by removing unwanted words and utterances, such as 'uh' and 'um', and configurable words like profanity. It employs a neural network (OpenAI Whisper) to predict speech in real-time and eliminate undesired words. The plugin runs efficiently using the Whisper.cpp project from ggerganov. CleanStream offers users the ability to adjust settings and add the plugin to any audio-generating source in OBS, providing a seamless experience for content creators looking to enhance the quality of their live audio streams.
ChatGPT-desktop
ChatGPT Desktop Application is a multi-platform tool that provides a powerful AI wrapper for generating text. It offers features like text-to-speech, exporting chat history in various formats, automatic application upgrades, system tray hover window, support for slash commands, customization of global shortcuts, and pop-up search. The application is built using Tauri and aims to enhance user experience by simplifying text generation tasks. It is available for Mac, Windows, and Linux, and is designed for personal learning and research purposes.
air-light
Air-light is a minimalist WordPress starter theme designed to be an ultra minimal starting point for a WordPress project. It is built to be very straightforward, backwards compatible, front-end developer friendly and modular by its structure. Air-light is free of weird "app-like" folder structures or odd syntaxes that nobody else uses. It loves WordPress as it was and as it is.
NoLabs
NoLabs is an open-source biolab that provides easy access to state-of-the-art models for bio research. It supports various tasks, including drug discovery, protein analysis, and small molecule design. NoLabs aims to accelerate bio research by making inference models accessible to everyone.
neural
Neural is a Vim and Neovim plugin that integrates various machine learning tools to assist users in writing code, generating text, and explaining code or paragraphs. It supports multiple machine learning models, focuses on privacy, and is compatible with Vim 8.0+ and Neovim 0.8+. Users can easily configure Neural to interact with third-party machine learning tools, such as OpenAI, to enhance code generation and completion. The plugin also provides commands like `:NeuralExplain` to explain code or text and `:NeuralStop` to stop Neural from working. Neural is maintained by the Dense Analysis team and comes with a disclaimer about sending input data to third-party servers for machine learning queries.
depthai
This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.
For similar tasks
UltraSinger
UltraSinger is a tool under development that automatically creates UltraStar.txt, midi, and notes from music. It pitches UltraStar files, adds text and tapping, creates separate UltraStar karaoke files, re-pitches current UltraStar files, and calculates in-game score. It uses multiple AI models to extract text from voice and determine pitch. Users should mention UltraSinger in UltraStar.txt files and only use it on Creative Commons licensed songs.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.