UltraSinger

AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.

Stars: 305

Visit

UltraSinger is a tool under development that automatically creates UltraStar.txt, midi, and notes from music. It pitches UltraStar files, adds text and tapping, creates separate UltraStar karaoke files, re-pitches current UltraStar files, and calculates in-game score. It uses multiple AI models to extract text from voice and determine pitch. Users should mention UltraSinger in UltraStar.txt files and only use it on Creative Commons licensed songs.

README:

UltraSinger

⚠️ This project is still under development!

UltraSinger is a tool to automatically create UltraStar.txt, midi and notes from music. It automatically pitches UltraStar files, adding text and tapping to UltraStar files and creates separate UltraStar karaoke files. It also can re-pitch current UltraStar files and calculates the possible in-game score.

Multiple AI models are used to extract text from the voice and to determine the pitch.

Please mention UltraSinger in your UltraStar.txt file if you use it. It helps others find this tool, and it helps this tool get improved and maintained. You should only use it on Creative Commons licensed songs.

❤️ Support

There are many ways to support this project. Starring ⭐️ the repo is just one 🙏

You can also support this work on GitHub sponsors or Patreon or Buy Me a Coffee.

This will help me a lot to keep this project alive and improve it.

UltraSinger

💻 How to use this source code

Installation

Install Python 3.10 (older and newer versions has some breaking changes). Download
Also install ffmpeg separately with PATH. Download
Go to folder install and run install script for your OS.
- Choose GPU if you have an nvidia CUDA GPU.
- Choose CPU if you don't have an nvidia CUDA GPU.

Run

In root folder just run run_on_windows.bat or run_on_linux.sh to start the app.
Now you can use the UltraSinger source code with py UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]. See How to use for more information.

📖 How to use the App

Not all options working now!

    UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]
    
    [opt]
    -h      This help text.
    -i      Ultrastar.txt
            audio like .mp3, .wav, youtube link
    -o      Output folder
    
    [mode]
    ## if INPUT is audio ##
    default  Creates all
    
    # Single file creation selection is in progress, you currently getting all!
    (-u      Create ultrastar txt file) # In Progress
    (-m      Create midi file) # In Progress
    (-s      Create sheet file) # In Progress
    
    ## if INPUT is ultrastar.txt ##
    default  Creates all

    [separation]
    # Default is htdemucs
    --demucs              Model name htdemucs|htdemucs_ft|htdemucs_6s|hdemucs_mmi|mdx|mdx_extra|mdx_q|mdx_extra_q >> ((default) is htdemucs)

    [transcription]
    # Default is whisper
    --whisper               Multilingual model > tiny|base|small|medium|large-v1|large-v2|large-v3  >> ((default) is large-v2)
                            English-only model > tiny.en|base.en|small.en|medium.en
    --whisper_align_model   Use other languages model for Whisper provided from huggingface.co
    --language              Override the language detected by whisper, does not affect transcription but steps after transcription
    --whisper_batch_size    Reduce if low on GPU mem >> ((default) is 16)
    --whisper_compute_type  Change to "int8" if low on GPU mem (may reduce accuracy) >> ((default) is "float16" for cuda devices, "int8" for cpu)
    --keep_numbers          Numbers will be transcribed as numerics instead of as words 
    
    [pitcher]
    # Default is crepe
    --crepe            tiny|full >> ((default) is full)
    --crepe_step_size  unit is miliseconds >> ((default) is 10)
    
    [extra]
    --disable_hyphenation   Disable word hyphenation. Hyphenation is enabled by default.
    --disable_separation    Disable track separation. Track separation is enabled by default.
    --disable_karaoke       Disable creation of karaoke style txt file. Karaoke is enabled by default.
    --create_audio_chunks   Enable creation of audio chunks. Audio chunks are disabled by default.
    --keep_cache            Keep cache folder after creation. Cache folder is removed by default.
    --plot                  Enable creation of plots. Plots are disabled by default.
    --format_version        0.3.0|1.0.0|1.1.0|1.2.0 >> ((default) is 1.2.0)
    --musescore_path        path to MuseScore executable
    --keep_numbers          Transcribe numbers as digits and not words
    
    [yt-dlp]
    --cookiefile            File name where cookies should be read from

    [device]
    --force_cpu             Force all steps to be processed on CPU.
    --force_whisper_cpu     Only whisper will be forced to cpu
    --force_crepe_cpu       Only crepe will be forced to cpu

For standard use, you only need to use [opt]. All other options are optional.

🎶 Input

Audio (full automatic)

Local file

-i "input/music.mp3"

Youtube

-i https://www.youtube.com/watch?v=YwNs1Z0qRY0

Note that if you run into a yt-dlp error such as Sign in to confirm you’re not a bot. This helps protect our community (yt-dlp issue) you can follow these steps:

generate a cookies.txt file with yt-dlp yt-dlp --cookies cookies.txt --cookies-from-browser firefox
then pass the cookies.txt to UltraSinger --cookiefile cookies.txt

UltraStar (re-pitch)

This re-pitch the audio and creates a new txt file.

-i "input/ultrastar.txt"

🗣 Transcriber

Keep in mind that while a larger model is more accurate, it also takes longer to transcribe.

Whisper

For the first test run, use the tiny, to be accurate use the large-v2 model.

-i XYZ --whisper large-v2

Whisper languages

Currently provided default language models are en, fr, de, es, it, ja, zh, nl, uk, pt. If the language is not in this list, you need to find a phoneme-based ASR model from 🤗 huggingface model hub. It will download automatically.

Example for romanian:

-i XYZ --whisper_align_model "gigant/romanian-wav2vec2"

✍️ Hyphenation

Is on by default. Can also be deactivated if hyphenation does not produce anything useful. Note that the word is simply split, without paying attention to whether the separated word really starts at the place or is heard. To disable:

-i XYZ --disable_hyphenation

👂 Pitcher

Pitching is done with the crepe model. Also consider that a bigger model is more accurate, but also takes longer to pitch. For just testing you should use tiny. If you want solid accurate, then use the full model.

-i XYZ --crepe full

👄 Separation

The vocals are separated from the audio before they are passed to the models. If problems occur with this, you have the option to disable this function; in which case the original audio file is used instead.

-i XYZ --disable_separation

Sheet Music

For Sheet Music generation you need to have MuseScore installed on your system. Or provide the path to the MuseScore executable.

-i XYZ --musescore_path "C:/Program Files/MuseScore 4/bin/MuseScore4.exe"

Format Version

This defines the format version of the UltraStar.txt file. For more info see Official UltraStar format specification.

You can choose between different format versions. The default is 1.2.0.

0.3.0 is the first format version. Use this if you have an old UltraStar program and problems with the newer format.
1.0.0 should be supported by the most UltraStar programs. Use this if you have problems with the newest format version
1.1.0 is the current format version.
1.2.0 is the upcoming format version. It is not finished yet.
2.0.0 is the next format version. It is not finished yet.

-i XYZ --format_version 1.2.0

🏆 Ultrastar Score Calculation

The score that the singer in the audio would receive will be measured. You get 2 scores, simple and accurate. You wonder where the difference is? Ultrastar is not interested in pitch hights. As long as it is in the pitch range A-G you get one point. This makes sense for the game, because otherwise men don't get points for high female voices and women don't get points for low male voices. Accurate is the real tone specified in the txt. I had txt files where the pitch was in a range not singable by humans, but you could still reach the 10k points in the game. The accuracy is important here, because from this MIDI and sheet are created. And you also want to have accurate files

📟 Use GPU

With a GPU you can speed up the process. Also the quality of the transcription and pitching is better.

You need a cuda device for this to work. Sorry, there is no cuda device for macOS.

It is optional (but recommended) to install the cuda driver for your gpu: see driver. Install torch with cuda separately in your venv. See tourch+cuda. Also check you GPU cuda support. See cuda support

Command for pip:

pip3 install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117 --index-url https://download.pytorch.org/whl/cu117

When you want to use conda instead you need a different installation command.

Considerations for Windows users

The pitch tracker used by UltraSinger (crepe) uses TensorFlow as its backend. TensorFlow dropped GPU support for Windows for versions >2.10 as you can see in this release note and their installation instructions.

For now UltraSinger runs the latest version available that still supports GPUs on windows.

For running later versions of TensorFlow on windows while still taking advantage of GPU support the suggested solution is to run UltraSinger in a container.

Crashes due to low VRAM

If something crashes because of low VRAM then use a smaller model. Whisper needs more than 8GB VRAM in the large model!

You can also force cpu usage with the extra option --force_cpu.

📦 Containerized (Docker or Podman)

See container/README.md

For Tasks:

Click tags to check more tools for each tasks

create karaoke files pitch correction calculate in-game score extract text from voice generate ultrastar files

For Jobs:

music producer karaoke event organizer ai engineer software developer musician

Alternative AI tools for UltraSinger

Similar Open Source Tools

UltraSinger

github

: 305

ComfyUI-mnemic-nodes

ComfyUI-mnemic-nodes is a repository hosting a collection of nodes developed for ComfyUI, providing useful components to enhance project functionality. The nodes include features like returning file paths, saving text files, downloading images from URLs, tokenizing text, cleaning strings, querying Groq language models, generating negative prompts, and more. Some nodes are experimental and marked with a 'Caution' label. Installation instructions and setup details are provided for each node, along with examples and presets for different tasks.

github

: 53

metavoice-src

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text

github

: 3.1k

browser

Lightpanda Browser is an open-source headless browser designed for fast web automation, AI agents, LLM training, scraping, and testing. It features ultra-low memory footprint, exceptionally fast execution, and compatibility with Playwright and Puppeteer through CDP. Built for performance, Lightpanda offers Javascript execution, support for Web APIs, and is optimized for minimal memory usage. It is a modern solution for web scraping and automation tasks, providing a lightweight alternative to traditional browsers like Chrome.

github

: 7.8k

superflows

Superflows is an open-source alternative to OpenAI's Assistant API. It allows developers to easily add an AI assistant to their software products, enabling users to ask questions in natural language and receive answers or have tasks completed by making API calls. Superflows can analyze data, create plots, answer questions based on static knowledge, and even write code. It features a developer dashboard for configuration and testing, stateful streaming API, UI components, and support for multiple LLMs. Superflows can be set up in the cloud or self-hosted, and it provides comprehensive documentation and support.

github

: 530

gpt-engineer

GPT-Engineer is a tool that allows you to specify a software in natural language, sit back and watch as an AI writes and executes the code, and ask the AI to implement improvements.

github

: 51.9k

linkedin-api

The Linkedin API for Python allows users to programmatically search profiles, send messages, and find jobs using a regular Linkedin user account. It does not require 'official' API access, just a valid Linkedin account. However, it is important to note that this library is not officially supported by LinkedIn and using it may violate LinkedIn's Terms of Service. Users can authenticate using any Linkedin account credentials and access features like getting profiles, profile contact info, and connections. The library also provides commercial alternatives for extracting data, scraping public profiles, and accessing a full LinkedIn API. It is not endorsed or supported by LinkedIn and is intended for educational purposes and personal use only.

github

: 1.8k

Bjornulf_custom_nodes

github

: 87

bia-bob

BIA `bob` is a Jupyter-based assistant for interacting with data using large language models to generate Python code. It can utilize OpenAI's chatGPT, Google's Gemini, Helmholtz' blablador, and Ollama. Users need respective accounts to access these services. Bob can assist in code generation, bug fixing, code documentation, GPU-acceleration, and offers a no-code custom Jupyter Kernel. It provides example notebooks for various tasks like bio-image analysis, model selection, and bug fixing. Installation is recommended via conda/mamba environment. Custom endpoints like blablador and ollama can be used. Google Cloud AI API integration is also supported. The tool is extensible for Python libraries to enhance Bob's functionality.

github

: 110

Bard-API

The Bard API is a Python package that returns responses from Google Bard through the value of a cookie. It is an unofficial API that operates through reverse-engineering, utilizing cookie values to interact with Google Bard for users struggling with frequent authentication problems or unable to authenticate via Google Authentication. The Bard API is not a free service, but rather a tool provided to assist developers with testing certain functionalities due to the delayed development and release of Google Bard's API. It has been designed with a lightweight structure that can easily adapt to the emergence of an official API. Therefore, using it for any other purposes is strongly discouraged. If you have access to a reliable official PaLM-2 API or Google Generative AI API, replace the provided response with the corresponding official code. Check out https://github.com/dsdanielpark/Bard-API/issues/262.

github

: 5.4k

vim-ollama

The 'vim-ollama' plugin for Vim adds Copilot-like code completion support using Ollama as a backend, enabling intelligent AI-based code completion and integrated chat support for code reviews. It does not rely on cloud services, preserving user privacy. The plugin communicates with Ollama via Python scripts for code completion and interactive chat, supporting Vim only. Users can configure LLM models for code completion tasks and interactive conversations, with detailed installation and usage instructions provided in the README.

github

: 147

LLMFlex

LLMFlex is a python package designed for developing AI applications with local Large Language Models (LLMs). It provides classes to load LLM models, embedding models, and vector databases to create AI-powered solutions with prompt engineering and RAG techniques. The package supports multiple LLMs with different generation configurations, embedding toolkits, vector databases, chat memories, prompt templates, custom tools, and a chatbot frontend interface. Users can easily create LLMs, load embeddings toolkit, use tools, chat with models in a Streamlit web app, and serve an OpenAI API with a GGUF model. LLMFlex aims to offer a simple interface for developers to work with LLMs and build private AI solutions using local resources.

github

: 94

torchchat

torchchat is a codebase showcasing the ability to run large language models (LLMs) seamlessly. It allows running LLMs using Python in various environments such as desktop, server, iOS, and Android. The tool supports running models via PyTorch, chatting, generating text, running chat in the browser, and running models on desktop/server without Python. It also provides features like AOT Inductor for faster execution, running in C++ using the runner, and deploying and running on iOS and Android. The tool supports popular hardware and OS including Linux, Mac OS, Android, and iOS, with various data types and execution modes available.

github

: 3.5k

fabric

Fabric is an open-source framework for augmenting humans using AI. It provides a structured approach to breaking down problems into individual components and applying AI to them one at a time. Fabric includes a collection of pre-defined Patterns (prompts) that can be used for a variety of tasks, such as extracting the most interesting parts of YouTube videos and podcasts, writing essays, summarizing academic papers, creating AI art prompts, and more. Users can also create their own custom Patterns. Fabric is designed to be easy to use, with a command-line interface and a variety of helper apps. It is also extensible, allowing users to integrate it with their own AI applications and infrastructure.

github

: 30.3k

depthai

This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.

github

: 927

obs-cleanstream

CleanStream is an OBS plugin that utilizes AI to clean live audio streams by removing unwanted words and utterances, such as 'uh's and 'um's, and configurable words like profanity. It uses a neural network (OpenAI Whisper) in real-time to predict speech and eliminate unwanted words. The plugin is still experimental and not recommended for live production use, but it is functional for testing purposes. Users can adjust settings and configure the plugin to enhance audio quality during live streams.

github

: 90

For similar tasks

UltraSinger

github

: 305

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

UltraSinger

README:

UltraSinger

❤️ Support

Table of Contents

💻 How to use this source code

Installation

Run

📖 How to use the App

🎶 Input

Audio (full automatic)

Local file

Youtube

UltraStar (re-pitch)

🗣 Transcriber

Whisper

Whisper languages

✍️ Hyphenation

👂 Pitcher

👄 Separation

Sheet Music

Format Version

🏆 Ultrastar Score Calculation

📟 Use GPU

Considerations for Windows users

Crashes due to low VRAM

📦 Containerized (Docker or Podman)

For Tasks:

For Jobs:

Alternative AI tools for UltraSinger

Similar Open Source Tools

UltraSinger

ComfyUI-mnemic-nodes

metavoice-src

browser

superflows

gpt-engineer

linkedin-api

Bjornulf_custom_nodes

bia-bob

Bard-API

vim-ollama

LLMFlex

torchchat

fabric

depthai

obs-cleanstream

For similar tasks

UltraSinger

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape