data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="suno-api"
suno-api
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
Stars: 1689
data:image/s3,"s3://crabby-images/1e921/1e92140626e352c2cab3981ad76fcf81250c0cd3" alt="screenshot"
Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.
README:
Use API to call the music generation AI of Suno.ai and easily integrate it into agents like GPTs.
👉 We update quickly, please star.
English | 简体中文 | русский | Demo | Docs | Deploy with Vercel
🔥 Check out my new project: ReadPo - 10x Speed Up Your Reading and Writing
Suno is an amazing AI music service. Although the official API is not yet available, we couldn't wait to integrate its capabilities somewhere.
We discovered that some users have similar needs, so we decided to open-source this project, hoping you'll like it.
This implementation uses the paid 2Captcha service (a.k.a. ruCaptcha) to solve the hCaptcha challenges automatically and does not use any already made closed-source paid Suno API implementations.
We have deployed an example bound to a free Suno account, so it has daily usage limits, but you can see how it runs: suno.gcui.ai
- Perfectly implements the creation API from suno.ai.
- Automatically keep the account active.
- Solve CAPTCHAs automatically using 2Captcha and Playwright with rebrowser-patches.
- Compatible with the format of OpenAI’s
/v1/chat/completions
API. - Supports Custom Mode.
- One-click deployment to Vercel & Docker.
- In addition to the standard API, it also adapts to the API Schema of Agent platforms like GPTs and Coze, so you can use it as a tool/plugin/Action for LLMs and integrate it into any AI Agent.
- Permissive open-source license, allowing you to freely integrate and modify.
- Head over to suno.com/create using your browser.
- Open up the browser console: hit
F12
or access theDeveloper Tools
. - Navigate to the
Network
tab. - Give the page a quick refresh.
- Identify the latest request that includes the keyword
?__clerk_api_version
. - Click on it and switch over to the
Header
tab. - Locate the
Cookie
section, hover your mouse over it, and copy the value of the Cookie.
2Captcha is a paid CAPTCHA solving service that uses real workers to solve the CAPTCHA and has high accuracy. It is needed because of Suno constantly requesting hCaptcha solving that currently isn't possible for free by any means.
Create a new 2Captcha account, top up your balance and get your API key.
[!NOTE] If you are located in Russia or Belarus, use the ruCaptcha interface instead of 2Captcha. It's the same service, but it supports payments from those countries.
[!TIP] If you want as few CAPTCHAs as possible, it is recommended to use a macOS system. macOS systems usually get fewer CAPTCHAs than Linux and Windows—this is due to its unpopularity in the web scraping industry. Running suno-api on Windows and Linux will work, but in some cases, you could get a pretty large number of CAPTCHAs.
You can choose your preferred deployment method:
git clone https://github.com/gcui-art/suno-api.git
cd suno-api
npm install
[!IMPORTANT] GPU acceleration will be disabled in Docker. If you have a slow CPU, it is recommended to deploy locally.
Alternatively, you can use Docker Compose. However, follow the step below before running.
docker compose build && docker compose up
-
If deployed to Vercel, please add the environment variables in the Vercel dashboard.
-
If you’re running this locally, be sure to add the following to your
.env
file:
-
SUNO_COOKIE
— theCookie
header you obtained in the first step. -
TWOCAPTCHA_KEY
— your 2Captcha API key from the second step. -
BROWSER
— the name of the browser that is going to be used to solve the CAPTCHA. Onlychromium
andfirefox
supported. -
BROWSER_GHOST_CURSOR
— use ghost-cursor-playwright to simulate smooth mouse movements. Please note that it doesn't seem to make any difference in the rate of CAPTCHAs, so you can set it tofalse
. Retained for future testing. -
BROWSER_LOCALE
— the language of the browser. Using eitheren
orru
is recommended, since those have the most workers on 2Captcha. List of supported languages -
BROWSER_HEADLESS
— run the browser without the window. You probably want to set this totrue
.
SUNO_COOKIE=<…>
TWOCAPTCHA_KEY=<…>
BROWSER=chromium
BROWSER_GHOST_CURSOR=false
BROWSER_LOCALE=en
BROWSER_HEADLESS=true
- If you’ve deployed to Vercel:
- Please click on Deploy in the Vercel dashboard and wait for the deployment to be successful.
- Visit the
https://<vercel-assigned-domain>/api/get_limit
API for testing.
- If running locally:
- Run
npm run dev
. - Visit the
http://localhost:3000/api/get_limit
API for testing.
- Run
- If the following result is returned:
{
"credits_left": 50,
"period": "day",
"monthly_limit": 50,
"monthly_usage": 50
}
it means the program is running normally.
You can check out the detailed API documentation at : suno.gcui.ai/docs
Suno API currently mainly implements the following APIs:
- `/api/generate`: Generate music
- `/v1/chat/completions`: Generate music - Call the generate API in a format that works with OpenAI’s API.
- `/api/custom_generate`: Generate music (Custom Mode, support setting lyrics, music style, title, etc.)
- `/api/generate_lyrics`: Generate lyrics based on prompt
- `/api/get`: Get music information based on the id. Use “,” to separate multiple ids.
If no IDs are provided, all music will be returned.
- `/api/get_limit`: Get quota Info
- `/api/extend_audio`: Extend audio length
- `/api/generate_stems`: Make stem tracks (separate audio and music track)
- `/api/get_aligned_lyrics`: Get list of timestamps for each word in the lyrics
- `/api/clip`: Get clip information based on ID passed as query parameter `id`
- `/api/concat`: Generate the whole song from extensions
You can also specify the cookies in the Cookie
header of your request, overriding the default cookies in the SUNO_COOKIE
environment variable. This comes in handy when, for example, you want to use multiple free accounts at the same time.
For more detailed documentation, please check out the demo site: suno.gcui.ai/docs
import time
import requests
# replace with your suno-api URL
base_url = 'http://localhost:3000'
def custom_generate_audio(payload):
url = f"{base_url}/api/custom_generate"
response = requests.post(url, json=payload, headers={'Content-Type': 'application/json'})
return response.json()
def extend_audio(payload):
url = f"{base_url}/api/extend_audio"
response = requests.post(url, json=payload, headers={'Content-Type': 'application/json'})
return response.json()
def generate_audio_by_prompt(payload):
url = f"{base_url}/api/generate"
response = requests.post(url, json=payload, headers={'Content-Type': 'application/json'})
return response.json()
def get_audio_information(audio_ids):
url = f"{base_url}/api/get?ids={audio_ids}"
response = requests.get(url)
return response.json()
def get_quota_information():
url = f"{base_url}/api/get_limit"
response = requests.get(url)
return response.json()
def get_clip(clip_id):
url = f"{base_url}/api/clip?id={clip_id}"
response = requests.get(url)
return response.json()
def generate_whole_song(clip_id):
payload = {"clip_id": clip_id}
url = f"{base_url}/api/concat"
response = requests.post(url, json=payload)
return response.json()
if __name__ == '__main__':
data = generate_audio_by_prompt({
"prompt": "A popular heavy metal song about war, sung by a deep-voiced male singer, slowly and melodiously. The lyrics depict the sorrow of people after the war.",
"make_instrumental": False,
"wait_audio": False
})
ids = f"{data[0]['id']},{data[1]['id']}"
print(f"ids: {ids}")
for _ in range(60):
data = get_audio_information(ids)
if data[0]["status"] == 'streaming':
print(f"{data[0]['id']} ==> {data[0]['audio_url']}")
print(f"{data[1]['id']} ==> {data[1]['audio_url']}")
break
# sleep 5s
time.sleep(5)
const axios = require("axios");
// replace your vercel domain
const baseUrl = "http://localhost:3000";
async function customGenerateAudio(payload) {
const url = `${baseUrl}/api/custom_generate`;
const response = await axios.post(url, payload, {
headers: { "Content-Type": "application/json" },
});
return response.data;
}
async function generateAudioByPrompt(payload) {
const url = `${baseUrl}/api/generate`;
const response = await axios.post(url, payload, {
headers: { "Content-Type": "application/json" },
});
return response.data;
}
async function extendAudio(payload) {
const url = `${baseUrl}/api/extend_audio`;
const response = await axios.post(url, payload, {
headers: { "Content-Type": "application/json" },
});
return response.data;
}
async function getAudioInformation(audioIds) {
const url = `${baseUrl}/api/get?ids=${audioIds}`;
const response = await axios.get(url);
return response.data;
}
async function getQuotaInformation() {
const url = `${baseUrl}/api/get_limit`;
const response = await axios.get(url);
return response.data;
}
async function getClipInformation(clipId) {
const url = `${baseUrl}/api/clip?id=${clipId}`;
const response = await axios.get(url);
return response.data;
}
async function main() {
const data = await generateAudioByPrompt({
prompt:
"A popular heavy metal song about war, sung by a deep-voiced male singer, slowly and melodiously. The lyrics depict the sorrow of people after the war.",
make_instrumental: false,
wait_audio: false,
});
const ids = `${data[0].id},${data[1].id}`;
console.log(`ids: ${ids}`);
for (let i = 0; i < 60; i++) {
const data = await getAudioInformation(ids);
if (data[0].status === "streaming") {
console.log(`${data[0].id} ==> ${data[0].audio_url}`);
console.log(`${data[1].id} ==> ${data[1].audio_url}`);
break;
}
// sleep 5s
await new Promise((resolve) => setTimeout(resolve, 5000));
}
}
main();
You can integrate Suno AI as a tool/plugin/action into your AI agent.
[coming soon...]
[coming soon...]
[coming soon...]
There are four ways you can support this project:
- Fork and Submit Pull Requests: We welcome any PRs that enhance the functionality, APIs, response time and availability. You can also help us just by translating this README into your language—any help for this project is welcome!
- Open Issues: We appreciate reasonable suggestions and bug reports.
- Donate: If this project has helped you, consider buying us a coffee using the Sponsor button at the top of the project. Cheers! ☕
- Spread the Word: Recommend this project to others, star the repo, or add a backlink after using the project.
We use GitHub Issues to manage feedback. Feel free to open an issue, and we'll address it promptly.
The license of this project is LGPL-3.0 or later. See LICENSE for more information.
- Project repository: github.com/gcui-art/suno-api
- Suno.ai official website: suno.ai
- Demo: suno.gcui.ai
- Readpo: ReadPo is an AI-powered reading and writing assistant. Collect, curate, and create content at lightning speed.
- Album AI: Auto generate image metadata and chat with the album. RAG + Album.
suno-api is an unofficial open source project, intended for learning and research purposes only.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for suno-api
Similar Open Source Tools
data:image/s3,"s3://crabby-images/1e921/1e92140626e352c2cab3981ad76fcf81250c0cd3" alt="suno-api Screenshot"
suno-api
Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.
data:image/s3,"s3://crabby-images/e6a4d/e6a4d3a27b65ae2387abbeecde0382d468dc9f91" alt="langserve Screenshot"
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
data:image/s3,"s3://crabby-images/1a9de/1a9dee01e01132803d732abd7b9719195b49543a" alt="IntelliNode Screenshot"
IntelliNode
IntelliNode is a javascript module that integrates cutting-edge AI models like ChatGPT, LLaMA, WaveNet, Gemini, and Stable diffusion into projects. It offers functions for generating text, speech, and images, as well as semantic search, multi-model evaluation, and chatbot capabilities. The module provides a wrapper layer for low-level model access, a controller layer for unified input handling, and a function layer for abstract functionality tailored to various use cases.
data:image/s3,"s3://crabby-images/ecd4a/ecd4a50c0703e9198b013320de41a66b7dc92a48" alt="lollms Screenshot"
lollms
LoLLMs Server is a text generation server based on large language models. It provides a Flask-based API for generating text using various pre-trained language models. This server is designed to be easy to install and use, allowing developers to integrate powerful text generation capabilities into their applications.
data:image/s3,"s3://crabby-images/60989/60989d6e5ce51e6b43bc6bbe6c6b22f34f8b8d47" alt="gateway Screenshot"
gateway
Adaline Gateway is a fully local production-grade Super SDK that offers a unified interface for calling over 200+ LLMs. It is production-ready, supports batching, retries, caching, callbacks, and OpenTelemetry. Users can create custom plugins and providers for seamless integration with their infrastructure.
data:image/s3,"s3://crabby-images/4de8d/4de8daaab3c52f1e72d3fc7d404fbe0809f0f9e8" alt="hugging-chat-api Screenshot"
hugging-chat-api
Unofficial HuggingChat Python API for creating chatbots, supporting features like image generation, web search, memorizing context, and changing LLMs. Users can log in, chat with the ChatBot, perform web searches, create new conversations, manage conversations, switch models, get conversation info, use assistants, and delete conversations. The API also includes a CLI mode with various commands for interacting with the tool. Users are advised not to use the application for high-stakes decisions or advice and to avoid high-frequency requests to preserve server resources.
data:image/s3,"s3://crabby-images/0dcdd/0dcddf38db52a9a4caae1a247408b888678a653f" alt="deepgram-js-sdk Screenshot"
deepgram-js-sdk
Deepgram JavaScript SDK. Power your apps with world-class speech and Language AI models.
data:image/s3,"s3://crabby-images/4687d/4687dead905fce990761df2229b9bf8d49d6e94f" alt="react-native-fast-tflite Screenshot"
react-native-fast-tflite
A high-performance TensorFlow Lite library for React Native that utilizes JSI for power, zero-copy ArrayBuffers for efficiency, and low-level C/C++ TensorFlow Lite core API for direct memory access. It supports swapping out TensorFlow Models at runtime and GPU-accelerated delegates like CoreML/Metal/OpenGL. Easy VisionCamera integration allows for seamless usage. Users can load TensorFlow Lite models, interpret input and output data, and utilize GPU Delegates for faster computation. The library is suitable for real-time object detection, image classification, and other machine learning tasks in React Native applications.
data:image/s3,"s3://crabby-images/3639e/3639e687cf094a83de83fadaab6193a69b43e58f" alt="zml Screenshot"
zml
ZML is a high-performance AI inference stack built for production, using Zig language, MLIR, and Bazel. It allows users to create exciting AI projects, run pre-packaged models like MNIST, TinyLlama, OpenLLama, and Meta Llama, and compile models for accelerator runtimes. Users can also run tests, explore examples, and contribute to the project. ZML is licensed under the Apache 2.0 license.
data:image/s3,"s3://crabby-images/72b1f/72b1ff810b08cf666f5d35b6ba64f8f8cc4a6f8e" alt="Gemini-API Screenshot"
Gemini-API
Gemini-API is a reverse-engineered asynchronous Python wrapper for Google Gemini web app (formerly Bard). It provides features like persistent cookies, ImageFx support, extension support, classified outputs, official flavor, and asynchronous operation. The tool allows users to generate contents from text or images, have conversations across multiple turns, retrieve images in response, generate images with ImageFx, save images to local files, use Gemini extensions, check and switch reply candidates, and control log level.
data:image/s3,"s3://crabby-images/d41e6/d41e65b8c7a0a85aad2c4498e03a41ccc4ac8ecc" alt="minja Screenshot"
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
data:image/s3,"s3://crabby-images/ff9bb/ff9bb257cd69bb751b2c987b45d3efbd63e3f34e" alt="react-native-vercel-ai Screenshot"
react-native-vercel-ai
Run Vercel AI package on React Native, Expo, Web and Universal apps. Currently React Native fetch API does not support streaming which is used as a default on Vercel AI. This package enables you to use AI library on React Native but the best usage is when used on Expo universal native apps. On mobile you get back responses without streaming with the same API of `useChat` and `useCompletion` and on web it will fallback to `ai/react`
data:image/s3,"s3://crabby-images/54f48/54f48eca995b56e79c866b4f865c038e830cde46" alt="lloco Screenshot"
lloco
LLoCO is a technique that learns documents offline through context compression and in-domain parameter-efficient finetuning using LoRA, which enables LLMs to handle long context efficiently.
data:image/s3,"s3://crabby-images/a2bdd/a2bdd0d8e6408ca7f632b6bdfa86816890f763ac" alt="raglite Screenshot"
raglite
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.
data:image/s3,"s3://crabby-images/2d6c0/2d6c0b486cf1d67108668dae6e0723b48dfc6f66" alt="aire Screenshot"
aire
Aire is a modern Laravel form builder with a focus on expressive and beautiful code. It allows easy configuration of form components using fluent method calls or Blade components. Aire supports customization through config files and custom views, data binding with Eloquent models or arrays, method spoofing, CSRF token injection, server-side and client-side validation, and translations. It is designed to run on Laravel 5.8.28 and higher, with support for PHP 7.1 and higher. Aire is actively maintained and under consideration for additional features like read-only plain text, cross-browser support for custom checkboxes and radio buttons, support for Choices.js or similar libraries, improved file input handling, and better support for content prepending or appending to inputs.
data:image/s3,"s3://crabby-images/ddd86/ddd868babfe66c9a2b84c549175aa19996f34d2f" alt="shortest Screenshot"
shortest
Shortest is an AI-powered natural language end-to-end testing framework built on Playwright. It provides a seamless testing experience by allowing users to write tests in natural language and execute them using Anthropic Claude API. The framework also offers GitHub integration with 2FA support, making it suitable for testing web applications with complex authentication flows. Shortest simplifies the testing process by enabling users to run tests locally or in CI/CD pipelines, ensuring the reliability and efficiency of web applications.
For similar tasks
data:image/s3,"s3://crabby-images/1e921/1e92140626e352c2cab3981ad76fcf81250c0cd3" alt="suno-api Screenshot"
suno-api
Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.
data:image/s3,"s3://crabby-images/db66b/db66b3876a97e5c495fe62c6a38c7d35c2793ee2" alt="lollms-webui Screenshot"
lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.
data:image/s3,"s3://crabby-images/c621c/c621c46ac4c3d25c7e7cc36b626e0e71719c2868" alt="openvino-plugins-ai-audacity Screenshot"
openvino-plugins-ai-audacity
OpenVINO™ AI Plugins for Audacity* are a set of AI-enabled effects, generators, and analyzers for Audacity®. These AI features run 100% locally on your PC -- no internet connection necessary! OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU. * **Music Separation**: Separate a mono or stereo track into individual stems -- Drums, Bass, Vocals, & Other Instruments. * **Noise Suppression**: Removes background noise from an audio sample. * **Music Generation & Continuation**: Uses MusicGen LLM to generate snippets of music, or to generate a continuation of an existing snippet of music. * **Whisper Transcription**: Uses whisper.cpp to generate a label track containing the transcription or translation for a given selection of spoken audio or vocals.
data:image/s3,"s3://crabby-images/0b669/0b6695876cc45c5c2472df1bbf2f5f00cffa34a7" alt="SunoApi Screenshot"
SunoApi
SunoAPI is an unofficial client for Suno AI, built on Python and Streamlit. It supports functions like generating music and obtaining music information. Users can set up multiple account information to be saved for use. The tool also features built-in maintenance and activation functions for tokens, eliminating concerns about token expiration. It supports multiple languages and allows users to upload pictures for generating songs based on image content analysis.
data:image/s3,"s3://crabby-images/1d54b/1d54b19e4239e8b2014d0fb9a4fcaa8ef396f7f8" alt="awesome-generative-ai Screenshot"
awesome-generative-ai
Awesome Generative AI is a curated list of modern Generative Artificial Intelligence projects and services. Generative AI technology creates original content like images, sounds, and texts using machine learning algorithms trained on large data sets. It can produce unique and realistic outputs such as photorealistic images, digital art, music, and writing. The repo covers a wide range of applications in art, entertainment, marketing, academia, and computer science.
data:image/s3,"s3://crabby-images/b2d6e/b2d6e70f290b5699cd7f9e0cce7413e3a45612fc" alt="ai-audio-datasets Screenshot"
ai-audio-datasets
AI Audio Datasets List (AI-ADL) is a comprehensive collection of datasets consisting of speech, music, and sound effects, used for Generative AI, AIGC, AI model training, and audio applications. It includes datasets for speech recognition, speech synthesis, music information retrieval, music generation, audio processing, sound synthesis, and more. The repository provides a curated list of diverse datasets suitable for various AI audio tasks.
data:image/s3,"s3://crabby-images/b8811/b8811c213afe4bae8132b6beef3ec03c49527b8e" alt="awesome-large-audio-models Screenshot"
awesome-large-audio-models
This repository is a curated list of awesome large AI models in audio signal processing, focusing on the application of large language models to audio tasks. It includes survey papers, popular large audio models, automatic speech recognition, neural speech synthesis, speech translation, other speech applications, large audio models in music, and audio datasets. The repository aims to provide a comprehensive overview of recent advancements and challenges in applying large language models to audio signal processing, showcasing the efficacy of transformer-based architectures in various audio tasks.
data:image/s3,"s3://crabby-images/c2c97/c2c97dd8b8eaafde2057478373be0f41c98a6869" alt="YuE Screenshot"
YuE
YuE (乐) is an open-source foundation model designed for music generation, specifically transforming lyrics into full songs. It can generate complete songs in various genres and vocal styles, ensuring a polished and cohesive result. The model requires significant GPU memory for generating long sequences and recommends specific configurations for optimal performance. Users can customize the number of sessions for memory usage. The tool provides a quickstart guide for generating music using Transformers and includes tips for execution time and tag selection. The project is licensed under Creative Commons Attribution Non Commercial 4.0.
For similar jobs
data:image/s3,"s3://crabby-images/cae7f/cae7fbfa58d767cb2d1fe073d4c538e197c4db87" alt="metavoice-src Screenshot"
metavoice-src
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities: * Emotional speech rhythm and tone in English. * Zero-shot cloning for American & British voices, with 30s reference audio. * Support for (cross-lingual) voice cloning with finetuning. * We have had success with as little as 1 minute training data for Indian speakers. * Synthesis of arbitrary length text
data:image/s3,"s3://crabby-images/1e921/1e92140626e352c2cab3981ad76fcf81250c0cd3" alt="suno-api Screenshot"
suno-api
Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.
data:image/s3,"s3://crabby-images/36990/369905066895a4ad65ec2c9fc589bef1e0c5156c" alt="bark.cpp Screenshot"
bark.cpp
Bark.cpp is a C/C++ implementation of the Bark model, a real-time, multilingual text-to-speech generation model. It supports AVX, AVX2, and AVX512 for x86 architectures, and is compatible with both CPU and GPU backends. Bark.cpp also supports mixed F16/F32 precision and 4-bit, 5-bit, and 8-bit integer quantization. It can be used to generate realistic-sounding audio from text prompts.
data:image/s3,"s3://crabby-images/8ce72/8ce727a4d95202788544780337710871dc503420" alt="NSMusicS Screenshot"
NSMusicS
NSMusicS is a local music software that is expected to support multiple platforms with AI capabilities and multimodal features. The goal of NSMusicS is to integrate various functions (such as artificial intelligence, streaming, music library management, cross platform, etc.), which can be understood as similar to Navidrome but with more features than Navidrome. It wants to become a plugin integrated application that can almost have all music functions.
data:image/s3,"s3://crabby-images/53296/53296dbab642b63917658108b1c0c7f75a350be1" alt="ai-voice-cloning Screenshot"
ai-voice-cloning
This repository provides a tool for AI voice cloning, allowing users to generate synthetic speech that closely resembles a target speaker's voice. The tool is designed to be user-friendly and accessible, with a graphical user interface that guides users through the process of training a voice model and generating synthetic speech. The tool also includes a variety of features that allow users to customize the generated speech, such as the pitch, volume, and speaking rate. Overall, this tool is a valuable resource for anyone interested in creating realistic and engaging synthetic speech.
data:image/s3,"s3://crabby-images/55856/55856e34effccd061a596ee10a78c0aa5e96d490" alt="RVC_CLI Screenshot"
RVC_CLI
**RVC_CLI: Retrieval-based Voice Conversion Command Line Interface** This command-line interface (CLI) provides a comprehensive set of tools for voice conversion, enabling you to modify the pitch, timbre, and other characteristics of audio recordings. It leverages advanced machine learning models to achieve realistic and high-quality voice conversions. **Key Features:** * **Inference:** Convert the pitch and timbre of audio in real-time or process audio files in batch mode. * **TTS Inference:** Synthesize speech from text using a variety of voices and apply voice conversion techniques. * **Training:** Train custom voice conversion models to meet specific requirements. * **Model Management:** Extract, blend, and analyze models to fine-tune and optimize performance. * **Audio Analysis:** Inspect audio files to gain insights into their characteristics. * **API:** Integrate the CLI's functionality into your own applications or workflows. **Applications:** The RVC_CLI finds applications in various domains, including: * **Music Production:** Create unique vocal effects, harmonies, and backing vocals. * **Voiceovers:** Generate voiceovers with different accents, emotions, and styles. * **Audio Editing:** Enhance or modify audio recordings for podcasts, audiobooks, and other content. * **Research and Development:** Explore and advance the field of voice conversion technology. **For Jobs:** * Audio Engineer * Music Producer * Voiceover Artist * Audio Editor * Machine Learning Engineer **AI Keywords:** * Voice Conversion * Pitch Shifting * Timbre Modification * Machine Learning * Audio Processing **For Tasks:** * Convert Pitch * Change Timbre * Synthesize Speech * Train Model * Analyze Audio
data:image/s3,"s3://crabby-images/c621c/c621c46ac4c3d25c7e7cc36b626e0e71719c2868" alt="openvino-plugins-ai-audacity Screenshot"
openvino-plugins-ai-audacity
OpenVINO™ AI Plugins for Audacity* are a set of AI-enabled effects, generators, and analyzers for Audacity®. These AI features run 100% locally on your PC -- no internet connection necessary! OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU. * **Music Separation**: Separate a mono or stereo track into individual stems -- Drums, Bass, Vocals, & Other Instruments. * **Noise Suppression**: Removes background noise from an audio sample. * **Music Generation & Continuation**: Uses MusicGen LLM to generate snippets of music, or to generate a continuation of an existing snippet of music. * **Whisper Transcription**: Uses whisper.cpp to generate a label track containing the transcription or translation for a given selection of spoken audio or vocals.
data:image/s3,"s3://crabby-images/fe563/fe563d798b500e2b0ead96a004e7fd95d88cc345" alt="WavCraft Screenshot"
WavCraft
WavCraft is an LLM-driven agent for audio content creation and editing. It applies LLM to connect various audio expert models and DSP function together. With WavCraft, users can edit the content of given audio clip(s) conditioned on text input, create an audio clip given text input, get more inspiration from WavCraft by prompting a script setting and let the model do the scriptwriting and create the sound, and check if your audio file is synthesized by WavCraft.