friendly-stable-audio-tools

Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.

Stars: 63

Visit

This repository is a refactored and updated version of `stable-audio-tools`, an open-source code for audio/music generative models originally by Stability AI. It contains refactored codes for improved readability and usability, useful scripts for evaluating and playing with trained models, and instructions on how to train models such as `Stable Audio 2.0`. The repository does not contain any pretrained checkpoints. Requirements include PyTorch 2.0 or later for Flash Attention support and Python 3.8.10 or later for development. The repository provides guidance on installing, building a training environment using Docker or Singularity, logging with Weights & Biases, training configurations, and stages for VAE-GAN and Diffusion Transformer (DiT) training.

README:

🐈 friendly-stable-audio-tools

This repository is a refactored / updated version of stable-audio-tools which is an open-source code for audio/music generative models originally by Stability AI.

https://github.com/Stability-AI/stable-audio-tools

This repository contains the following additional features:

🔥 Refactored codes of stable-audio-tools for improved readability and usability.
🔥 Useful scripts for evaluating and playing with your own trained models.
🔥 Instruction on how to train models such as Stable Audio 2.0.

and does NOT contain:

Any pretrained checkpoints

Requirements

PyTorch 2.0 or later for Flash Attention support
Development for the repo is done in Python 3.8.10 or later

Install

To run the training scripts or inference code, you'll need to clone this repository, navigate to the root directory, and then execute the pip command as follow:

$ git clone https://github.com/yukara-ikemiya/friendly-stable-audio-tools.git
$ cd friendly-stable-audio-tools
$ pip install .
$ # you may need to execute this to avoid Accelerate import error
$ pip uninstall -y transformer-engine

Building a training environment

To simplify setting up the training environment, I recommend to use container systems like Docker or Singularity instead of installing dependencies on each GPU machine. Below are the steps for creating Docker and Singularity containers.

All example scripts are stored at the container folder.

Please be sure that Docker and Singularity are installed in advance.

1. Create a Docker image

$ # create a Docker image
$ NAME=friendly-stable-audio-tools
$ docker build  -t ${NAME} -f ./container/${NAME}.Dockerfile .

2. Convert a Docker image to a Singularity container

$ # convert a Docker image to a Singularity container
$ singularity build friendly-stable-audio-tools.sif docker-daemon://friendly-stable-audio-tools

By running the above script, friendly-stable-audio-tools.sif should be created in the working directory.

Logging

WandB setting

The training code also requires a Weights & Biases account to log the training outputs and demos. Create an account and log in with:

$ wandb login

Or you can also pass an API key as an environment variable WANDB_API_KEY. (You can obtain the API key from https://wandb.ai/authorize after logging in to your account.)

$ WANDB_API_KEY="12345x6789y..."

This method is convenient when you want to execute the code using containers such as Docker or Singularity.

Training

Configuration files

Before starting your training run, you have to prepare the following two configuration files.

model config file
dataset config file

For more information about those, refer to the Configuration section below.

Training from scratch

To start a training run, run the train.py script in the repo root with:

$ python3 train.py --dataset-config /path/to/dataset/config --model-config /path/to/model/config --name my_experiment

The --name parameter will set the project name for your Weights and Biases run.

Fine-tuning

Fine-tuning involves resuming a training run from a pre-trained checkpoint.

To resume training from a wrapped checkpoint, you can pass in the checkpoint path (.ckpt) to train.py with the --ckpt-path flag.
To start fresh training from a pre-trained unwrapped model, you can pass in the unwrapped checkpoint path (.ckpt) to train.py with the --pretrained-ckpt-path flag.

Unwrapping a model

stable-audio-tools uses PyTorch Lightning to facilitate multi-GPU and multi-node training.

When a model is being trained, it is wrapped in a "training wrapper", which is a pl.LightningModule that contains all of the relevant objects needed only for training. That includes things like discriminators for autoencoders, EMA copies of models, and all of the optimizer states.

The checkpoint files created during training include this training wrapper, which greatly increases the size of the checkpoint file.

unwrap_model.py takes in a wrapped model checkpoint and save a new checkpoint file including only the model itself.

That can be run with from the repo root with:

$ python3 unwrap_model.py --model-config /path/to/model/config --ckpt-path /path/to/wrapped/ckpt.ckpt --name /new/path/to/new_ckpt_name

Unwrapped model checkpoints are required for:

Inference scripts
Using a model as a pretransform for another model (e.g. using an autoencoder model for latent diffusion)
Fine-tuning a pre-trained model with a modified configuration (i.e. partial initialization)

Configurations

Training and inference code for stable-audio-tools is based around JSON configuration files that define model hyperparameters, training settings, and information about your training dataset.

Model config

The model config file defines all of the information needed to load a model for training or inference. It also contains the training configuration needed to fine-tune a model or train from scratch.

The following properties are defined in the top level of the model configuration:

model_type
- The type of model being defined, currently limited to one of "autoencoder", "diffusion_uncond", "diffusion_cond", "diffusion_cond_inpaint", "diffusion_autoencoder", "lm".
sample_size
- The length of the audio provided to the model during training, in samples. For diffusion models, this is also the raw audio sample length used for inference.
sample_rate
- The sample rate of the audio provided to the model during training, and generated during inference, in Hz.
audio_channels
- The number of channels of audio provided to the model during training, and generated during inference. Defaults to 2. Set to 1 for mono.
model
- The specific configuration for the model being defined, varies based on model_type
training
- The training configuration for the model, varies based on model_type. Provides parameters for training as well as demos.

Dataset config

stable-audio-tools currently supports two kinds of data sources: local directories of audio files, and WebDataset datasets stored in Amazon S3. More information can be found in the dataset config documentation

Additional training flags

Additional optional flags for train.py include:

--config-file
- The path to the defaults.ini file in the repo root, required if running train.py from a directory other than the repo root
--pretransform-ckpt-path
- Used in various model types such as latent diffusion models to load a pre-trained autoencoder. Requires an unwrapped model checkpoint.
--save-dir
- The directory in which to save the model checkpoints
--checkpoint-every
- The number of steps between saved checkpoints.
- Default: 10000
--batch-size
- Number of samples per-GPU during training. Should be set as large as your GPU VRAM will allow.
- Default: 8
--num-gpus
- Number of GPUs per-node to use for training
- Default: 1
--num-nodes
- Number of GPU nodes being used for training
- Default: 1
--accum-batches
- Enables and sets the number of batches for gradient batch accumulation. Useful for increasing effective batch size when training on smaller GPUs.
--strategy
- Multi-GPU strategy for distributed training. Setting to deepspeed will enable DeepSpeed ZeRO Stage 2.
- Default: ddp if --num_gpus > 1, else None
--precision
- floating-point precision to use during training
- Default: 16
--num-workers
- Number of CPU workers used by the data loader
--seed
- RNG seed for PyTorch, helps with deterministic training

🔥 Let's train `Stable Audio 2.0`

Prerequisites

Prepare a checkpoint of CLAP encoder

To use CLAP encoder for conditioning music generation, you have to prepare a pretrained checkpoint file of CLAP.

Download a pretrained CLAP checkpoint trained with music dataset (music_audioset_epoch_15_esc_90.14.pt) from the LAION CLAP repository.
Store the checkpoint file to a directory of your choice.
Edit a model config file of Stable Audio 2.0 as follows

= stable_audio_2_0.json =

...
"model": {
  ...
  "conditioning": {
            "configs": [
                {
                    ...
                    "config": {
                        ...
                        "clap_ckpt_path": "ckpt/clap/music_audioset_epoch_15_esc_90.14.pt",
                    ...

Prepare audio and metadata for training

Since Stable Audio uses text prompts as condition for music generation, you have to prepare them as metadata in addition to audio data.

When using a dataset in a local environment, I support the use of metadata in JSON format as follows.

You can include any information as metadata in a JSON file, but you must always include the text data named prompt required for training of Stable Audio.

= music_2.json =

{
    "prompt": "This is an electronic song sending positive vibes."
}

The metadata files must be placed in the same directory as corresponding audio files. And the file names must also be the same.

.
└── dataset/
    ├── music_1.wav
    ├── music_1.json
    ├── music_2.wav
    ├── music_2.json
    └── ...

Stage 1 : VAE-GAN (compression model)

Training

As the 1st stage of Stable Audio 2.0, you'll train a VAE-GAN which is a compression model for audio signal.

The model config file for a VAE-GAN is place in the configs directory. Regarding dataset configuration, please prepare a dataset config file corresponding to your own datasets.

Once you prepare configuration files, you can execute a training job like this:

CONTAINER_PATH="/path/to/sif/friendly-stable-audio-tools.sif"
ROOT_DIR="/path/to/friendly-stable-audio-tools/"
DATASET_DIR="/path/to/your/dataset/"
OUTPUT_DIR="/path/to/output/directory/"

MODEL_CONFIG="stable_audio_tools/configs/model_configs/autoencoders/stable_audio_2_0_vae.json"
DATASET_CONFIG="stable_audio_tools/configs/dataset_configs/local_training_example.json"

BATCH_SIZE=10 # WARNING : This is batch size per GPU
WANDB_API_KEY="12345x6789y..."
PORT=12345

# Singularity container case
# NOTE: Please change each configuration as you like

singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
  --env WANDB_API_KEY=$WANDB_API_KEY \
  ${CONTAINER_PATH} \
  torchrun --nproc_per_node gpu --master_port ${PORT} \
  ${ROOT_DIR}/train.py \
    --dataset-config ${DATASET_CONFIG} \
    --model-config ${MODEL_CONFIG} \
    --name "vae_training" \
    --num-gpus 8 \
    --batch-size ${BATCH_SIZE} \
    --num-workers 8 \
    --save-dir ${OUTPUT_DIR}

Model unwrapping

As described in the unwrapping-a-model section, after completing the training of VAE, you need to unwrap the model checkpoint for using the next stage training.

CKPT_PATH="/path/to/wrapped_ckpt/last.ckpt"
# NOTE: file extension ".ckpt" will be automatically added to the end of OUTPOUT_DIR name
OUTPUT_PATH="/path/to/output_name/unwrapped_last"

singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR \
  --env WANDB_API_KEY=$WANDB_API_KEY \
  ${CONTAINER_PATH} \
  torchrun --nproc_per_node gpu --master_port ${PORT} \
    ${ROOT_DIR}/unwrap_model.py \
    --model-config ${MODEL_CONFIG} \
    --ckpt-path ${CKPT_PATH} \
    --name ${OUTPUT_PATH}

Reconstruction test

Once you finished the VAE training, you might want to test and evaluate reconstruction quality of the trained model.

I support reconstruction of audio files in a directory with reconstruct_audios.py, and you can use the reconstructed audios for your evaluation.

AUDIO_DIR="/path/to/original_audio/"
OUTPUT_DIR="/path/to/output_audio/"

FRAME_DURATION=1.0 # [sec]
OVERLAP_RATE=0.01
BATCH_SIZE=50

singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
  --env WANDB_API_KEY=$WANDB_API_KEY \
  ${CONTAINER_PATH} \
  torchrun --nproc_per_node gpu --master_port ${PORT} \
    ${ROOT_DIR}/reconstruct_audios.py \
    --model-config ${MODEL_CONFIG} \
    --ckpt-path ${UNWRAP_CKPT_PATH} \
    --audio-dir ${AUDIO_DIR} \
    --output-dir ${OUTPUT_DIR} \
    --frame-duration ${FRAME_DURATION} \
    --overlap-rate ${OVERLAP_RATE} \
    --batch-size ${BATCH_SIZE}

Stage 2 : Diffusion Transformer (DiT)

Training

As the 2nd stage of Stable Audio 2.0, you'll train a DiT which is a generative model in latent domain.

Before this part, please make sure that

you have met all of the prerequisites
you have trained the VAE model and created an unwrapped checkpoints file (See the VAE section.)

Now, you can train a DiT model as follows:

CONTAINER_PATH="/path/to/sif/friendly-stable-audio-tools.sif"
ROOT_DIR="/path/to/friendly-stable-audio-tools/"
DATASET_DIR="/path/to/your/dataset/"
OUTPUT_DIR="/path/to/output/directory/"

MODEL_CONFIG="stable_audio_tools/configs/model_configs/txt2audio/stable_audio_2_0.json"
DATASET_CONFIG="stable_audio_tools/configs/dataset_configs/local_training_example.json"

# Pretrained checkpoint of VAE (Stage-1) model
PRETRANSFORM_CKPT="/path/to/vae_ckpt/unwrapped_last.ckpt"

BATCH_SIZE=10 # WARNING : This is batch size per GPU
WANDB_API_KEY="12345x6789y..."
PORT=12345

singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
  --env WANDB_API_KEY=$WANDB_API_KEY \
  ${CONTAINER_PATH} \
  torchrun --nproc_per_node gpu --master_port ${PORT} \
    ${ROOT_DIR}/train.py \
    --dataset-config ${DATASET_CONFIG} \
    --model-config ${MODEL_CONFIG} \
    --pretransform-ckpt-path ${PRETRANSFORM_CKPT} \
    --name "dit_training" \
    --num-gpus ${NUM_GPUS} \
    --batch-size ${BATCH_SIZE} \
    --save-dir ${OUTPUT_DIR}

Todo

[ ] Add convenient scripts for sampling
[ ] Add documentation for Gradio interface
[ ] Add troubleshooting section
[ ] Add contribution guidelines

For Tasks:

Click tags to check more tools for each tasks

train audio models evaluate trained models build training environment log training outputs fine-tune pre-trained models

For Jobs:

audio engineer machine learning engineer data scientist research scientist ai developer

Alternative AI tools for friendly-stable-audio-tools

Similar Open Source Tools

friendly-stable-audio-tools

github

: 63

ScandEval

ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.

github

: 81

cover-agent

CodiumAI Cover Agent is a tool designed to help increase code coverage by automatically generating qualified tests to enhance existing test suites. It utilizes Generative AI to streamline development workflows and is part of a suite of utilities aimed at automating the creation of unit tests for software projects. The system includes components like Test Runner, Coverage Parser, Prompt Builder, and AI Caller to simplify and expedite the testing process, ensuring high-quality software development. Cover Agent can be run via a terminal and is planned to be integrated into popular CI platforms. The tool outputs debug files locally, such as generated_prompt.md, run.log, and test_results.html, providing detailed information on generated tests and their status. It supports multiple LLMs and allows users to specify the model to use for test generation.

github

: 4.2k

LayerSkip

LayerSkip is an implementation enabling early exit inference and self-speculative decoding. It provides a code base for running models trained using the LayerSkip recipe, offering speedup through self-speculative decoding. The tool integrates with Hugging Face transformers and provides checkpoints for various LLMs. Users can generate tokens, benchmark on datasets, evaluate tasks, and sweep over hyperparameters to optimize inference speed. The tool also includes correctness verification scripts and Docker setup instructions. Additionally, other implementations like gpt-fast and Native HuggingFace are available. Training implementation is a work-in-progress, and contributions are welcome under the CC BY-NC license.

github

: 255

LESS

This repository contains the code for the paper 'LESS: Selecting Influential Data for Targeted Instruction Tuning'. The work proposes a data selection method to choose influential data for inducing a target capability. It includes steps for warmup training, building the gradient datastore, selecting data for a task, and training with the selected data. The repository provides tools for data preparation, data selection pipeline, and evaluation of the model trained on the selected data.

github

: 234

generative-models

Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.

github

: 23.6k

alignment-attribution-code

This repository provides an original implementation of Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications. It includes tools for neuron-level pruning, pruning based on set difference, Wanda/SNIP score dumping, rank-level pruning, and rank removal with orthogonal projection. Users can specify parameters like prune method, datasets, sparsity ratio, model, and save location to evaluate and modify neural networks for safety alignment.

github

: 73

llm-finetuning

llm-finetuning is a repository that provides a serverless twist to the popular axolotl fine-tuning library using Modal's serverless infrastructure. It allows users to quickly fine-tune any LLM model with state-of-the-art optimizations like Deepspeed ZeRO, LoRA adapters, Flash attention, and Gradient checkpointing. The repository simplifies the fine-tuning process by not exposing all CLI arguments, instead allowing users to specify options in a config file. It supports efficient training and scaling across multiple GPUs, making it suitable for production-ready fine-tuning jobs.

github

: 483

torchchat

torchchat is a codebase showcasing the ability to run large language models (LLMs) seamlessly. It allows running LLMs using Python in various environments such as desktop, server, iOS, and Android. The tool supports running models via PyTorch, chatting, generating text, running chat in the browser, and running models on desktop/server without Python. It also provides features like AOT Inductor for faster execution, running in C++ using the runner, and deploying and running on iOS and Android. The tool supports popular hardware and OS including Linux, Mac OS, Android, and iOS, with various data types and execution modes available.

github

: 3.5k

depthai

This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.

github

: 927

web-llm

WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally, with functionalities including json-mode, function-calling, streaming, etc. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.

github

: 13.1k

Aidan-Bench

Aidan Bench is a tool that rewards creativity, reliability, contextual attention, and instruction following. It is weakly correlated with Lmsys, has no score ceiling, and aligns with real-world open-ended use. The tool involves giving LLMs open-ended questions and evaluating their answers based on novelty scores. Users can set up the tool by installing required libraries and setting up API keys. The project allows users to run benchmarks for different models and provides flexibility in threading options.

github

: 71

vector-inference

This repository provides an easy-to-use solution for running inference servers on Slurm-managed computing clusters using vLLM. All scripts in this repository run natively on the Vector Institute cluster environment. Users can deploy models as Slurm jobs, check server status and performance metrics, and shut down models. The repository also supports launching custom models with specific configurations. Additionally, users can send inference requests and set up an SSH tunnel to run inference from a local device.

github

: 53

lhotse

Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.

github

: 999

mlx-lm

MLX LM is a Python package designed for generating text and fine-tuning large language models on Apple silicon using MLX. It offers integration with the Hugging Face Hub for easy access to thousands of LLMs, support for quantizing and uploading models to the Hub, low-rank and full model fine-tuning capabilities, and distributed inference and fine-tuning with `mx.distributed`. Users can interact with the package through command line options or the Python API, enabling tasks such as text generation, chatting with language models, model conversion, streaming generation, and sampling. MLX LM supports various Hugging Face models and provides tools for efficient scaling to long prompts and generations, including a rotating key-value cache and prompt caching. It requires macOS 15.0 or higher for optimal performance.

github

: 339

mflux

MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.

github

: 1.3k

For similar tasks

friendly-stable-audio-tools

github

: 63

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 5.8k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 106

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529

friendly-stable-audio-tools

README:

🐈 friendly-stable-audio-tools

Requirements

Install

Building a training environment

1. Create a Docker image

2. Convert a Docker image to a Singularity container

Logging

WandB setting

Training

Configuration files

Training from scratch

Fine-tuning

Unwrapping a model

Configurations

Model config

Dataset config

Additional training flags

🔥 Let's train Stable Audio 2.0

Prerequisites

Prepare a checkpoint of CLAP encoder

Prepare audio and metadata for training

Stage 1 : VAE-GAN (compression model)

Training

Model unwrapping

Reconstruction test

Stage 2 : Diffusion Transformer (DiT)

Training

Todo

For Tasks:

For Jobs:

Alternative AI tools for friendly-stable-audio-tools

Similar Open Source Tools

friendly-stable-audio-tools

ScandEval

cover-agent

LayerSkip

LESS

generative-models

alignment-attribution-code

llm-finetuning

torchchat

depthai

web-llm

Aidan-Bench

vector-inference

lhotse

mlx-lm

mflux

For similar tasks

friendly-stable-audio-tools

For similar jobs

promptflow

deepeval

MegaDetector

leapfrogai

llava-docker

carrot

TrustLLM

AI-YinMei

🔥 Let's train `Stable Audio 2.0`