
friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Stars: 63

This repository is a refactored and updated version of `stable-audio-tools`, an open-source code for audio/music generative models originally by Stability AI. It contains refactored codes for improved readability and usability, useful scripts for evaluating and playing with trained models, and instructions on how to train models such as `Stable Audio 2.0`. The repository does not contain any pretrained checkpoints. Requirements include PyTorch 2.0 or later for Flash Attention support and Python 3.8.10 or later for development. The repository provides guidance on installing, building a training environment using Docker or Singularity, logging with Weights & Biases, training configurations, and stages for VAE-GAN and Diffusion Transformer (DiT) training.
README:
This repository is a refactored / updated version of stable-audio-tools
which is an open-source code for audio/music generative models
originally by Stability AI.
This repository contains the following additional features:
- 🔥 Refactored codes of
stable-audio-tools
for improved readability and usability. - 🔥 Useful scripts for evaluating and playing with your own trained models.
- 🔥 Instruction on how to train models such as
Stable Audio 2.0
.
and does NOT contain:
- Any pretrained checkpoints
- PyTorch 2.0 or later for Flash Attention support
- Development for the repo is done in Python 3.8.10 or later
To run the training scripts or inference code, you'll need to clone this repository, navigate to the root directory, and then execute the pip
command as follow:
$ git clone https://github.com/yukara-ikemiya/friendly-stable-audio-tools.git
$ cd friendly-stable-audio-tools
$ pip install .
$ # you may need to execute this to avoid Accelerate import error
$ pip uninstall -y transformer-engine
To simplify setting up the training environment, I recommend to use container systems like Docker
or Singularity
instead of installing dependencies on each GPU machine. Below are the steps for creating Docker and Singularity containers.
All example scripts are stored at the container folder.
Please be sure that Docker and Singularity are installed in advance.
$ # create a Docker image
$ NAME=friendly-stable-audio-tools
$ docker build -t ${NAME} -f ./container/${NAME}.Dockerfile .
$ # convert a Docker image to a Singularity container
$ singularity build friendly-stable-audio-tools.sif docker-daemon://friendly-stable-audio-tools
By running the above script, friendly-stable-audio-tools.sif
should be created in the working directory.
The training code also requires a Weights & Biases account to log the training outputs and demos. Create an account and log in with:
$ wandb login
Or you can also pass an API key as an environment variable WANDB_API_KEY
.
(You can obtain the API key from https://wandb.ai/authorize after logging in to your account.)
$ WANDB_API_KEY="12345x6789y..."
This method is convenient when you want to execute the code using containers such as Docker or Singularity.
Before starting your training run, you have to prepare the following two configuration files.
- model config file
- dataset config file
For more information about those, refer to the Configuration section below.
To start a training run, run the train.py
script in the repo root with:
$ python3 train.py --dataset-config /path/to/dataset/config --model-config /path/to/model/config --name my_experiment
The --name
parameter will set the project name for your Weights and Biases run.
Fine-tuning involves resuming a training run from a pre-trained checkpoint.
- To resume training from a wrapped checkpoint, you can pass in the checkpoint path (.ckpt) to
train.py
with the--ckpt-path
flag. - To start fresh training from a pre-trained unwrapped model, you can pass in the unwrapped checkpoint path (.ckpt) to
train.py
with the--pretrained-ckpt-path
flag.
stable-audio-tools
uses PyTorch Lightning to facilitate multi-GPU and multi-node training.
When a model is being trained, it is wrapped in a "training wrapper", which is a pl.LightningModule
that contains all of the relevant objects needed only for training. That includes things like discriminators for autoencoders, EMA copies of models, and all of the optimizer states.
The checkpoint files created during training include this training wrapper, which greatly increases the size of the checkpoint file.
unwrap_model.py
takes in a wrapped model checkpoint and save a new checkpoint file including only the model itself.
That can be run with from the repo root with:
$ python3 unwrap_model.py --model-config /path/to/model/config --ckpt-path /path/to/wrapped/ckpt.ckpt --name /new/path/to/new_ckpt_name
Unwrapped model checkpoints are required for:
- Inference scripts
- Using a model as a pretransform for another model (e.g. using an autoencoder model for latent diffusion)
- Fine-tuning a pre-trained model with a modified configuration (i.e. partial initialization)
Training and inference code for stable-audio-tools
is based around JSON configuration files that define model hyperparameters, training settings, and information about your training dataset.
The model config file defines all of the information needed to load a model for training or inference. It also contains the training configuration needed to fine-tune a model or train from scratch.
The following properties are defined in the top level of the model configuration:
-
model_type
- The type of model being defined, currently limited to one of
"autoencoder", "diffusion_uncond", "diffusion_cond", "diffusion_cond_inpaint", "diffusion_autoencoder", "lm"
.
- The type of model being defined, currently limited to one of
-
sample_size
- The length of the audio provided to the model during training, in samples. For diffusion models, this is also the raw audio sample length used for inference.
-
sample_rate
- The sample rate of the audio provided to the model during training, and generated during inference, in Hz.
-
audio_channels
- The number of channels of audio provided to the model during training, and generated during inference. Defaults to 2. Set to 1 for mono.
-
model
- The specific configuration for the model being defined, varies based on
model_type
- The specific configuration for the model being defined, varies based on
-
training
- The training configuration for the model, varies based on
model_type
. Provides parameters for training as well as demos.
- The training configuration for the model, varies based on
stable-audio-tools
currently supports two kinds of data sources: local directories of audio files, and WebDataset datasets stored in Amazon S3. More information can be found in the dataset config documentation
Additional optional flags for train.py
include:
-
--config-file
- The path to the defaults.ini file in the repo root, required if running
train.py
from a directory other than the repo root
- The path to the defaults.ini file in the repo root, required if running
-
--pretransform-ckpt-path
- Used in various model types such as latent diffusion models to load a pre-trained autoencoder. Requires an unwrapped model checkpoint.
-
--save-dir
- The directory in which to save the model checkpoints
-
--checkpoint-every
- The number of steps between saved checkpoints.
- Default: 10000
-
--batch-size
- Number of samples per-GPU during training. Should be set as large as your GPU VRAM will allow.
- Default: 8
-
--num-gpus
- Number of GPUs per-node to use for training
- Default: 1
-
--num-nodes
- Number of GPU nodes being used for training
- Default: 1
-
--accum-batches
- Enables and sets the number of batches for gradient batch accumulation. Useful for increasing effective batch size when training on smaller GPUs.
-
--strategy
- Multi-GPU strategy for distributed training. Setting to
deepspeed
will enable DeepSpeed ZeRO Stage 2. -
Default:
ddp
if--num_gpus
> 1, else None
- Multi-GPU strategy for distributed training. Setting to
-
--precision
- floating-point precision to use during training
- Default: 16
-
--num-workers
- Number of CPU workers used by the data loader
-
--seed
- RNG seed for PyTorch, helps with deterministic training
To use CLAP encoder for conditioning music generation, you have to prepare a pretrained checkpoint file of CLAP.
- Download a pretrained CLAP checkpoint trained with music dataset (
music_audioset_epoch_15_esc_90.14.pt
) from the LAION CLAP repository. - Store the checkpoint file to a directory of your choice.
- Edit a
model config
file of Stable Audio 2.0 as follows
= stable_audio_2_0.json =
...
"model": {
...
"conditioning": {
"configs": [
{
...
"config": {
...
"clap_ckpt_path": "ckpt/clap/music_audioset_epoch_15_esc_90.14.pt",
...
Since Stable Audio uses text prompts as condition for music generation, you have to prepare them as metadata in addition to audio data.
When using a dataset in a local environment, I support the use of metadata in JSON format as follows.
- You can include any information as metadata in a JSON file, but you must always include the text data named
prompt
required for training of Stable Audio.
= music_2.json =
{
"prompt": "This is an electronic song sending positive vibes."
}
- The metadata files must be placed in the same directory as corresponding audio files. And the file names must also be the same.
.
└── dataset/
├── music_1.wav
├── music_1.json
├── music_2.wav
├── music_2.json
└── ...
As the 1st stage of Stable Audio 2.0, you'll train a VAE-GAN which is a compression model for audio signal.
The model config file for a VAE-GAN is place in the configs directory. Regarding dataset configuration, please prepare a dataset config file corresponding to your own datasets.
Once you prepare configuration files, you can execute a training job like this:
CONTAINER_PATH="/path/to/sif/friendly-stable-audio-tools.sif"
ROOT_DIR="/path/to/friendly-stable-audio-tools/"
DATASET_DIR="/path/to/your/dataset/"
OUTPUT_DIR="/path/to/output/directory/"
MODEL_CONFIG="stable_audio_tools/configs/model_configs/autoencoders/stable_audio_2_0_vae.json"
DATASET_CONFIG="stable_audio_tools/configs/dataset_configs/local_training_example.json"
BATCH_SIZE=10 # WARNING : This is batch size per GPU
WANDB_API_KEY="12345x6789y..."
PORT=12345
# Singularity container case
# NOTE: Please change each configuration as you like
singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
--env WANDB_API_KEY=$WANDB_API_KEY \
${CONTAINER_PATH} \
torchrun --nproc_per_node gpu --master_port ${PORT} \
${ROOT_DIR}/train.py \
--dataset-config ${DATASET_CONFIG} \
--model-config ${MODEL_CONFIG} \
--name "vae_training" \
--num-gpus 8 \
--batch-size ${BATCH_SIZE} \
--num-workers 8 \
--save-dir ${OUTPUT_DIR}
As described in the unwrapping-a-model section, after completing the training of VAE, you need to unwrap the model checkpoint for using the next stage training.
CKPT_PATH="/path/to/wrapped_ckpt/last.ckpt"
# NOTE: file extension ".ckpt" will be automatically added to the end of OUTPOUT_DIR name
OUTPUT_PATH="/path/to/output_name/unwrapped_last"
singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR \
--env WANDB_API_KEY=$WANDB_API_KEY \
${CONTAINER_PATH} \
torchrun --nproc_per_node gpu --master_port ${PORT} \
${ROOT_DIR}/unwrap_model.py \
--model-config ${MODEL_CONFIG} \
--ckpt-path ${CKPT_PATH} \
--name ${OUTPUT_PATH}
Once you finished the VAE training, you might want to test and evaluate reconstruction quality of the trained model.
I support reconstruction of audio files in a directory with reconstruct_audios.py
,
and you can use the reconstructed audios for your evaluation.
AUDIO_DIR="/path/to/original_audio/"
OUTPUT_DIR="/path/to/output_audio/"
FRAME_DURATION=1.0 # [sec]
OVERLAP_RATE=0.01
BATCH_SIZE=50
singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
--env WANDB_API_KEY=$WANDB_API_KEY \
${CONTAINER_PATH} \
torchrun --nproc_per_node gpu --master_port ${PORT} \
${ROOT_DIR}/reconstruct_audios.py \
--model-config ${MODEL_CONFIG} \
--ckpt-path ${UNWRAP_CKPT_PATH} \
--audio-dir ${AUDIO_DIR} \
--output-dir ${OUTPUT_DIR} \
--frame-duration ${FRAME_DURATION} \
--overlap-rate ${OVERLAP_RATE} \
--batch-size ${BATCH_SIZE}
As the 2nd stage of Stable Audio 2.0, you'll train a DiT which is a generative model in latent domain.
Before this part, please make sure that
- you have met all of the prerequisites
- you have trained the VAE model and created an unwrapped checkpoints file (See the VAE section.)
Now, you can train a DiT model as follows:
CONTAINER_PATH="/path/to/sif/friendly-stable-audio-tools.sif"
ROOT_DIR="/path/to/friendly-stable-audio-tools/"
DATASET_DIR="/path/to/your/dataset/"
OUTPUT_DIR="/path/to/output/directory/"
MODEL_CONFIG="stable_audio_tools/configs/model_configs/txt2audio/stable_audio_2_0.json"
DATASET_CONFIG="stable_audio_tools/configs/dataset_configs/local_training_example.json"
# Pretrained checkpoint of VAE (Stage-1) model
PRETRANSFORM_CKPT="/path/to/vae_ckpt/unwrapped_last.ckpt"
BATCH_SIZE=10 # WARNING : This is batch size per GPU
WANDB_API_KEY="12345x6789y..."
PORT=12345
singularity exec --nv --pwd $ROOT_DIR -B $ROOT_DIR -B $DATASET_DIR \
--env WANDB_API_KEY=$WANDB_API_KEY \
${CONTAINER_PATH} \
torchrun --nproc_per_node gpu --master_port ${PORT} \
${ROOT_DIR}/train.py \
--dataset-config ${DATASET_CONFIG} \
--model-config ${MODEL_CONFIG} \
--pretransform-ckpt-path ${PRETRANSFORM_CKPT} \
--name "dit_training" \
--num-gpus ${NUM_GPUS} \
--batch-size ${BATCH_SIZE} \
--save-dir ${OUTPUT_DIR}
- [ ] Add convenient scripts for sampling
- [ ] Add documentation for Gradio interface
- [ ] Add troubleshooting section
- [ ] Add contribution guidelines
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for friendly-stable-audio-tools
Similar Open Source Tools

friendly-stable-audio-tools
This repository is a refactored and updated version of `stable-audio-tools`, an open-source code for audio/music generative models originally by Stability AI. It contains refactored codes for improved readability and usability, useful scripts for evaluating and playing with trained models, and instructions on how to train models such as `Stable Audio 2.0`. The repository does not contain any pretrained checkpoints. Requirements include PyTorch 2.0 or later for Flash Attention support and Python 3.8.10 or later for development. The repository provides guidance on installing, building a training environment using Docker or Singularity, logging with Weights & Biases, training configurations, and stages for VAE-GAN and Diffusion Transformer (DiT) training.

ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.

DALM
The DALM (Domain Adapted Language Modeling) toolkit is designed to unify general LLMs with vector stores to ground AI systems in efficient, factual domains. It provides developers with tools to build on top of Arcee's open source Domain Pretrained LLMs, enabling organizations to deeply tailor AI according to their unique intellectual property and worldview. The toolkit contains code for fine-tuning a fully differential Retrieval Augmented Generation (RAG-end2end) architecture, incorporating in-batch negative concept alongside RAG's marginalization for efficiency. It includes training scripts for both retriever and generator models, evaluation scripts, data processing codes, and synthetic data generation code.

lhotse
Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.

raglite
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.

AnyGPT
AnyGPT is a unified multimodal language model that utilizes discrete representations for processing various modalities like speech, text, images, and music. It aligns the modalities for intermodal conversions and text processing. AnyInstruct dataset is constructed for generative models. The model proposes a generative training scheme using Next Token Prediction task for training on a Large Language Model (LLM). It aims to compress vast multimodal data on the internet into a single model for emerging capabilities. The tool supports tasks like text-to-image, image captioning, ASR, TTS, text-to-music, and music captioning.

obs-cleanstream
CleanStream is an OBS plugin that utilizes real-time local AI to clean live audio streams by removing unwanted words and utterances, such as 'uh' and 'um', and configurable words like profanity. It employs a neural network (OpenAI Whisper) to predict speech in real-time and eliminate undesired words. The plugin runs efficiently using the Whisper.cpp project from ggerganov. CleanStream offers users the ability to adjust settings and add the plugin to any audio-generating source in OBS, providing a seamless experience for content creators looking to enhance the quality of their live audio streams.

swe-rl
SWE-RL is the official codebase for the paper 'SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution'. It is the first approach to scale reinforcement learning based LLM reasoning for real-world software engineering, leveraging open-source software evolution data and rule-based rewards. The code provides prompt templates and the implementation of the reward function based on sequence similarity. Agentless Mini, a part of SWE-RL, builds on top of Agentless with improvements like fast async inference, code refactoring for scalability, and support for using multiple reproduction tests for reranking. The tool can be used for localization, repair, and reproduction test generation in software engineering tasks.

obs-cleanstream
CleanStream is an OBS plugin that utilizes AI to clean live audio streams by removing unwanted words and utterances, such as 'uh's and 'um's, and configurable words like profanity. It uses a neural network (OpenAI Whisper) in real-time to predict speech and eliminate unwanted words. The plugin is still experimental and not recommended for live production use, but it is functional for testing purposes. Users can adjust settings and configure the plugin to enhance audio quality during live streams.

Jlama
Jlama is a modern Java inference engine designed for large language models. It supports various model types such as Gemma, Llama, Mistral, GPT-2, BERT, and more. The tool implements features like Flash Attention, Mixture of Experts, and supports different model quantization formats. Built with Java 21 and utilizing the new Vector API for faster inference, Jlama allows users to add LLM inference directly to their Java applications. The tool includes a CLI for running models, a simple UI for chatting with LLMs, and examples for different model types.

kvpress
This repository implements multiple key-value cache pruning methods and benchmarks using transformers, aiming to simplify the development of new methods for researchers and developers in the field of long-context language models. It provides a set of 'presses' that compress the cache during the pre-filling phase, with each press having a compression ratio attribute. The repository includes various training-free presses, special presses, and supports KV cache quantization. Users can contribute new presses and evaluate the performance of different presses on long-context datasets.

llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.

ai-cli-lib
The ai-cli-lib is a library designed to enhance interactive command-line editing programs by integrating with GPT large language model servers. It allows users to obtain AI help from servers like Anthropic's or OpenAI's, or a llama.cpp server. The library acts as a command line copilot, providing natural language prompts and responses to enhance user experience and productivity. It supports various platforms such as Debian GNU/Linux, macOS, and Cygwin, and requires specific packages for installation and operation. Users can configure the library to activate during shell startup and interact with command-line programs like bash, mysql, psql, gdb, sqlite3, and bc. Additionally, the library provides options for configuring API keys, setting up llama.cpp servers, and ensuring data privacy by managing context settings.

co-llm
Co-LLM (Collaborative Language Models) is a tool for learning to decode collaboratively with multiple language models. It provides a method for data processing, training, and inference using a collaborative approach. The tool involves steps such as formatting/tokenization, scoring logits, initializing Z vector, deferral training, and generating results using multiple models. Co-LLM supports training with different collaboration pairs and provides baseline training scripts for various models. In inference, it uses 'vllm' services to orchestrate models and generate results through API-like services. The tool is inspired by allenai/open-instruct and aims to improve decoding performance through collaborative learning.

llama_index
LlamaIndex is a data framework for building LLM applications. It provides tools for ingesting, structuring, and querying data, as well as integrating with LLMs and other tools. LlamaIndex is designed to be easy to use for both beginner and advanced users, and it provides a comprehensive set of features for building LLM applications.

aire
Aire is a modern Laravel form builder with a focus on expressive and beautiful code. It allows easy configuration of form components using fluent method calls or Blade components. Aire supports customization through config files and custom views, data binding with Eloquent models or arrays, method spoofing, CSRF token injection, server-side and client-side validation, and translations. It is designed to run on Laravel 5.8.28 and higher, with support for PHP 7.1 and higher. Aire is actively maintained and under consideration for additional features like read-only plain text, cross-browser support for custom checkboxes and radio buttons, support for Choices.js or similar libraries, improved file input handling, and better support for content prepending or appending to inputs.
For similar tasks

friendly-stable-audio-tools
This repository is a refactored and updated version of `stable-audio-tools`, an open-source code for audio/music generative models originally by Stability AI. It contains refactored codes for improved readability and usability, useful scripts for evaluating and playing with trained models, and instructions on how to train models such as `Stable Audio 2.0`. The repository does not contain any pretrained checkpoints. Requirements include PyTorch 2.0 or later for Flash Attention support and Python 3.8.10 or later for development. The repository provides guidance on installing, building a training environment using Docker or Singularity, logging with Weights & Biases, training configurations, and stages for VAE-GAN and Diffusion Transformer (DiT) training.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.