generative-models

Generative Models by Stability AI

Stars: 23604

Visit

Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.

README:

Generative Models by Stability AI

News

July 24, 2024

We are releasing Stable Video 4D (SV4D), a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
- SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
- To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
- Please check our project page, tech report and video summary for more details.

QUICKSTART : python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d (after downloading sv4d.safetensors and sv3d_u.safetensors from HuggingFace into checkpoints/)

To run SV4D on a single input video of 21 frames:

Download SV3D models (sv3d_u.safetensors and sv3d_p.safetensors) from here and SV4D model (sv4d.safetensors) from here to checkpoints/
Run python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>
- input_path : The input video <path/to/video> can be
  - a single video file in gif or mp4 format, such as assets/test_video1.mp4, or
  - a folder containing images of video frames in .jpg, .jpeg, or .png format, or
  - a file name pattern matching images of video frames.
- num_steps : default is 20, can increase to 50 for better quality but longer sampling time.
- sv3d_version : To specify the SV3D model to generate reference multi-views, set --sv3d_version=sv3d_u for SV3D_u or --sv3d_version=sv3d_p for SV3D_p.
- elevations_deg : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run python scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0
- Background removal : For input videos with plain background, (optionally) use rembg to remove background and crop video frames by setting --remove_bg=True. To obtain higher quality outputs on real-world input videos (with noisy background), try segmenting the foreground object using Cliipdrop before running SV4D.

March 18, 2024

We are releasing SV3D, an image-to-video model for novel multi-view synthesis, for research purposes:
- SV3D was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
- SV3D_u: This variant generates orbital videos based on single image inputs without camera conditioning..
- SV3D_p: Extending the capability of SVD3_u, this variant accommodates both single images and orbital views allowing for the creation of 3D video along specified camera paths.
- We extend the streamlit demo scripts/demo/video_sampling.py and the standalone python script scripts/sampling/simple_video_sample.py for inference of both models.
- Please check our project page, tech report and video summary for more details.

To run SV3D_u on a single image:

Download sv3d_u.safetensors from https://huggingface.co/stabilityai/sv3d to checkpoints/sv3d_u.safetensors
Run python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u

To run SV3D_p on a single image:

Download sv3d_p.safetensors from https://huggingface.co/stabilityai/sv3d to checkpoints/sv3d_p.safetensors

Generate static orbit at a specified elevation eg. 10.0 : python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg 10.0
Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to elevations_deg ([-90, 90]), and 21 azimuths (in degrees) to azimuths_deg [0, 360] in sorted order from 0 to 360. For example: python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]

To run SVD or SV3D on a streamlit server: streamlit run scripts/demo/video_sampling.py

November 30, 2023

Following the launch of SDXL-Turbo, we are releasing SD-Turbo.

November 28, 2023

We are releasing SDXL-Turbo, a lightning fast text-to image model. Alongside the model, we release a technical report
- Usage:
  - Follow the installation instructions or update the existing environment with pip install streamlit-keyup.
  - Download the weights and place them in the checkpoints/ directory.
  - Run streamlit run scripts/demo/turbo.py.

November 21, 2023

We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
- SVD: This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware deflickering decoder.
- SVD-XT: Same architecture as SVD but finetuned for 25 frame generation.
- You can run the community-build gradio demo locally by running python -m scripts.demo.gradio_app.
- We provide a streamlit demo scripts/demo/video_sampling.py and a standalone python script scripts/sampling/simple_video_sample.py for inference of both models.
- Alongside the model, we release a technical report.

July 26, 2023

We are releasing two new open models with a permissive CreativeML Open RAIL++-M license (see Inference for file hashes):
- SDXL-base-1.0: An improved version over SDXL-base-0.9.
- SDXL-refiner-1.0: An improved version over SDXL-refiner-0.9.

July 4, 2023

A technical report on SDXL is now available here.

June 22, 2023

We are releasing two new diffusion models for research purposes:
- SDXL-base-0.9: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model.
- SDXL-refiner-0.9: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.

If you would like to access these models for your research, please apply using one of the following links: SDXL-0.9-Base model, and SDXL-0.9-Refiner. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access. We plan to do a full release soon (July).

The codebase

General Philosophy

Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.

Changelog from the old `ldm` codebase

For training, we use PyTorch Lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion, now DiffusionEngine) has been cleaned up:

No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: GeneralConditioner, see sgm/modules/encoders/modules.py.
We separate guiders (such as classifier-free guidance, see sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model.
We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see sgm/modules/diffusionmodules/denoiser.py.
- The following features are now independent: weighting of the diffusion loss function (sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
Autoencoding models have also been cleaned up.

Installation:

1. Clone the repo

git clone https://github.com/Stability-AI/generative-models.git
cd generative-models

2. Setting up the virtualenv

This is assuming you have navigated to the generative-models root after cloning it.

NOTE: This is tested under python3.10. For other python versions, you might encounter version conflicts.

PyTorch 2.0

# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt

3. Install `sgm`

pip3 install .

4. Install `sdata` for training

pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Packaging

This repository uses PEP 517 compliant packaging using Hatch.

To build a distributable wheel, install hatch and run hatch build (specifying -t wheel will skip building a sdist, which is not necessary).

pip install hatch
hatch build -t wheel

You will find the built package in dist/. You can install the wheel with pip install dist/*.whl.

Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.

Inference

We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. We provide file hashes for the complete file as well as for only the saved tensors in the file ( see Model Spec for a script to evaluate that). The following models are currently supported:

SDXL-base-1.0

File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7

SDXL-refiner-1.0

File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f
Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81

SDXL-base-0.9
SDXL-refiner-0.9
SD-2.1-512
SD-2.1-768

Weights for SDXL:

SDXL-1.0: The weights of SDXL-1.0 are available (subject to a CreativeML Open RAIL++-M license) here:

base model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/
refiner model: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/

SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.

After obtaining the weights, place them into checkpoints/. Next, start the demo using

streamlit run scripts/demo/sampling.py --server.port <your_port>

Invisible Watermark Detection

Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:

python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

To run the script you need to have a working installation as above. The script is then useable in the following ways (don't forget to activate your virtual environment beforehand, e.g. source .pt1/bin/activate):

# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*

Training:

We are providing example training configs in configs/example_training. To launch a training, run

python main.py --base configs/<config1.yaml> configs/<config2.yaml>

where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run

python main.py --base configs/example_training/toy/mnist_cond.yaml

NOTE 1: Using the non-toy-dataset configs configs/example_training/imagenet-f8_cond.yaml, configs/example_training/txt2img-clipl.yaml and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml for training will require edits depending on the used dataset (which is expected to stored in tar-file in the webdataset-format). To find the parts which have to be adapted, search for comments containing USER: in the respective config.

NOTE 2: This repository supports both pytorch1.13 and pytorch2for training generative models. However for autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml, only pytorch1.13 is supported.

NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml) requires retrieving the checkpoint from Hugging Face and replacing the CKPT_PATH placeholder in this line. The same is to be done for the provided text-to-image configs.

Building New Diffusion Models

Conditioner

The GeneralConditioner is configured through the conditioner_config. Its only attribute is emb_models, a list of different embedders (all inherited from AbstractEmbModel) that are used to condition the generative model. All embedders should define whether or not they are trainable (is_trainable, default False), a classifier-free guidance dropout rate is used (ucg_rate, default 0), and an input key (input_key), for example, txt for text-conditioning or cls for class-conditioning. When computing conditionings, the embedder will get batch[input_key] as input. We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated appropriately. Note that the order of the embedders in the conditioner_config is important.

Network

The neural network is set through the network_config. This used to be called unet_config, which is not general enough as we plan to experiment with transformer-based diffusion backbones.

Loss

The loss is configured through loss_config. For standard diffusion model training, you will have to set sigma_sampler_config.

Sampler config

As discussed above, the sampler is independent of the model. In the sampler_config, we set the type of numerical solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free guidance.

Dataset Handling

For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,

example = {"jpg": x,  # this is a tensor -1...1 chw
           "txt": "a beautiful image"}

where we expect images in -1...1, channel-first format.

For Tasks:

Click tags to check more tools for each tasks

generate novel-view videos train generative models conduct research experiments perform image-to-image generation explore text-to-image synthesis

For Jobs:

research scientist machine learning engineer computer vision researcher data scientist ai researcher

Alternative AI tools for generative-models

Similar Open Source Tools

generative-models

github

: 23.6k

rtdl-num-embeddings

This repository provides the official implementation of the paper 'On Embeddings for Numerical Features in Tabular Deep Learning'. It focuses on transforming scalar continuous features into vectors before integrating them into the main backbone of tabular neural networks, showcasing improved performance. The embeddings for continuous features are shown to enhance the performance of tabular DL models and are applicable to various conventional backbones, offering efficiency comparable to Transformer-based models. The repository includes Python packages for practical usage, exploration of metrics and hyperparameters, and reproducing reported results for different algorithms and datasets.

github

: 287

lantern

Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and handle vector search operations efficiently. It introduces a new index type called 'lantern_hnsw' for vector columns, which speeds up 'ORDER BY ... LIMIT' queries. Lantern utilizes the state-of-the-art HNSW implementation called usearch. Users can easily install Lantern using Docker, Homebrew, or precompiled binaries. The tool supports various distance functions, index construction parameters, and operator classes for efficient querying. Lantern offers features like embedding generation, interoperability with pgvector, parallel index creation, and external index graph generation. It aims to provide superior performance metrics compared to other similar tools and has a roadmap for future enhancements such as cloud-hosted version, hardware-accelerated distance metrics, industry-specific application templates, and support for version control and A/B testing of embeddings.

github

: 756

mflux

MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.

github

: 1.3k

shellChatGPT

ShellChatGPT is a shell wrapper for OpenAI's ChatGPT, DALL-E, Whisper, and TTS, featuring integration with LocalAI, Ollama, Gemini, Mistral, Groq, and GitHub Models. It provides text and chat completions, vision, reasoning, and audio models, voice-in and voice-out chatting mode, text editor interface, markdown rendering support, session management, instruction prompt manager, integration with various service providers, command line completion, file picker dialogs, color scheme personalization, stdin and text file input support, and compatibility with Linux, FreeBSD, MacOS, and Termux for a responsive experience.

github

: 71

HuggingFaceGuidedTourForMac

HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.

github

: 79

LeanCopilot

Lean Copilot is a tool that enables the use of large language models (LLMs) in Lean for proof automation. It provides features such as suggesting tactics/premises, searching for proofs, and running inference of LLMs. Users can utilize built-in models from LeanDojo or bring their own models to run locally or on the cloud. The tool supports platforms like Linux, macOS, and Windows WSL, with optional CUDA and cuDNN for GPU acceleration. Advanced users can customize behavior using Tactic APIs and Model APIs. Lean Copilot also allows users to bring their own models through ExternalGenerator or ExternalEncoder. The tool comes with caveats such as occasional crashes and issues with premise selection and proof search. Users can get in touch through GitHub Discussions for questions, bug reports, feature requests, and suggestions. The tool is designed to enhance theorem proving in Lean using LLMs.

github

: 1.0k

lhotse

Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.

github

: 999

mark

Mark is a CLI tool that allows users to interact with large language models (LLMs) using Markdown format. It enables users to seamlessly integrate GPT responses into Markdown files, supports image recognition, scraping of local and remote links, and image generation. Mark focuses on using Markdown as both a prompt and response medium for LLMs, offering a unique and flexible way to interact with language models for various use cases in development and documentation processes.

github

: 55

audiobook-creator

Audiobook Creator is an open-source tool that converts books in various text formats into fully voiced audiobooks with intelligent character voice attribution. It utilizes NLP, LLMs, and TTS technologies to provide an engaging audiobook experience. The project includes components for text cleaning and formatting, character identification, and audiobook generation. Key features include a Gradio UI app, M4B audiobook creation, multi-format support, Docker compatibility, customizable narration, progress tracking, and open-source licensing.

github

: 211

Agentless

Agentless is an open-source tool designed for automatically solving software development problems. It follows a two-phase process of localization and repair to identify faults in specific files, classes, and functions, and generate candidate patches for fixing issues. The tool is aimed at simplifying the software development process by automating issue resolution and patch generation.

github

: 301

humanoid-gym

Humanoid-Gym is a reinforcement learning framework designed for training locomotion skills for humanoid robots, focusing on zero-shot transfer from simulation to real-world environments. It integrates a sim-to-sim framework from Isaac Gym to Mujoco for verifying trained policies in different physical simulations. The codebase is verified with RobotEra's XBot-S and XBot-L humanoid robots. It offers comprehensive training guidelines, step-by-step configuration instructions, and execution scripts for easy deployment. The sim2sim support allows transferring trained policies to accurate simulated environments. The upcoming features include Denoising World Model Learning and Dexterous Hand Manipulation. Installation and usage guides are provided along with examples for training PPO policies and sim-to-sim transformations. The code structure includes environment and configuration files, with instructions on adding new environments. Troubleshooting tips are provided for common issues, along with a citation and acknowledgment section.

github

: 388

ice-score

ICE-Score is a tool designed to instruct large language models to evaluate code. It provides a minimum viable product (MVP) for evaluating generated code snippets using inputs such as problem, output, task, aspect, and model. Users can also evaluate with reference code and enable zero-shot chain-of-thought evaluation. The tool is built on codegen-metrics and code-bert-score repositories and includes datasets like CoNaLa and HumanEval. ICE-Score has been accepted to EACL 2024.

github

: 62

sage

Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.

github

: 705

storm

STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage. **Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**

github

: 17.0k

KrillinAI

KrillinAI is a video subtitle translation and dubbing tool based on AI large models, featuring speech recognition, intelligent sentence segmentation, professional translation, and one-click deployment of the entire process. It provides a one-stop workflow from video downloading to the final product, empowering cross-language cultural communication with AI. The tool supports multiple languages for input and translation, integrates features like automatic dependency installation, video downloading from platforms like YouTube and Bilibili, high-speed subtitle recognition, intelligent subtitle segmentation and alignment, custom vocabulary replacement, professional-level translation engine, and diverse external service selection for speech and large model services.

github

: 655

For similar tasks

generative-models

github

: 23.6k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

generative-models

README:

Generative Models by Stability AI

News

The codebase

General Philosophy

Changelog from the old ldm codebase

Installation:

1. Clone the repo

2. Setting up the virtualenv

3. Install sgm

4. Install sdata for training

Packaging

Inference

Invisible Watermark Detection

Training:

Building New Diffusion Models

Conditioner

Network

Loss

Sampler config

Dataset Handling

For Tasks:

For Jobs:

Alternative AI tools for generative-models

Similar Open Source Tools

generative-models

rtdl-num-embeddings

lantern

mflux

shellChatGPT

HuggingFaceGuidedTourForMac

LeanCopilot

lhotse

mark

audiobook-creator

Agentless

humanoid-gym

ice-score

sage

storm

KrillinAI

For similar tasks

generative-models

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick

Changelog from the old `ldm` codebase

3. Install `sgm`

4. Install `sdata` for training