generative-models
Generative Models by Stability AI
Stars: 23604
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
README:
July 24, 2024
- We are releasing Stable Video 4D (SV4D), a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
- SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
- To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
- Please check our project page, tech report and video summary for more details.
QUICKSTART : python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d
(after downloading sv4d.safetensors and sv3d_u.safetensors from HuggingFace into checkpoints/
)
To run SV4D on a single input video of 21 frames:
-
Download SV3D models (
sv3d_u.safetensors
andsv3d_p.safetensors
) from here and SV4D model (sv4d.safetensors
) from here tocheckpoints/
-
Run
python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>
-
input_path
: The input video<path/to/video>
can be- a single video file in
gif
ormp4
format, such asassets/test_video1.mp4
, or - a folder containing images of video frames in
.jpg
,.jpeg
, or.png
format, or - a file name pattern matching images of video frames.
- a single video file in
-
num_steps
: default is 20, can increase to 50 for better quality but longer sampling time. -
sv3d_version
: To specify the SV3D model to generate reference multi-views, set--sv3d_version=sv3d_u
for SV3D_u or--sv3d_version=sv3d_p
for SV3D_p. -
elevations_deg
: To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), runpython scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0
-
Background removal : For input videos with plain background, (optionally) use rembg to remove background and crop video frames by setting
--remove_bg=True
. To obtain higher quality outputs on real-world input videos (with noisy background), try segmenting the foreground object using Cliipdrop before running SV4D.
-
March 18, 2024
- We are releasing SV3D, an image-to-video model for novel multi-view synthesis, for research purposes:
- SV3D was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
- SV3D_u: This variant generates orbital videos based on single image inputs without camera conditioning..
- SV3D_p: Extending the capability of SVD3_u, this variant accommodates both single images and orbital views allowing for the creation of 3D video along specified camera paths.
- We extend the streamlit demo
scripts/demo/video_sampling.py
and the standalone python scriptscripts/sampling/simple_video_sample.py
for inference of both models. - Please check our project page, tech report and video summary for more details.
To run SV3D_u on a single image:
- Download
sv3d_u.safetensors
from https://huggingface.co/stabilityai/sv3d tocheckpoints/sv3d_u.safetensors
- Run
python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u
To run SV3D_p on a single image:
- Download
sv3d_p.safetensors
from https://huggingface.co/stabilityai/sv3d tocheckpoints/sv3d_p.safetensors
- Generate static orbit at a specified elevation eg. 10.0 :
python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg 10.0
- Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to
elevations_deg
([-90, 90]), and 21 azimuths (in degrees) toazimuths_deg
[0, 360] in sorted order from 0 to 360. For example:python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]
To run SVD or SV3D on a streamlit server:
streamlit run scripts/demo/video_sampling.py
November 30, 2023
- Following the launch of SDXL-Turbo, we are releasing SD-Turbo.
November 28, 2023
-
We are releasing SDXL-Turbo, a lightning fast text-to image model. Alongside the model, we release a technical report
- Usage:
- Follow the installation instructions or update the existing environment with
pip install streamlit-keyup
. - Download the weights and place them in the
checkpoints/
directory. - Run
streamlit run scripts/demo/turbo.py
.
- Follow the installation instructions or update the existing environment with
- Usage:
November 21, 2023
-
We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
-
SVD: This model was trained to generate 14
frames at resolution 576x1024 given a context frame of the same size.
We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware
deflickering decoder
. -
SVD-XT: Same architecture as
SVD
but finetuned for 25 frame generation. - You can run the community-build gradio demo locally by running
python -m scripts.demo.gradio_app
. - We provide a streamlit demo
scripts/demo/video_sampling.py
and a standalone python scriptscripts/sampling/simple_video_sample.py
for inference of both models. - Alongside the model, we release a technical report.
-
SVD: This model was trained to generate 14
frames at resolution 576x1024 given a context frame of the same size.
We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware
July 26, 2023
- We are releasing two new open models with a
permissive
CreativeML Open RAIL++-M
license (see Inference for file hashes):-
SDXL-base-1.0: An improved version
over
SDXL-base-0.9
. -
SDXL-refiner-1.0: An improved version
over
SDXL-refiner-0.9
.
-
SDXL-base-1.0: An improved version
over
July 4, 2023
- A technical report on SDXL is now available here.
June 22, 2023
- We are releasing two new diffusion models for research purposes:
-
SDXL-base-0.9
: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model. -
SDXL-refiner-0.9
: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
-
If you would like to access these models for your research, please apply using one of the following links: SDXL-0.9-Base model, and SDXL-0.9-Refiner. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access. We plan to do a full release soon (July).
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by
calling instantiate_from_config()
on objects defined in yaml configs. See configs/
for many examples.
For training, we use PyTorch Lightning, but it should be easy to use other
training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion
,
now DiffusionEngine
) has been cleaned up:
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial
conditionings, and all combinations thereof) in a single class:
GeneralConditioner
, seesgm/modules/encoders/modules.py
. - We separate guiders (such as classifier-free guidance, see
sgm/modules/diffusionmodules/guiders.py
) from the samplers (sgm/modules/diffusionmodules/sampling.py
), and the samplers are independent of the model. - We adopt the "denoiser framework" for both training and inference (most notable
change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
see
sgm/modules/diffusionmodules/denoiser.py
. - The following features are now independent: weighting of the diffusion loss
function (
sgm/modules/diffusionmodules/denoiser_weighting.py
), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py
), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py
).
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
see
- Autoencoding models have also been cleaned up.
git clone https://github.com/Stability-AI/generative-models.git
cd generative-models
This is assuming you have navigated to the generative-models
root after cloning it.
NOTE: This is tested under python3.10
. For other python versions, you might encounter version conflicts.
PyTorch 2.0
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
This repository uses PEP 517 compliant packaging using Hatch.
To build a distributable wheel, install hatch
and run hatch build
(specifying -t wheel
will skip building a sdist, which is not necessary).
pip install hatch
hatch build -t wheel
You will find the built package in dist/
. You can install the wheel with pip install dist/*.whl
.
Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.
We provide a streamlit demo for text-to-image and image-to-image sampling
in scripts/demo/sampling.py
.
We provide file hashes for the complete file as well as for only the saved tensors in the file (
see Model Spec for a script to evaluate that).
The following models are currently supported:
-
SDXL-base-1.0
File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7
-
SDXL-refiner-1.0
File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81
- SDXL-base-0.9
- SDXL-refiner-0.9
- SD-2.1-512
- SD-2.1-768
Weights for SDXL:
SDXL-1.0:
The weights of SDXL-1.0 are available (subject to
a CreativeML Open RAIL++-M
license) here:
- base model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/
- refiner model: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/
SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.
After obtaining the weights, place them into checkpoints/
.
Next, start the demo using
streamlit run scripts/demo/sampling.py --server.port <your_port>
Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.
To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:
python -m venv .detect
source .detect/bin/activate
pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark
To run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual environment beforehand, e.g. source .pt1/bin/activate
):
# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*
We are providing example training configs in configs/example_training
. To launch a training, run
python main.py --base configs/<config1.yaml> configs/<config2.yaml>
where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run
python main.py --base configs/example_training/toy/mnist_cond.yaml
NOTE 1: Using the non-toy-dataset
configs configs/example_training/imagenet-f8_cond.yaml
, configs/example_training/txt2img-clipl.yaml
and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml
for training will require edits depending on the
used dataset (which is expected to stored in tar-file in
the webdataset-format). To find the parts which have to be adapted, search
for comments containing USER:
in the respective config.
NOTE 2: This repository supports both pytorch1.13
and pytorch2
for training generative models. However for
autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml
,
only pytorch1.13
is supported.
NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml
) requires
retrieving the checkpoint from Hugging Face and replacing
the CKPT_PATH
placeholder in this line. The same is to be done
for the provided text-to-image configs.
The GeneralConditioner
is configured through the conditioner_config
. Its only attribute is emb_models
, a list of
different embedders (all inherited from AbstractEmbModel
) that are used to condition the generative model.
All embedders should define whether or not they are trainable (is_trainable
, default False
), a classifier-free
guidance dropout rate is used (ucg_rate
, default 0
), and an input key (input_key
), for example, txt
for
text-conditioning or cls
for class-conditioning.
When computing conditionings, the embedder will get batch[input_key]
as input.
We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated
appropriately.
Note that the order of the embedders in the conditioner_config
is important.
The neural network is set through the network_config
. This used to be called unet_config
, which is not general
enough as we plan to experiment with transformer-based diffusion backbones.
The loss is configured through loss_config
. For standard diffusion model training, you will have to
set sigma_sampler_config
.
As discussed above, the sampler is independent of the model. In the sampler_config
, we set the type of numerical
solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free
guidance.
For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,
example = {"jpg": x, # this is a tensor -1...1 chw
"txt": "a beautiful image"}
where we expect images in -1...1, channel-first format.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for generative-models
Similar Open Source Tools
generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
rtdl-num-embeddings
This repository provides the official implementation of the paper 'On Embeddings for Numerical Features in Tabular Deep Learning'. It focuses on transforming scalar continuous features into vectors before integrating them into the main backbone of tabular neural networks, showcasing improved performance. The embeddings for continuous features are shown to enhance the performance of tabular DL models and are applicable to various conventional backbones, offering efficiency comparable to Transformer-based models. The repository includes Python packages for practical usage, exploration of metrics and hyperparameters, and reproducing reported results for different algorithms and datasets.
lantern
Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and handle vector search operations efficiently. It introduces a new index type called 'lantern_hnsw' for vector columns, which speeds up 'ORDER BY ... LIMIT' queries. Lantern utilizes the state-of-the-art HNSW implementation called usearch. Users can easily install Lantern using Docker, Homebrew, or precompiled binaries. The tool supports various distance functions, index construction parameters, and operator classes for efficient querying. Lantern offers features like embedding generation, interoperability with pgvector, parallel index creation, and external index graph generation. It aims to provide superior performance metrics compared to other similar tools and has a roadmap for future enhancements such as cloud-hosted version, hardware-accelerated distance metrics, industry-specific application templates, and support for version control and A/B testing of embeddings.
mflux
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.
lhotse
Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.
Agentless
Agentless is an open-source tool designed for automatically solving software development problems. It follows a two-phase process of localization and repair to identify faults in specific files, classes, and functions, and generate candidate patches for fixing issues. The tool is aimed at simplifying the software development process by automating issue resolution and patch generation.
humanoid-gym
Humanoid-Gym is a reinforcement learning framework designed for training locomotion skills for humanoid robots, focusing on zero-shot transfer from simulation to real-world environments. It integrates a sim-to-sim framework from Isaac Gym to Mujoco for verifying trained policies in different physical simulations. The codebase is verified with RobotEra's XBot-S and XBot-L humanoid robots. It offers comprehensive training guidelines, step-by-step configuration instructions, and execution scripts for easy deployment. The sim2sim support allows transferring trained policies to accurate simulated environments. The upcoming features include Denoising World Model Learning and Dexterous Hand Manipulation. Installation and usage guides are provided along with examples for training PPO policies and sim-to-sim transformations. The code structure includes environment and configuration files, with instructions on adding new environments. Troubleshooting tips are provided for common issues, along with a citation and acknowledgment section.
raft
RAFT (Reusable Accelerated Functions and Tools) is a C++ header-only template library with an optional shared library that contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
ice-score
ICE-Score is a tool designed to instruct large language models to evaluate code. It provides a minimum viable product (MVP) for evaluating generated code snippets using inputs such as problem, output, task, aspect, and model. Users can also evaluate with reference code and enable zero-shot chain-of-thought evaluation. The tool is built on codegen-metrics and code-bert-score repositories and includes datasets like CoNaLa and HumanEval. ICE-Score has been accepted to EACL 2024.
web-llm
WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally, with functionalities including json-mode, function-calling, streaming, etc. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
garak
Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.
LLMUnity
LLM for Unity enables seamless integration of Large Language Models (LLMs) within the Unity engine, allowing users to create intelligent characters for immersive player interactions. The tool supports major LLM models, runs locally without internet access, offers fast inference on CPU and GPU, and is easy to set up with a single line of code. It is free for both personal and commercial use, tested on Unity 2021 LTS, 2022 LTS, and 2023. Users can build multiple AI characters efficiently, use remote servers for processing, and customize model settings for text generation.
jina
Jina is a tool that allows users to build multimodal AI services and pipelines using cloud-native technologies. It provides a Pythonic experience for serving ML models and transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Users can build and serve models for any data type and deep learning framework, design high-performance services with easy scaling, serve LLM models while streaming their output, integrate with Docker containers via Executor Hub, and host on CPU/GPU using Jina AI Cloud. Jina also offers advanced orchestration and scaling capabilities, a smooth transition to the cloud, and easy scalability and concurrency features for applications. Users can deploy to their own cloud or system with Kubernetes and Docker Compose integration, and even deploy to JCloud for autoscaling and monitoring.
storm
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage. **Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
llm-compressor
llm-compressor is an easy-to-use library for optimizing models for deployment with vllm. It provides a comprehensive set of quantization algorithms, seamless integration with Hugging Face models and repositories, and supports mixed precision, activation quantization, and sparsity. Supported algorithms include PTQ, GPTQ, SmoothQuant, and SparseGPT. Installation can be done via git clone and local pip install. Compression can be easily applied by selecting an algorithm and calling the oneshot API. The library also offers end-to-end examples for model compression. Contributions to the code, examples, integrations, and documentation are appreciated.
For similar tasks
generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.