generative-models
Generative Models by Stability AI
Stars: 23604
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
README:
July 24, 2024
- We are releasing Stable Video 4D (SV4D), a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
- SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
- To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
- Please check our project page, tech report and video summary for more details.
QUICKSTART : python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d (after downloading sv4d.safetensors and sv3d_u.safetensors from HuggingFace into checkpoints/)
To run SV4D on a single input video of 21 frames:
-
Download SV3D models (
sv3d_u.safetensorsandsv3d_p.safetensors) from here and SV4D model (sv4d.safetensors) from here tocheckpoints/ -
Run
python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>-
input_path: The input video<path/to/video>can be- a single video file in
giformp4format, such asassets/test_video1.mp4, or - a folder containing images of video frames in
.jpg,.jpeg, or.pngformat, or - a file name pattern matching images of video frames.
- a single video file in
-
num_steps: default is 20, can increase to 50 for better quality but longer sampling time. -
sv3d_version: To specify the SV3D model to generate reference multi-views, set--sv3d_version=sv3d_ufor SV3D_u or--sv3d_version=sv3d_pfor SV3D_p. -
elevations_deg: To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), runpython scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0 -
Background removal : For input videos with plain background, (optionally) use rembg to remove background and crop video frames by setting
--remove_bg=True. To obtain higher quality outputs on real-world input videos (with noisy background), try segmenting the foreground object using Cliipdrop before running SV4D.
-
March 18, 2024
- We are releasing SV3D, an image-to-video model for novel multi-view synthesis, for research purposes:
- SV3D was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
- SV3D_u: This variant generates orbital videos based on single image inputs without camera conditioning..
- SV3D_p: Extending the capability of SVD3_u, this variant accommodates both single images and orbital views allowing for the creation of 3D video along specified camera paths.
- We extend the streamlit demo
scripts/demo/video_sampling.pyand the standalone python scriptscripts/sampling/simple_video_sample.pyfor inference of both models. - Please check our project page, tech report and video summary for more details.
To run SV3D_u on a single image:
- Download
sv3d_u.safetensorsfrom https://huggingface.co/stabilityai/sv3d tocheckpoints/sv3d_u.safetensors - Run
python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u
To run SV3D_p on a single image:
- Download
sv3d_p.safetensorsfrom https://huggingface.co/stabilityai/sv3d tocheckpoints/sv3d_p.safetensors
- Generate static orbit at a specified elevation eg. 10.0 :
python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg 10.0 - Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to
elevations_deg([-90, 90]), and 21 azimuths (in degrees) toazimuths_deg[0, 360] in sorted order from 0 to 360. For example:python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]
To run SVD or SV3D on a streamlit server:
streamlit run scripts/demo/video_sampling.py
November 30, 2023
- Following the launch of SDXL-Turbo, we are releasing SD-Turbo.
November 28, 2023
-
We are releasing SDXL-Turbo, a lightning fast text-to image model. Alongside the model, we release a technical report
- Usage:
- Follow the installation instructions or update the existing environment with
pip install streamlit-keyup. - Download the weights and place them in the
checkpoints/directory. - Run
streamlit run scripts/demo/turbo.py.
- Follow the installation instructions or update the existing environment with
- Usage:
November 21, 2023
-
We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
-
SVD: This model was trained to generate 14
frames at resolution 576x1024 given a context frame of the same size.
We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware
deflickering decoder. -
SVD-XT: Same architecture as
SVDbut finetuned for 25 frame generation. - You can run the community-build gradio demo locally by running
python -m scripts.demo.gradio_app. - We provide a streamlit demo
scripts/demo/video_sampling.pyand a standalone python scriptscripts/sampling/simple_video_sample.pyfor inference of both models. - Alongside the model, we release a technical report.
-
SVD: This model was trained to generate 14
frames at resolution 576x1024 given a context frame of the same size.
We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware
July 26, 2023
- We are releasing two new open models with a
permissive
CreativeML Open RAIL++-Mlicense (see Inference for file hashes):-
SDXL-base-1.0: An improved version
over
SDXL-base-0.9. -
SDXL-refiner-1.0: An improved version
over
SDXL-refiner-0.9.
-
SDXL-base-1.0: An improved version
over
July 4, 2023
- A technical report on SDXL is now available here.
June 22, 2023
- We are releasing two new diffusion models for research purposes:
-
SDXL-base-0.9: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model. -
SDXL-refiner-0.9: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.
-
If you would like to access these models for your research, please apply using one of the following links: SDXL-0.9-Base model, and SDXL-0.9-Refiner. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access. We plan to do a full release soon (July).
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by
calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.
For training, we use PyTorch Lightning, but it should be easy to use other
training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion,
now DiffusionEngine) has been cleaned up:
- No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial
conditionings, and all combinations thereof) in a single class:
GeneralConditioner, seesgm/modules/encoders/modules.py. - We separate guiders (such as classifier-free guidance, see
sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model. - We adopt the "denoiser framework" for both training and inference (most notable
change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
see
sgm/modules/diffusionmodules/denoiser.py. - The following features are now independent: weighting of the diffusion loss
function (
sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers);
see
- Autoencoding models have also been cleaned up.
git clone https://github.com/Stability-AI/generative-models.git
cd generative-modelsThis is assuming you have navigated to the generative-models root after cloning it.
NOTE: This is tested under python3.10. For other python versions, you might encounter version conflicts.
PyTorch 2.0
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txtpip3 install .pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdataThis repository uses PEP 517 compliant packaging using Hatch.
To build a distributable wheel, install hatch and run hatch build
(specifying -t wheel will skip building a sdist, which is not necessary).
pip install hatch
hatch build -t wheel
You will find the built package in dist/. You can install the wheel with pip install dist/*.whl.
Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.
We provide a streamlit demo for text-to-image and image-to-image sampling
in scripts/demo/sampling.py.
We provide file hashes for the complete file as well as for only the saved tensors in the file (
see Model Spec for a script to evaluate that).
The following models are currently supported:
-
SDXL-base-1.0
File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7 -
SDXL-refiner-1.0
File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81 - SDXL-base-0.9
- SDXL-refiner-0.9
- SD-2.1-512
- SD-2.1-768
Weights for SDXL:
SDXL-1.0:
The weights of SDXL-1.0 are available (subject to
a CreativeML Open RAIL++-M license) here:
- base model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/
- refiner model: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/
SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.
After obtaining the weights, place them into checkpoints/.
Next, start the demo using
streamlit run scripts/demo/sampling.py --server.port <your_port>
Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.
To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:
python -m venv .detect
source .detect/bin/activate
pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermarkTo run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual environment beforehand, e.g. source .pt1/bin/activate):
# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*We are providing example training configs in configs/example_training. To launch a training, run
python main.py --base configs/<config1.yaml> configs/<config2.yaml>
where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run
python main.py --base configs/example_training/toy/mnist_cond.yamlNOTE 1: Using the non-toy-dataset
configs configs/example_training/imagenet-f8_cond.yaml, configs/example_training/txt2img-clipl.yaml
and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml for training will require edits depending on the
used dataset (which is expected to stored in tar-file in
the webdataset-format). To find the parts which have to be adapted, search
for comments containing USER: in the respective config.
NOTE 2: This repository supports both pytorch1.13 and pytorch2for training generative models. However for
autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml,
only pytorch1.13 is supported.
NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml) requires
retrieving the checkpoint from Hugging Face and replacing
the CKPT_PATH placeholder in this line. The same is to be done
for the provided text-to-image configs.
The GeneralConditioner is configured through the conditioner_config. Its only attribute is emb_models, a list of
different embedders (all inherited from AbstractEmbModel) that are used to condition the generative model.
All embedders should define whether or not they are trainable (is_trainable, default False), a classifier-free
guidance dropout rate is used (ucg_rate, default 0), and an input key (input_key), for example, txt for
text-conditioning or cls for class-conditioning.
When computing conditionings, the embedder will get batch[input_key] as input.
We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated
appropriately.
Note that the order of the embedders in the conditioner_config is important.
The neural network is set through the network_config. This used to be called unet_config, which is not general
enough as we plan to experiment with transformer-based diffusion backbones.
The loss is configured through loss_config. For standard diffusion model training, you will have to
set sigma_sampler_config.
As discussed above, the sampler is independent of the model. In the sampler_config, we set the type of numerical
solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free
guidance.
For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,
example = {"jpg": x, # this is a tensor -1...1 chw
"txt": "a beautiful image"}where we expect images in -1...1, channel-first format.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for generative-models
Similar Open Source Tools
generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
lhotse
Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.
mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic
datadreamer
DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.
rtdl-num-embeddings
This repository provides the official implementation of the paper 'On Embeddings for Numerical Features in Tabular Deep Learning'. It focuses on transforming scalar continuous features into vectors before integrating them into the main backbone of tabular neural networks, showcasing improved performance. The embeddings for continuous features are shown to enhance the performance of tabular DL models and are applicable to various conventional backbones, offering efficiency comparable to Transformer-based models. The repository includes Python packages for practical usage, exploration of metrics and hyperparameters, and reproducing reported results for different algorithms and datasets.
LLMBox
LLMBox is a comprehensive library designed for implementing Large Language Models (LLMs) with a focus on a unified training pipeline and comprehensive model evaluation. It serves as a one-stop solution for training and utilizing LLMs, offering flexibility and efficiency in both training and utilization stages. The library supports diverse training strategies, comprehensive datasets, tokenizer vocabulary merging, data construction strategies, parameter efficient fine-tuning, and efficient training methods. For utilization, LLMBox provides comprehensive evaluation on various datasets, in-context learning strategies, chain-of-thought evaluation, evaluation methods, prefix caching for faster inference, support for specific LLM models like vLLM and Flash Attention, and quantization options. The tool is suitable for researchers and developers working with LLMs for natural language processing tasks.
physical-AI-interpretability
Physical AI Interpretability is a toolkit for transformer-based Physical AI and robotics models, providing tools for attention mapping, feature extraction, and out-of-distribution detection. It includes methods for post-hoc attention analysis, applying Dictionary Learning into robotics, and training sparse autoencoders. The toolkit aims to enhance interpretability and understanding of AI models in physical environments.
torchtitan
Torchtitan is a PyTorch native platform designed for rapid experimentation and large-scale training of generative AI models. It provides a flexible foundation for developers to build upon with extension points for creating custom extensions. The tool showcases PyTorch's latest distributed training features and supports pretraining Llama 3.1 LLMs of various sizes. It offers key features like multi-dimensional parallelisms, FSDP2 with per-parameter sharding, Tensor Parallel, Pipeline Parallel, and more. Users can contribute to the tool through the experiments folder and core contributions guidelines. Installation can be done from source, nightly builds, or stable releases. The tool also supports training Llama 3.1 models and provides guidance on starting a training run and multi-node training. Citation information is available for referencing the tool in academic work, and the source code is under a BSD 3 license.
Pixel-Reasoner
Pixel Reasoner is a framework that introduces reasoning in the pixel-space for Vision-Language Models (VLMs), enabling them to directly inspect, interrogate, and infer from visual evidences. This enhances reasoning fidelity for visual tasks by equipping VLMs with visual reasoning operations like zoom-in and select-frame. The framework addresses challenges like model's imbalanced competence and reluctance to adopt pixel-space operations through a two-phase training approach involving instruction tuning and curiosity-driven reinforcement learning. With these visual operations, VLMs can interact with complex visual inputs such as images or videos to gather necessary information, leading to improved performance across visual reasoning benchmarks.
LayerSkip
LayerSkip is an implementation enabling early exit inference and self-speculative decoding. It provides a code base for running models trained using the LayerSkip recipe, offering speedup through self-speculative decoding. The tool integrates with Hugging Face transformers and provides checkpoints for various LLMs. Users can generate tokens, benchmark on datasets, evaluate tasks, and sweep over hyperparameters to optimize inference speed. The tool also includes correctness verification scripts and Docker setup instructions. Additionally, other implementations like gpt-fast and Native HuggingFace are available. Training implementation is a work-in-progress, and contributions are welcome under the CC BY-NC license.
kvpress
This repository implements multiple key-value cache pruning methods and benchmarks using transformers, aiming to simplify the development of new methods for researchers and developers in the field of long-context language models. It provides a set of 'presses' that compress the cache during the pre-filling phase, with each press having a compression ratio attribute. The repository includes various training-free presses, special presses, and supports KV cache quantization. Users can contribute new presses and evaluate the performance of different presses on long-context datasets.
sieves
sieves is a library for zero- and few-shot NLP tasks with structured generation, enabling rapid prototyping of NLP applications without the need for training. It simplifies NLP prototyping by bundling capabilities into a single library, providing zero- and few-shot model support, a unified interface for structured generation, built-in tasks for common NLP operations, easy extendability, document-based pipeline architecture, caching to prevent redundant model calls, and more. The tool draws inspiration from spaCy and spacy-llm, offering features like immediate inference, observable pipelines, integrated tools for document parsing and text chunking, ready-to-use tasks such as classification, summarization, translation, and more, persistence for saving and loading pipelines, distillation for specialized model creation, and caching to optimize performance.
kafka-ml
Kafka-ML is a framework designed to manage the pipeline of Tensorflow/Keras and PyTorch machine learning models on Kubernetes. It enables the design, training, and inference of ML models with datasets fed through Apache Kafka, connecting them directly to data streams like those from IoT devices. The Web UI allows easy definition of ML models without external libraries, catering to both experts and non-experts in ML/AI.
mentals-ai
Mentals AI is a tool designed for creating and operating agents that feature loops, memory, and various tools, all through straightforward markdown syntax. This tool enables you to concentrate solely on the agent’s logic, eliminating the necessity to compose underlying code in Python or any other language. It redefines the foundational frameworks for future AI applications by allowing the creation of agents with recursive decision-making processes, integration of reasoning frameworks, and control flow expressed in natural language. Key concepts include instructions with prompts and references, working memory for context, short-term memory for storing intermediate results, and control flow from strings to algorithms. The tool provides a set of native tools for message output, user input, file handling, Python interpreter, Bash commands, and short-term memory. The roadmap includes features like a web UI, vector database tools, agent's experience, and tools for image generation and browsing. The idea behind Mentals AI originated from studies on psychoanalysis executive functions and aims to integrate 'System 1' (cognitive executor) with 'System 2' (central executive) to create more sophisticated agents.
llm-memorization
The 'llm-memorization' project is a tool designed to index, archive, and search conversations with a local LLM using a SQLite database enriched with automatically extracted keywords. It aims to provide personalized context at the start of a conversation by adding memory information to the initial prompt. The tool automates queries from local LLM conversational management libraries, offers a hybrid search function, enhances prompts based on posed questions, and provides an all-in-one graphical user interface for data visualization. It supports both French and English conversations and prompts for bilingual use.
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
For similar tasks
generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
ai-agent-papers
The AI Agents Papers repository provides a curated collection of papers focusing on AI agents, covering topics such as agent capabilities, applications, architectures, and presentations. It includes a variety of papers on ideation, decision making, long-horizon tasks, learning, memory-based agents, self-evolving agents, and more. The repository serves as a valuable resource for researchers and practitioners interested in AI agent technologies and advancements.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.





