sdkit

sdkit (stable diffusion kit) is an easy-to-use library for using Stable Diffusion in your AI Art projects. It is fast, feature-packed, and memory-efficient.

Stars: 164

Visit

sdkit (stable diffusion kit) is an easy-to-use library for utilizing Stable Diffusion in AI Art projects. It includes features like ControlNets, LoRAs, Textual Inversion Embeddings, GFPGAN, CodeFormer for face restoration, RealESRGAN for upscaling, k-samplers, support for custom VAEs, NSFW filter, model-downloader, parallel GPU support, and more. It offers a model database, auto-scanning for malicious models, and various optimizations. The API consists of modules for loading models, generating images, filters, model merging, and utilities, all managed through the sdkit.Context object.

README:

sdkit

sdkit (stable diffusion kit) is an easy-to-use library for using Stable Diffusion in your AI Art projects. It is fast, feature-packed, and memory-efficient.

New: Stable Diffusion XL, ControlNets, LoRAs and Embeddings are now supported!

This is a community project, so please feel free to contribute (and to use it in your project)!

Why?

The goal is to let you be productive quickly (at your AI art project), so it bundles Stable Diffusion along with commonly-used features (like ControlNets, LoRAs, Textual Inversion Embeddings, GFPGAN and CodeFormer for face restoration, RealESRGAN for upscaling, k-samplers, support for loading custom VAEs, NSFW filter etc).

Advanced features include a model-downloader (with a database of commonly used models), support for running in parallel on multiple GPUs, auto-scanning for malicious models, etc. Full list of features

Installation

Tested with Python 3.8. Supports Windows, Linux, and Mac.

Windows/Linux:

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
Run pip install sdkit

Mac:

Run pip install sdkit

Example

Local model

A simple example for generating an image from a Stable Diffusion model file (already present on the disk):

import sdkit
from sdkit.models import load_model
from sdkit.generate import generate_images
from sdkit.utils import log

context = sdkit.Context()

# set the path to the model file on the disk (.ckpt or .safetensors file)
context.model_paths['stable-diffusion'] = 'D:\\path\\to\\512-base-ema.ckpt'
load_model(context, 'stable-diffusion')

# generate the image
images = generate_images(context, prompt='Photograph of an astronaut riding a horse', seed=42, width=512, height=512)

# save the image
images[0].save("image.png") # images is a list of PIL.Image

log.info("Generated images!")

Auto-download a known model

A simple example for automatically downloading a known Stable Diffusion model file:

import sdkit
from sdkit.models import download_models, resolve_downloaded_model_path, load_model
from sdkit.generate import generate_images
from sdkit.utils import save_images

context = sdkit.Context()

download_models(context, models={'stable-diffusion': '1.5-pruned-emaonly'}) # downloads the known "SD 1.5-pruned-emaonly" model

context.model_paths['stable-diffusion'] = resolve_downloaded_model_path(context, 'stable-diffusion', '1.5-pruned-emaonly')
load_model(context, 'stable-diffusion')

images = generate_images(context, prompt='Photograph of an astronaut riding a horse', seed=42, width=512, height=512)
save_images(images, dir_path='D:\\path\\to\\images\\directory')

Please see the list of examples, to learn how to use the other features (like filters, VAE, ControlNet, Embeddings, LoRA, memory optimizations, running on multiple GPUs etc).

API

Please see the API Reference page for a detailed summary.

Broadly, the API contains 5 modules:

sdkit.models # load/unloading models, downloading known models, scanning models
sdkit.generate # generating images
sdkit.filter # face restoration, upscaling
sdkit.train # model merge, and (in the future) more training methods
sdkit.utils

And a sdkit.Context object is passed around, which encapsulates the data related to the runtime (e.g. device and vram_optimizations) as well as references to the loaded model files and paths. Context is a thread-local object.

Models DB

Click here to see the list of known models.

sdkit includes a database of known models and their configurations. This lets you download a known model with a single line of code. (You can customize where it saves the downloaded model)

Additionally, sdkit will attempt to automatically determine the configuration for a given model (when loading from disk). E.g. if an SD 2.1 model is being loaded, sdkit will automatically know to use fp32 for attn_precision. If an SD 2.0 v-type model is being loaded, sdkit will automatically know to use the v2-inference-v.yaml configuration. It does this by matching the quick-hash of the given model file, with the list of known quick-hashes.

For models that don't match a known hash (e.g. custom models), or if you want to override the config file, you can set the path to the config file in context.model_paths. e.g. context.model_paths['stable-diffusion'] = 'path/to/config.yaml'

FAQ

Does it have all the cool features?

It was born out of a popular Stable Diffusion UI, splitting out the battle-tested core engine into sdkit.

Features include: SD 2.1, SDXL, ControlNet, LoRAs, Embeddings, txt2img, img2img, inpainting, NSFW filter, multiple GPU support, Mac Support, GFPGAN and CodeFormer (fix faces), RealESRGAN (upscale), 16 samplers (including k-samplers and UniPC), custom VAE, low-memory optimizations, model merging, safetensor support, picklescan, etc. Click here to see the full list of features.

📢 We're looking to add support for Lycoris, AMD, Pix2Pix, and outpainting. We'd love some code contributions for these!

Is it fast?

It is pretty fast, and close to the fastest. For the same image, sdkit took 5.5 seconds, while automatic1111 WebUI took 4.95 seconds. 📢 We're looking for code contributions to make sdkit even faster!

xformers is supported experimentally, which will make sdkit even faster.

Details of the benchmark:

Windows 11, NVIDIA 3060 12GB, 512x512 image, sd-v1-4.ckpt, euler_a sampler, number of steps: 25, seed 42, guidance scale 7.5.

No xformers. No VRAM optimizations for low-memory usage.

	Time taken	Iterations/sec	Peak VRAM usage
`sdkit`	5.5 sec	6.0 it/s	5.1 GB
`automatic1111` webui	4.95 sec	6.15 it/s	5.1 GB

Does it work on lower-end GPUs, or without GPUs?

Yes. It works on NVIDIA/Mac GPUs with at least 2GB of VRAM. For PCs without a compatible GPU, it can run entirely on the CPU. Running on the CPU will be very slow, but at least you'll be able to try it out!

📢 We don't support AMD yet on Windows (it'll run in CPU mode, or in Linux), but we're looking for code contributions for AMD support!

Why not just use diffusers?

You can certainly use diffusers. sdkit is in fact using diffusers internally, so you can think of sdkit as a convenient API and a collection of tools, focused on Stable Diffusion projects.

sdkit:

is a simple, lightweight toolkit for Stable Diffusion projects.
natively includes frequently-used projects like GFPGAN, CodeFormer, and RealESRGAN.
works with the popular .ckpt and .safetensors model format.
includes memory optimizations for low-end GPUs.
built-in support for running on multiple GPUs.
can download models from any server.
Auto scans for malicious models.
includes 16 samplers (including k-samplers).
born out of the needs of the new Stable Diffusion AI Art scene, starting Aug 2022.

Who is using sdkit?

Easy Diffusion (cmdr2 UI) for Stable Diffusion.
Arthemy AI

If your project is using sdkit, you can add it to this list. Please feel free to open a pull request (or let us know at our Discord community).

Contributing

We'd love to accept code contributions. Please feel free to drop by our Discord community!

📢 We're looking for code contributions for these features (or anything else you'd like to work on):

Lycoris.
Outpainting.
Pix2Pix.
AMD support.

If you'd like to set up a developer version on your PC (to contribute code changes), please follow these instructions.

Instructions for running automated tests: Running Tests.

Credits

Stable Diffusion: https://github.com/Stability-AI/stablediffusion
CodeFormer: https://github.com/sczhou/CodeFormer (license: https://github.com/sczhou/CodeFormer/blob/master/LICENSE)
GFPGAN: https://github.com/TencentARC/GFPGAN
RealESRGAN: https://github.com/xinntao/Real-ESRGAN
k-diffusion: https://github.com/crowsonkb/k-diffusion
Code contributors and artists on Easy Diffusion (cmdr2 UI): https://github.com/easydiffusion/easydiffusion and Discord (https://discord.com/invite/u9yhsFmEkB)
Lots of contributors on the internet

Disclaimer

The authors of this project are not responsible for any content generated using this project.

The license of this software forbids you from sharing any content that:

Violates any laws.
Produces any harm to a person or persons.
Disseminates (spreads) any personal information that would be meant for harm.
Spreads misinformation.
Target vulnerable groups.

For the full list of restrictions please read the License. By using this software you agree to the terms.

For Tasks:

Click tags to check more tools for each tasks

generate images download models face restoration upscaling images run on multiple gpus

For Jobs:

ai researcher computer vision engineer machine learning engineer data scientist artificial intelligence developer

Alternative AI tools for sdkit

Similar Open Source Tools

sdkit

github

: 164

SwarmUI

SwarmUI is a modular stable diffusion web-user-interface designed to make powertools easily accessible, high performance, and extensible. It is in Beta status, offering a primary Generate tab for beginners and a Comfy Workflow tab for advanced users. The tool aims to become a full-featured one-stop-shop for all things Stable Diffusion, with plans for better mobile browser support, detailed 'Current Model' display, dynamic tab shifting, LLM-assisted prompting, and convenient direct distribution as an Electron app.

github

: 2.3k

MARS5-TTS

MARS5 is a novel English speech model (TTS) developed by CAMB.AI, featuring a two-stage AR-NAR pipeline with a unique NAR component. The model can generate speech for various scenarios like sports commentary and anime with just 5 seconds of audio and a text snippet. It allows steering prosody using punctuation and capitalization in the transcript. Speaker identity is specified using an audio reference file, enabling 'deep clone' for improved quality. The model can be used via torch.hub or HuggingFace, supporting both shallow and deep cloning for inference. Checkpoints are provided for AR and NAR models, with hardware requirements of 750M+450M params on GPU. Contributions to improve model stability, performance, and reference audio selection are welcome.

github

: 2.1k

Pandrator

Pandrator is a GUI tool for generating audiobooks and dubbing using voice cloning and AI. It transforms text, PDF, EPUB, and SRT files into spoken audio in multiple languages. It leverages XTTS, Silero, and VoiceCraft models for text-to-speech conversion and voice cloning, with additional features like LLM-based text preprocessing and NISQA for audio quality evaluation. The tool aims to be user-friendly with a one-click installer and a graphical interface.

github

: 429

jaison-core

J.A.I.son is a Python project designed for generating responses using various components and applications. It requires specific plugins like STT, T2T, TTSG, and TTSC to function properly. Users can customize responses, voice, and configurations. The project provides a Discord bot, Twitch events and chat integration, and VTube Studio Animation Hotkeyer. It also offers features for managing conversation history, training AI models, and monitoring conversations.

github

: 216

stable-diffusion-webui

Stable Diffusion web UI is a web interface for Stable Diffusion, implemented using Gradio library. It provides a user-friendly interface to access the powerful image generation capabilities of Stable Diffusion. With Stable Diffusion web UI, users can easily generate images from text prompts, edit and refine images using inpainting and outpainting, and explore different artistic styles and techniques. The web UI also includes a range of advanced features such as textual inversion, hypernetworks, and embeddings, allowing users to customize and fine-tune the image generation process. Whether you're an artist, designer, or simply curious about the possibilities of AI-generated art, Stable Diffusion web UI is a valuable tool that empowers you to create stunning and unique images.

github

: 148.6k

keras-hub

KerasHub is a pretrained modeling library that provides Keras 3 implementations of popular model architectures with pretrained checkpoints. It supports text, image, and audio data for generation, classification, and other tasks. Models are compatible with JAX, TensorFlow, and PyTorch, and can be fine-tuned on GPUs and TPUs. KerasHub components are provided as Layer and Model implementations, extending the core Keras API.

github

: 885

mosec

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic

github

: 834

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

agno

Agno is a lightweight library for building multi-modal Agents. It is designed with core principles of simplicity, uncompromising performance, and agnosticism, allowing users to create blazing fast agents with minimal memory footprint. Agno supports any model, any provider, and any modality, making it a versatile container for AGI. Users can build agents with lightning-fast agent creation, model agnostic capabilities, native support for text, image, audio, and video inputs and outputs, memory management, knowledge stores, structured outputs, and real-time monitoring. The library enables users to create autonomous programs that use language models to solve problems, improve responses, and achieve tasks with varying levels of agency and autonomy.

github

: 24.0k

lmstudio-js

LM Studio Client SDK lmstudio-ts is LM Studio's official JavaScript/TypeScript client SDK. It allows you to use LLMs to respond in chats or predict text completions, define functions as tools, and turn LLMs into autonomous agents that run completely locally, load, configure, and unload models from memory, supports both browser and any Node-compatible environments, generate embeddings for text, and more! Why use `lmstudio-js` over `openai` sdk? Open AI's SDK is designed to use with Open AI's proprietary models. As such, it is missing many features that are essential for using LLMs in a local environment, such as managing loading and unloading models from memory, configuring load parameters (context length, gpu offload settings, etc.), speculative decoding, getting information (such as context length, model size, etc.) about a model, and more. In addition, while `openai` sdk is automatically generated, `lmstudio-js` is designed from ground-up to be clean and easy to use for TypeScript/JavaScript developers.

github

: 964

openai-agents-python

The OpenAI Agents SDK is a lightweight framework for building multi-agent workflows. It includes concepts like Agents, Handoffs, Guardrails, and Tracing to facilitate the creation and management of agents. The SDK is compatible with any model providers supporting the OpenAI Chat Completions API format. It offers flexibility in modeling various LLM workflows and provides automatic tracing for easy tracking and debugging of agent behavior. The SDK is designed for developers to create deterministic flows, iterative loops, and more complex workflows.

github

: 8.1k

libedgetpu

This repository contains the source code for the userspace level runtime driver for Coral devices. The software is distributed in binary form at coral.ai/software. Users can build the library using Docker + Bazel, Bazel, or Makefile methods. It supports building on Linux, macOS, and Windows. The library is used to enable the Edge TPU runtime, which may heat up during operation. Google does not accept responsibility for any loss or damage if the device is operated outside the recommended ambient temperature range.

github

: 177

NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding _programmable guardrails_ to LLM-based conversational applications. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.

github

: 4.6k

ygo-agent

YGO Agent is a project focused on using deep learning to master the Yu-Gi-Oh! trading card game. It utilizes reinforcement learning and large language models to develop advanced AI agents that aim to surpass human expert play. The project provides a platform for researchers and players to explore AI in complex, strategic game environments.

github

: 55

2p-kt

2P-Kt is a Kotlin-based and multi-platform reboot of tuProlog (2P), a multi-paradigm logic programming framework written in Java. It consists of an open ecosystem for Symbolic Artificial Intelligence (AI) with modules supporting logic terms, unification, indexing, resolution of logic queries, probabilistic logic programming, binary decision diagrams, OR-concurrent resolution, DSL for logic programming, parsing modules, serialisation modules, command-line interface, and graphical user interface. The tool is designed to support knowledge representation and automatic reasoning through logic programming in an extensible and flexible way, encouraging extensions towards other symbolic AI systems than Prolog. It is a pure, multi-platform Kotlin project supporting JVM, JS, Android, and Native platforms, with a lightweight library leveraging the Kotlin common library.

github

: 86

For similar tasks

comfy-cli

Comfy-cli is a command line tool designed to facilitate the installation and management of ComfyUI, an open-source machine learning framework. Users can easily set up ComfyUI, install packages, and manage custom nodes directly from the terminal. The tool offers features such as easy installation, seamless package management, custom node management, checkpoint downloads, cross-platform compatibility, and comprehensive documentation. Comfy-cli simplifies the process of working with ComfyUI, making it convenient for users to handle various tasks related to the framework.

github

: 214

sdkit

github

: 164

Jlama

Jlama is a modern Java inference engine designed for large language models. It supports various model types such as Gemma, Llama, Mistral, GPT-2, BERT, and more. The tool implements features like Flash Attention, Mixture of Experts, and supports different model quantization formats. Built with Java 21 and utilizing the new Vector API for faster inference, Jlama allows users to add LLM inference directly to their Java applications. The tool includes a CLI for running models, a simple UI for chatting with LLMs, and examples for different model types.

github

: 987

olah

Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.

github

: 132

gemma

Gemma is a family of open-weights Large Language Model (LLM) by Google DeepMind, based on Gemini research and technology. This repository contains an inference implementation and examples, based on the Flax and JAX frameworks. Gemma can run on CPU, GPU, and TPU, with model checkpoints available for download. It provides tutorials, reference implementations, and Colab notebooks for tasks like sampling and fine-tuning. Users can contribute to Gemma through bug reports and pull requests. The code is licensed under the Apache License, Version 2.0.

github

: 3.1k

FireRedTTS

FireRedTTS is a foundation text-to-speech framework designed for industry-level generative speech applications. It offers a rich-punctuation model with expanded punctuation coverage and enhanced audio production consistency. The tool provides pre-trained checkpoints, inference code, and an interactive demo space. Users can clone the repository, create a conda environment, download required model files, and utilize the tool for synthesizing speech in various languages. FireRedTTS aims to enhance stability and provide controllable human-like speech generation capabilities.

github

: 313

ai-dev-gallery

The AI Dev Gallery is an app designed to help Windows developers integrate AI capabilities within their own apps and projects. It contains over 25 interactive samples powered by local AI models, allows users to explore, download, and run models from Hugging Face and GitHub, and provides the ability to view the C# source code and export a standalone Visual Studio project for each sample. The app is open-source and welcomes contributions and suggestions from the community.

github

: 926

Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI. It includes two different sizes, Ling-Lite with 16.8 billion parameters and Ling-Plus with 290 billion parameters. These models show impressive performance and scalability for various tasks, from natural language processing to complex problem-solving. The open-source nature of Ling encourages collaboration and innovation within the AI community, leading to rapid advancements and improvements. Users can download the models from Hugging Face and ModelScope for different use cases. Ling also supports offline batched inference and online API services for deployment. Additionally, users can fine-tune Ling models using Llama-Factory for tasks like SFT and DPO.

github

: 119

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675