mflux
A MLX port of FLUX based on the Huggingface Diffusers implementation.
Stars: 723
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
README:
A MLX port of FLUX based on the Huggingface Diffusers implementation.
Run the powerful FLUX models from Black Forest Labs locally on your Mac!
- Philosophy
- πΏ Installation
- πΌοΈ Generating an image
- β±οΈ Image generation speed (updated)
βοΈ Equivalent to Diffusers implementation- ποΈ Quantization
- π½ Running a non-quantized model directly from disk
- π LoRA
- πΉοΈ Controlnet
- π§ Current limitations
- β TODO
- License
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. MFLUX is purposefully kept minimal and explicit - Network architectures are hardcoded and no config files are used except for the tokenizers. The aim is to have a tiny codebase with the single purpose of expressing these models (thereby avoiding too many abstractions). While MFLUX priorities readability over generality and performance, it can still be quite fast, and even faster quantized.
All models are implemented from scratch in MLX and only the tokenizers are used via the Huggingface Transformers library. Other than that, there are only minimal dependencies like Numpy and Pillow for simple image post-processing.
For users, the easiest way to install MFLUX is to use uv tool
: If you have installed uv
, simply:
uv tool install --upgrade mflux
to get the mflux-generate
and related command line executables. You can skip to the usage guides below.
For the classic way to create a user virtual environment:
mkdir -p mflux && cd mflux && python3 -m venv .venv && source .venv/bin/activate
This creates and activates a virtual environment in the mflux
folder. After that, install MFLUX via pip:
pip install -U mflux
For contributors (click to expand)
- Clone the repo:
git clone [email protected]:filipstrand/mflux.git
- Install the application
make install
- To run the test suite
make test
- Follow format and lint checks prior to submitting Pull Requests. The recommended
make lint
andmake format
installs and usesruff
. You can setup your editor/IDE to lint/format automatically, or use our providedmake
helpers:
-
make format
- formats your code -
make lint
- shows your lint errors and warnings, but does not auto fix -
make check
- viapre-commit
hooks, formats your code and attempts to auto fix lint errors - consult official
ruff
documentation on advanced usages
Run the command mflux-generate
by specifying a prompt and the model and some optional arguments. For example, here we use a quantized version of the schnell
model for 2 steps:
mflux-generate --model schnell --prompt "Luxury food photograph" --steps 2 --seed 2 -q 8
This example uses the more powerful dev
model with 25 time steps:
mflux-generate --model dev --prompt "Luxury food photograph" --steps 25 --seed 2 -q 8
By default, model files are downloaded to the .cache
folder within your home directory. For example, in my setup, the path looks like this:
/Users/filipstrand/.cache/huggingface/hub/models--black-forest-labs--FLUX.1-dev
To change this default behavior, you can do so by modifying the HF_HOME
environment variable. For more details on how to adjust this setting, please refer to the Hugging Face documentation.
-
--prompt
(required,str
): Text description of the image to generate. -
--model
or-m
(required,str
): Model to use for generation ("schnell"
or"dev"
). -
--output
(optional,str
, default:"image.png"
): Output image filename. -
--seed
(optional,int
, default:None
): Seed for random number generation. Default is time-based. -
--height
(optional,int
, default:1024
): Height of the output image in pixels. -
--width
(optional,int
, default:1024
): Width of the output image in pixels. -
--steps
(optional,int
, default:4
): Number of inference steps. -
--guidance
(optional,float
, default:3.5
): Guidance scale (only used for"dev"
model). -
--path
(optional,str
, default:None
): Path to a local model on disk. -
--quantize
or-q
(optional,int
, default:None
): Quantization (choose between4
or8
). -
--lora-paths
(optional,[str]
, default:None
): The paths to the LoRA weights. -
--lora-scales
(optional,[float]
, default:None
): The scale for each respective LoRA (will default to1.0
if not specified and only one LoRA weight is loaded.) -
--metadata
(optional): Exports a.json
file containing the metadata for the image with the same name. (Even without this flag, the image metadata is saved and can be viewed usingexiftool image.png
) -
--controlnet-image-path
(required,str
): Path to the local image used by ControlNet to guide output generation. -
--controlnet-strength
(optional,float
, default:0.4
): Degree of influence the control image has on the output. Ranges from0.0
(no influence) to1.0
(full influence). -
--controlnet-save-canny
(optional, bool, default: False): If set, saves the Canny edge detection reference image used by ControlNet.
Or, with the correct python environment active, create and run a separate script like the following:
from mflux import Flux1, Config
# Load the model
flux = Flux1.from_alias(
alias="schnell", # "schnell" or "dev"
quantize=8, # 4 or 8
)
# Generate an image
image = flux.generate_image(
seed=2,
prompt="Luxury food photograph",
config=Config(
num_inference_steps=2, # "schnell" works well with 2-4 steps, "dev" works well with 20-25 steps
height=1024,
width=1024,
)
)
image.save(path="image.png")
For more options on how to configure MFLUX, please see generate.py.
These numbers are based on the non-quantized schnell
model, with the configuration provided in the code snippet below.
To time your machine, run the following:
time mflux-generate \
--prompt "Luxury food photograph" \
--model schnell \
--steps 2 \
--seed 2 \
--height 1024 \
--width 1024
Device | User | Reported Time | Notes |
---|---|---|---|
M3 Max | @karpathy | ~20s | |
M2 Ultra | @awni | <15s | |
2023 M2 Max (96GB) | @explorigin | ~25s | |
2021 M1 Pro (16GB) | @qw-in | ~175s | Might freeze your mac |
2023 M3 Pro (36GB) | @kush-gupt | ~80s | |
2020 M1 (8GB) | @mbvillaverde | ~335s | With resolution 512 x 512 |
2022 M1 MAX (64GB) | @BosseParra | ~55s | |
2021 M1 Pro (32GB) | @filipstrand | ~160s | |
2023 M2 Max (32GB) | @filipstrand | ~70s |
Note that these numbers includes starting the application from scratch, which means doing model i/o, setting/quantizing weights etc.
If we assume that the model is already loaded, you can inspect the image metadata using exiftool image.png
and see the total duration of the denoising loop (excluding text embedding).
There is only a single source of randomness when generating an image: The initial latent array.
In this implementation, this initial latent is fully deterministically controlled by the input seed
parameter.
However, if we were to import a fixed instance of this latent array saved from the Diffusers implementation, then MFLUX will produce an identical image to the Diffusers implementation (assuming a fixed prompt and using the default parameter settings in the Diffusers setup).
The images below illustrate this equivalence. In all cases the Schnell model was run for 2 time steps. The Diffusers implementation ran in CPU mode. The precision for MFLUX can be set in the Config class. There is typically a noticeable but very small difference in the final image when switching between 16bit and 32bit precision.
Luxury food photograph
detailed cinematic dof render of an old dusty detailed CRT monitor on a wooden desk in a dim room with items around, messy dirty room. On the screen are the letters "FLUX" glowing softly. High detail hard surface render
photorealistic, lotr, A tiny red dragon curled up asleep inside a nest, (Soft Focus) , (f_stop 2.8) , (focal_length 50mm) macro lens f/2. 8, medieval wizard table, (pastel) colors, (cozy) morning light filtering through a nearby window, (whimsical) steam shapes, captured with a (Canon EOS R5) , highlighting (serene) comfort, medieval, dnd, rpg, 3d, 16K, 8K
A weathered fisherman in his early 60s stands on the deck of his boat, gazing out at a stormy sea. He has a thick, salt-and-pepper beard, deep-set blue eyes, and skin tanned and creased from years of sun exposure. He's wearing a yellow raincoat and hat, with water droplets clinging to the fabric. Behind him, dark clouds loom ominously, and waves crash against the side of the boat. The overall atmosphere is one of tension and respect for the power of nature.
Luxury food photograph of an italian Linguine pasta alle vongole dish with lots of clams. It has perfect lighting and a cozy background with big bokeh and shallow depth of field. The mood is a sunset balcony in tuscany. The photo is taken from the side of the plate. The pasta is shiny with sprinkled parmesan cheese and basil leaves on top. The scene is complemented by a warm, inviting light that highlights the textures and colors of the ingredients, giving it an appetizing and elegant look.
MFLUX supports running FLUX in 4-bit or 8-bit quantized mode. Running a quantized version can greatly speed up the generation process and reduce the memory consumption by several gigabytes. Quantized models also take up less disk space.
mflux-generate \
--model schnell \
--steps 2 \
--seed 2 \
--quantize 8 \
--height 1920 \
--width 1024 \
--prompt "Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid β creating a sense of harmony and balance, the pondβs calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier"
In this example, weights are quantized at runtime - this is convenient if you don't want to save a quantized copy of the weights to disk, but still want to benefit from the potential speedup and RAM reduction quantization might bring.
By selecting the --quantize
or -q
flag to be 4
, 8
, or removing it entirely, we get all 3 images above. As can be seen, there is very little difference between the images (especially between the 8-bit, and the non-quantized result).
Image generation times in this example are based on a 2021 M1 Pro (32GB) machine. Even though the images are almost identical, there is a ~2x speedup by
running the 8-bit quantized version on this particular machine. Unlike the non-quantized version, for the 8-bit version the swap memory usage is drastically reduced and GPU utilization is close to 100% during the whole generation. Results here can vary across different machines.
The model sizes for both schnell
and dev
at various quantization levels are as follows:
4 bit | 8 bit | Original (16 bit) |
---|---|---|
9.85GB | 18.16GB | 33.73GB |
The reason weights sizes are not fully cut in half is because a small number of weights are not quantized and kept at full precision.
To save a local copy of the quantized weights, run the mflux-save
command like so:
mflux-save \
--path "/Users/filipstrand/Desktop/schnell_8bit" \
--model schnell \
--quantize 8
Note that when saving a quantized version, you will need the original huggingface weights.
It is also possible to specify LoRA adapters when saving the model, e.g
mflux-save \
--path "/Users/filipstrand/Desktop/schnell_8bit" \
--model schnell \
--quantize 8 \
--lora-paths "/path/to/lora.safetensors" \
--lora-scales 0.7
When generating images with a model like this, no LoRA adapter is needed to be specified since it is already baked into the saved quantized weights.
To generate a new image from the quantized model, simply provide a --path
to where it was saved:
mflux-generate \
--path "/Users/filipstrand/Desktop/schnell_8bit" \
--model schnell \
--steps 2 \
--seed 2 \
--height 1920 \
--width 1024 \
--prompt "Tranquil pond in a bamboo forest at dawn, the sun is barely starting to peak over the horizon, panda practices Tai Chi near the edge of the pond, atmospheric perspective through the mist of morning dew, sunbeams, its movements are graceful and fluid β creating a sense of harmony and balance, the pondβs calm waters reflecting the scene, inviting a sense of meditation and connection with nature, style of Howard Terpning and Jessica Rossier"
Note: When loading a quantized model from disk, there is no need to pass in -q
flag, since we can infer this from the weight metadata.
Also Note: Once we have a local model (quantized or not) specified via the --path
argument, the huggingface cache models are not required to launch the model.
In other words, you can reclaim the 34GB diskspace (per model) by deleting the full 16-bit model from the Huggingface cache if you choose.
If you don't want to download the full models and quantize them yourself, the 4-bit weights are available here for a direct download:
MFLUX also supports running a non-quantized model directly from a custom location.
In the example below, the model is placed in /Users/filipstrand/Desktop/schnell
:
mflux-generate \
--path "/Users/filipstrand/Desktop/schnell" \
--model schnell \
--steps 2 \
--seed 2 \
--prompt "Luxury food photograph"
Note that the --model
flag must be set when loading a model from disk.
Also note that unlike when using the typical alias
way of initializing the model (which internally handles that the required resources are downloaded),
when loading a model directly from disk, we require the downloaded models to look like the following:
.
βββ text_encoder
βΒ Β βββ model.safetensors
βββ text_encoder_2
βΒ Β βββ model-00001-of-00002.safetensors
βΒ Β βββ model-00002-of-00002.safetensors
βββ tokenizer
βΒ Β βββ merges.txt
βΒ Β βββ special_tokens_map.json
βΒ Β βββ tokenizer_config.json
βΒ Β βββ vocab.json
βββ tokenizer_2
βΒ Β βββ special_tokens_map.json
βΒ Β βββ spiece.model
βΒ Β βββ tokenizer.json
βΒ Β βββ tokenizer_config.json
βββ transformer
βΒ Β βββ diffusion_pytorch_model-00001-of-00003.safetensors
βΒ Β βββ diffusion_pytorch_model-00002-of-00003.safetensors
βΒ Β βββ diffusion_pytorch_model-00003-of-00003.safetensors
βββ vae
βββ diffusion_pytorch_model.safetensors
This mirrors how the resources are placed in the HuggingFace Repo for FLUX.1. Huggingface weights, unlike quantized ones exported directly from this project, have to be processed a bit differently, which is why we require this structure above.
MFLUX support loading trained LoRA adapters (actual training support is coming).
The following example The_Hound LoRA from @TheLastBen:
mflux-generate --prompt "sandor clegane" --model dev --steps 20 --seed 43 -q 8 --lora-paths "sandor_clegane_single_layer.safetensors"
The following example is Flux_1_Dev_LoRA_Paper-Cutout-Style LoRA from @Norod78:
mflux-generate --prompt "pikachu, Paper Cutout Style" --model schnell --steps 4 --seed 43 -q 8 --lora-paths "Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors"
Note that LoRA trained weights are typically trained with a trigger word or phrase. For example, in the latter case, the sentence should include the phrase "Paper Cutout Style".
Also note that the same LoRA weights can work well with both the schnell
and dev
models. Refer to the original LoRA repository to see what mode it was trained for.
Multiple LoRAs can be sent in to combine the effects of the individual adapters. The following example combines both of the above LoRAs:
mflux-generate \
--prompt "sandor clegane in a forest, Paper Cutout Style" \
--model dev \
--steps 20 \
--seed 43 \
--lora-paths sandor_clegane_single_layer.safetensors Flux_1_Dev_LoRA_Paper-Cutout-Style.safetensors \
--lora-scales 1.0 1.0 \
-q 8
Just to see the difference, this image displays the four cases: One of having both adapters fully active, partially active and no LoRA at all.
The example above also show the usage of --lora-scales
flag.
Since different fine-tuning services can use different implementations of FLUX, the corresponding LoRA weights trained on these services can be different from one another. The aim of MFLUX is to support the most common ones. The following table show the current supported formats:
Supported | Name | Example | Notes |
---|---|---|---|
β | BFL | civitai - Impressionism | Many things on civitai seem to work |
β | Diffusers | Flux_1_Dev_LoRA_Paper-Cutout-Style | |
β | XLabs-AI | flux-RealismLora |
To report additional formats, examples or other any suggestions related to LoRA format support, please see issue #47.
MFLUX has Controlnet support for an even more fine-grained control
of the image generation. By providing a reference image via --controlnet-image-path
and a strength parameter via --controlnet-strength
, you can guide the generation toward the reference image.
mflux-generate-controlnet \
--prompt "A comic strip with a joker in a purple suit" \
--model dev \
--steps 20 \
--seed 1727047657 \
--height 1066 \
--width 692 \
-q 8 \
--lora-paths "Dark Comic - s0_8 g4.safetensors" \
--controlnet-image-path "reference.png" \
--controlnet-strength 0.5 \
--controlnet-save-canny
This example combines the controlnet reference image with the LoRA Dark Comic Flux.
generate-controlnet
command.
At the moment, the Controlnet used is InstantX/FLUX.1-dev-Controlnet-Canny, which was trained for the dev
model.
It can work well with schnell
, but performance is not guaranteed.
Controlnet can also work well together with LoRA adapters. In the example below the same reference image is used as a controlnet input with different prompts and LoRA adapters active.
- Images are generated one by one.
- Negative prompts not supported.
- LoRA weights are only supported for the transformer part of the network.
- Some LoRA adapters does not work.
- Currently, the supported controlnet is the canny-only version.
- [ ] LoRA fine-tuning (now also in mlx-examples for reference)
- [ ] Frontend support (Gradio/Streamlit/Other?)
- [ ] ComfyUI support?
- [ ] Image2Image support (upcoming)
- [ ] Support for PuLID
- [ ] Support for depth based controlnet via ml-depth-pro or similar?
This project is licensed under the MIT License.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for mflux
Similar Open Source Tools
mflux
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.
rtdl-num-embeddings
This repository provides the official implementation of the paper 'On Embeddings for Numerical Features in Tabular Deep Learning'. It focuses on transforming scalar continuous features into vectors before integrating them into the main backbone of tabular neural networks, showcasing improved performance. The embeddings for continuous features are shown to enhance the performance of tabular DL models and are applicable to various conventional backbones, offering efficiency comparable to Transformer-based models. The repository includes Python packages for practical usage, exploration of metrics and hyperparameters, and reproducing reported results for different algorithms and datasets.
raft
RAFT (Reusable Accelerated Functions and Tools) is a C++ header-only template library with an optional shared library that contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
llm-analysis
llm-analysis is a tool designed for Latency and Memory Analysis of Transformer Models for Training and Inference. It automates the calculation of training or inference latency and memory usage for Large Language Models (LLMs) or Transformers based on specified model, GPU, data type, and parallelism configurations. The tool helps users to experiment with different setups theoretically, understand system performance, and optimize training/inference scenarios. It supports various parallelism schemes, communication methods, activation recomputation options, data types, and fine-tuning strategies. Users can integrate llm-analysis in their code using the `LLMAnalysis` class or use the provided entry point functions for command line interface. The tool provides lower-bound estimations of memory usage and latency, and aims to assist in achieving feasible and optimal setups for training or inference.
web-llm
WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally, with functionalities including json-mode, function-calling, streaming, etc. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
storm
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage. **Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system π!**
mentals-ai
Mentals AI is a tool designed for creating and operating agents that feature loops, memory, and various tools, all through straightforward markdown syntax. This tool enables you to concentrate solely on the agentβs logic, eliminating the necessity to compose underlying code in Python or any other language. It redefines the foundational frameworks for future AI applications by allowing the creation of agents with recursive decision-making processes, integration of reasoning frameworks, and control flow expressed in natural language. Key concepts include instructions with prompts and references, working memory for context, short-term memory for storing intermediate results, and control flow from strings to algorithms. The tool provides a set of native tools for message output, user input, file handling, Python interpreter, Bash commands, and short-term memory. The roadmap includes features like a web UI, vector database tools, agent's experience, and tools for image generation and browsing. The idea behind Mentals AI originated from studies on psychoanalysis executive functions and aims to integrate 'System 1' (cognitive executor) with 'System 2' (central executive) to create more sophisticated agents.
HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.
generative-fusion-decoding
Generative Fusion Decoding (GFD) is a novel shallow fusion framework that integrates Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). GFD operates across mismatched token spaces of different models by mapping text token space to byte token space, enabling seamless fusion during the decoding process. It simplifies the complexity of aligning different model sample spaces, allows LLMs to correct errors in tandem with the recognition model, increases robustness in long-form speech recognition, and enables fusing recognition models deficient in Chinese text recognition with LLMs extensively trained on Chinese. GFD significantly improves performance in ASR and OCR tasks, offering a unified solution for leveraging existing pre-trained models through step-by-step fusion.
lhotse
Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.
paxml
Pax is a framework to configure and run machine learning experiments on top of Jax.
datadreamer
DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.
lantern
Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and handle vector search operations efficiently. It introduces a new index type called 'lantern_hnsw' for vector columns, which speeds up 'ORDER BY ... LIMIT' queries. Lantern utilizes the state-of-the-art HNSW implementation called usearch. Users can easily install Lantern using Docker, Homebrew, or precompiled binaries. The tool supports various distance functions, index construction parameters, and operator classes for efficient querying. Lantern offers features like embedding generation, interoperability with pgvector, parallel index creation, and external index graph generation. It aims to provide superior performance metrics compared to other similar tools and has a roadmap for future enhancements such as cloud-hosted version, hardware-accelerated distance metrics, industry-specific application templates, and support for version control and A/B testing of embeddings.
mergekit
Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.
jina
Jina is a tool that allows users to build multimodal AI services and pipelines using cloud-native technologies. It provides a Pythonic experience for serving ML models and transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Users can build and serve models for any data type and deep learning framework, design high-performance services with easy scaling, serve LLM models while streaming their output, integrate with Docker containers via Executor Hub, and host on CPU/GPU using Jina AI Cloud. Jina also offers advanced orchestration and scaling capabilities, a smooth transition to the cloud, and easy scalability and concurrency features for applications. Users can deploy to their own cloud or system with Kubernetes and Docker Compose integration, and even deploy to JCloud for autoscaling and monitoring.
For similar tasks
mflux
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
mindsdb
MindsDB is a platform for customizing AI from enterprise data. You can create, serve, and fine-tune models in real-time from your database, vector store, and application data. MindsDB "enhances" SQL syntax with AI capabilities to make it accessible for developers worldwide. With MindsDBβs nearly 200 integrations, any developer can create AI customized for their purpose, faster and more securely. Their AI systems will constantly improve themselves β using companiesβ own data, in real-time.
training-operator
Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, Tensorflow, XGBoost, MPI, Paddle and others. Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK. > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes. * For a complete reference of the custom resource definitions, please refer to the API Definition. * TensorFlow API Definition * PyTorch API Definition * Apache MXNet API Definition * XGBoost API Definition * MPI API Definition * PaddlePaddle API Definition * For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator * For details on its observability, please refer to the monitoring design doc.
helix
HelixML is a private GenAI platform that allows users to deploy the best of open AI in their own data center or VPC while retaining complete data security and control. It includes support for fine-tuning models with drag-and-drop functionality. HelixML brings the best of open source AI to businesses in an ergonomic and scalable way, optimizing the tradeoff between GPU memory and latency.
nntrainer
NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.
petals
Petals is a tool that allows users to run large language models at home in a BitTorrent-style manner. It enables fine-tuning and inference up to 10x faster than offloading. Users can generate text with distributed models like Llama 2, Falcon, and BLOOM, and fine-tune them for specific tasks directly from their desktop computer or Google Colab. Petals is a community-run system that relies on people sharing their GPUs to increase its capacity and offer a distributed network for hosting model layers.
LLaVA-pp
This repository, LLaVA++, extends the visual capabilities of the LLaVA 1.5 model by incorporating the latest LLMs, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B. It provides various models for instruction-following LMMS and academic-task-oriented datasets, along with training scripts for Phi-3-V and LLaMA-3-V. The repository also includes installation instructions and acknowledgments to related open-source contributions.
KULLM
KULLM (ꡬλ¦) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8ΓA100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.