mergekit

Tools for merging pretrained large language models.

Stars: 5461

Visit

Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

README:

mergekit

mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

Why Merge Models?
Features
Installation
Usage
Merge Configuration
Merge Methods
LoRA extraction
Mixture of Experts merging
Evolutionary merge methods
Merge in the Cloud
Citation

Why Merge Models?

Model merging is a powerful technique that allows combining the strengths of different models without the computational overhead of ensembling or the need for additional training. By operating directly in the weight space of models, merging can:

Combine multiple specialized models into a single versatile model
Transfer capabilities between models without access to training data
Find optimal trade-offs between different model behaviors
Improve performance while maintaining inference costs
Create new capabilities through creative model combinations

Unlike traditional ensembling which requires running multiple models, merged models maintain the same inference cost as a single model while often achieving comparable or superior performance.

Features

Key features of mergekit include:

Supports Llama, Mistral, GPT-NeoX, StableLM, and more
Many merge methods
GPU or CPU execution
Lazy loading of tensors for low memory use
Interpolated gradients for parameter values (inspired by Gryphe's BlockMerge_Gradient script)
Piecewise assembly of language models from layers ("Frankenmerging")
Mixture of Experts merging
LORA extraction
Evolutionary merge methods

🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a mega-GPU backed graphical user interface for mergekit in Arcee! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at the Arcee App. There is also a Hugging Face Space with limited amounts of GPUs.

Installation

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit

pip install -e .  # install the package and make scripts available

If the above fails with the error of:

ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:
(A "pyproject.toml" file was found, but editable mode currently requires a setuptools-based build.)

You may need to upgrade pip to > 21.3 with the command python3 -m pip install --upgrade pip

Usage

The script mergekit-yaml is the main entry point for mergekit. It takes a YAML configuration file and an output path, like so:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

This will run the merge and write your merged model to ./output-model-directory.

For more information on the arguments accepted by mergekit-yaml run the command mergekit-yaml --help.

Uploading to Huggingface

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. mergekit generates a README.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generated README.md as-is. It is also possible to edit your README.md online once it has been uploaded to the Hub.

Once you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using the huggingface_hub Python library.

# log in to huggingface with an access token (must have write permission)
huggingface-cli login
# upload your model
huggingface-cli upload your_hf_username/my-cool-model ./output-model-directory .

The documentation for huggingface_hub goes into more detail about other options for uploading.

Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model. Below are the primary elements of a configuration file:

merge_method: Specifies the method to use for merging models. See Merge Methods for a list.
slices: Defines slices of layers from different models to be used. This field is mutually exclusive with models.
models: Defines entire models to be used for merging. This field is mutually exclusive with slices.
base_model: Specifies the base model used in some merging methods.
parameters: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.
dtype: Specifies the data type used for the merging operation.
tokenizer or tokenizer_source: Determines how to construct a tokenizer for the merged model.
chat_template: Specifies a chat template for the merged model.

Parameter Specification

Parameters are flexible and can be set with varying precedence. They can be specified conditionally using tensor name filters, which allows finer control such as differentiating between attention heads and fully connected layers.

Parameters can be specified as:

Scalars: Single floating-point values.
Gradients: List of floating-point values, specifying an interpolated gradient.

The parameters can be set at different levels, with decreasing precedence as follows:

slices.*.sources.parameters - applying to a specific input slice
slices.*.parameters - applying to a specific output slice
models.*.parameters or input_model_parameters - applying to any tensors coming from specific input models
parameters - catchall

Tokenizer Configuration

The tokenizer behavior can be configured in two ways: using the new tokenizer field (recommended) or the legacy tokenizer_source field (maintained for backward compatibility). These fields are mutually exclusive - you should use one or the other, not both.

Modern Configuration (tokenizer)

The tokenizer field provides fine-grained control over vocabulary and embeddings:

tokenizer:
  source: "union"  # or "base" or a specific model path
  tokens:          # Optional: configure specific tokens
    <token_name>:
      source: ...  # Specify embedding source
      force: false # Optional: force this embedding for all models
  pad_to_multiple_of: null  # Optional: pad vocabulary size

Tokenizer Source

The source field determines the vocabulary of the output model:

union: Combine vocabularies from all input models (default)
base: Use vocabulary from the base model
"path/to/model": Use vocabulary from a specific model

Token Embedding Handling

When merging models with different vocabularies, mergekit uses smart defaults to handle token embeddings:

If a token exists in the base model, its embedding is used as the default
If only one model has the token, that model's embedding is used
Otherwise, an average of all available embeddings is used

You can override these defaults for specific tokens:

tokenizer:
  source: union
  tokens:
    # Use embedding from a specific model
    <|im_start|>:
      source: "path/to/chatml/model"

    # Force a specific embedding for all models
    <|special|>:
      source: "path/to/model"
      force: true

    # Map a token to another model's token embedding
    <|renamed_token|>:
      source:
        kind: "model_token"
        model: "path/to/model"
        token: "<|original_token|>"  # or use token_id: 1234

Practical Example

Here's how you might preserve both Llama 3 Instruct and ChatML prompt formats when merging models:

tokenizer:
  source: union
  tokens:
    # ChatML tokens
    <|im_start|>:
      source: "chatml_model"
    <|im_end|>:
      source: "chatml_model"

    # Llama 3 tokens - force original embeddings
    <|start_header_id|>:
      source: "llama3_model"
      force: true
    <|end_header_id|>:
      source: "llama3_model"
      force: true
    <|eot_id|>:
      source: "llama3_model"
      force: true

Legacy Configuration (tokenizer_source)

For backward compatibility, the tokenizer_source field is still supported:

tokenizer_source: "union"  # or "base" or a model path

This provides basic tokenizer selection but lacks the fine-grained control of the modern tokenizer field.

Chat Template Configuration

The optional chat_template field allows overriding the chat template used for the merged model.

chat_template: "auto"  # or a template name or Jinja2 template

Options include:

"auto": Automatically select the most common template among input models
Built-in templates: "alpaca", "chatml", "llama3", "mistral", "exaone"
A Jinja2 template string for custom formatting

Examples

Several examples of merge configurations are available in examples/.

Merge Methods

A quick overview of the currently supported merge methods:

Method	`merge_method` value	Multi-Model	Uses base model
Linear (Model Soups)	`linear`	✅	❌
SLERP	`slerp`	❌	✅
Nearswap	`nearswap`	❌	✅
Task Arithmetic	`task_arithmetic`	✅	✅
TIES	`ties`	✅	✅
DARE TIES	`dare_ties`	✅	✅
DARE Task Arithmetic	`dare_linear`	✅	✅
Passthrough	`passthrough`	❌	❌
Model Breadcrumbs	`breadcrumbs`	✅	✅
Model Breadcrumbs + TIES	`breadcrumbs_ties`	✅	✅
Model Stock	`model_stock`	✅	✅
NuSLERP	`nuslerp`	❌	✅
DELLA	`della`	✅	✅
DELLA Task Arithmetic	`della_linear`	✅	✅
SCE	`sce`	✅	✅

Linear

The classic merge method - a simple weighted average.

Parameters:

weight - relative (or absolute if normalize=False) weighting of a given tensor
normalize - if true, the weights of all models contributing to a tensor will be normalized. Default behavior.

SLERP

Spherically interpolate the parameters of two models. One must be set as base_model.

Parameters:

t - interpolation factor. At t=0 will return base_model, at t=1 will return the other one.

Nearswap

Interpolates base model with secondary model if similarity is below t. Accepts two models.

Parameters:

t - similarity threshold

Task Arithmetic

Computes "task vectors" for each model by subtracting a base model. Merges the task vectors linearly and adds back the base. Works great for models that were fine tuned from a common ancestor. Also a super useful mental framework for several of the more involved merge methods.

Parameters: same as Linear, plus:

lambda - scaling factor applied after weighted sum of task vectors

TIES

Builds on the task arithmetic framework. Resolves interference between models by sparsifying the task vectors and applying a sign consensus algorithm. Allows you to merge a larger number of models and retain more of their strengths.

Parameters: same as Task Arithmetic, plus:

density - fraction of weights in differences from the base model to retain

DARE

In the same vein as TIES, sparsifies task vectors to reduce interference. Differs in that DARE uses random pruning with a novel rescaling to better match performance of the original models. DARE can be used either with the sign consensus algorithm of TIES (dare_ties) or without (dare_linear).

Parameters: same as TIES for dare_ties, or Linear for dare_linear

Passthrough

passthrough is a no-op that simply passes input tensors through unmodified. It is meant to be used for layer-stacking type merges where you have only one input model. Useful for frankenmerging.

Model Breadcrumbs

An extension of task arithmetic that discards both small and extremely large differences from the base model. As with DARE, the Model Breadcrumbs algorithm can be used with (breadcrumbs_ties) or without (breadcrumbs) the sign consensus algorithm of TIES.

Parameters: same as Task Arithmetic, plus:

density - fraction of weights in differences from the base model to retain
gamma - fraction of largest magnitude differences to remove

Note that gamma corresponds with the parameter β described in the paper, while density is the final density of the sparsified tensors (related to γ and β by density = 1 - γ - β). For good default values, try density: 0.9 and gamma: 0.01.

Model Stock

Uses some neat geometric properties of fine tuned models to compute good weights for linear interpolation. Requires at least three models, including a base model.

Parameters:

filter_wise: if true, weight calculation will be per-row rather than per-tensor. Not recommended.

NuSLERP

Spherically interpolate between parameters, but with more options and more sensical configuration! Does not require a base model, but can use one to do spherical interpolation of task vectors. Only works with either two models or two plus a base model.

Parameters:

weight: relative weighting of a given tensor
nuslerp_flatten: set to false to do row-wise/column-wise interpolation instead of treating tensors as vectors
nuslerp_row_wise: SLERP row vectors instead of column vectors

To replicate the behavior of the original slerp method, set weight to 1-t and t for your first and second model respectively.

DELLA

Building upon DARE, DELLA uses adaptive pruning based on parameter magnitudes. DELLA first ranks parameters in each row of delta parameters and assigns drop probabilities inversely proportional to their magnitudes. This allows it to retain more important changes while reducing interference. After pruning, it rescales the remaining parameters similar to DARE. DELLA can be used with (della) or without (della_linear) the sign elect step of TIES

Parameters: same as Task Arithmetic, plus:

density - fraction of weights in differences from the base model to retain
epsilon - maximum change in drop probability based on magnitude. Drop probabilities assigned will range from density - epsilon to density + epsilon. (When selecting values for density and epsilon, ensure that the range of probabilities falls within 0 to 1)

SCE

SCE introduces adaptive matrix-level merging weights based on parameter variances. SCE first selects the top-k% elements from each parameter matrix that exhibit high variance across all delta parameters. Following this selection, SCE calculates matrix-level merging weights based on the sum of squares of elements in the delta parameters. Finally, it erases minority elements, a step similar to the sign election process in TIES.

Parameters: same as TIES, plus:

select_topk - fraction of elements with the highest variance in the delta parameters to retain.

LoRA extraction

Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.

Usage

mergekit-extract-lora --model finetuned_model_id_or_path --base-model base_model_id_or_path --out-path output_path [--no-lazy-unpickle] [--cuda] [--max-rank=desired_rank] [--sv-epsilon=tol]

Mixture of Experts merging

The mergekit-moe script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see the mergekit-moe documentation.

Evolutionary merge methods

See docs/evolve.md for details.

✨ Merge in the Cloud ✨

We host merging on Arcee's cloud GPUs - you can launch a cloud merge in the Arcee App. Or through python - grab an ARCEE_API_KEY:

export ARCEE_API_KEY=<your-api-key> pip install -q arcee-py

import arcee
arcee.merge_yaml("bio-merge","./examples/bio-merge.yml")

Check your merge status at the Arcee App

When complete, either deploy your merge:

arcee.start_deployment("bio-merge", merging="bio-merge")

Or download your merge:

!arcee merging download bio-merge

Citation

If you find mergekit useful in your research, please consider citing the paper:

@inproceedings{goddard-etal-2024-arcees,
    title = "Arcee{'}s {M}erge{K}it: A Toolkit for Merging Large Language Models",
    author = "Goddard, Charles  and
      Siriwardhana, Shamane  and
      Ehghaghi, Malikeh  and
      Meyers, Luke  and
      Karpukhin, Vladimir  and
      Benedict, Brian  and
      McQuade, Mark  and
      Solawetz, Jacob",
    editor = "Dernoncourt, Franck  and
      Preo{\c{t}}iuc-Pietro, Daniel  and
      Shimorina, Anastasia",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    month = nov,
    year = "2024",
    address = "Miami, Florida, US",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-industry.36",
    doi = "10.18653/v1/2024.emnlp-industry.36",
    pages = "477--485",
    abstract = "The rapid growth of open-source language models provides the opportunity to merge model checkpoints, combining their parameters to improve performance and versatility. Advances in transfer learning have led to numerous task-specific models, which model merging can integrate into powerful multitask models without additional training. MergeKit is an open-source library designed to support this process with an efficient and extensible framework suitable for any hardware. It has facilitated the merging of thousands of models, contributing to some of the world{'}s most powerful open-source model checkpoints. The library is accessible at: https://github.com/arcee-ai/mergekit.",
}

For Tasks:

Click tags to check more tools for each tasks

merge language models fuse models interpolate models average models ensemble models

For Jobs:

language model merging model fusion model interpolation model averaging model ensemble

Alternative AI tools for mergekit

Similar Open Source Tools

mergekit

github

: 5.5k

mflux

MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.

github

: 1.3k

datadreamer

DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

github

: 77

garak

Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

github

: 1.3k

chatgpt-cli

ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.

github

: 661

detoxify

Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.

github

: 980

garak

Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.

github

: 4.2k

paxml

Pax is a framework to configure and run machine learning experiments on top of Jax.

github

: 448

py-vectara-agentic

The `vectara-agentic` Python library is designed for developing powerful AI assistants using Vectara and Agentic-RAG. It supports various agent types, includes pre-built tools for domains like finance and legal, and enables easy creation of custom AI assistants and agents. The library provides tools for summarizing text, rephrasing text, legal tasks like summarizing legal text and critiquing as a judge, financial tasks like analyzing balance sheets and income statements, and database tools for inspecting and querying databases. It also supports observability via LlamaIndex and Arize Phoenix integration.

github

: 98

hordelib

horde-engine is a wrapper around ComfyUI designed to run inference pipelines visually designed in the ComfyUI GUI. It enables users to design inference pipelines in ComfyUI and then call them programmatically, maintaining compatibility with the existing horde implementation. The library provides features for processing Horde payloads, initializing the library, downloading and validating models, and generating images based on input data. It also includes custom nodes for preprocessing and tasks such as face restoration and QR code generation. The project depends on various open source projects and bundles some dependencies within the library itself. Users can design ComfyUI pipelines, convert them to the backend format, and run them using the run_image_pipeline() method in hordelib.comfy.Comfy(). The project is actively developed and tested using git, tox, and a specific model directory structure.

github

: 56

Hurley-AI

Hurley AI is a next-gen framework for developing intelligent agents through Retrieval-Augmented Generation. It enables easy creation of custom AI assistants and agents, supports various agent types, and includes pre-built tools for domains like finance and legal. Hurley AI integrates with LLM inference services and provides observability with Arize Phoenix. Users can create Hurley RAG tools with a single line of code and customize agents with specific instructions. The tool also offers various helper functions to connect with Hurley RAG and search tools, along with pre-built tools for tasks like summarizing text, rephrasing text, understanding memecoins, and querying databases.

github

: 175

ice-score

ICE-Score is a tool designed to instruct large language models to evaluate code. It provides a minimum viable product (MVP) for evaluating generated code snippets using inputs such as problem, output, task, aspect, and model. Users can also evaluate with reference code and enable zero-shot chain-of-thought evaluation. The tool is built on codegen-metrics and code-bert-score repositories and includes datasets like CoNaLa and HumanEval. ICE-Score has been accepted to EACL 2024.

github

: 62

LeanCopilot

Lean Copilot is a tool that enables the use of large language models (LLMs) in Lean for proof automation. It provides features such as suggesting tactics/premises, searching for proofs, and running inference of LLMs. Users can utilize built-in models from LeanDojo or bring their own models to run locally or on the cloud. The tool supports platforms like Linux, macOS, and Windows WSL, with optional CUDA and cuDNN for GPU acceleration. Advanced users can customize behavior using Tactic APIs and Model APIs. Lean Copilot also allows users to bring their own models through ExternalGenerator or ExternalEncoder. The tool comes with caveats such as occasional crashes and issues with premise selection and proof search. Users can get in touch through GitHub Discussions for questions, bug reports, feature requests, and suggestions. The tool is designed to enhance theorem proving in Lean using LLMs.

github

: 1.0k

upgini

Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

github

: 330

duckdb-airport-extension

The 'duckdb-airport-extension' is a tool that enables the use of Arrow Flight with DuckDB. It provides functions to list available Arrow Flights at a specific endpoint and to retrieve the contents of an Arrow Flight. The extension also supports creating secrets for authentication purposes. It includes features for serializing filters and optimizing projections to enhance data transmission efficiency. The tool is built on top of gRPC and the Arrow IPC format, offering high-performance data services for data processing and retrieval.

github

: 170

allms

allms is a versatile and powerful library designed to streamline the process of querying Large Language Models (LLMs). Developed by Allegro engineers, it simplifies working with LLM applications by providing a user-friendly interface, asynchronous querying, automatic retrying mechanism, error handling, and output parsing. It supports various LLM families hosted on different platforms like OpenAI, Google, Azure, and GCP. The library offers features for configuring endpoint credentials, batch querying with symbolic variables, and forcing structured output format. It also provides documentation, quickstart guides, and instructions for local development, testing, updating documentation, and making new releases.

github

: 82

For similar tasks

mergekit

github

: 5.5k

For similar jobs

mergekit

github

: 5.5k

mergekit

README:

mergekit

Contents

Why Merge Models?

Features

Installation

Usage

Uploading to Huggingface

Merge Configuration

Parameter Specification

Tokenizer Configuration

Modern Configuration (tokenizer)

Tokenizer Source

Token Embedding Handling

Practical Example

Legacy Configuration (tokenizer_source)

Chat Template Configuration

Examples

Merge Methods

Linear

SLERP

Nearswap

Task Arithmetic

TIES

DARE

Passthrough

Model Breadcrumbs

Model Stock

NuSLERP

DELLA

SCE

LoRA extraction

Usage

Mixture of Experts merging

Evolutionary merge methods

✨ Merge in the Cloud ✨

Citation

For Tasks:

For Jobs:

Alternative AI tools for mergekit

Similar Open Source Tools

mergekit

mflux

datadreamer

garak

chatgpt-cli

detoxify

garak

paxml

py-vectara-agentic

hordelib

Hurley-AI

ice-score

LeanCopilot

upgini

duckdb-airport-extension

allms

For similar tasks

mergekit

For similar jobs

mergekit