Awesome-Model-Merging-Methods-Theories-Applications

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.

Stars: 347

Visit

A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

README:

Awesome-Model-Merging-Methods-Theories-Applications

A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. Arxiv, 2024.'.

[!IMPORTANT] Contributions welcome:

If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us! Or, you may also consider submitting 'Pull requests' directly, thank you!

If you think your paper is more suitable for another category, please contact us or submit 'Pull requests'. If your paper is accepted, you may consider updating the relevant information. Thank you!

💥 News 💥

🔥🔥🔥 We marked the papers that used model size $\geq$ 7B in experiments.

Abstract

Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.

Citation

If you find our paper or this resource helpful, please consider cite:

@article{Survery_ModelMerging_2024,
  title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
  author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
  journal={arXiv preprint arXiv:2408.07666},
  year={2024}
}

Thanks!

Framework

Awesome-Model-Merging-Methods-Theories-Applications

Survey

Paper Title	Year	Conference/Journal
From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches	2025	Arxiv
SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques	2024	Arxiv
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities	2024	Arxiv
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning	2024	Arxiv
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models	2024	Arxiv
Learn From Model Beyond Fine-Tuning: A Survey	2023	Arxiv
Deep Model Fusion: A Survey	2023	Arxiv

Benchmark/Evaluation

Paper Title	Year	Conference/Journal	Remark
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging	2025	Arxiv	Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2
How to Merge Your Multimodal Models Over Time?	2024	Arxiv
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning	2024	Arxiv	Aya 23 8B
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models	2024	Arxiv	LLaMA3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct-v0.3,
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild	2024	NeurIPS Track on Datasets and Benchmarks	Synthia-7B-v1.2, Llama-2-7b-evolcodealpaca, OpenHermes-7B, pygmalion-2-7b, Llama-2-7b-chat-hf, BeingWell_llama2_7b, MetaMath-7B-V1.0, vicuna-7b-v1.5, Platypus2-7B, GOAT-7B-Community, Llama-2-7b-WikiChat-fused, dolphin-llama2-7b, MetaMath-Llemma-7B, CodeLlama-7b-Instruct-hf, Magicoder-S-CL-7B , CrystalChat
What Matters for Model Merging at Scale?	2024	Arxiv	PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
Realistic Evaluation of Model Merging for Compositional Generalization	2024	Arxiv
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities	2024	Arxiv	Llama-3.1-8B, Mistral-7B-v0.3
FusionBench: A Comprehensive Benchmark of Deep Model Fusion	2024	Arxiv
Arcee's MergeKit: A Toolkit for Merging Large Language Models	2024	Arxiv	Llama2-7B-Chat, Meditron-7B

Advanced Methods

Pre-Merging Methods

Linearization Fine-tuning

Paper Title	Year	Conference/Journal
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic	2025	ICLR
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic	2024	Arxiv
Tangent Transformers for Composition,Privacy and Removal	2024	ICLR
Parameter Efficient Multi-task Model Fusion with Partial Linearization	2024	ICLR
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models	2023	NeurIPS

Sparse Fine-tuning

Paper Title	Year	Conference/Journal	Remark
Efficient Model Editing with Task-Localized Sparse Fine-tuning	2024

Architecture Transformation

Paper Title	Year	Conference/Journal	Remark
Model Assembly Learning with Heterogeneous Layer Weight Merging	2025	ICLR Workshop
Training-free Heterogeneous Model Merging	2025	Arxiv
Knowledge fusion of large language models	2024	ICLR	Llama-2 7B, OpenLLaMA 7B, MPT 7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	2024	Arxiv	NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks	2023	ICASSP
GAN Cocktail: mixing GANs without dataset access	2022	ECCV

Weight Alignment

Paper Title	Year	Conference/Journal	Remark
Model Assembly Learning with Heterogeneous Layer Weight Merging	2025	ICLR Workshop
Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms	2025	Arxiv	Llama-2-7b
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion	2025	Arxiv
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse	2024	Arxiv
Equivariant Deep Weight Space Alignment	2024	ICML
Harmony in diversity: Merging neural networks with canonical correlation analysis	2024	ICML
Transformer fusion with optimal transport	2024	ICLR
Layerwise linear mode connectivity	2024	ICLR
ZipIt! Merging Models from Different Tasks without Training	2024	ICLR
Proving linear mode connectivity of neural networks via optimal transport	2024	AISTATS
Training-Free Pretrained Model Merging	2024	CVPR
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	2024	Arxiv	Llama2-7b, Llama2-13b
C2M3: Cycle-Consistent Multi Model Merging	2024	NeurIPS
PLeaS--Merging Models with Permutations and Least Squares	2024	Arxiv
Rethink Model Re-Basin and the Linear Mode Connectivity	2024	Arxiv
Git Re-Basin: Merging Models modulo Permutation Symmetries	2023	ICLR
Re-basin via implicit Sinkhorn differentiation	2023	CVPR
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks	2023	ICLR
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization	2023	ICLR
REPAIR: REnormalizing Permuted Activations for Interpolation Repair	2023	ICLR
Going beyond linear mode connectivity: The layerwise linear feature connectivity	2023	NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks	2022	ICLR
What can linear interpolation of neural network loss landscapes tell us?	2022	ICML
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling	2021	ICML
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes	2021	ICML
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances	2021	ICML
Linear Mode Connectivity and the Lottery Ticket Hypothesis	2020	ICML
Optimizing mode connectivity via neuron alignment	2020	NeurIPS
Model fusion via optimal transport	2020	NeurIPS
Uniform convergence may be unable to explain generalization in deep learning	2019	NeurIPS
Explaining landscape connectivity of low-cost solutions for multilayer nets	2019	NeurIPS
Essentially no barriers in neural network energy landscape	2018	ICML
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs	2018	NeurIPS

During Merging Methods

Basic Merging Methods

Paper Title	Year	Conference/Journal
Composing parameter-efficient modules with arithmetic operation	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR
Model fusion via optimal transport	2020	NeurIPS
Weight averaging for neural networks and local resampling schemes	1996	AAAI Workshop
Acceleration of stochastic approximation by averaging	1992	IAM Journal on Control and Optimization
Animating rotation with quaternion curves (Spherical Linear Interpolation (SLERP) Model Merging)	1985	SIGGRAPH Computer Graphics

Weighted-based Merging Methods

Paper Title	Year	Conference/Journal	Remark
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	2025	Arxiv	Gemma-2-9B, Llama-3-8B
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models	2025	Arxiv	LLaMA-2 7B series, Mistral 7B series, LLaMA-2 13B series
Non-Uniform Parameter-Wise Model Merging	2024	Arxiv
How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging	2024	Arxiv
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging	2024	Arxiv
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation	2024	Arxiv	shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling	2024	Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic	2024	EMNLP	LLaMA-2-7B, Mistral-7B, LLaMA-2-13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining	2024	Arxiv	Baichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B
Arcee’s MergeKit: A Toolkit for Merging Large Language Models	2024	Arxiv	Llama2-7B-Chat, Meditron-7B
Evolutionary optimization of model merging recipes	2024	Arxiv	shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	2024	ACL
AdaMerging: Adaptive Model Merging for Multi-Task Learning	2024	ICLR
Model Merging by Uncertainty-Based Gradient Matching	2024	ICLR
Merging by Matching Models in Task Subspaces	2024	TMLR
Fisher Mask Nodes for Language Model Merging	2024	LREC-COLING
Erasure Coded Neural Network Inference via Fisher Averaging	2024	ISIT
Dataless Knowledge Fusion by Merging Weights of Language Models	2023	ICLR
Merging models with fisher-weighted averaging	2022	NeurIPS

Subspace-based Merging Method (Sparse or Low-rank Subspace)

Paper Title	Year	Conference/Journal	Remark
Task Vector Quantization for Memory-Efficient Model Merging	2025	Arxiv
Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms	2025	Arxiv	Llama-2-7b
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts		2025	ICLR 2025 Workshop
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach	2025	ICLR 2025 Workshop	Gemma-9b, LLaMA 3.1 8b
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging	2025	Arxiv	Mistral-7b-v0.1, WildMarcoroni-Variant1-7B and WestSeverus-7B-DPO-v2
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation	2025	Arxiv
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint	2025	Arxiv	Llama-3- 8B, Mistral-7B, and Llama2-13B
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation	2025	Arxiv
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging	2025	Arxiv	Llama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca
Superpose Singular Features for Model Merging	2025	Arxiv	Llama-2-7B
STAR: Spectral Truncation and Rescale for Model Merging	2025	NAACL	Mistral-7B-Instruct
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces	2025	Arxiv
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging	2025	Arxiv
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent	2025	Arxiv
Revisiting Weight Averaging for Model Merging	2024	Arxiv
Task Singular Vectors: Reducing Task Interference in Model Merging	2024	Arxiv
Less is More: Efficient Model Merging with Binary Task Switch	2024	Arxiv
FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts	2024	Arxiv	Qwen-14B (LoRA), LLaMa2-13B, WizardLM-13B, WizardMath-13B, WizardCoderPython-13B
Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics	2024	Arxiv
Parameter Competition Balancing for Model Merging	2024	NeurIPS	Llama-2-7b
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2024	ICML	WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Localizing Task Information for Improved Model Merging and Compression	2024	ICML
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging	2024	ICLR
Model merging with svd to tie the knots	2024	Arxiv	Llama3-8B
NegMerge: Consensual Weight Negation for Strong Machine Unlearning	2024	Arxiv
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic	2024	Arxiv
Activated Parameter Locating via Causal Intervention for Model Merging	2024	Arxiv	Llama-2-chat-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning	2024	Arxiv	Mistral-7B-v0.1, Llama-3-8B, Neurotic-7B, MoMo-70B
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	2024	Arxiv	Llama-2-13b-code-alpaca, WizardLM, Wizard-Math, WizardCoder-Python
EMR-Merging: Tuning-Free High-Performance Model Merging	2024	NeurIPS
DPPA: Pruning Method for Large Language Model to Model Merging	2024	Arxiv	LLaMa 2
Model breadcrumbs: Scaling multi-task model merging with sparse masks	2023	Arxiv
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion	2023	Arxiv
Effective and ParameterEfficient Reusing Fine-Tuned Models	2023	Openreview
Resolving Interference When Merging Models	2023	NeurIPS
Task-Specific Skill Localization in Fine-tuned Language Model	2023	ICML

Routing-based Merging Methods (Dynamic Merging)

Paper Title	Year	Conference/Journal	Remark
Dynamic Model Merging with Mixture of Weights	2025	TCSVT
CAMEx: Curvature-aware Merging of Experts	2025	ICLR
1bit-Merging: Dynamic Quantized Merging for Large Language Models	2025	Arxiv	LLaMA-2 7B, Mistral 7B, and LLaMA-2 13B
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing	2025	Arxiv	Qwen-2.5-7B, LLaMA-3.2-8B
Adapting Foundation Models via Training-free Dynamic Weight Interpolation	2024	NeurIPS 2024 Workshop
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging	2024	Arxiv
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	2024	ICML
Learning to Route Among Specialized Experts for Zero-Shot Generalization	2024	ICML
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy	2024	ICLR
Soft merging of experts with adaptive routing	2024	TMLR
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	2024	Arxiv	Mistral-7B-v0.1, MetaMath-Mistral-7B, dolphin-2.1-mistral-7b, speechless-code-mistral-7b-v1.0
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging	2024	NeurIPS	Qwen-14B
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts	2024	Arxiv	Gemma-7B, LLaMA-2 7B & 13B, Mistral 7B, LLaMA-3 8B
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	2024	Arxiv
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints	2023	ICLR

Post-calibration based Methods

Paper Title	Year	Conference/Journal
Multi-Task Model Fusion via Adaptive Merging	2025	ICASSP
Tint Your Models Task-wise for Improved Multi-task Model Merging	2024	Arxiv
Parameter-Efficient Interventions for Enhanced Model Merging	2024	Arxiv
Rethink the Evaluation Protocol of Model Merging on Classification Task	2024	Arxiv
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery	2024	Arxiv
Representation Surgery for Multi-Task Model Merging	2024	ICML

Other Merging Methods

Paper Title	Year	Conference/Journal	Remark
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization	2025	Arxiv	LLaMA2-7B
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors	2025	Arxiv	WizardLM-13B (LM), WizardMath-13B (Math), and llama-2-13bcodealpaca (Code)
GNNMERGE: Merging of GNN Models Without Accessing Training Data	2025	Arxiv
MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs	2025	Arxiv	Mistral-7B
Activation-Informed Merging of Large Language Models	2025	Arxiv	Llama-2-13b, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
Scalable Model Merging with Progressive Layer-wise Distillation	2025	Arxiv	WizardLM-13B, WizardMath-13B and llama-2-13b-code-alpaca
Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging	2025	Arxiv	Llama-2-13, WizardLM13B, WizardMath-13, llama-2-13b-code-alpaca
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts	2025	ICLR
Multi-Task Model Merging via Adaptive Weight Disentanglement	2024	Arxiv
Rethinking Weight-Averaged Model-merging	2024	Arxiv
ATM: Improving Model Merging by Alternating Tuning and Merging	2024	Arxiv
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models	2024	Arxiv	Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging	2024	Arxiv
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization	2024	Arxiv	Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling	2023	Arxiv	SOLAR 10.7B, SOLAR 10.7B-Instruct

Theories and Analysis of Model Merging

Paper Title	Year	Conference/Journal	Remark
Multi-Level Collaboration in Model Merging	2025	Arxiv
Low-rank bias, weight decay, and model merging in neural networks	2025	Arxiv
Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression	2025	Arxiv
SeWA: Selective Weight Average via Probabilistic Masking	2025	Arxiv
Efficient Model Editing with Task Vector Bases: A Theoretical Framework and Scalable Approach	2025	Arxiv
Task Arithmetic Through The Lens Of One-Shot Federated Learning	2024	Arxiv	WizardLM-13B, WizardMath-13B, Llama-2-13B-Code-Alpaca, Llama2-13B
A Unified Analysis for Finite Weight Averaging	2024	Arxiv
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average	2024	Arxiv
On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm	2024	ICML
Diverse weight averaging for out-of-distribution generalization	2022	NeurIPS
Ensemble of averages: Improving model selection and boosting performance in domain generalization	2022	NeurIPS
Stability analysis and generalization bounds of adversarial training	2022	NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks	2022	ICLR
Swad: Domain generalization by seeking flat minima	2021	NeurIPS
Linear Mode Connectivity and the Lottery Ticket Hypothesis	2020	ICML
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes	2020	ICLR
Optimizing mode connectivity via neuron alignment	2020	NeurIPS
Uniform convergence may be unable to explain generalization in deep learning	2019	NeurIPS
Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification	2018	JMLR
Iterate averaging as regularization for stochastic gradient descent	2018	Arxiv
Essentially no barriers in neural network energy landscape	2018	ICML
Averaging weights leads to wider optima and better generalization	2018	UAI
Train faster, generalize better: Stability of stochastic gradient descent	2016	ICML

Application of Model Merging in Foundation Models

Model Merging in Large Language Model

Human Preference Alignment for LLMs

Paper Title	Year	Conference/Journal	Remark
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation	2025	Arxiv	LLaMA-2 7B
Model soup for better rlhf: Weight space averaging to improve alignment in llms	2024	NeurIPS 2024 Workshop	Llama2-7B, Mistral-7B, Gemma-2B
Weighted-reward preference optimization for implicit model fusion	2024	Arxiv	LLaMA3-8B-Instruct
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation	2024	Arxiv
H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	2024	Arxiv	LLaMA-2 7B
Baichuan Alignment Technical Report	2024	Arxiv	Qwen2-Nova-72B, Llama3-PBM-Nova-70B
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning	2024	Arxiv
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging	2024	Arxiv	MetaMath-7B, MAmmoTH-7B, LLaMA2-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning	2024	Arxiv	Mistral-7B-v0.1, Llama-3-8B
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch	2024	Arxiv	Mistral-0.2-7B-Instruct, LLaMA-3-8B-Instruct, OpenBioLLM-8B, MAmmoTH2-7B, WizardMath-1.1-7B
Towards Comprehensive Post Safety Alignment of Large Language Models via Safety Patching	2024	Arxiv	LLaMA-2-7B-Chat, LLaMA-3-8B-Instruct, Mistral7B-Instruct-v0.1 and Gemma1.1-7B-it
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction	2024	Arxiv	Llama-2-7b
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment	2024	Arxiv	Qwen1.5-7B, LLaMa3-8B
A safety realignment framework via subspace-oriented model fusion for large language models	2024	Arxiv	WizardLM-7B
Weak-to-strong extrapolation expedites alignment	2024	Arxiv	zephyr-7b, starling-7b, snorkel-7b, llama3-8b, internlm2-7b, internlm2-20b, tulu-2-dpo-7b, tulu-2-dpo-13b, tulu-2-dpo-70b
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	2024	Arxiv	Llama-2-7BChat
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards	2023	NeurIPS	LLaMA-7b
Personalized soups: Personalized large language model alignment via post-hoc parameter merging	2023	Arxiv	Tulu-7B LM

Detoxification of LLMs

Paper Title	Year	Conference/Journal	Remark
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach	2024	Arxiv
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation	2024	AAAI	LLaMA-7B
Mitigating Social Biases in Language Models through Unlearning	2024	Arxiv	LLaMA-2 7B
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models	2024	Arxiv	Llama-2-7B, Llama-2-chat-7B, Vicuna-7B, Llama-2-13B
Composing Parameter-Efficient Modules with Arithmetic Operation	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation	2023	Arxiv

Knowledge Unlearning of LLMs

Paper Title	Year	Conference/Journal	Remark
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging	2025	Arxiv	OLMo-7B-0724-Instruct
Exact Unlearning of Finetuning Data via Model Merging at Scale	2025	ICLR 2025 Workshop MCDC
NegMerge: Consensual Weight Negation for Strong Machine Unlearning	2024	Arxiv
Towards Safer Large Language Models through Machine Unlearning	2024	ACL	LLAMA2-7B, LLAMA2-13B
Editing models with task arithmetic	2023	ICLR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model	2023	Arxiv	LLAMA2-7B, LLAMA-7B, BLOOM-7B
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion	2023	Arxiv

Faster Training of LLMs

Paper Title	Year	Conference/Journal	Remark
DEM: Distribution Edited Model for Training with Mixed Data Distributions	2024	Arxiv	OpenLLaMA 7B and 13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining	2024	Arxiv	Baichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning	2023	ACL
Early Weight Averaging meets High Learning Rates for LLM Pre-training	2023	NeurIPS Workshop
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging	2022	NeurIPS Workshop
Fusing finetuned models for better pretraining	2022	Arxiv

Combine the Capabilities of Expert LLMs

Paper Title	Year	Conference/Journal	Remark
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging	2025	Arxiv	Qwen2.5-32B, DeepSeek-R1-32B
Extrapolation Merging: Keep Improving With Extrapolation and Merging	2025	Arxiv	Qwen2-7B, Meta-Llama-3-8B, Mistral-Nemo-Base-2407-12B, Qwen1.5-14B
Superficial Self-Improved Reasoners Benefit from Model Merging	2025	Arxiv	Llama2-7B
Nature-Inspired Population-Based Evolution of Large Language Models	2025	Arxiv
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	2025	Arxiv	Gemma-2-9B, Llama-3-8B
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation	2025	Arxiv	WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging	2025	Arxiv	NuminaMath-7B, DeepSeek-Math-7B-Base, LLaMA-series models, WizardMath-13B
Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition	2025	Arxiv	ContactDoctor-8B
Transferring Textual Preferences to Vision-Language Understanding through Model Merging	2025	Arxiv	Llama-3.2-11B-Vision -Instruct, Llama-3.1-Tulu-2-8B-uf-mean-rm, Llama-3.1-Tulu-3-8B-RM
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging	2025	Arxiv	Llama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging	2025	Arxiv	Typhoon2 70B Instruct, DeepSeek R1 70B Distill, Llama 3.1 70B, Llama 3.3 70B
Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging	2025	Arxiv	WizardLM-13B, WizardMath-13B, and llama-2-13b-code-alpaca
Skill Expansion and Composition in Parameter Space	2025	Arxiv
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion	2025	Arxiv	Qwen2.5-Coder-14B-Instruct, Qwen2.5-14B-Instruct, and Mistral-Small-24B-Instruct-2501
Channel Merging: Preserving Specialization for Merged Experts	2025	AAAI	Dolphin-2.2.1-Mistral-7B, Speechless-Code-Mistral-7B, MetaMathMistral-7B, Chinese-Mistral-7BInstruct-v0.1
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion	2024	Arxiv	MiniGemini-8B and SLIME-8B
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents	2024	Arxiv	Llama3.1-8B
JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging	2024	Arxiv	Bunny-v1_1-Llama-3-8B-V, MMed-Llama-3-8B-EnIns, OpenBioLLM-Llama3-8B, Llama-3-Swallow-8B-Instruct-v0.1
If You Can’t Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs	2024	Arxiv	Command R+ 104B
Agent Skill Acquisition for Large Language Models via CycleQD	2024	Arxiv	Llama3-8B-Instruct
Collaboratively adding new knowledge to an LLM	2024	Arxiv	Meta-Llama-3-8B
Unconstrained Model Merging for Enhanced LLM Reasoning	2024	Arxiv	CodeLlama-7B-Ins, CodeLlama-70B-Ins, Deepseek-Coder-Ins-v1.5, Qwen2.5-Math-7B-Ins, WizardMath-7B-V1.1, OpenMath-Mistral 7B, MetaMath-7B, MetaMath-70B
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks	2024	Arxiv	Llama-7b, Llama2-7b-chat
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging	2024	Arxiv	Llama 2 7B
Exploring Model Kinship for Merging Large Language Models	2024	Arxiv	Mistral-7B, Mistral-7b-instruct-v0.2, MetaMath-mistral-7b, Open-chat-3.5-1210
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation	2024	Arxiv	shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models	2024	Arxiv	LLAMA 3.1 8B
What Matters for Model Merging at Scale?	2024	Arxiv	PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models	2024	Arxiv	Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	2024	Arxiv	CodeLlama 7B
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization	2024	Arxiv	Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
Knowledge Fusion By Evolving Weights of Language Models	2024	ACL
LLM Merging: Building LLMs Efficiently through Merging	2024	NeurIPS 2024 Competition Track	LLaMA-7B, Mistral-7B, Gemma-7B
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement	2024	Arxiv	Qwen1.5-7B, Qwen1.5-Chat-7B, Sailor-7B, Qwen1.5-14B, Qwen1.5-Chat-14B, Sailor-14B, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization	2024	Arxiv	Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic	2024	Arxiv	LLaMA-2-7B, Mistral-7B, LLaMA-2-13B
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models	2024	Arxiv	Mistral-Instruct-7B, Mixtral-Instruct-8x7B
Knowledge fusion of large language models	2024	ICLR	Llama-2 7B, OpenLLaMA 7B, MPT 7B
Language models are super mario: Absorbing abilities from homologous models as a free lunch	2024	ICML	WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Controlled Text Generation via Language Model Arithmetic	2024	ICML	MPT-7B, Pythia-12B, Llama-2-Chat-13B
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models	2024	Arxiv	LlaMA2-13B and LlaMA3-8B (LoRA)
Evolutionary optimization of model merging recipes	2024	Arxiv	shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	2024	Arxiv	Llama-2 7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	2024	Arxiv	NH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B

Note: The following papers are from: LLM Merging Competition at NeurIPS 2024

Paper Title	Year	Conference/Journal	Models
Llm merging: Building llms efficiently through merging	2024	LLM Merging Competition at NeurIPS	-
Towards an approach combining Knowledge Graphs and Prompt Engineering for Merging Large Language Models	2024	LLM Merging Competition at NeurIPS	meta-llama/Llama-2-7b; microsoft_phi1/2/3
Model Merging using Geometric Median of Task Vectors	2024	LLM Merging Competition at NeurIPS	flan_t5_xl
Interpolated Layer-Wise Merging for NeurIPS 2024 LLM Merging Competition	2024	LLM Merging Competition at NeurIPS	suzume-llama-3-8B-multilingual-orpo-borda-top75, Barcenas-Llama3-8bORPO, Llama-3-8B-Ultra-Instruct-SaltSprinkle, MAmmoTH2-8B-Plus, Daredevil-8B
A Model Merging Method	2024	LLM Merging Competition at NeurIPS	-
Differentiable DARE-TIES for NeurIPS 2024 LLM Merging Competition	2024	LLM Merging Competition at NeurIPS	suzume-llama-3-8B-multilingualorpo-borda-top75, MAmmoTH2-8B-Plus and Llama-3-Refueled
LLM Merging Competition Technical Report: Efficient Model Merging with Strategic Model Selection, Merging, and Hyperparameter Optimization	2024	LLM Merging Competition at NeurIPS	MaziyarPanahi/Llama3-8B-Instruct-v0.8, MaziyarPanahi/Llama-3-8B-Instruct-v0.9, shenzhiwang/Llama3-8B-Chinese-Chat, lightblue/suzume-llama-3-8B-multilingual
Simple Llama Merge: What Kind of LLM Do We Need?	2024	LLM Merging Competition at NeurIPS	Hermes-2-Pro-Llama-3-8B, and Daredevil-8B
LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging	2024	LLM Merging Competition at NeurIPS	Mistral-7B-Instruct94 v2, Llama3-8B-Instruct, Flan-T5-large, Gemma-7B-Instruct, and WizardLM-2-7B
MoD: A Distribution-Based Approach for Merging Large Language Models	2024	LLM Merging Competition at NeurIPS	Qwen2.5-1.5B and Qwen2.5-7B

Model Merging in Multimodal Large Language Models

Model Merging for Multimodal Fusion

Paper Title	Year	Conference/Journal	Remark
Jointly training large autoregressive multimodal models	2024	ICLR
Model Composition for Multimodal Large Language Models	2024	ACL	Vicuna-7B-v1.5
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation	2023	ICML
An Empirical Study of Multimodal Model Merging	2023	EMNLP
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks	2023	TMLR

Model Merging for Cross-Modal Knowledge Transfer

Paper Title	Year	Conference/Journal	Remark
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification	2024	ICASSP Workshop

Model Merging in Image Generative Models

Style Mixing in Generative Models

Paper Title	Year	Conference/Journal	Remark
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation	2024	Arxiv	LLaVA-Critic 7b
IterIS: Iterative Inference-Solving Alignment for LoRA Merging	2024	Arxiv
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models	2024	ECCV
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models	2024	Arxiv
MoLE: Mixture of LoRA Experts	2024	ICLR
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models	2024	Arxiv
Multi-LoRA Composition for Image Generation	2024	Arxiv
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models	2023	NeurIPS
Merging loras	2023	(github)
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs	2023	Arxiv
GAN Cocktail: mixing GANs without dataset access	2022	ECCV

Reducing Training Cost of Generative Models

Paper Title	Year	Conference/Journal	Remark
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better	2024	Arxiv
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA	2024	Arxiv

Enhancing the Faithfulness (or Generation Quality) of Diffusion Models

Paper Title	Year	Conference/Journal	Remark
Decouple-Then-Merge: Towards Better Training for Diffusion Models	2024	Arxiv
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data	2024	Arxiv

Model Merging in Video Generative Models

Enhancing Motion Modeling

Paper Title	Year	Conference/Journal	Remark
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think	2025	CVPR	Dynamicrafter，SVD

Application of Model Merging in Different Machine Learning Subfields

Model Merging in Continual Learning

Model Merging to Mitigate Catastrophic Forgetting

Paper Title	Year	Conference/Journal	Remark
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs	2025	Arxiv	Llama-3-8B-Instruct
Cost-Efficient Continual Learning with Sufficient Exemplar Memory	2025	Arxiv
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging	2025	Arxiv
Soup to go: mitigating forgetting during continual learning with model averaging	2025	Arxiv	Llama 2 (7B)
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning	2024	Arxiv
Parameter Averaging is All You Need to Prevent Forgetting	2024	SLT Workshop
DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning	2024	Arxiv
Adaptive LoRA Merging for Efficient Domain Incremental Learning	2024	NeurIPS Workshop
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging	2024	Arxiv
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models	2024	ICML	InstructBLIP (Vicuna-7B), LLaVA-1.5 (Vicuna7B)
Adaptive Discovering and Merging for Incremental Novel Class Discovery	2024	AAAI
MagMax: Leveraging Model Merging for Seamless Continual Learning	2024	ECCV
Lm-cocktail: Resilient tuning of language models via model merging	2024	ACL Findings	Llama-2-chat-7b
Backward Compatibility During Data Updates by Weight Interpolation	2024	EACL
Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models	2024	EMNLP Findings
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging	2024	Arxiv	MISTRAL-7B, LLAMA-3-8B
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation	2024	Arxiv	Llama3-70B
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs	2024	Arxiv	Mistral-7B, Llama-3-8B
WARP: On the Benefits of Weight Averaged Rewarded Policies	2024	Arxiv	Gemma-7B
A Second-Order perspective on Compositionality and Incremental Learning	2024	Arxiv
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images	2024	Arxiv
DAM: Dynamic Adapter Merging for Continual Video QA Learning	2024	Arxiv
Task-Specific Skill Localization in Fine-tuned Language Model	2023	ICML
Tangent model composition for ensembling and continual fine-tuning	2023	ICCV
A Unified Continual Learning Framework with General Parameter-Efficient Tuning	2023	ICCV
Task Arithmetic with LoRA for Continual Learning	2023	NeurIPS Workshop
Mitigating the Alignment Tax of RLHF	2023	Arxiv	Mistral-7B
PAINT: Patching open-vocabulary models by interpolating weights	2022	NeurIPS
Robust fine-tuning of zero-shot models	2022	CVPR

Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning

Model Merging for Knowledge Transfer in Multi-Task Learning

Paper Title	Year	Conference/Journal	Remark
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging	2024	Arxiv
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging	2024	Arxiv
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning	2024	Arxiv	Aya 23 8B
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks	2024	Arxiv
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer	2024	Arxiv
Evolutionary optimization of model merging recipes	2024	Arxiv	shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2024	ICML	WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Representation Surgery for Multi-Task Model Merging	2024	ICML
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	2024	ICML
ZipIt! Merging Models from Different Tasks without Training	2024	ICLR
AdaMerging: Adaptive Model Merging for Multi-Task Learning	2024	ICLR
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies	2023	Arxiv
Resolving Interference When Merging Models	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR

Model Merging for Knowledge Transfer in Multi-Objective Optimization

Paper Title	Year	Conference/Journal	Remark
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation	2025	Arxiv	LLaMA-2 7B
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging	2024	Arxiv
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	2024	Arxiv
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation	2024	Arxiv	Llama3-8B

Model Merging for Knowledge Transfer in Multi-Domain Learning

Paper Title	Year	Conference/Journal	Remark
DEM: Distribution Edited Model for Training with Mixed Data Distributions	2024	Arxiv	OpenLLaMA-7B, OpenLLaMA-13B
Merging Vision Transformers from Different Tasks and Domains	2023	Arxiv

Model Merging for Knowledge Transfer in Auxiliary Learning

Paper Title	Year	Conference/Journal	Remark
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning	2023	NeurIPS

Model Merging in Out-of-Distribution/Domain Generalization

Model Merging for Better Out-of-Distribution Generalization

Paper Title	Year	Conference/Journal	Remark
SeWA: Selective Weight Average via Probabilistic Masking	2025	Arxiv
When, Where and Why to Average Weights?	2025	Arxiv
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation	2024	NeurIPS 2024 Workshop
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging	2024	Arxiv	Llama-2-7b
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models	2024	Arxiv
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging	2024	ICLR
Warm: On the benefits of weight averaged reward models	2024	ICML
Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy	2024	ECCV
Adaptive Stochastic Weight Averaging	2024	JMLR
Population parameter averaging (papa)	2024	TMLR
WARP: On the Benefits of Weight Averaged Rewarded Policies	2024	Arxiv	Mistral 7B, Mixtral 8x7B
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average	2024	Arxiv
Model Stock: All we need is just a few fine-tuned models	2024	Arxiv
Lookaround Optimizer: 𝑘 steps around, 1 step average	2023	NeurIPS
Model ratatouille: Recycling diverse models for out-of-distribution generalization	2023	ICML
Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions	2023	ICLR
Lookaround Optimizer: k steps around, 1 step average	2023	NeurIPS
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models	2023	EACL
Dart: Diversify aggregate-repeat training improves generalization of neural networks	2023	CVPR
When do flat minima optimizers work?	2022	NeurIPS
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	2022	ICML
Diverse weight averaging for out-of-distribution generalization	2022	NeurIPS
Robust fine-tuning of zero-shot models	2022	CVPR
Neural networks with late-phase weights	2021	ICLR
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well	2020	ICLR
SWALP: Stochastic weight averaging in low precision training	2019	ICML
Averaging weights leads to wider optima and better generalization	2018	UAI
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results	2017	NeurIPS

Model Merging for Better Domain Generalization or Domain Adaptation

Paper Title	Year	Conference/Journal	Remark
Realistic Evaluation of Model Merging for Compositional Generalization	2024	Arxiv
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks	2024	Arxiv
Training-Free Model Merging for Multi-target Domain Adaptation	2024	Arxiv
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation	2024	Arxiv	Llama3-70B
Ensemble of averages: Improving model selection and boosting performance in domain generalization	2022	NeurIPS
Swad: Domain generalization by seeking flat minima	2021	NeurIPS

Model Merging in Federated Learning

Model Merging for Local Knowledge Aggregation

Paper Title	Year	Conference/Journal
Many-Task Federated Fine-Tuning via Unified Task Vectors	2025	Arxiv
PrivFusion: Privacy-Preserving Model Fusion via Decentralized Federated Graph Matching	2024	TKDE
Model Trip: Enhancing Privacy and Fairness in Model Fusion Across Multi-Federations for Trustworthy Global Healthcare	2024	ICDE
DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices	2024	NeurIPS
FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion	2024	Arxiv
Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning	2024	Arxiv
Closed-form merging of parameter-efficient modules for Federated Continual Learning	2024	Arxiv
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models	2024	CVPR
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning	2024	AISTATS
lo-fi: distributed fine-tuning without communication	2023	TMLR
Revisiting Weighted Aggregation in Federated Learning with Neural Networks	2023	ICML
Deep neural network fusion via graph matching with applications to model ensemble and federated learning	2022	ICML
Federated Learning with Matched Averaging	2020	ICLR
Tackling the objective inconsistency problem in heterogeneous federated optimization	2020	NeurIPS
Model fusion via optimal transport	2020	NeurIPS
Bayesian nonparametric federated learning of neural networks	2019	ICML
Learning private neural language modeling with attentive aggregation	2019	IJCNN
Communication-Efficient Learning of Deep Networks from Decentralized Data	2017	AISTATS

Model Merging in Zero-shot/Few-shot Learning

Model Merging for Cross-task Generalization in Zero-shot Learning

Paper Title	Year	Conference/Journal	Remark
Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering	2024	Arxiv	Qwen 60x2.7B, Qwen 45x2.7B, Qwen 30x2.7B, Mixtral 8x7B, Mixtral 6x7B, Mixtral 4x7B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models	2024	Arxiv	LLAMA 3.1 8B
Learning to Route Among Specialized Experts for Zero-Shot Generalization	2024	ICML
Towards Modular LLMs by Building and Reusing a Library of LoRAs	2024	ICML	Mistral-7B
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities	2024	ACL	LLaMA-2 13B, Chinese-LLaMA-13B, Chinese-Alpaca-13B, Mistral-7B, llama-2-ko-7b
Unlocking the Potential of Model Merging for Low-Resource Languages	2024	Arxiv	Llama-2-7B
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models	2024	Arxiv
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement	2024	Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models	2024	Arxiv
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging	2024	Arxiv	Llama2-7b
Model Composition for Multimodal Large Language Models	2024	Arxiv	Vicuna-7B-v1.5
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	2023	ICML
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization	2023	Arxiv	Llama-2-7b
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization	2023	Arxiv	PaLM 2-S

Model Merging for Cross-task Generalization in Few-shot Learning

Paper Title	Year	Conference/Journal	Remark
Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs	2025	CVPR
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks	2024	ACL	Llama-2- 7B
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition	2024	COLM	Llama-2-7B, Llama-2-13B
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild	2024	ACL
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?	2024	Arxiv
MerA: Merging pretrained adapters for few-shot learning	2023	Arxiv

Model Merging in Adversarial Learning

Model Merging as an Attack

Paper Title	Year	Conference/Journal	Remark
Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging	2025	Arxiv
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy	2025	Arxiv
LoBAM: LoRA-Based Backdoor Attack on Model Merging	2024	Arxiv
BadMerging: Backdoor Attacks Against Model Merging	2024	CCS
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario	2024	ACL	Llama-2-7B

Model Merging as a Defense or Intellectual Property Protection

Paper Title	Year	Conference/Journal	Remark
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy	2024	Arxiv
Large Language Models Merging for Enhancing the Link Stealing Attack on Graph Neural Networks	2024	Arxiv	Vicuna-7B, Vicuna-13B
Strong Copyright Protection for Language Models via Adaptive Model Fusion	2024	ICML	LLaMa2 7B, StarCoder 7B
Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models	2024	Arxiv
REEF: Representation Encoding Fingerprints for Large Language Models	2024	Arxiv	Evollm-jp-7b, Shisa-gamma-7b-v1, Wizardmath-7b-1.1, Abel-7b-002, Llama-2-7b, Openllama-2-7b, Mpt-7b, Internlm2-chat-20b, Mixtral-8x7b-instruct, Qwen-1.5-chat-72b
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace	2024	Arxiv
MergePrint: Robust Fingerprinting against Merging Large Language Models	2024	Arxiv	LLaMA-2-7B, WizardMath-7B-V1.0, LLaMA-2-7B-CHAT
Avoiding Copyright Infringement via Machine Unlearning	2024	Arxiv	Llama3-8B
Merging Improves Self-Critique Against Jailbreak Attacks	2024	Arxiv	Mistral-7B, Mixtral-8x7B
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging	2024	Arxiv	LLaMA-2-7B, LLaMA-2-7B-CHAT, WizardMath-7B-V1.0
Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge	2024	ACL
Revisiting adapters with adversarial training	2023	ICLR
Seasoning model soups for robustness to adversarial and natural distribution shifts	2023	CVPR

Other Applications

Paper Title	Year	Conference/Journal	Remark
Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos	2025	Arxiv
MedForge: Building Medical Foundation Models Like Open Source Software Development	2025	Arxiv
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging	2024	EMNLP	Llama-2-7b
Is Multiple Object Tracking a Matter of Specialization?	2024	NeurIPS
Tracking Universal Features Through Fine-Tuning and Model Merging	2024	Arxiv
HM3: Heterogeneous Multi-Class Model Merging	2024	Arxiv
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation	2024	Interspeech
Erasure Coded Neural Network Inference via Fisher Averaging	2024	Arxiv
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair	2024	Arxiv
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks	2024	Arxiv	Llama2-7B, Llama2-13B-chat, Mistral-7B-instruct
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization	2024	Arxiv
An Attribute Interpolation Method in Speech Synthesis by Model Merging	2024	Arxiv
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition	2024	Arxiv
MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks	2024	Arxiv
Experts Weights Averaging: A New General Training Scheme for Vision Transformers	2023	Arxiv
One Student Knows All Experts Know: From Sparse to Dense	2022	Arxiv
Meta-Learning PAC-Bayes Priors in Model Averaging	2019	AAAI

Star History

Contact

We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.

If you have a related paper that was not added to the library, please contact us.

Email: [email protected] / [email protected]

For Tasks:

Click tags to check more tools for each tasks

train models merge models optimize weights improve generalization enhance performance

For Jobs:

machine learning engineer research scientist data scientist ai engineer software developer

Alternative AI tools for Awesome-Model-Merging-Methods-Theories-Applications

Similar Open Source Tools

Awesome-Model-Merging-Methods-Theories-Applications

github

: 347

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

Awesome-Tabular-LLMs

This repository is a collection of papers on Tabular Large Language Models (LLMs) specialized for processing tabular data. It includes surveys, models, and applications related to table understanding tasks such as Table Question Answering, Table-to-Text, Text-to-SQL, and more. The repository categorizes the papers based on key ideas and provides insights into the advancements in using LLMs for processing diverse tables and fulfilling various tabular tasks based on natural language instructions.

github

: 151

Awesome-LLM-Constrained-Decoding

Awesome-LLM-Constrained-Decoding is a curated list of papers, code, and resources related to constrained decoding of Large Language Models (LLMs). The repository aims to facilitate reliable, controllable, and efficient generation with LLMs by providing a comprehensive collection of materials in this domain.

github

: 180

speech-trident

Speech Trident is a repository focusing on speech/audio large language models, covering representation learning, neural codec, and language models. It explores speech representation models, speech neural codec models, and speech large language models. The repository includes contributions from various researchers and provides a comprehensive list of speech/audio language models, representation models, and codec models.

github

: 636

AudioLLM

AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.

github

: 71

Awesome-Neuro-Symbolic-Learning-with-LLM

The Awesome-Neuro-Symbolic-Learning-with-LLM repository is a curated collection of papers and resources focusing on improving reasoning and planning capabilities of Large Language Models (LLMs) and Multi-Modal Large Language Models (MLLMs) through neuro-symbolic learning. It covers a wide range of topics such as neuro-symbolic visual reasoning, program synthesis, logical reasoning, mathematical reasoning, code generation, visual reasoning, geometric reasoning, classical planning, game AI planning, robotic planning, AI agent planning, and more. The repository provides a comprehensive overview of tutorials, workshops, talks, surveys, papers, datasets, and benchmarks related to neuro-symbolic learning with LLMs and MLLMs.

github

: 53

CogVLM2

CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.

github

: 83

Awesome-LLMs-for-Video-Understanding

Awesome-LLMs-for-Video-Understanding is a repository dedicated to exploring Video Understanding with Large Language Models. It provides a comprehensive survey of the field, covering models, pretraining, instruction tuning, and hybrid methods. The repository also includes information on tasks, datasets, and benchmarks related to video understanding. Contributors are encouraged to add new papers, projects, and materials to enhance the repository.

github

: 1.8k

Awesome-LLM4IE-Papers

github

: 645

open-llms

Open LLMs is a repository containing various Large Language Models licensed for commercial use. It includes models like T5, GPT-NeoX, UL2, Bloom, Cerebras-GPT, Pythia, Dolly, and more. These models are designed for tasks such as transfer learning, language understanding, chatbot development, code generation, and more. The repository provides information on release dates, checkpoints, papers/blogs, parameters, context length, and licenses for each model. Contributions to the repository are welcome, and it serves as a resource for exploring the capabilities of different language models.

github

: 10.3k

LLM4Opt

LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

github

: 125

Awesome-LLM-Papers-Comprehensive-Topics

github

: 172

Github-Ranking-AI

This repository provides a list of the most starred and forked repositories on GitHub. It is updated automatically and includes information such as the project name, number of stars, number of forks, language, number of open issues, description, and last commit date. The repository is divided into two sections: LLM and chatGPT. The LLM section includes repositories related to large language models, while the chatGPT section includes repositories related to the chatGPT chatbot.

github

: 227

LLM-Agent-Survey

Autonomous agents are designed to achieve specific objectives through self-guided instructions. With the emergence and growth of large language models (LLMs), there is a growing trend in utilizing LLMs as fundamental controllers for these autonomous agents. This repository conducts a comprehensive survey study on the construction, application, and evaluation of LLM-based autonomous agents. It explores essential components of AI agents, application domains in natural sciences, social sciences, and engineering, and evaluation strategies. The survey aims to be a resource for researchers and practitioners in this rapidly evolving field.

github

: 2.2k

For similar tasks

Awesome-Model-Merging-Methods-Theories-Applications

github

: 347

optillm

optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

github

: 2.1k

llm-structured-output

This repository contains a library for constraining LLM generation to structured output, enforcing a JSON schema for precise data types and property names. It includes an acceptor/state machine framework, JSON acceptor, and JSON schema acceptor for guiding decoding in LLMs. The library provides reference implementations using Apple's MLX library and examples for function calling tasks. The tool aims to improve LLM output quality by ensuring adherence to a schema, reducing unnecessary output, and enhancing performance through pre-emptive decoding. Evaluations show performance benchmarks and comparisons with and without schema constraints.

github

: 69

HookPHP

HookPHP is an open-source project that provides a PHP extension for hooking into various aspects of PHP applications. It allows developers to easily extend and customize the behavior of their PHP applications by providing hooks at key points in the execution flow. With HookPHP, developers can efficiently add custom functionality, modify existing behavior, and enhance the overall performance of their PHP applications. The project is licensed under the MIT license, making it accessible for developers to use and contribute to.

github

: 617

ai-gateway

Envoy AI Gateway is an open source project that utilizes Envoy Gateway to manage request traffic from application clients to Generative AI services. The project aims to provide a seamless and efficient solution for handling communication between clients and AI services. It is designed to enhance the performance and scalability of AI applications by leveraging the capabilities of Envoy Gateway. The project welcomes contributions from the community and encourages collaboration to further develop and improve the functionality of the AI Gateway.

github

: 201

aligner

Aligner is a model-agnostic alignment tool designed to efficiently correct responses from large language models. It redistributes initial answers to align with human intentions, improving performance across various LLMs. The tool can be applied with minimal training, enhancing upstream models and reducing hallucination. Aligner's 'copy and correct' method preserves the base structure while enhancing responses. It achieves significant performance improvements in helpfulness, harmlessness, and honesty dimensions, with notable success in boosting Win Rates on evaluation leaderboards.

github

: 138

AirLine

AirLine is a learnable edge-based line detection algorithm designed for various robotic tasks such as scene recognition, 3D reconstruction, and SLAM. It offers a novel approach to extracting line segments directly from edges, enhancing generalization ability for unseen environments. The algorithm balances efficiency and accuracy through a region-grow algorithm and local edge voting scheme for line parameterization. AirLine demonstrates state-of-the-art precision with significant runtime acceleration compared to other learning-based methods, making it ideal for low-power robots.

github

: 64

LongRecipe

LongRecipe is a tool designed for efficient long context generalization in large language models. It provides a recipe for extending the context window of language models while maintaining their original capabilities. The tool includes data preprocessing steps, model training stages, and a process for merging fine-tuned models to enhance foundational capabilities. Users can follow the provided commands and scripts to preprocess data, train models in multiple stages, and merge models effectively.

github

: 57

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k