Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
Stars: 273
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.
README:
A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. Arxiv, 2024.'.
[!IMPORTANT] Contributions welcome:
- If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us! Or, you may also consider submitting 'Pull requests' directly, thank you!
- If you think your paper is more suitable for another category, please contact us or submit 'Pull requests'. If your paper is accepted, you may consider updating the relevant information. Thank you!
- 🔥🔥🔥 We marked the papers that used model size $\geq$ 7B in experiments.
Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.
If you find our paper or this resource helpful, please consider cite:
@article{Survery_ModelMerging_2024,
title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
journal={arXiv preprint arXiv:2408.07666},
year={2024}
}
Thanks!
-
Awesome-Model-Merging-Methods-Theories-Applications
- Survey
- Benchmark/Evaluation
- Advanced Methods
- Application of Model Merging in Foundation Models
- Application of Model Merging in Different Machine Learning Subfields
- Other Applications
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | 2024 | Arxiv | LLaMA3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct-v0.3, |
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | 2024 | NeurIPS Track on Datasets and Benchmarks | Synthia-7B-v1.2, Llama-2-7b-evolcodealpaca, OpenHermes-7B, pygmalion-2-7b, Llama-2-7b-chat-hf, BeingWell_llama2_7b, MetaMath-7B-V1.0, vicuna-7b-v1.5, Platypus2-7B, GOAT-7B-Community, Llama-2-7b-WikiChat-fused, dolphin-llama2-7b, MetaMath-Llemma-7B, CodeLlama-7b-Instruct-hf, Magicoder-S-CL-7B , CrystalChat |
What Matters for Model Merging at Scale? | 2024 | Arxiv | PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B) |
Realistic Evaluation of Model Merging for Compositional Generalization | 2024 | Arxiv | |
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities | 2024 | Arxiv | Llama-3.1-8B, Mistral-7B-v0.3 |
FusionBench: A Comprehensive Benchmark of Deep Model Fusion | 2024 | Arxiv | |
Arcee's MergeKit: A Toolkit for Merging Large Language Models | 2024 | Arxiv | Llama2-7B-Chat, Meditron-7B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | 2024 | Arxiv | |
Tangent Transformers for Composition,Privacy and Removal | 2024 | ICLR | |
Parameter Efficient Multi-task Model Fusion with Partial Linearization | 2024 | ICLR | |
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Efficient Model Editing with Task-Localized Sparse Fine-tuning | 2024 |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Training-free Heterogeneous Model Merging | 2025 | Arxiv | |
Knowledge fusion of large language models | 2024 | ICLR | Llama-2 7B, OpenLLaMA 7B, MPT 7B |
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report | 2024 | Arxiv | NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B |
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks | 2023 | ICASSP | |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Composing parameter-efficient modules with arithmetic operation | 2023 | NeurIPS | |
Editing models with task arithmetic | 2023 | ICLR | |
Model fusion via optimal transport | 2020 | NeurIPS | |
Weight averaging for neural networks and local resampling schemes | 1996 | AAAI Workshop | |
Animating rotation with quaternion curves (Spherical Linear Interpolation (SLERP) Model Merging) | 1985 | SIGGRAPH Computer Graphics |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Tint Your Models Task-wise for Improved Multi-task Model Merging | 2024 | Arxiv | |
Parameter-Efficient Interventions for Enhanced Model Merging | 2024 | Arxiv | |
Rethink the Evaluation Protocol of Model Merging on Classification Task | 2024 | Arxiv | |
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery | 2024 | Arxiv | |
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent | 2025 | Arxiv | |
How to Merge Your Multimodal Models Over Time? | 2024 | Arxiv | |
Multi-Task Model Merging via Adaptive Weight Disentanglement | 2024 | Arxiv | |
Rethinking Weight-Averaged Model-merging | 2024 | Arxiv | |
ATM: Improving Model Merging by Alternating Tuning and Merging | 2024 | Arxiv | |
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | 2024 | Arxiv | Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B |
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging | 2024 | Arxiv | |
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization | 2024 | Arxiv | Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | 2023 | Arxiv | SOLAR 10.7B, SOLAR 10.7B-Instruct |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach | 2024 | Arxiv | |
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | 2024 | AAAI | LLaMA-7B |
Mitigating Social Biases in Language Models through Unlearning | 2024 | Arxiv | LLaMA-2 7B |
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models | 2024 | Arxiv | Llama-2-7B, Llama-2-chat-7B, Vicuna-7B, Llama-2-13B |
Composing Parameter-Efficient Modules with Arithmetic Operation | 2023 | NeurIPS | |
Editing models with task arithmetic | 2023 | ICLR | |
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
NegMerge: Consensual Weight Negation for Strong Machine Unlearning | 2024 | Arxiv | |
Towards Safer Large Language Models through Machine Unlearning | 2024 | ACL | LLAMA2-7B, LLAMA2-13B |
Editing models with task arithmetic | 2023 | ICLR | |
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model | 2023 | Arxiv | LLAMA2-7B, LLAMA-7B, BLOOM-7B |
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv | OpenLLaMA 7B and 13B |
Checkpoint Merging via Bayesian Optimization in LLM Pretraining | 2024 | Arxiv | Baichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B |
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning | 2023 | ACL | |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | 2023 | NeurIPS Workshop | |
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging | 2022 | NeurIPS Workshop | |
Fusing finetuned models for better pretraining | 2022 | Arxiv |
Note: The following papers are from: LLM Merging Competition at NeurIPS 2024
Paper Title | Year | Conference/Journal | Models |
---|---|---|---|
Llm merging: Building llms efficiently through merging | 2024 | LLM Merging Competition at NeurIPS | - |
Towards an approach combining Knowledge Graphs and Prompt Engineering for Merging Large Language Models | 2024 | LLM Merging Competition at NeurIPS | meta-llama/Llama-2-7b; microsoft_phi1/2/3 |
Model Merging using Geometric Median of Task Vectors | 2024 | LLM Merging Competition at NeurIPS | flan_t5_xl |
Interpolated Layer-Wise Merging for NeurIPS 2024 LLM Merging Competition | 2024 | LLM Merging Competition at NeurIPS | suzume-llama-3-8B-multilingual-orpo-borda-top75, Barcenas-Llama3-8bORPO, Llama-3-8B-Ultra-Instruct-SaltSprinkle, MAmmoTH2-8B-Plus, Daredevil-8B |
A Model Merging Method | 2024 | LLM Merging Competition at NeurIPS | - |
Differentiable DARE-TIES for NeurIPS 2024 LLM Merging Competition | 2024 | LLM Merging Competition at NeurIPS | suzume-llama-3-8B-multilingualorpo-borda-top75, MAmmoTH2-8B-Plus and Llama-3-Refueled |
LLM Merging Competition Technical Report: Efficient Model Merging with Strategic Model Selection, Merging, and Hyperparameter Optimization | 2024 | LLM Merging Competition at NeurIPS | MaziyarPanahi/Llama3-8B-Instruct-v0.8, MaziyarPanahi/Llama-3-8B-Instruct-v0.9, shenzhiwang/Llama3-8B-Chinese-Chat, lightblue/suzume-llama-3-8B-multilingual |
Simple Llama Merge: What Kind of LLM Do We Need? | 2024 | LLM Merging Competition at NeurIPS | Hermes-2-Pro-Llama-3-8B, and Daredevil-8B |
LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging | 2024 | LLM Merging Competition at NeurIPS | Mistral-7B-Instruct94 v2, Llama3-8B-Instruct, Flan-T5-large, Gemma-7B-Instruct, and WizardLM-2-7B |
MoD: A Distribution-Based Approach for Merging Large Language Models | 2024 | LLM Merging Competition at NeurIPS | Qwen2.5-1.5B and Qwen2.5-7B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Jointly training large autoregressive multimodal models | 2024 | ICLR | |
Model Composition for Multimodal Large Language Models | 2024 | ACL | Vicuna-7B-v1.5 |
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation | 2023 | ICML | |
An Empirical Study of Multimodal Model Merging | 2023 | EMNLP | |
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks | 2023 | TMLR |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification | 2024 | ICASSP Workshop |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | 2024 | Arxiv | LLaVA-Critic 7b |
IterIS: Iterative Inference-Solving Alignment for LoRA Merging | 2024 | Arxiv | |
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | 2024 | ECCV | |
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models | 2024 | Arxiv | |
MoLE: Mixture of LoRA Experts | 2024 | ICLR | |
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | 2024 | Arxiv | |
Multi-LoRA Composition for Image Generation | 2024 | Arxiv | |
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models | 2023 | NeurIPS | |
Merging loras | 2023 | (github) | |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | 2023 | Arxiv | |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better | 2024 | Arxiv | |
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA | 2024 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Decouple-Then-Merge: Towards Better Training for Diffusion Models | 2024 | Arxiv | |
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data | 2024 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging | 2024 | Arxiv | |
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | 2024 | Arxiv | |
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation | 2024 | Arxiv | Llama3-8B |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv | OpenLLaMA-7B, OpenLLaMA-13B |
Merging Vision Transformers from Different Tasks and Domains | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
Realistic Evaluation of Model Merging for Compositional Generalization | 2024 | Arxiv | |
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks | 2024 | Arxiv | |
Training-Free Model Merging for Multi-target Domain Adaptation | 2024 | Arxiv | |
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation | 2024 | Arxiv | Llama3-70B |
Ensemble of averages: Improving model selection and boosting performance in domain generalization | 2022 | NeurIPS | |
Swad: Domain generalization by seeking flat minima | 2021 | NeurIPS |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks | 2024 | ACL | Llama-2- 7B |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | 2024 | COLM | Llama-2-7B, Llama-2-13B |
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild | 2024 | ACL | |
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? | 2024 | Arxiv | |
MerA: Merging pretrained adapters for few-shot learning | 2023 | Arxiv |
Paper Title | Year | Conference/Journal | Remark |
---|---|---|---|
LoBAM: LoRA-Based Backdoor Attack on Model Merging | 2024 | Arxiv | |
BadMerging: Backdoor Attacks Against Model Merging | 2024 | CCS | |
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario | 2024 | ACL | Llama-2-7B |
Star History
We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.
If you have a related paper that was not added to the library, please contact us.
Email: [email protected] / [email protected]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Model-Merging-Methods-Theories-Applications
Similar Open Source Tools
Awesome-Model-Merging-Methods-Theories-Applications
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.
Awesome-LLMs-for-Video-Understanding
Awesome-LLMs-for-Video-Understanding is a repository dedicated to exploring Video Understanding with Large Language Models. It provides a comprehensive survey of the field, covering models, pretraining, instruction tuning, and hybrid methods. The repository also includes information on tasks, datasets, and benchmarks related to video understanding. Contributors are encouraged to add new papers, projects, and materials to enhance the repository.
Awesome-Tabular-LLMs
This repository is a collection of papers on Tabular Large Language Models (LLMs) specialized for processing tabular data. It includes surveys, models, and applications related to table understanding tasks such as Table Question Answering, Table-to-Text, Text-to-SQL, and more. The repository categorizes the papers based on key ideas and provides insights into the advancements in using LLMs for processing diverse tables and fulfilling various tabular tasks based on natural language instructions.
speech-trident
Speech Trident is a repository focusing on speech/audio large language models, covering representation learning, neural codec, and language models. It explores speech representation models, speech neural codec models, and speech large language models. The repository includes contributions from various researchers and provides a comprehensive list of speech/audio language models, representation models, and codec models.
AudioLLM
AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.
CogVLM2
CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.
Awesome-Knowledge-Distillation-of-LLMs
A collection of papers related to knowledge distillation of large language models (LLMs). The repository focuses on techniques to transfer advanced capabilities from proprietary LLMs to smaller models, compress open-source LLMs, and refine their performance. It covers various aspects of knowledge distillation, including algorithms, skill distillation, verticalization distillation in fields like law, medical & healthcare, finance, science, and miscellaneous domains. The repository provides a comprehensive overview of the research in the area of knowledge distillation of LLMs.
LLM4Opt
LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.
Github-Ranking-AI
This repository provides a list of the most starred and forked repositories on GitHub. It is updated automatically and includes information such as the project name, number of stars, number of forks, language, number of open issues, description, and last commit date. The repository is divided into two sections: LLM and chatGPT. The LLM section includes repositories related to large language models, while the chatGPT section includes repositories related to the chatGPT chatbot.
open-llms
Open LLMs is a repository containing various Large Language Models licensed for commercial use. It includes models like T5, GPT-NeoX, UL2, Bloom, Cerebras-GPT, Pythia, Dolly, and more. These models are designed for tasks such as transfer learning, language understanding, chatbot development, code generation, and more. The repository provides information on release dates, checkpoints, papers/blogs, parameters, context length, and licenses for each model. Contributions to the repository are welcome, and it serves as a resource for exploring the capabilities of different language models.
VoiceBench
VoiceBench is a repository containing code and data for benchmarking LLM-Based Voice Assistants. It includes a leaderboard with rankings of various voice assistant models based on different evaluation metrics. The repository provides setup instructions, datasets, evaluation procedures, and a curated list of awesome voice assistants. Users can submit new voice assistant results through the issue tracker for updates on the ranking list.
Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, benchmarks, demos, papers for Large Language Models (like ChatGPT, LLaMA, GLM, Baichuan, etc) Evaluation on Language capabilities, Knowledge, Reasoning, Fairness and Safety.
Model-References
The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.
LLamaTuner
LLamaTuner is a repository for the Efficient Finetuning of Quantized LLMs project, focusing on building and sharing instruction-following Chinese baichuan-7b/LLaMA/Pythia/GLM model tuning methods. The project enables training on a single Nvidia RTX-2080TI and RTX-3090 for multi-round chatbot training. It utilizes bitsandbytes for quantization and is integrated with Huggingface's PEFT and transformers libraries. The repository supports various models, training approaches, and datasets for supervised fine-tuning, LoRA, QLoRA, and more. It also provides tools for data preprocessing and offers models in the Hugging Face model hub for inference and finetuning. The project is licensed under Apache 2.0 and acknowledges contributions from various open-source contributors.
For similar tasks
Awesome-Model-Merging-Methods-Theories-Applications
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.
optillm
optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.
llm-structured-output
This repository contains a library for constraining LLM generation to structured output, enforcing a JSON schema for precise data types and property names. It includes an acceptor/state machine framework, JSON acceptor, and JSON schema acceptor for guiding decoding in LLMs. The library provides reference implementations using Apple's MLX library and examples for function calling tasks. The tool aims to improve LLM output quality by ensuring adherence to a schema, reducing unnecessary output, and enhancing performance through pre-emptive decoding. Evaluations show performance benchmarks and comparisons with and without schema constraints.
HookPHP
HookPHP is an open-source project that provides a PHP extension for hooking into various aspects of PHP applications. It allows developers to easily extend and customize the behavior of their PHP applications by providing hooks at key points in the execution flow. With HookPHP, developers can efficiently add custom functionality, modify existing behavior, and enhance the overall performance of their PHP applications. The project is licensed under the MIT license, making it accessible for developers to use and contribute to.
ai-gateway
Envoy AI Gateway is an open source project that utilizes Envoy Gateway to manage request traffic from application clients to Generative AI services. The project aims to provide a seamless and efficient solution for handling communication between clients and AI services. It is designed to enhance the performance and scalability of AI applications by leveraging the capabilities of Envoy Gateway. The project welcomes contributions from the community and encourages collaboration to further develop and improve the functionality of the AI Gateway.
aligner
Aligner is a model-agnostic alignment tool designed to efficiently correct responses from large language models. It redistributes initial answers to align with human intentions, improving performance across various LLMs. The tool can be applied with minimal training, enhancing upstream models and reducing hallucination. Aligner's 'copy and correct' method preserves the base structure while enhancing responses. It achieves significant performance improvements in helpfulness, harmlessness, and honesty dimensions, with notable success in boosting Win Rates on evaluation leaderboards.
AirLine
AirLine is a learnable edge-based line detection algorithm designed for various robotic tasks such as scene recognition, 3D reconstruction, and SLAM. It offers a novel approach to extracting line segments directly from edges, enhancing generalization ability for unseen environments. The algorithm balances efficiency and accuracy through a region-grow algorithm and local edge voting scheme for line parameterization. AirLine demonstrates state-of-the-art precision with significant runtime acceleration compared to other learning-based methods, making it ideal for low-power robots.
LongRecipe
LongRecipe is a tool designed for efficient long context generalization in large language models. It provides a recipe for extending the context window of language models while maintaining their original capabilities. The tool includes data preprocessing steps, model training stages, and a process for merging fine-tuned models to enhance foundational capabilities. Users can follow the provided commands and scripts to preprocess data, train models in multiple stages, and merge models effectively.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.