
Awesome-LLM-Post-training
Awesome Reasoning LLM Tutorial/Survey/Guide
Stars: 807

The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.
README:
Welcome to the Awesome-LLM-Post-training repository! This repository is a curated collection of the most influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies.
Our work is based on the following paper:
π LLM Post-Training: A Deep Dive into Reasoning Large Language Models β Available on
Komal Kumar* , Tajamul Ashraf* , Omkar Thawakar , Rao Muhammad Anwer , Hisham Cholakkal , Mubarak Shah , Ming-Hsuan Yang , Philip H.S. Torr , Fahad Shahbaz Khan , and Salman Khan
* Equally contributing first authors
- Corresponding authors: Komal Kumar, Tajamul Ashraf.
Feel free to β star and fork this repository to keep up with the latest advancements and contribute to the community.
Section | Subsection |
---|---|
π Papers | Survey, Theory, Explainability |
π€ LLMs in RL | LLM-Augmented Reinforcement Learning |
π Reward Learning | Human Feedback, Preference-Based RL, Intrinsic Motivation |
π Policy Optimization | Offline RL, Imitation Learning, Hierarchical RL |
π§ LLMs for Reasoning & Decision-Making | Causal Reasoning, Planning, Commonsense RL |
π Exploration & Generalization | Zero-Shot RL, Generalization in RL, Self-Supervised RL |
π€ Multi-Agent RL (MARL) | Emergent Communication, Coordination, Social RL |
β‘ Applications & Benchmarks | Autonomous Agents, Simulations, LLM-RL Benchmarks |
π Tutorials & Courses | Lectures, Workshops |
π οΈ Libraries & Implementations | Open-Source RL-LLM Frameworks |
π Other Resources | Additional Research & Readings |
Title | Publication Date | Link |
---|---|---|
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning | 3 Mar 2025 | Arxiv |
From System 1 to System 2: A Survey of Reasoning Large Language Models | 25 Feb 2025 | Arxiv |
Empowering LLMs with Logical Reasoning: A Comprehensive Survey | 24 Fev 2025 | Arxiv |
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | 16 Jan 2025 | Arxiv |
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey | 26 Sep 2024 | Arxiv |
Reasoning with Large Language Models, a Survey | 16 July 2024 | Arxiv |
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods | 30 Mar 2024 | Arxiv |
Reinforcement Learning Enhanced LLMs: A Survey | 5 Dec 2024 | Arxiv |
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey | 29 Dec 2024 | Arxiv |
Large Language Models: A Survey of Their Development, Capabilities, and Applications | 15 Jan 2025 | Springer |
A Survey on Multimodal Large Language Models | 10 Feb 2025 | Oxford Academic |
Large Language Models (LLMs): Survey, Technical Frameworks, and Future Directions | 20 Jul 2024 | Springer |
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machines | 11 Feb 2024 | Arxiv |
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | 14 Mar 2024 | Arxiv |
Reinforcement Learning Problem Solving with Large Language Models | 29 Apr 2024 | Arxiv |
A Survey on Large Language Models for Reinforcement Learning | 10 Dec 2023 | Arxiv |
Large Language Models as Decision-Makers: A Survey | 23 Aug 2023 | Arxiv |
A Survey on Large Language Model Alignment Techniques | 6 May 2023 | Arxiv |
Reinforcement Learning with Human Feedback: A Survey | 12 April 2023 | Arxiv |
Reasoning with Large Language Models: A Survey | 14 Feb 2023 | Arxiv |
A Survey on Foundation Models for Decision Making | 9 Jan 2023 | Arxiv |
Large Language Models in Reinforcement Learning: Opportunities and Challenges | 5 Dec 2022 | Arxiv |
- "Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search" | 08-02-2025 | [Paper]
- "DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL" | 08-02-2025 | [Paper]
- "QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search" | 08-02-2025 | [Paper]
- "Process Reinforcement through Implicit Rewards" | 04-02-2025 | [Paper]
- "Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling" | 29-01-2025 | [Paper]
- "Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies" | 30-01-2025 [Paper]
- "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" | 26-01-2025 | [Paper]
- "Kimi k1.5: Scaling Reinforcement Learning with LLMs" | 25-01-2025 | [Paper]
- "Does RLHF Scale? Exploring the Impacts From Data, Model, and Method" | 18-12-2024 | [Paper]
- "Offline Reinforcement Learning for LLM Multi-Step Reasoning" | 29-12-2024 | [Paper]
- "ReFT: Representation Finetuning for Language Models" | 10-07-2024 | [Paper]
- "Deepseekmath: Pushing the Limits of Mathematical Reasoning in Open Language Models" | 02-02-2024 | [Paper]
- "Reasoning with Reinforced Functional Token Tuning" | 15-02-2025 | [Paper]
- "Value-Based Deep RL Scales Predictably" | 07-02-2025 | [Paper]
- "InfAlign: Inference-aware Language Model Alignment" | 30-12-2024 | [Paper]
- "LIMR: Less is More for RL Scaling" | 12-02-2025 | [Paper]
- "A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics" | 14-02-2025 | [Paper]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models. [Paper]
- ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [Paper]
- The Lessons of Developing Process Reward Models in Mathematical Reasoning. [Paper]
- ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark. [Paper]
- AutoPSV: Automated Process-Supervised Verifier [Paper]
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
- Free Process Rewards without Process Labels. [Paper]
- Outcome-Refining Process Supervision for Code Generation [Paper]
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations [Paper]
- OVM: Outcome-supervised Value Models for Planning in Mathematical Reasoning [Paper]
- Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [Paper]
- Let's Verify Step by Step. [Paper]
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision [Paper]
- Making Large Language Models Better Reasoners with Step-Aware Verifier [Paper]
- Solving Math Word Problems with Process and Outcome-Based Feedback [Paper]
- Uncertainty-Aware Step-wise Verification with Generative Reward Models [Paper]
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
- Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models [Paper]
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems [Paper]
- "Decision Transformer: Reinforcement Learning via Sequence Modeling" - Chen et al. (2021) [Paper]
- "Offline RL with LLMs as Generalist Memory" - Tian et al. (2023) [Paper]
- Agents Thinking Fast and Slow: A Talker-Reasoner Architecture [Paper]
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective [Paper]
- When a Language Model is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1 [Paper]
- The Impact of Reasoning Step Length on Large Language Models [Paper]
- Distilling System 2 into System 1 [Paper]
- System 2 Attention (is something you might need too) [Paper]
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought [Paper]
- LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [Paper]
- Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time [Paper]
- Diving into Self-Evolving Training for Multimodal Reasoning [Paper]
- Visual Agents as Fast and Slow Thinkers [Paper]
- Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [Paper]
- Scaling Inference-Time Search With Vision Value Model for Improved Visual Comprehension [Paper]
- Slow Perception: Let's Perceive Geometric Figures Step-by-Step [Paper]
- AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [Paper]
- LLaVA-o1: Let Vision Language Models Reason Step-by-Step [Paper]
- Vision-Language Models Can Self-Improve Reasoning via Reflection [Paper]
- I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models [Paper]
- RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision [Paper]
- Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models [Paper]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models [Paper]
- MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [Paper]
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs [Paper]
- A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? [Paper]
- EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [Paper]
- SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [Paper]
- Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models [Paper]
- FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [Paper]
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI [Paper]
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [Paper]
- LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion [Paper]
- Humanity's Last Exam [Paper]
- LR2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems [Paper]
- BIG-Bench Extra Hard [Paper]
- Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable [Paper]
- OverThink: Slowdown Attacks on Reasoning LLMs [Paper]
- GuardReasoner: Towards Reasoning-based LLM Safeguards [Paper]
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
- ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails [Paper]
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
- H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking [Paper]
- BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack [Paper]
# | Repository & Link | Description |
---|---|---|
1 |
RL4VLM Archived & Read-Only as of December 15, 2024 |
Offers code for fine-tuning large vision-language models as decision-making agents via RL. Includes implementations for training models with task-specific rewards and evaluating them in various environments. |
2 | LlamaGym | Simplifies fine-tuning large language model (LLM) agents with online RL. Provides an abstract Agent class to handle various aspects of RL training, allowing for quick iteration and experimentation across different environments. |
3 | RL-Based Fine-Tuning of Diffusion Models for Biological Sequences | Accompanies a tutorial and review paper on RL-based fine-tuning, focusing on the design of biological sequences (DNA/RNA). Provides comprehensive tutorials and code implementations for training and fine-tuning diffusion models using RL. |
4 | LM-RL-Finetune | Aims to improve KL penalty optimization in RL fine-tuning of language models by computing the KL penalty term analytically. Includes configurations for training with Proximal Policy Optimization (PPO). |
5 | InstructLLaMA | Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT. |
6 | SEIKO | Introduces a novel RL method to efficiently fine-tune diffusion models in an online setting. Its techniques outperform baselines such as PPO, classifier-based guidance, and direct reward backpropagation for fine-tuning Stable Diffusion. |
7 | TRL (Train Transformer Language Models with RL) | A state-of-the-art library for post-training foundation models using methods like Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), GRPO, and Direct Preference Optimization (DPO). Built on the π€ Transformers ecosystem, it supports multiple model architectures and scales efficiently across hardware setups. |
8 | Fine-Tuning Reinforcement Learning Models as Continual Learning | Explores fine-tuning RL models as a forgetting mitigation problem (continual learning). Provides insights and code implementations to address forgetting in RL models. |
9 | RL4LMs | A modular RL library to fine-tune language models to human preferences. Rigorously evaluated through 2000+ experiments using the GRUE benchmark, ensuring robustness across various NLP tasks. |
10 | Lamorel | A high-throughput, distributed architecture for seamless LLM integration in interactive environments. While not specialized in RL or RLHF by default, it supports custom implementations and is ideal for users needing maximum flexibility. |
11 | LLM-Reverse-Curriculum-RL | Implements the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning". Focuses on enhancing LLM reasoning capabilities using a reverse curriculum RL approach. |
12 | veRL | A flexible, efficient, and production-ready RL training library for large language models (LLMs). Serves as the open-source implementation of the HybridFlow framework and supports various RL algorithms (PPO, GRPO), advanced resource utilization, and scalability up to 70B models on hundreds of GPUs. Integrates with Hugging Face models, supervised fine-tuning, and RLHF with multiple reward types. |
13 | trlX | A distributed training framework for fine-tuning large language models (LLMs) with reinforcement learning. Supports both Accelerate and NVIDIA NeMo backends, allowing training of models up to 20B+ parameters. Implements PPO and ILQL, and integrates with CHEESE for human-in-the-loop data collection. |
14 | Okapi | A framework for instruction tuning in LLMs with RLHF, supporting 26 languages. Provides multilingual resources such as ChatGPT prompts, instruction datasets, and response ranking data, along with both BLOOM-based and LLaMa-based models and evaluation benchmarks. |
15 | LLaMA-Factory | Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024). Supports a wide array of models (e.g., LLaMA, LLaVA, Qwen, Mistral) with methods including pre-training, multimodal fine-tuning, reward modeling, PPO, DPO, and ORPO. Offers scalable tuning (16-bit, LoRA, QLoRA) with advanced optimizations and logging integrations, and provides fast inference via API, Gradio UI, and CLI with vLLM workers. |
- "AutoGPT: LLMs for Autonomous RL Agents" - OpenAI (2023) [Paper]
- "Barkour: Benchmarking LLM-Augmented RL" - Wu et al. (2023) [Paper]
- πΉ Decision Transformer (GitHub)
- πΉ ReAct (GitHub)
- πΉ RLHF (GitHub)
Contributions are welcome! If you have relevant papers, code, or insights, feel free to submit a pull request.
If you find our work useful or use it in your research, please consider citing:
@misc{kumar2025llmposttrainingdeepdive,
title={LLM Post-Training: A Deep Dive into Reasoning Large Language Models},
author={Komal Kumar and Tajamul Ashraf and Omkar Thawakar and Rao Muhammad Anwer and Hisham Cholakkal and Mubarak Shah and Ming-Hsuan Yang and Phillip H. S. Torr and Salman Khan and Fahad Shahbaz Khan},
year={2025},
eprint={2502.21321},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.21321},
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Looking forward to your feedback, contributions, and stars! π Please raise any issues or questions here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-Post-training
Similar Open Source Tools

Awesome-LLM-Post-training
The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.

LLM-Tool-Survey
This repository contains a collection of papers related to tool learning with large language models (LLMs). The papers are organized according to the survey paper 'Tool Learning with Large Language Models: A Survey'. The survey focuses on the benefits and implementation of tool learning with LLMs, covering aspects such as task planning, tool selection, tool calling, response generation, benchmarks, evaluation, challenges, and future directions in the field. It aims to provide a comprehensive understanding of tool learning with LLMs and inspire further exploration in this emerging area.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and their applications, while also discussing current limitations and future directions.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and applications. It discusses current limitations and future directions in efficient MLLM research.

llm-continual-learning-survey
This repository is an updating survey for Continual Learning of Large Language Models (CL-LLMs), providing a comprehensive overview of various aspects related to the continual learning of large language models. It covers topics such as continual pre-training, domain-adaptive pre-training, continual fine-tuning, model refinement, model alignment, multimodal LLMs, and miscellaneous aspects. The survey includes a collection of relevant papers, each focusing on different areas within the field of continual learning of large language models.

Awesome-GenAI-Unlearning
This repository is a collection of papers on Generative AI Machine Unlearning, categorized based on modality and applications. It includes datasets, benchmarks, and surveys related to unlearning scenarios in generative AI. The repository aims to provide a comprehensive overview of research in the field of machine unlearning for generative models.

Awesome-Efficient-AIGC
This repository, Awesome Efficient AIGC, collects efficient approaches for AI-generated content (AIGC) to cope with its huge demand for computing resources. It includes efficient Large Language Models (LLMs), Diffusion Models (DMs), and more. The repository is continuously improving and welcomes contributions of works like papers and repositories that are missed by the collection.

Efficient-LLMs-Survey
This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

awesome-AIOps
awesome-AIOps is a curated list of academic researches and industrial materials related to Artificial Intelligence for IT Operations (AIOps). It includes resources such as competitions, white papers, blogs, tutorials, benchmarks, tools, companies, academic materials, talks, workshops, papers, and courses covering various aspects of AIOps like anomaly detection, root cause analysis, incident management, microservices, dependency tracing, and more.

LLM-Agent-Survey
LLM-Agent-Survey is a comprehensive repository that provides a curated list of papers related to Large Language Model (LLM) agents. The repository categorizes papers based on LLM-Profiled Roles and includes high-quality publications from prestigious conferences and journals. It aims to offer a systematic understanding of LLM-based agents, covering topics such as tool use, planning, and feedback learning. The repository also includes unpublished papers with insightful analysis and novelty, marked for future updates. Users can explore a wide range of surveys, tool use cases, planning workflows, and benchmarks related to LLM agents.

Awesome_Mamba
Awesome Mamba is a curated collection of groundbreaking research papers and articles on Mamba Architecture, a pioneering framework in deep learning known for its selective state spaces and efficiency in processing complex data structures. The repository offers a comprehensive exploration of Mamba architecture through categorized research papers covering various domains like visual recognition, speech processing, remote sensing, video processing, activity recognition, image enhancement, medical imaging, reinforcement learning, natural language processing, 3D recognition, multi-modal understanding, time series analysis, graph neural networks, point cloud analysis, and tabular data handling.

lobe-cli-toolbox
Lobe CLI Toolbox is an AI CLI Toolbox designed to enhance git commit and i18n workflow efficiency. It includes tools like Lobe Commit for generating Gitmoji-based commit messages and Lobe i18n for automating the i18n translation process. The toolbox also features Lobe label for automatically copying issues labels from a template repo. It supports features such as automatic splitting of large files, incremental updates, and customization options for the OpenAI model, API proxy, and temperature.
For similar tasks

Awesome-LLM-Post-training
The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.

mindsdb
MindsDB is a platform for customizing AI from enterprise data. You can create, serve, and fine-tune models in real-time from your database, vector store, and application data. MindsDB "enhances" SQL syntax with AI capabilities to make it accessible for developers worldwide. With MindsDBβs nearly 200 integrations, any developer can create AI customized for their purpose, faster and more securely. Their AI systems will constantly improve themselves β using companiesβ own data, in real-time.

training-operator
Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, Tensorflow, XGBoost, MPI, Paddle and others. Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK. > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes. * For a complete reference of the custom resource definitions, please refer to the API Definition. * TensorFlow API Definition * PyTorch API Definition * Apache MXNet API Definition * XGBoost API Definition * MPI API Definition * PaddlePaddle API Definition * For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator * For details on its observability, please refer to the monitoring design doc.

helix
HelixML is a private GenAI platform that allows users to deploy the best of open AI in their own data center or VPC while retaining complete data security and control. It includes support for fine-tuning models with drag-and-drop functionality. HelixML brings the best of open source AI to businesses in an ergonomic and scalable way, optimizing the tradeoff between GPU memory and latency.

nntrainer
NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.

petals
Petals is a tool that allows users to run large language models at home in a BitTorrent-style manner. It enables fine-tuning and inference up to 10x faster than offloading. Users can generate text with distributed models like Llama 2, Falcon, and BLOOM, and fine-tune them for specific tasks directly from their desktop computer or Google Colab. Petals is a community-run system that relies on people sharing their GPUs to increase its capacity and offer a distributed network for hosting model layers.

LLaVA-pp
This repository, LLaVA++, extends the visual capabilities of the LLaVA 1.5 model by incorporating the latest LLMs, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B. It provides various models for instruction-following LMMS and academic-task-oriented datasets, along with training scripts for Phi-3-V and LLaMA-3-V. The repository also includes installation instructions and acknowledgments to related open-source contributions.

KULLM
KULLM (ꡬλ¦) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8ΓA100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.