
Awesome-LLM-Post-training
Awesome Reasoning LLM Tutorial/Survey/Guide
Stars: 1222

The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.
README:
Welcome to the Awesome-LLM-Post-training repository! This repository is a curated collection of the most influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies.
Our work is based on the following paper:
π LLM Post-Training: A Deep Dive into Reasoning Large Language Models β Available on
Komal Kumar* , Tajamul Ashraf* , Omkar Thawakar , Rao Muhammad Anwer , Hisham Cholakkal , Mubarak Shah , Ming-Hsuan Yang , Philip H.S. Torr , Fahad Shahbaz Khan , and Salman Khan
* Equally contributing first authors
- Corresponding authors: Komal Kumar, Tajamul Ashraf.
Feel free to β star and fork this repository to keep up with the latest advancements and contribute to the community.
A taxonomy of post-training approaches for **LLMs**, categorized into Fine-tuning, Reinforcement Learning, and Test-time Scaling methods. We summarize the key techniques used in recent LLM models.
Section | Subsection |
---|---|
π Papers | Survey, Theory, Explainability |
π€ LLMs in RL | LLM-Augmented Reinforcement Learning |
π Reward Learning | Human Feedback, Preference-Based RL, Intrinsic Motivation |
π Policy Optimization | Offline RL, Imitation Learning, Hierarchical RL |
π§ LLMs for Reasoning & Decision-Making | Causal Reasoning, Planning, Commonsense RL |
π Exploration & Generalization | Zero-Shot RL, Generalization in RL, Self-Supervised RL |
π€ Multi-Agent RL (MARL) | Emergent Communication, Coordination, Social RL |
β‘ Applications & Benchmarks | Autonomous Agents, Simulations, LLM-RL Benchmarks |
π Tutorials & Courses | Lectures, Workshops |
π οΈ Libraries & Implementations | Open-Source RL-LLM Frameworks |
π Other Resources | Additional Research & Readings |
Title | Publication Date | Link |
---|---|---|
A Survey on Post-training of Large Language Models | 8 Mar 2025 | Arxiv |
LLM Post-Training: A Deep Dive into Reasoning Large Language Models | 28 Feb 2025 | Arxiv |
From System 1 to System 2: A Survey of Reasoning Large Language Models | 25 Feb 2025 | Arxiv |
Empowering LLMs with Logical Reasoning: A Comprehensive Survey | 24 Feb 2025 | Arxiv |
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | 16 Jan 2025 | Arxiv |
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey | 26 Sep 2024 | Arxiv |
Reasoning with Large Language Models, a Survey | 16 July 2024 | Arxiv |
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods | 30 Mar 2024 | Arxiv |
Reinforcement Learning Enhanced LLMs: A Survey | 5 Dec 2024 | Arxiv |
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey | 29 Dec 2024 | Arxiv |
Large Language Models: A Survey of Their Development, Capabilities, and Applications | 15 Jan 2025 | Springer |
A Survey on Multimodal Large Language Models | 10 Feb 2025 | Oxford Academic |
Large Language Models (LLMs): Survey, Technical Frameworks, and Future Directions | 20 Jul 2024 | Springer |
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machines | 11 Feb 2024 | Arxiv |
ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | 14 Mar 2024 | Arxiv |
Reinforcement Learning Problem Solving with Large Language Models | 29 Apr 2024 | Arxiv |
A Survey on Large Language Models for Reinforcement Learning | 10 Dec 2023 | Arxiv |
Large Language Models as Decision-Makers: A Survey | 23 Aug 2023 | Arxiv |
A Survey on Large Language Model Alignment Techniques | 6 May 2023 | Arxiv |
Reinforcement Learning with Human Feedback: A Survey | 12 April 2023 | Arxiv |
Reasoning with Large Language Models: A Survey | 14 Feb 2023 | Arxiv |
A Survey on Foundation Models for Decision Making | 9 Jan 2023 | Arxiv |
Large Language Models in Reinforcement Learning: Opportunities and Challenges | 5 Dec 2022 | Arxiv |
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [Paper]
- DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL [Paper]
- QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search [Paper]
- Process Reinforcement through Implicit Rewards [Paper]
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling [Paper]
- Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies [Paper]
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Paper]
- Kimi k1.5: Scaling Reinforcement Learning with LLMs [Paper]
- Does RLHF Scale? Exploring the Impacts From Data, Model, and Method [Paper]
- Offline Reinforcement Learning for LLM Multi-Step Reasoning [Paper]
- ReFT: Representation Finetuning for Language Models [Paper]
- Deepseekmath: Pushing the limits of mathematical reasoning in open language models [Paper]
- Reasoning with Reinforced Functional Token Tuning [Paper]
- Value-Based Deep RL Scales Predictably [Paper]
- InfAlign: Inference-aware language model alignment [Paper]
- LIMR: Less is More for RL Scaling [Paper]
- A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics [Paper]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models. [Paper]
- ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [Paper]
- The Lessons of Developing Process Reward Models in Mathematical Reasoning. [Paper]
- ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark. [Paper]
- AutoPSV: Automated Process-Supervised Verifier [Paper]
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
- Free Process Rewards without Process Labels. [Paper]
- Outcome-Refining Process Supervision for Code Generation [Paper]
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations [Paper]
- OVM: Outcome-supervised Value Models for Planning in Mathematical Reasoning [Paper]
- Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [Paper]
- Let's Verify Step by Step. [Paper]
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision [Paper]
- Making Large Language Models Better Reasoners with Step-Aware Verifier [Paper]
- Solving Math Word Problems with Process and Outcome-Based Feedback [Paper]
- Uncertainty-Aware Step-wise Verification with Generative Reward Models [Paper]
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [Paper]
- Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models [Paper]
- Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling [Paper]
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems [Paper]
- On the Convergence Rate of MCTS for the Optimal Value Estimation in Markov Decision Processes [Paper]
- Search-o1: Agentic Search-Enhanced Large Reasoning Models [Paper]
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking [Paper]
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [Paper]
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [Paper]
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs [Paper]
- Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper]
- Proposing and solving olympiad geometry with guided tree search [Paper]
- SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [Paper]
- Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning [Paper]
- CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [Paper]
- GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection [Paper]
- MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree [Paper]
- Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions [Paper]
- SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [Paper]
- Donβt throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding [Paper]
- AFlow: Automating Agentic Workflow Generation [Paper]
- Interpretable Contrastive Monte Carlo Tree Search Reasoning [Paper]
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [Paper]
- Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning [Paper]
- TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling [Paper]
- Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination [Paper]
- RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [Paper]
- Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search [Paper]
- LiteSearch: Efficacious Tree Search for LLM [Paper]
- Tree Search for Language Model Agents [Paper]
- Uncertainty-Guided Optimization on Large Language Model Search Trees [Paper]
-
- Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [Paper]
- Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [Paper]
- Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping [Paper]
- LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models [Paper]
- AlphaMath Almost Zero: process Supervision without process [Paper]
- Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search [Paper]
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [Paper]
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [Paper]
- Stream of Search (SoS): Learning to Search in Language [Paper]
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [Paper]
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models [Paper]
- Reasoning with Language Model is Planning with World Model [Paper]
- Large Language Models as Commonsense Knowledge for Large-Scale Task Planning [Paper]
- ALPHAZERO-LIKE TREE-SEARCH CAN GUIDE LARGE LANGUAGE MODEL DECODING AND TRAINING [Paper]
- Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training [Paper]
- MAKING PPO EVEN BETTER: VALUE-GUIDED MONTE-CARLO TREE SEARCH DECODING [Paper]
- Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning [Paper]
- Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [Paper]
- Fine-grained Conversational Decoding via Isotropic and Proximal Search [Paper]
- Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata [Paper]
- Look-back Decoding for Open-Ended Text Generation [Paper]
- LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [Paper]
- Agents Thinking Fast and Slow: A Talker-Reasoner Architecture [Paper]
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective [Paper]
- When a Language Model is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI o1 [Paper]
- The Impact of Reasoning Step Length on Large Language Models [Paper]
- Distilling System 2 into System 1 [Paper]
- System 2 Attention (is something you might need too) [Paper]
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought [Paper]
- LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [Paper]
- Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time [Paper]
- Diving into Self-Evolving Training for Multimodal Reasoning [Paper]
- Visual Agents as Fast and Slow Thinkers [Paper]
- Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [Paper]
- Scaling Inference-Time Search With Vision Value Model for Improved Visual Comprehension [Paper]
- Slow Perception: Let's Perceive Geometric Figures Step-by-Step [Paper]
- AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [Paper]
- LLaVA-o1: Let Vision Language Models Reason Step-by-Step [Paper]
- Vision-Language Models Can Self-Improve Reasoning via Reflection [Paper]
- I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models [Paper]
- RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision [Paper]
- Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models [Paper]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models [Paper]
- MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [Paper]
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs [Paper]
- A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? [Paper]
- EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [Paper]
- SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [Paper]
- Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models [Paper]
- FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [Paper]
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI [Paper]
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [Paper]
- LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion [Paper]
- Humanity's Last Exam [Paper]
- LR2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems [Paper]
- BIG-Bench Extra Hard [Paper]
- Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable [Paper]
- OverThink: Slowdown Attacks on Reasoning LLMs [Paper]
- GuardReasoner: Towards Reasoning-based LLM Safeguards [Paper]
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
- ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails [Paper]
- SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities [Paper]
- H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking [Paper]
- BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack [Paper]
# | Repository & Link | Description |
---|---|---|
1 |
RL4VLM Archived & Read-Only as of December 15, 2024 |
Offers code for fine-tuning large vision-language models as decision-making agents via RL. Includes implementations for training models with task-specific rewards and evaluating them in various environments. |
2 | LlamaGym | Simplifies fine-tuning large language model (LLM) agents with online RL. Provides an abstract Agent class to handle various aspects of RL training, allowing for quick iteration and experimentation across different environments. |
3 | RL-Based Fine-Tuning of Diffusion Models for Biological Sequences | Accompanies a tutorial and review paper on RL-based fine-tuning, focusing on the design of biological sequences (DNA/RNA). Provides comprehensive tutorials and code implementations for training and fine-tuning diffusion models using RL. |
4 | LM-RL-Finetune | Aims to improve KL penalty optimization in RL fine-tuning of language models by computing the KL penalty term analytically. Includes configurations for training with Proximal Policy Optimization (PPO). |
5 | InstructLLaMA | Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT. |
6 | SEIKO | Introduces a novel RL method to efficiently fine-tune diffusion models in an online setting. Its techniques outperform baselines such as PPO, classifier-based guidance, and direct reward backpropagation for fine-tuning Stable Diffusion. |
7 | TRL (Train Transformer Language Models with RL) | A state-of-the-art library for post-training foundation models using methods like Supervised Fine-Tuning (SFT), Proximal Policy Optimization (PPO), GRPO, and Direct Preference Optimization (DPO). Built on the π€ Transformers ecosystem, it supports multiple model architectures and scales efficiently across hardware setups. |
8 | Fine-Tuning Reinforcement Learning Models as Continual Learning | Explores fine-tuning RL models as a forgetting mitigation problem (continual learning). Provides insights and code implementations to address forgetting in RL models. |
9 | RL4LMs | A modular RL library to fine-tune language models to human preferences. Rigorously evaluated through 2000+ experiments using the GRUE benchmark, ensuring robustness across various NLP tasks. |
10 | Lamorel | A high-throughput, distributed architecture for seamless LLM integration in interactive environments. While not specialized in RL or RLHF by default, it supports custom implementations and is ideal for users needing maximum flexibility. |
11 | LLM-Reverse-Curriculum-RL | Implements the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning". Focuses on enhancing LLM reasoning capabilities using a reverse curriculum RL approach. |
12 | veRL | A flexible, efficient, and production-ready RL training library for large language models (LLMs). Serves as the open-source implementation of the HybridFlow framework and supports various RL algorithms (PPO, GRPO), advanced resource utilization, and scalability up to 70B models on hundreds of GPUs. Integrates with Hugging Face models, supervised fine-tuning, and RLHF with multiple reward types. |
13 | trlX | A distributed training framework for fine-tuning large language models (LLMs) with reinforcement learning. Supports both Accelerate and NVIDIA NeMo backends, allowing training of models up to 20B+ parameters. Implements PPO and ILQL, and integrates with CHEESE for human-in-the-loop data collection. |
14 | Okapi | A framework for instruction tuning in LLMs with RLHF, supporting 26 languages. Provides multilingual resources such as ChatGPT prompts, instruction datasets, and response ranking data, along with both BLOOM-based and LLaMa-based models and evaluation benchmarks. |
15 | LLaMA-Factory | Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024). Supports a wide array of models (e.g., LLaMA, LLaVA, Qwen, Mistral) with methods including pre-training, multimodal fine-tuning, reward modeling, PPO, DPO, and ORPO. Offers scalable tuning (16-bit, LoRA, QLoRA) with advanced optimizations and logging integrations, and provides fast inference via API, Gradio UI, and CLI with vLLM workers. |
- "AutoGPT: LLMs for Autonomous RL Agents" - OpenAI (2023) [Paper]
- "Barkour: Benchmarking LLM-Augmented RL" - Wu et al. (2023) [Paper]
- Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models [Paper]
- PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models [Paper]
- MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [Paper]
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-like LLMs [Paper]
- A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? [Paper]
- EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [Paper]
- SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [Paper]
- Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models [Paper]
- FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI [Paper]
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI [Paper]
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [Paper]
- LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion [Paper]
- Humanity's Last Exam [Paper]
- LR2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems [Paper]
- BIG-Bench Extra Hard [Paper]
- πΉ Decision Transformer (GitHub)
- πΉ ReAct (GitHub)
- πΉ RLHF (GitHub)
Contributions are welcome! If you have relevant papers, code, or insights, feel free to submit a pull request.
If you find our work useful or use it in your research, please consider citing:
@misc{kumar2025llmposttrainingdeepdive,
title={LLM Post-Training: A Deep Dive into Reasoning Large Language Models},
author={Komal Kumar and Tajamul Ashraf and Omkar Thawakar and Rao Muhammad Anwer and Hisham Cholakkal and Mubarak Shah and Ming-Hsuan Yang and Phillip H. S. Torr and Fahad Shahbaz Khan and Salman Khan},
year={2025},
eprint={2502.21321},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.21321},
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Looking forward to your feedback, contributions, and stars! π Please raise any issues or questions here.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-Post-training
Similar Open Source Tools

Awesome-LLM-Post-training
The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.

NewEraAI-Papers
The NewEraAI-Papers repository provides links to collections of influential and interesting research papers from top AI conferences, along with open-source code to promote reproducibility and provide detailed implementation insights beyond the scope of the article. Users can stay up to date with the latest advances in AI research by exploring this repository. Contributions to improve the completeness of the list are welcomed, and users can create pull requests, open issues, or contact the repository owner via email to enhance the repository further.

dive-into-llms
The 'Dive into Large Language Models' series programming practice tutorial is an extension of the 'Artificial Intelligence Security Technology' course lecture notes from Shanghai Jiao Tong University (Instructor: Zhang Zhuosheng). It aims to provide introductory programming references related to large models. Through simple practice, it helps students quickly grasp large models, better engage in course design, or academic research. The tutorial covers topics such as fine-tuning and deployment, prompt learning and thought chains, knowledge editing, model watermarking, jailbreak attacks, multimodal models, large model intelligent agents, and security. Disclaimer: The content is based on contributors' personal experiences, internet data, and accumulated research work, provided for reference only.

Awesome-Code-LLM
Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, in lowercase letters: 'description' (detailed description of the repo, must be less than 400 wordsοΌEnsure that no line breaks and quotation marks.),'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,user may use those keyword to find the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in lowercase letters), 'answer' (in english languages)

lobe-icons
Lobe Icons is a collection of popular AI / LLM Model Brand SVG logos and icons. It features lightweight and scalable icons designed with highly optimized scalable vector graphics (SVG) for optimal performance. The collection is tree-shakable, allowing users to import only the icons they need to reduce the overall bundle size of their projects. Lobe Icons has an active community of designers and developers who can contribute and seek support on platforms like GitHub and Discord. The repository supports a wide range of brands across different models, providers, and applications, with more brands continuously being added through contributions. Users can easily install Lobe UI with the provided commands and integrate it with NextJS for server-side rendering. Local development can be done using Github Codespaces or by cloning the repository. Contributions are welcome, and users can contribute code by checking out the GitHub Issues. The project is MIT licensed and maintained by LobeHub.

ColossalAI
Colossal-AI is a deep learning system for large-scale parallel training. It provides a unified interface to scale sequential code of model training to distributed environments. Colossal-AI supports parallel training methods such as data, pipeline, tensor, and sequence parallelism and is integrated with heterogeneous training and zero redundancy optimizer.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and their applications, while also discussing current limitations and future directions.

Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and applications. It discusses current limitations and future directions in efficient MLLM research.

omi
Omi is an open-source AI wearable that provides automatic, high-quality transcriptions of meetings, chats, and voice memos. It revolutionizes how conversations are captured and managed by connecting to mobile devices. The tool offers features for seamless documentation and integration with third-party services.

PPTAgent
PPTAgent is an innovative system that automatically generates presentations from documents. It employs a two-step process for quality assurance and introduces PPTEval for comprehensive evaluation. With dynamic content generation, smart reference learning, and quality assessment, PPTAgent aims to streamline presentation creation. The tool follows an analysis phase to learn from reference presentations and a generation phase to develop structured outlines and cohesive slides. PPTEval evaluates presentations based on content accuracy, visual appeal, and logical coherence.

RAG-Survey
This repository is dedicated to collecting and categorizing papers related to Retrieval-Augmented Generation (RAG) for AI-generated content. It serves as a survey repository based on the paper 'Retrieval-Augmented Generation for AI-Generated Content: A Survey'. The repository is continuously updated to keep up with the rapid growth in the field of RAG.

Awesome-Graph-LLM
Awesome-Graph-LLM is a curated collection of research papers exploring the intersection of graph-based techniques with Large Language Models (LLMs). The repository aims to bridge the gap between LLMs and graph structures prevalent in real-world applications by providing a comprehensive list of papers covering various aspects of graph reasoning, node classification, graph classification/regression, knowledge graphs, multimodal models, applications, and tools. It serves as a valuable resource for researchers and practitioners interested in leveraging LLMs for graph-related tasks.

LLaMA-Factory
LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

Awesome-Deep-Learning-Papers-for-Search-Recommendation-Advertising
This repository contains a curated list of deep learning papers focused on industrial applications such as search, recommendation, and advertising. The papers cover various topics including embedding, matching, ranking, large models, transfer learning, and reinforcement learning.

gateway
Gateway is a tool that streamlines requests to 100+ open & closed source models with a unified API. It is production-ready with support for caching, fallbacks, retries, timeouts, load balancing, and can be edge-deployed for minimum latency. It is blazing fast with a tiny footprint, supports load balancing across multiple models, providers, and keys, ensures app resilience with fallbacks, offers automatic retries with exponential fallbacks, allows configurable request timeouts, supports multimodal routing, and can be extended with plug-in middleware. It is battle-tested over 300B tokens and enterprise-ready for enhanced security, scale, and custom deployments.

InternLM
InternLM is a powerful language model series with features such as 200K context window for long-context tasks, outstanding comprehensive performance in reasoning, math, code, chat experience, instruction following, and creative writing, code interpreter & data analysis capabilities, and stronger tool utilization capabilities. It offers models in sizes of 7B and 20B, suitable for research and complex scenarios. The models are recommended for various applications and exhibit better performance than previous generations. InternLM models may match or surpass other open-source models like ChatGPT. The tool has been evaluated on various datasets and has shown superior performance in multiple tasks. It requires Python >= 3.8, PyTorch >= 1.12.0, and Transformers >= 4.34 for usage. InternLM can be used for tasks like chat, agent applications, fine-tuning, deployment, and long-context inference.
For similar tasks

Awesome-LLM-Post-training
The Awesome-LLM-Post-training repository is a curated collection of influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies. It covers various aspects of LLMs, including reasoning, decision-making, reinforcement learning, reward learning, policy optimization, explainability, multimodal agents, benchmarks, tutorials, libraries, and implementations. The repository aims to provide a comprehensive overview and resources for researchers and practitioners interested in advancing LLM technologies.

mindsdb
MindsDB is a platform for customizing AI from enterprise data. You can create, serve, and fine-tune models in real-time from your database, vector store, and application data. MindsDB "enhances" SQL syntax with AI capabilities to make it accessible for developers worldwide. With MindsDBβs nearly 200 integrations, any developer can create AI customized for their purpose, faster and more securely. Their AI systems will constantly improve themselves β using companiesβ own data, in real-time.

training-operator
Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, Tensorflow, XGBoost, MPI, Paddle and others. Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK. > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes. * For a complete reference of the custom resource definitions, please refer to the API Definition. * TensorFlow API Definition * PyTorch API Definition * Apache MXNet API Definition * XGBoost API Definition * MPI API Definition * PaddlePaddle API Definition * For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator * For details on its observability, please refer to the monitoring design doc.

helix
HelixML is a private GenAI platform that allows users to deploy the best of open AI in their own data center or VPC while retaining complete data security and control. It includes support for fine-tuning models with drag-and-drop functionality. HelixML brings the best of open source AI to businesses in an ergonomic and scalable way, optimizing the tradeoff between GPU memory and latency.

nntrainer
NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.

petals
Petals is a tool that allows users to run large language models at home in a BitTorrent-style manner. It enables fine-tuning and inference up to 10x faster than offloading. Users can generate text with distributed models like Llama 2, Falcon, and BLOOM, and fine-tune them for specific tasks directly from their desktop computer or Google Colab. Petals is a community-run system that relies on people sharing their GPUs to increase its capacity and offer a distributed network for hosting model layers.

LLaVA-pp
This repository, LLaVA++, extends the visual capabilities of the LLaVA 1.5 model by incorporating the latest LLMs, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B. It provides various models for instruction-following LMMS and academic-task-oriented datasets, along with training scripts for Phi-3-V and LLaMA-3-V. The repository also includes installation instructions and acknowledgments to related open-source contributions.

KULLM
KULLM (ꡬλ¦) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8ΓA100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.