Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI.

Stars: 55

Visit

README:

Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI. The contents contain the following parts:

Table of Content

NLP
CV
Multimodal
Reinforcement Learning
GNN
Transformer Architecture

  ├─ NLP/  
  │  ├─ Word2Vec/  
  │  ├─ Seq2Seq/           
  │  └─ Pretraining/  
  │    ├─ Large Language Model/          
  │    ├─ LLM Application/ 
  │      ├─ AI Agent/          
  │      ├─ Academic/          
  │      ├─ Code/       
  │      ├─ Financial Application/
  │      ├─ Information Retrieval/  
  │      ├─ Math/     
  │      ├─ Medicine and Law/   
  │      ├─ Recommend System/      
  │      └─ Tool Learning/             
  │    ├─ LLM Technique/ 
  │      ├─ Alignment/          
  │      ├─ Context Length/          
  │      ├─ Corpus/       
  │      ├─ Evaluation/
  │      ├─ Hallucination/  
  │      ├─ Inference/     
  │      ├─ MoE/   
  │      ├─ PEFT/     
  │      ├─ Prompt Learning/   
  │      ├─ RAG/       
  │      └─ Reasoning and Planning/       
  │    ├─ LLM Theory/       
  │    └─ Chinese Model/             
  ├─ CV/  
  │  ├─ CV Application/          
  │  ├─ Contrastive Learning/         
  │  ├─ Foundation Model/ 
  │  ├─ Generative Model (GAN and VAE)/          
  │  ├─ Image Editing/          
  │  ├─ Object Detection/          
  │  ├─ Semantic Segmentation/            
  │  └─ Video/          
  ├─ Multimodal/       
  │  ├─ Audio/          
  │  ├─ BLIP/         
  │  ├─ CLIP/        
  │  ├─ Diffusion Model/   
  │  ├─ Multimodal LLM/          
  │  ├─ Text2Image/          
  │  ├─ Text2Video/            
  │  └─ Survey/           
  │─ Reinforcement Learning/ 
  │─ GNN/ 
  └─ Transformer Architecture/

NLP

1. Word2Vec

Efficient Estimation of Word Representations in Vector Space, Mikolov et al., arxiv 2013. [paper]
Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al., NIPS 2013. [paper]
Distributed representations of sentences and documents, Le and Mikolov, ICML 2014. [paper]
Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy, arxiv 2014. [paper]
word2vec Parameter Learning Explained, Rong, arxiv 2014. [paper]
Glove: Global vectors for word representation.，Pennington et al., EMNLP 2014. [paper][code]
fastText: Bag of Tricks for Efficient Text Classification, Joulin et al., arxiv 2016. [paper][code]
ELMo: Deep Contextualized Word Representations, Peters et al., NAACL 2018. [paper]
Distilling the Knowledge in a Neural Network, Hinton et al., arxiv 2015. [paper][FitNets]
BPE: Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL 2016. [paper][code]
Byte-Level BPE: Neural Machine Translation with Byte-Level Subwords, Wang et al., arxiv 2019. [paper][code]

2. Seq2Seq

Generating Sequences With Recurrent Neural Networks, Graves, arxiv 2013. [paper]
Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeruIPS 2014. [paper]
Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015. [paper][code]
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., arxiv 2014. [paper]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Cho et al., arxiv 2014. [paper]
[fairseq][fairseq2][fairscale][pytorch-seq2seq]

3. Pretraining

Attention Is All You Need, Vaswani et al., NIPS 2017. [paper][code]
GPT: Improving language understanding by generative pre-training, Radford et al., preprint 2018. [paper][code]
GPT-2: Language Models are Unsupervised Multitask Learners, Radford et al., OpenAI blog 2019. [paper][code][llm.c]
GPT-3: Language Models are Few-Shot Learners, Brown et al., NeurIPS 2020. [paper][code][nanoGPT][build-nanogpt][gpt-fast][modded-nanogpt][nanotron]
InstructGPT: Training language models to follow instructions with human feedback, Ouyang et al., NeurIPS 2022. [paper][MOSS-RLHF]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL 2019 Best Paper. [paper][code][BERT-pytorch][bert4torch][bert4keras][ModernBERT][What Should We Learn From ModernBERT]
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Liu et al., arxiv 2019. [paper][code][Chinese-BERT-wwm]
What Does BERT Look At: An Analysis of BERT's Attention, Clark et al., arxiv 2019. [paper][code]
DeBERTa: Decoding-enhanced BERT with Disentangled Attention, He et al., ICLR 2021. [paper][code]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Sanh et al., arxiv 2019. [paper][code][albert_pytorch]
BERT Rediscovers the Classical NLP Pipeline, Tenney et al., arxiv 2019. [paper][code]
How to Fine-Tune BERT for Text Classification?, Sun et al., arxiv 2019. [paper][code]
TinyStories: How Small Can Language Models Be and Still Speak Coherent English, Eldan and Li, arxiv 2023. [paper][dataset][phi-4][SmolLM][SmolLM2][Computational Bottlenecks of Training Small-scale Large Language Models][SLMs-Survey][MiniLLM][aligning_tinystories]
[LLM101n][EurekaLabsAI][llm-course][intro-llm][llm-cookbook][hugging-llm][generative-ai-for-beginners][awesome-generative-ai-guide][LLMs-from-scratch][llm-action][llms_idx][tiny-universe][AISystem]
[cs230-code-examples][victoresque/pytorch-template][songquanpeng/pytorch-template][Academic-project-page-template][WritingAIPaper]
[tokenizer_summary][minbpe][tokenizers][tiktoken][SentencePiece][Cosmos-Tokenizer][tiktokenizer]

3.1 Large Language Model

A Survey of Large Language Models, Zhao etal., arxiv 2023. [paper][code][LLMBox][LLMBook-zh][LLMsPracticalGuide][Foundations-of-LLMs]
Efficient Large Language Models: A Survey, Wan et al., arxiv 2023. [paper][code]
Challenges and Applications of Large Language Models, Kaddour et al., arxiv 2023. [paper]
A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT, Zhou et al., arxiv 2023. [paper]
From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, Mclntosh et al., arxiv 2023. [paper][AGI-survey]
A Survey of Resource-efficient LLM and Multimodal Foundation Models, Xu et al., arxiv 2024. [paper][code]
Large Language Models: A Survey, Minaee et al., arxiv 2024. [paper][Foundations of Large Language Models]
Anthropic: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
Anthropic: Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
Anthropic: Model Card and Evaluations for Claude Models, Anthropic, 2023. [paper]
Anthropic: The Claude 3 Model Family: Opus, Sonnet, Haiku, Anthropic, 2024. [paper][Claude 3.5]
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model, BigScience Workshop, arxiv 2022. [paper][code][model]
OPT: Open Pre-trained Transformer Language Models, Zhang et al., arxiv 2022. [paper][code]
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., arxiv 2022. [paper]
Gopher: Scaling Language Models: Methods, Analysis & Insights from Training Gopher, Rae et al., arxiv 2021. [paper]
GPT-NeoX-20B: An Open-Source Autoregressive Language Model, Black et al., arxiv 2022. [paper][code]
Gemini: A Family of Highly Capable Multimodal Models, Gemini Team, Google, arxiv 2023. [paper][Gemini 1.0][Gemini 1.5][Unofficial Implementation][MiniGemini]
Gemma: Open Models Based on Gemini Research and Technology, Google DeepMind, arxiv 2024. [paper][code][google-deepmind/gemma][gemma.cpp][model][paligemma][gemma-cookbook]
Gemma 2: Improving Open Language Models at a Practical Size, Google Team, 2024. [paper][blog][Advancing Responsible AI with Gemma][Gemma Scope][ShieldGemma][Gemma-2-9B-Chinese-Chat]
Gemma 3 Technical Report, Gemma Team, 2025. [blog][paper]
GPT-4 Technical Report, OpenAI, arxiv 2023. [blog][paper]
GPT-4V(ision) System Card, OpenAI, OpenAI blog 2023. [paper][GPT-4o][GPT-4o System Card]
Sparks of Artificial General Intelligence_Early experiments with GPT-4, Bubeck et al., arxiv 2023. [paper]
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision), Yang et al., arxiv 2023. [paper][guidance]
LaMDA: Language Models for Dialog Applications, Thoppilan et al., arxiv 2022. [paper][LaMDA-rlhf-pytorch]
LLaMA: Open and Efficient Foundation Language Models, Touvron et al., arxiv 2023. [paper][code][llama.cpp][ollama][llamafile]
Llama 2: Open Foundation and Fine-Tuned Chat Models, Touvron et al., arxiv 2023. [paper][code][llama2.c][lit-llama][litgpt]
The Llama 3 Herd of Models, Llama Team, AI @ Meta, 2024. [blog][paper][code][llama-models][llama-recipes][LLM Adaptation][llama3-from-scratch][nano-llama31][minimind][felafax]
Llama 3.2: Revolutionizing edge AI and vision with open, customizable models, 2024. [blog][model][llama-stack][llama-stack-apps][lingua][llama-assistant][minimind-v][Llama3.2-Vision-Finetune]
TinyLlama: An Open-Source Small Language Model, Zhang et al., arxiv 2024. [paper][code][LiteLlama][MobiLlama][Steel-LLM][minimind][tiny-llm-zh][SkyLadder]
Stanford Alpaca: An Instruction-following LLaMA Model, Taori et al., Stanford blog 2023. [paper][code][Alpaca-Lora][OpenAlpaca]
Mistral 7B, Jiang et al., arxiv 2023. [paper][code][model][mistral-finetune]
OLMo: Accelerating the Science of Language Models, Groeneveld et al., ACL 2024. [paper][code][OLMo2 blog][OLMo2 paper][Dolma Dataset][Molmo and PixMo][Pangea]
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training, Lambert et al., arxiv 2024. [paper][code]
Minerva: Solving Quantitative Reasoning Problems with Language Models, Lewkowycz et al., arxiv 2022. [paper]
PaLM: Scaling Language Modeling with Pathways, Chowdhery et al., arxiv 2022. [paper][PaLM-pytorch][PaLM-rlhf-pytorch][PaLM]
PaLM 2 Technical Report, Anil et al., arxiv 2023. [paper][AudioPaLM]
PaLM-E: An Embodied Multimodal Language Model, Driess et al., arxiv 2023. [paper][code]
T5: Exploring the limits of transfer learning with a unified text-to-text transformer, Raffel et al., Journal of Machine Learning Research 2020. [paper][code][t5-pytorch][t5-pegasus-pytorch]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, Lewis et al., ACL 2020. [paper][code]
FLAN: Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper][code]
Scaling Flan: Scaling Instruction-Finetuned Language Models, Chung et al., arxiv 2022. [paper][model]
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai et al., ACL 2019. [paper][code]
XLNet: Generalized Autoregressive Pretraining for Language Understanding, Yang et al., NeurIPS 2019. [paper][code]
WebGPT: Browser-assisted question-answering with human feedback, Nakano et al., arxiv 2021. [paper][MS-MARCO-Web-Search][WebWalker]
Open Release of Grok-1, xAI, 2024. [blog][code][model][modelscope][hpcai-tech/grok-1][dbrx][Command R+][snowflake-arctic]
Large Language Diffusion Models, Nie et al., arxiv 2025. [paper][code][Diffusion-LM][BD3-LM]

3.2 LLM Application

A Watermark for Large Language Models, Kirchenbauer et al., arxiv 2023. [paper][code][MarkLLM][Watermarked_LLM_Identification][Awesome-LLM-Watermark]
SynthID-Text: Scalable watermarking for identifying large language model outputs, Dathathri et al., Nature 2024. [paper][code][watermark-anything]
SeqXGPT: Sentence-Level AI-Generated Text Detection, Wang et al., EMNLP 2023. [paper][code][llm-detect-ai][detect-gpt][fast-detect-gpt][ImBD][MAGE]
AlpaGasus: Training A Better Alpaca with Fewer Data, Chen et al., ICLR 2024. [paper][code]
AutoMix: Automatically Mixing Language Models, Madaan et al., arxiv 2023. [paper][code]
ChipNeMo: Domain-Adapted LLMs for Chip Design, Liu et al., arxiv 2023. [paper][semikong][circuit_training][semikong][Automating GPU Kernel Generation]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]
MemGPT: Towards LLMs as Operating Systems, Packer et al., arxiv 2023. [paper][code][Zep][zep-python][graphiti]
UFO: A UI-Focused Agent for Windows OS Interaction, Zhang et al., arxiv 2024. [paper][code][OSWorld][aguvis][Large Action Models]
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, Wu et al., ICLR 2024. [paper][code][OS-Atlas][OS-Genesis][SeeClick][WindowsAgentArena]
AIOS: LLM Agent Operating System, Mei et al., arxiv 2024. [paper][code]
DB-GPT: Empowering Database Interactions with Private Large Language Models, Xue et al., arxiv 2023. [paper][code][DocsGPT][privateGPT][localGPT]
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, Wang et al., ICLR 2024. [paper][code]
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement, Zheng et al., arxiv 2024. [paper][code][code-interpreter][open-interpreter]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4, Mukherjee et al., arxiv 2023. [paper]
PDFTriage: Question Answering over Long, Structured Documents, Saad-Falcon et al., arxiv 2023. [paper][[code]]
Prompt2Model: Generating Deployable Models from Natural Language Instructions, Viswanathan et al., EMNLP 2023. [paper][code]
Shepherd: A Critic for Language Model Generation, Wang et al., arxiv 2023. [paper][code]
Alpaca: A Strong, Replicable Instruction-Following Model, Taori et al., Stanford Blog 2023. [paper][code]
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*, Chiang et al., 2023. [blog]
WizardLM: Empowering Large Language Models to Follow Complex Instructions, Xu et al., ICLR 2024. [paper][code]
WebCPM: Interactive Web Search for Chinese Long-form Question Answering, Qin et al., ACL 2023. [paper][code]
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences, Liu et al., KDD 2023. [paper][code][AutoWebGLM][WebRL][AutoCrawler][gpt-crawler][webllama][gpt-researcher][skyvern][Scrapegraph-ai][crawl4ai][crawlee-python][Agent-E][CyberScraper-2077][browser-use][ReaderLM-v2]
LLM4Decompile: Decompiling Binary Code with Large Language Models, Tan et al., arxiv 2024. [paper] [code]
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Liu et al., ICML 2024. [paper][code][Awesome-LLMs-on-device]
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices, Chu et al., arxiv 2023. [paper][code][MobileVLM V2][BlueLM-V-3B][XiaoMi/mobilevlm]
The Oscars of AI Theater: A Survey on Role-Playing with Language Models, Chen et al., arxiv 2024. [paper][code][RPBench-Auto][Hermes 3 Technical Report][From Persona to Personalization: A Survey on Role-Playing Language Agents][MMRole][OpenCharacter]
Apple Intelligence Foundation Language Models, Gunter et al., arxiv 2024. [blog][paper]
Controllable Text Generation for Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code][guidance][outlines][instructor][marvin]
[ray][dify][academy][ant-ray][dask][TaskingAI][gpt4all][ollama][llama.cpp][mindsdb][bisheng][phidata][guidance][outlines][jsonformer][fabric][mem0][taipy][langflow]
[awesome-llm-apps][fastc][Awesome-Domain-LLM][agents][ai-app-lab]
[chatgpt-on-wechat][dify-on-wechat][LLM-As-Chatbot][NextChat][chatbox][cherry-studio][khoj][HuixiangDou][Streamer-Sales][Tianji][metahuman-stream][aiavatarkit][ai-getting-started][chatnio][VideoChat][livetalking]

3.2.1 AI Agent

LLM Powered Autonomous Agents, Lilian Weng, 2023. [blog][LLMAgentPapers][LLM-Agents-Papers][awesome-language-agents][Awesome-Papers-Autonomous-Agent][GUI-Agents-Paper-List][ai-agent-white-paper][Building effective agents]
A Survey on Large Language Model based Autonomous Agents, Wang et al., [paper][code][LLM-Agent-Paper-Digest][awesome-lifelong-llm-agent]
The Rise and Potential of Large Language Model Based Agents: A Survey, Xi et al., arxiv 2023. [paper][code]
Agent AI: Surveying the Horizons of Multimodal Interaction, Durante et al., arxiv 2024. [paper]
Position Paper: Agent AI Towards a Holistic Intelligence, Huang et al., arxiv 2024. [paper]
AgentBench: Evaluating LLMs as Agents, Liu et al., ICLR 2024. [paper][code][VisualAgentBench][OSWorld][AgentGym][Agent-as-a-Judge][intellagent][Survey on Evaluation of LLM-based Agents]
Agents: An Open-source Framework for Autonomous Language Agents, Zhou et al., arxiv 2023. [paper][code][Symbolic Learning Enables Self-Evolving Agents]
AutoAgents: A Framework for Automatic Agent Generation, Chen et al., IJCAI 2024. [paper][code]
AgentTuning: Enabling Generalized Agent Abilities for LLMs, Zeng et al., ACL 2024. [paper][code]
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors, Chen et al., ICLR 2024. [paper][code]
AppAgent: Multimodal Agents as Smartphone Users, Zhang et al., arxiv 2023. [paper][code][digirl][Android-Lab][AppAgentX]
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, Wang et al., arxiv 2024. [paper][code][Mobile-Agent-v2][LiMAC][Mobile-Agent-E][Mobile-Agent-V][PC-Agent]
OmniParser for Pure Vision Based GUI Agent, Lu et al., arxiv 2024. [paper][code][Agent-S][The Dawn of GUI Agent][ShowUI][Aria-UI][aguvis][TinyClick][InfiGUIAgent][autoMate]
AutoGLM: Autonomous Foundation Agents for GUIs, Liu et al., arxiv 2024. [paper][code][CogAgent]
UI-TARS: Pioneering Automated GUI Interaction with Native Agents, Qin et al., arxiv 2025. [paper][code][UI-TARS-desktop][midscene][browser-use][computer_use_ootb][Agent-S][open-operator][STEVE-R1][UI-R1]
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World, He et al., arxiv 2024. [paper][code][PPTAgent]
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, Li et al., arxiv 2024. [paper][code][OS-Agent-Survey][ACU][Large Language Model-Brained GUI Agents: A Survey][Aguvis][awesome-computer-use]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al., arxiv 2023. [paper][code][AG2][RD-Agent][TinyTroupe]
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Li et al., NeurIPS 2023. [paper][code][crab][oasis][owl]
ChatDev: Communicative Agents for Software Development, Qian et al., ACL 2024. [paper][code][gpt-pilot][Scaling Large-Language-Model-based Multi-Agent Collaboration][ProactiveAgent][FilmAgent]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, Hong et al., ICLR 2024 Oral. [paper][code][OpenManus][OpenManus-RL]
ProAgent: From Robotic Process Automation to Agentic Process Automation, Ye et al., arxiv 2023. [paper][code]
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation, Luo et al., arxiv 2024. [paper][code]
Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arxiv 2023. [paper][code][genagents][GPTeam]
CogAgent: A Visual Language Model for GUI Agents, Hong et al., CVPR 2024 Highlight. [paper][code][CogAgent][blog]
OpenAgents: An Open Platform for Language Agents in the Wild, Xie et al., arxiv 2023. [paper][code]
TaskWeaver: A Code-First Agent Framework, Qiao et al., arxiv 2023. [paper][code]
AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents, Tang et al., arxiv 2025. [paper][code][Auto-Deep-Research]
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, Fan et al., NeurIPS 2022 Outstanding Paper. [paper][code]
Voyager: An Open-Ended Embodied Agent with Large Language Models, Wang et al., arxiv 2023. [paper][code][WebVoyager][OpenWebVoyager][PAE]
Eureka: Human-Level Reward Design via Coding Large Language Models, Ma et al., ICLR 2024. [paper][code][DrEureka][MM-EUREKA]
LEGENT: Open Platform for Embodied Agents, Cheng et al., ACL 2024. [paper][code][EmbodiedEval]
Mind2Web: Towards a Generalist Agent for the Web, Deng et al., NeurIPS 2023. [paper][code][AutoWebGLM]
WebArena: A Realistic Web Environment for Building Autonomous Agents, Zhou et al., ICLR 2024. [paper][code][visualwebarena][agent-workflow-memory][WindowsAgentArena]
SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded, Zheng et al., arxiv 2024. [paper][code][WebDreamer]
Learning to Model the World with Language, Lin et al., ICML 2024. [paper][code]
Cradle: Empowering Foundation Agents Towards General Computer Control, Tan et al., arxiv 2024. [paper][code]
AgentScope: A Flexible yet Robust Multi-Agent Platform, Gao et al., arxiv 2024. [paper][code][modelscope-agent]
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments, Xi et al., arxiv 2024. [paper][code]
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence, Chen et al., arxiv 2024. [paper][code][iAgents]
CLASI: Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent, ByteDance Research, 2024. [paper][translation-agent]
Automated Design of Agentic Systems, Hu et al., arxiv 2024. [paper][code][agent-zero][AgentK][AFlow: Automating Agentic Workflow Generation]
New tools for building agents, OpenAI, 2025. [blog][openai-agents-python][openai-cua-sample-app][swarm]
Foundation Models in Robotics: Applications, Challenges, and the Future, Firoozi et al., arxiv 2023. [paper][code][Awesome-Implicit-NeRF-Robotics]
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, Liu et al., arxiv 2024. [paper][code]
RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., arxiv 2022. [paper][code][IRASim]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, Brohan et al., arxiv 2023. [paper][Unofficial Implementation][RT-H: Action Hierarchies Using Language][RoboMamba]
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, arxiv 2023. [paper][code]
Shaping the future of advanced robotics, Google DeepMind 2024. [blog][Gemini Robotics]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, Wang et al., ICML 2024. [paper][code][Genesis]
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation, Wu et al., ICLR 2024. [paper][code][Moto]
RL-GPT: Integrating Reinforcement Learning and Code-as-policy, Liu et al., arxiv 2024. [paper]
Genie: Generative Interactive Environments, Bruce et al., ICML 2024 Best Paper. [paper][Genie 2][genie2-pytorch][GameNGen][GameGen-X][GameFactory][Unbounded][open-oasis][DIAMOND][WHAM]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, Fu et al., arxiv 2024. [paper][code][Hardware Code][Learning Code][UMI][humanplus][TeleVision][Surgical Robot Transformer][lifelike-agility-and-play][ReKep][Open_Duck_Mini][Learning Visual Parkour from Generated Images][ASAP][UniAct]
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation, Liu et al., arxiv 2024. [paper][code]
Octo: An Open-Source Generalist Robot Policy, Ghosh et al., arxiv 2024. [paper][code][BodyTransformer][crossformer]
GRUtopia: Dream General Robots in a City at Scale, Wang et al., arxiv 2024. [paper][code]
HPT: Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, Wang et al., NeurIPS 2024 Spotlight. [paper][code][GenSim]
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents, Yue et al., arxiv 2024. [paper][SpatialLM]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots, NVIDIA, arxiv 2025. [paper][code][IsaacLab][IsaacGymEnvs][OmniIsaacGymEnvs][MuJoCo Playground]
[LeRobot][Genesis][DORA][awesome-ai-agents][IsaacLab][IsaacGymEnvs][OmniIsaacGymEnvs][Isaac-GR00T][Awesome-Robotics-3D][AimRT][agibot_x1_train][Agibot-World][unitree_IL_lerobot][unitree_rl_gym][openpi]
[AutoGPT][GPT-Engineer][AgentGPT][OpenManus][owl][langmanus]
[BabyAGI][SuperAGI][OpenAGI]
[open-interpreter][Homepage][rawdog][OpenCodeInterpreter]
XAgent: An Autonomous Agent for Complex Task Solving, [blog][code]
[crewAI][phidata][PraisonAI][llama_deploy][gpt-computer-assistant][agentic_patterns][pydantic-ai]
[swarm][swarms][AgentStack][multi-agent-orchestrator][smolagents][agent-service-toolkit][agno][ANUS][AutoAgent][AgentIQ]
[translation-agent][agent-zero][AgentK][evolving-agents][Twitter Personality][RD-Agent][TinyTroupe]

3.2.2 Academic

Pangu Weather: Accurate medium-range global weather forecasting with 3D neural networks, Bi et al., Nature 2023. [paper][code][arxiv]
Skilful nowcasting of extreme precipitation with NowcastNet, Zhang et al., Nature 2023. [paper][code][graphcast][OpenCastKit][GenCast][PhiFlow]
MatterGen: A generative model for inorganic materials design, Zeni et al., Nature 2025. [paper][code][mattersim]
Galactica: A Large Language Model for Science, Taylor et al., arxiv 2022. [paper][code]
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization, Deng et al., arxiv 2023. [paper][code][pdf_parser]
GeoGalactica: A Scientific Large Language Model in Geoscience, Lin et al., arxiv 2024. [paper][code][sciparser]
EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities, Li et al., ACL 2024. [paper][code][Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives]
Scientific Large Language Models: A Survey on Biological & Chemical Domains, Zhang et al., arxiv 2024. [paper][code][sciknoweval]
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning, Zhang et al., arxiv 2024. [paper][code]
ChemLLM: A Chemical Large Language Model, Zhang et al., arxiv 2024. [paper][model]
LangCell: Language-Cell Pre-training for Cell Identity Understanding, Zhao et al., ICML 2024. [paper][code][scFoundation]
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers, Pramanick et al., arxiv 2024. [paper][code]
STORM: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models, Shao et al., NAACL 2024. [paper][code][Co-STORM EMNLP 2024][WikiChat][kiroku][gpt-researcher][OmniThink]
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis, Yu et al., arxiv 2024. [paper][code][AgentReview]
AutoSurvey: Large Language Models Can Automatically Write Surveys, Wang et al., NeurIPS 2024. [paper][code][SurveyX][SurveyForge]
OpenResearcher: Unleashing AI for Accelerated Scientific Research, Zheng et al., arxiv 2024. [paper][code][Paper Copilot][SciAgentsDiscovery][paper-qa][GraphReasoning]
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs, Asai et al., arxiv 2024. [paper][code][ollama-deep-researcher][PaSa]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arxiv 2024. [paper][code][AI-Scientist-ICLR2025-Workshop-Experiment][Zochi Technical Report][Social_Science][SocialAgent][game_theory][hypothesis-generation][Towards an AI co-scientist]
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, Si et al., arxiv 2024. [paper][code]
CoI-Agent: Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents, Li et al., arxiv 2024. [paper][code][AI-Researcher]
Agent Laboratory: Using LLM Agents as Research Assistants, Schmidgall et al., arxiv 2025. [paper][code][AgentRxiv][Curie]
[Awesome-Scientific-Language-Models][gpt_academic][ChatPaper][scispacy][awesome-ai4s][xVal]
[OpenDeepResearcher][node-DeepResearch][open-deep-research][open-deep-research blog][open_deep_research][deep-research][Auto-Deep-Research][deep-searcher][local-deep-research][local-deep-researcher][open_deep_research][Agentic-Reasoning]

3.2.3 Code

Neural code generation, CMU 2024 Spring. [link]
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, Zhang et al., arxiv 2023. [paper][Awesome-Code-LLM][MFTCoder][Awesome-Code-LLM][CodeFuse-muAgent][Awesome-Code-Intelligence]
Source Code Data Augmentation for Deep Learning: A Survey, Zhuo et al., arxiv 2023. [paper][code]
Codex: Evaluating Large Language Models Trained on Code, Chen et al., arxiv 2021. [paper][human-eval][CriticGPT][On scalable oversight with weak LLMs judging strong LLMs]
Code Llama: Open Foundation Models for Code, Rozière et al., arxiv 2023. [paper][code][model][llamacoder]
CodeGemma: Open Code Models Based on Gemma, CodeGemma Team, arxiv 2024. [blog][paper]
AlphaCode: Competition-Level Code Generation with AlphaCode, Li et al., arxiv 2022. [paper][dataset][AlphaCode2_Tech_Report]
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X, Zheng et al., KDD 2023. [paper][code][CodeGeeX2][CodeGeeX4]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, Nijkamp et al., ICLR 2022. [paper][code]
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages, Nijkamp et al., ICLR 2023. [paper][code]
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, Le et al., arxiv 2023. [paper][code]
StarCoder: may the source be with you, Li et al., arxiv 2023. [paper][code][bigcode-project][model]
StarCoder 2 and The Stack v2: The Next Generation, Lozhkov et al., 2024. [paper][code][starcoder.cpp]
SelfCodeAlign: Self-Alignment for Code Generation, Wei et al., NeurIPS 2024. [paper][code]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct, Luo et al., ICLR 2024. [paper][code][WarriorCoder]
Magicoder: Source Code Is All You Need, Wei et al., arxiv 2023. [paper][code]
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Ridnik et al., arxiv 2024. [paper][code][pr-agent][cover-agent]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence, Guo et al., arxiv 2024. [paper][code]
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence, Zhu et al., CoRR 2024. [paper][code][DeepSeek-V2.5][Ling-Coder-lite]
Qwen2.5-Coder Technical Report, Hui et al., arxiv 2024. [paper][code][CodeArena][CodeElo]
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models, Huang et al., arxiv 2024. [paper][code][dataset][opc_data_filtering][OpenCodeEval]
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Yang et al., arxiv 2024. [paper]
Design2Code: How Far Are We From Automating Front-End Engineering?, Si et al., arxiv 2024. [paper][code]
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct, Lei et al., arxiv 2024. [paper][code]
XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL, Gao et al., arxiv 2024. [paper][code][vanna][NL2SQL_Handbook][Spider2][WrenAI]
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, Yang et al., arxiv 2024. [paper][code][swe-bench-technical-report][CodeR][Lingma-SWE-GPT][SWE-Gym][MarsCode Agent][SWE-Fixer][SWE-RL][SWE-Lancer]
Agentless: Demystifying LLM-based Software Engineering Agents, Xia et al., arxiv 2024. [paper][code]
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions, Zhuo et al., arxiv 2024. [paper][code][LiveCodeBench][evalplus][BigOBench]
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents, Wang et al., arxiv 2024. [paper][code][open-operator][potpie]
Planning In Natural Language Improves LLM Search For Code Generation, Wang et al., arxiv 2024. [paper][SRA-MCTS]
Large Language Model-Based Agents for Software Engineering: A Survey, Liu et al., arxiv 2024. [paper][code]
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale, Phan et al., arxiv 2024. [paper][code][Seeker][AutoKaggle][Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level]
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code, Zhang et al., arxiv 2024. [paper]
FullStack Bench: Evaluating LLMs as Full Stack Coders, Liu et al., arxiv 2024. [paper][code][SandboxFusion][BitsAI-CR]
o1-Coder: an o1 Replication for Coding, Zhang et al., arxiv 2024. [paper][code]
S*: Test Time Scaling for Code Generation, Li et al., arxiv 2025. [paper][code]
Competitive Programming with Large Reasoning Models, OpenAI, arxiv 2025. [paper][SWE-Lancer]
[Yi-Coder][aiXcoder-7B][codealpaca]
[OpenDevin][devika][auto-code-rover][developer][aider][claude-engineer][SuperCoder][AIDE][vulnhuntr][devin.cursorrules]
[screenshot-to-code][vanna][NL2SQL_Handbook][TAG-Bench][Spider2][WrenAI]

3.2.4 Financial Application

DocLLM: A layout-aware generative language model for multimodal document understanding, Wang et al., arxiv 2024. [paper]
DocGraphLM: Documental Graph Language Model for Information Extraction, Wang et al., arxiv 2023. [paper]
FinBERT: A Pretrained Language Model for Financial Communications, Yang et al., arxiv 2020. [paper][Wiley paper][code][finBERT][valuesimplex/FinBERT]
FinGPT: Open-Source Financial Large Language Models, Yang et al., IJCAI 2023. [paper][code]
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models, Yang et al., arxiv 2024. [paper][code]
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets, Wang et al., arxiv 2023. [paper][code]
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models, Zhang et al., arxiv 2023. [paper][code]
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, Liu et al., arxiv 2020. [paper][code]
FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning, Liu et al., NeurIPS 2022. [paper][code]
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning, Chen et al., arxiv 2023. [paper][code]
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist, Zhang et al., arxiv 2024. [paper]
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters, Zhang et al., arxiv 2023. [paper][code]
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications, Xie et al., arxiv 2024. [paper][code]
StructGPT: A General Framework for Large Language Model to Reason over Structured Data, Jiang et al., arxiv 2023. [paper][code]
Large Language Model for Table Processing: A Survey, Lu et al., arxiv 2024. [paper][llm-table-survey][table-transformer][Awesome-Tabular-LLMs][Awesome-LLM-Tabular][Table-LLaVA][tablegpt-agent][TableLLM][OmniSQL][ChartVLM]
TabPFN: Accurate predictions on small data with a tabular foundation model, Hollmann et al., Nature 2024. [paper][code]
rLLM: Relational Table Learning with LLMs, Li et al., arxiv 2024. [paper][code]
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Zhang et al., arxiv 2023. [paper][code]
Data Interpreter: An LLM Agent For Data Science, Hong et al., arxiv 2024. [paper][code]
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework, Li et al., COLING 2024. [paper][code]
LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction, Wang et al., arxiv 2024. [paper][MIGA]
A Survey of Large Language Models in Finance (FinLLMs), Lee et al., arxiv 2024. [paper][code][Revolutionizing Finance with LLMs: An Overview of Applications and Insights]
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges, Nie et al., arxiv 2024. [paper][financial-datasets][LLMs-in-Finance]
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods, Wang et al., arxiv 2024. [paper][code][Stockagent][TradingAgents]
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset, Zhu et al., ACL 2024. [paper][code][Golden-Touchstone][financebench][OmniEval][FLAME][FinEval][CFBenchmark][MME-Finance]
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model, Li et al., arxiv 2024. [paper][code][TwinMarket][HedgeAgents]
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance, Qian et al., arxiv 2025. [paper][code][PIXIU][FLAG-Trader][FinAudio]
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning, Liu et al., arxiv 2025. [paper][code][FinRL-DeepSeek]
[gpt-investor][FinGLM][agentUniverse][gs-quant][stockbot-on-groq][Real-Time-Stock-Market-Prediction-using-Ensemble-DL-and-Rainbow-DQN][openbb-agents][ai-hedge-fund][ai-financial-agent][Finance]

3.2.5 Information Retrieval

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, Khattab et al., SIGIR 2020. [paper][simbert][roformer-sim]
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, Santhanam et al., NAACL 2022. [paper][code][RAGatouille][rerankers][Rankify][A Reproducibility Study of PLAID][Jina-ColBERT-v2]
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval, Louis et al., arxiv 2024. [paper][code][model]
NCI: A Neural Corpus Indexer for Document Retrieval, Wang et al., NeurIPS 2022 Outstanding Paper. [paper][code][DSI-transformers][GDR EACL 2024 Oral]
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels, Gao et al., ACL 2023. [paper][code]
Query2doc: Query Expansion with Large Language Models, Wang et al., EMNLP 2023. [paper][Query Expansion by Prompting Large Language Models]
RankGPT: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents, Sun et al., EMNLP 2023 Outstanding Paper. [paper][code]
Large Language Models for Information Retrieval: A Survey, Zhu et al., arxiv 2023. [paper][code][YuLan-IR][A Survey of Conversational Search][A Survey of Model Architectures in Information Retrieval][A Survey of Query Optimization in Large Language Models]
Large Language Models for Generative Information Extraction: A Survey, Xu et al., arxiv 2023. [paper][code][UIE][NERRE][uie_pytorch]
LLaRA: Making Large Language Models A Better Foundation For Dense Retrieval, Li et al., arxiv 2023. [paper][code]
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models, Li et al., AAAI 2024. [paper]
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Zhu et al., ACL 2024. [paper][code][ChatRetriever][fullrank]
GenIR: From Matching to Generation: A Survey on Generative Information Retrieval, Li et al., arxiv 2024. [paper][code]
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search, Liao et al., ACL 2024. [paper][code]
Preference Discerning with LLM-Enhanced Generative Retrieval, Paischer et al., arxiv 2024. [paper][Unifying Generative and Dense Retrieval for Sequential Recommendation]
BM25S: Orders of magnitude faster lexical search via eager sparse scoring, Xing Han Lù, arxiv 2024. [paper][code][rank_bm25][pyserini]
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, Chen et al., arxiv 2024. [paper][code][Search Engines in an AI Era][WebWalker]
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines, Zhang et al., arxiv 2024. [paper][code][Smart Multi-Modal Search][M3DocRAG][Visualized BGE][OmniSearch][StreamRAG][VisRAG]
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines, Jiang et al., ICLR 2025. [paper][code][CoLLM]
MLGym: A New Framework and Benchmark for Advancing AI Research Agents, Nathani et al., arxiv 2025. [paper][code]
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, Song et al., arxiv 2025. [paper][code][Search-o1][ReSearch][Auto-Deep-Research][deep-searcher]
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, Jin et al., arxiv 2025. [paper][code][DeepRetrieval]
Open Deep Search: Democratizing Search with Open-source Reasoning Agents, Alzubi et al., arxiv 2025. [paper][code][deep-searcher]
SIGIR-AP 2023 Tutorial: Recent Advances in Generative Information Retrieval [link]
SIGIR 2024 Tutorial: Large Language Model Powered Agents for Information Retrieval [link]
[search_with_lepton][LLocalSearch][FreeAskInternet][storm][searxng][Perplexica][rag-search][sensei][azure-search-openai-demo][Gemini-Search][deep-searcher]
[similarities][text2vec][FlagEmbedding][leettools]
[SearchEngine][elasticsearch][elasticsearch-labs][tevatron]

3.2.6 Math

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, Gou et al., ICLR 2024. [paper][code]
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models, Yu et al., ICLR 2024. [paper][code][MathCoder]
MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models, Lu et al., ICLR 2024 Oral. [paper][code][MathBench][OlympiadBench][Math-Verify]
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning, Ying et al., arxiv 2024. [paper][code]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Shao et al., arxiv 2024. [paper][code][Math-Shepherd][DeepSeek-Prover-V1.5][Goedel-Prover][BFS-Prover]
Common 7B Language Models Already Possess Strong Math Capabilities, Li et al., arxiv 2024. [paper][code]
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline, Xu et al., arxiv 2024. [paper][code]
AlphaMath Almost Zero: process Supervision without process, Chen et al., arxiv 2024. [paper][code]
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models, Zhou et al., NeurIPS 2024. [paper][code]
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B, Zhang et al., arxiv 2024. [paper][code][LLaMA-Berry]
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models, Shi et al., arxiv 2024. [paper][code]
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?, Qiao et al., arxiv 2024. [paper][code][URSA]
MAVIS: Mathematical Visual Instruction Tuning, Zhang et al., arxiv 2024. [paper][code]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement, Yang et al., arxiv 2024. [paper][code][Qwen2.5-Math-Demo][ProcessBench][SuperCorrect-llm][The Lessons of Developing Process Reward Models in Mathematical Reasoning]
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models, Deng et al., arxiv 2024. [paper][code][alphageometry][AlphaGeometry2][MathCritique][PromptCoT]
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking, Guan et al., arxiv 2025. [paper][code][PRIME]
*Self-rewarding correction for mathematical reasoning, Xiong et al., arxiv 2025. [paper][code]
AI Mathematical Olympiad - Progress Prize 1, Kaggle Competition 2024. [Numina 1st Place Solution][project-numina/aimo-progress-prize][How NuminaMath Won the 1st AIMO Progress Prize][NuminaMath-7B-TIR][AI achieves silver-medal standard solving International Mathematical Olympiad problems]

3.2.7 Medicine and Law

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge, Zhou et al., arxiv 2023. [paper][code][LLM-for-Healthcare][GMAI-MMBench]
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law, Chen et al., arxiv 2024. [paper][code]
PMC-LLaMA: Towards Building Open-source Language Models for Medicine, Wu et al., arxiv 2024. [paper][code][MMedLM]
HuatuoGPT, towards Taming Language Model to Be a Doctor, Zhang et al., arxiv 2023. [paper][code][HuatuoGPT-II][Medical_NLP][Zhongjing][MedicalGPT][huatuogpt-vision][Chain-of-Diagnosis][BianCang][Llama3-OpenBioLLM-70B][CareGPT][HealthGPT]
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs, Chen et al., arxiv 2024. [paper][code][MedVLM-R1]
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model, Cui et al., arxiv 2023. [paper][code][HK-O1aw]
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services, Yue et al., arxiv 2023. [paper][code]
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Bao et al., arxiv 2023. [paper][code]
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT, Chen et al., arxiv 2023. [paper][code][SoulChat2.0][smile]
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning, Tang et al., arxiv 2023. [paper][code][MedRAG][TxAgent]
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Chen et al., arxiv 2023. [paper][meditron]
Med-PaLM: Large language models encode clinical knowledge, Singhal et al., Nature 2023. [paper][Unofficial Implementation]
Capabilities of Gemini Models in Medicine, Saab et al., arxiv 2024. [paper]
AMIE: Towards Conversational Diagnostic AI, Tu et al., arxiv 2024. [paper][AMIE-pytorch]
Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People, Wang et al., arxiv 2024. [paper][code][Medical_NLP]
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator, Fan et al., COLING 2025. [paper][code][Agent Hospital][MentalArena][MING][EmoLLM]
AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models, Liu et al., COLING 2025. [paper][code]
AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents, Chen et al., arxiv 2024. [paper][code]
On Domain-Specific Post-Training for Multimodal Large Language Models, Cheng et al., ICLR 2024. [paper][model]
[Awesome-LegalAI-Resources][LexiLaw][LexEval][LawBench]
[alphafold3][alphafold][RoseTTAFold][RFdiffusion]
[openfold][alphafold3-pytorch][Protenix][AlphaFold3][Ligo-Biosciences/AlphaFold3][LucaOne][esm][AlphaPPImd][visual-med-alpaca][chai-lab][evo][evo2][AIRS][OpenBioMed]

3.2.8 Recommend System

DIN: Deep Interest Network for Click-Through Rate Prediction, Zhou et al., KDD 2018. [paper][code][DIEN][x-deeplearning]
MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, Ma et al., KDD 2018. [paper][DeepCTR-Torch][pytorch-mmoe]
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), Geng et al., arxiv 2022. [paper][unofficial code][OpenP5]
TIGER: Recommender Systems with Generative Retrieval, Rajput et al., NeurIPS 2023. [paper][Methodologies for Improving Modern Industrial Recommender Systems][Sparse Meets Dense]
Unifying Large Language Models and Knowledge Graphs: A Roadmap, Pan et al., arxiv 2023. [paper]
YuLan-Rec: User Behavior Simulation with Large Language Model based Agents, Wang et al., arxiv 2023. [paper][code][Scaling Law of Large Sequential Recommendation Models]
SSLRec: A Self-Supervised Learning Framework for Recommendation, Ren et al., WSDM 2024 Oral. [paper][code][Awesome-SSLRec-Papers]
RLMRec: Representation Learning with Large Language Models for Recommendation, Ren et al., WWW 2024. [paper][code]
LLMRec: Large Language Models with Graph Augmentation for Recommendation, Wei et al., WSDM 2024 Oral. [paper][code][EasyRec]
XRec: Large Language Models for Explainable Recommendation, Ma et al., arxiv 2024. [paper][code][SelfGNN]
Agent4Rec_On Generative Agents in Recommendation, Zhang et al., SIGIR 2024. [paper][code]
LLM-KERec: Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph, Zhao et al., arxiv 2024. [paper]
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Zhai et al., ICML 2024. [paper][code][ExFM][Transformers4Rec][torchrec][LlamaRec]
Wukong: Towards a Scaling Law for Large-Scale Recommendation, Zhang et al., ICML 2024. [paper][unofficial code][Towards An Efficient LLM Training Paradigm for CTR Prediction]
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems, Lian et al., arxiv 2024. [paper][code]
IDGenRec: LLM-RecSys Alignment with Textual ID Learning, Tan et al., SIGIR 2024. [paper][code]
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application, Jia et al., arxiv 2024. [paper][NAR4Rec][HoME][Long-Sequence Recommendation Models Need Decoupled Embeddings][OneRec][MIM]
NoteLLM-2: Multimodal Large Representation Models for Recommendation, Zhang et al., NeurIPS 2024. [paper][NoteLLM][SSD]
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling, Chen et al., arxiv 2024. [paper][code][monolith]
STAR: A Simple Training-free Approach for Recommendations using Large Language Models, Lee et al., arxiv 2024. [paper][ActionPiece]
[recommenders][Twitter's Recommendation Algorithm][Awesome-RSPapers][RecBole][RecSysDatasets][LLM4Rec-Awesome-Papers][Awesome-LLM-for-RecSys][Awesome-LLM4RS-Papers][DA-CL-4Rec][ReChorus][Transformers4Rec][torchrec]
[fun-rec][RecommenderSystem][AI-RecommenderSystem][RecSysPapers][Algorithm-Practice-in-Industry][AlgoNotes][torch-rechub]

3.2.9 Tool Learning

Tool Learning with Foundation Models, Qin et al., arxiv 2023. [paper][code]
Tool Learning with Large Language Models: A Survey, Qu et al., arxiv 2024. [paper][code]
Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., arxiv 2023. [paper][toolformer-pytorch][conceptofmind/toolformer][xrsrke/toolformer][Graph_Toolformer]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., ICLR 2024 Spotlight. [paper][code][StableToolBench]
Gorilla: Large Language Model Connected with Massive APIs, Patil et al., arxiv 2023. [paper][code]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code][EasyTool]
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction, Yang et al., arxiv 2023. [paper][code]
RestGPT: Connecting Large Language Models with Real-World RESTful APIs, Song et al., arxiv 2023. [paper][code]
LLMCompiler: An LLM Compiler for Parallel Function Calling, Kim et al., ICML 2024. [paper][code]
Large Language Models as Tool Makers, Cai et al, arxiv 2023. [paper][code]
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang et al., arxiv 2023. [paper][code][ToolQA][toolbench]
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al., arxiv 2023. [paper][[code]]
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models, Lu et al., NeurIPS 2023. [paper][code]
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios, Ye et al., arxiv 2024. [paper][code]
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Du et al., arxiv 2024. [paper][code]
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error, Wang et al., arxiv 2024. [paper][code]
What Are Tools Anyway? A Survey from the Language Model Perspective, Wang et al., arxiv 2024. [paper]
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, Lu et al., arxiv 2024. [paper][code][API-Bank][ToolHop][ComplexFuncBench][tool-retrieval-benchmark]
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval, Chen et al., arxiv 2024. [paper]
ToolACE: Winning the Points of LLM Function Calling, Liu et al., arxiv 2024. [paper][ToolGen]
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking, Lin et al., arxiv 2024. [paper][code]
MCP: Introducing the Model Context Protocol, Anthropic 2024. [blog][code][Documentation][OpenAI Agents SDK][MCP.so][mcpagents]
[functionary][ToolLearningPapers][awesome-tool-llm][agents-json][langgraph-bigtool][octotools]

3.3 LLM Technique

How to Train Really Large Models on Many GPUs, Lilian Weng, 2021. [blog][The Ultra-Scale Playbook]
Training great LLMs entirely from ground zero in the wilderness as a startup, Yi Tay, 2024. [blog][What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives][New LLM Pre-training and Post-training Paradigms]
Understanding LLMs: A Comprehensive Overview from Training to Inference, Liu et al., arxiv 2024. [paper]
[Awesome-LLM-System-Papers][awesome-production-llm][Awesome-MLSys-Blogger][Awesome-ML-SYS-Tutorial][how-to-learn-deep-learning-framework][open-infra-index][CutlassAcademy][Triton-Puzzles]
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, Huang et al., NeurIPS 2019. [paper][Parameter Server OSDI 2014][ps-lite][Zero Bubble Pipeline Parallelism][DualPipe]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Shoeybi et al., arxiv 2019. [paper][code][picotron][Megatron-2][megatron sequence parallelism][Scaling Language Model Training to a Trillion Parameters Using Megatron][DeepEP]
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Rajbhandari et al., arxiv 2019. [paper][DeepSpeed][FSDP][pytorch-fsdp]
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training, Li et al., ICPP 2023. [paper][code]
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, Jiang et al., NSDI 2024. [paper][veScale][blog][Parameter Server OSDI 2014][ps-lite][ByteCheckpoint][HybridFlow]
A Theory on Adam Instability in Large-Scale Machine Learning, Molybog et al., arxiv 2023. [paper]
Loss Spike in Training Neural Networks, Zhang et al., arxiv 2023. [paper]
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, Biderman et al., arxiv 2023. [paper][code]
Continual Pre-Training of Large Language Models: How to (re)warm your model, Gupta et al., [paper]
FLM-101B: An Open LLM and How to Train It with $100K Budget, Li et al., arxiv 2023. [paper][model][Tele-FLM]
Instruction Tuning with GPT-4, Peng et al., arxiv 2023. [paper][code]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Khattab et al., arxiv 2023. [paper][code][textgrad][appl][okhat/blog][PromptWizard][SPO]
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training, Feng et al., ICML 2024. [paper][code][Natural-language-RL]
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning, Ye et al., arxiv 2024. [paper][code]
Arcee's MergeKit: A Toolkit for Merging Large Language Models, Goddard et al., EMNLP 2024. [paper][code][DistillKit][A Survey on Collaborative Strategies in the Era of Large Language Models][FuseAI][MergeLM][Long-to-Short-via-Model-Merging]
A Survey on Self-Evolution of Large Language Models, Tao et al., arxiv 2024. [paper][code]
Adam-mini: Use Fewer Learning Rates To Gain More, Zhang et al., arxiv 2024. [paper][code]
RouteLLM: Learning to Route LLMs with Preference Data, Ong et al., arxiv 2024. [paper][code][RouterDC][masrouter][RouterEval]
Instruction Pre-Training: Language Models are Supervised Multitask Learners, Cheng et al., arxiv 2024. [paper][code]
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training, Jaghouar et al., arxiv 2024. [paper][code][Prime][DiLoCo][DisTrO][Streaming DiLoCo][Eager Updates For Overlapped Communication and Computation in DiLoCo][Scaling Laws for DiLoCo]
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models, Jin et al., arxiv 2024. [paper][code][jailbreak_llms][llm-attacks][Awesome-Jailbreak-on-LLMs][Constitutional Classifiers]
LLM-Pruner: On the Structural Pruning of Large Language Models, Ma et al. NeurIPS 2023. [paper][code][Awesome-Efficient-LLM]
LLM Pruning and Distillation in Practice: The Minitron Approach, Sreenivas et al., arxiv 2024. [paper][code][distillm][llm_distillation_playbook]
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning, An et al., arxiv 2024. [paper][code][open-infra-index][Parameter Server OSDI 2014][ps-lite]
LLM Post-Training: A Deep Dive into Reasoning Large Language Models, Kumar et al., arxiv 2025. [paper][code][A Survey on Post-training of Large Language Models]
[wandb][aim][tensorboardX][nvitop]

3.3.1 Alignment

AI Alignment: A Comprehensive Survey, Ji et al., arxiv 2023. [paper][PKU-Alignment][webpage]
Large Language Model Alignment: A Survey, Shen et al., arxiv 2023. [paper]
Aligning Large Language Models with Human: A Survey, Wang et al., arxiv 2023. [paper][code]
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More, Wang et al., arxiv 2024. [paper]
Towards a Unified View of Preference Learning for Large Language Models: A Survey, Gao et al., arxiv 2024. [paper][code]
[alignment-handbook][OpenRLHF]
Self-Instruct: Aligning Language Models with Self-Generated Instructions, Wang et al., ACL 2023. [paper][code][open-instruct][Multi-modal-Self-instruct][evol-instruct][MMEvol][Automatic Instruction Evolving for Large Language Models]
Self-Alignment with Instruction Backtranslation, Li et al., ICLR 2024. [paper][unofficial implementation]
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code][From Quantity to Quality NAACL'24][Reformatted Alignment][MAmmoTH2: Scaling Instructions from the Web]
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing, Xu et al., ICLR 2025. [paper][code][Condor]
RLHF: [hf blog][deep-rl-ppo blog][OpenAI blog][alignment blog][awesome-RLHF]
Secrets of RLHF in Large Language Models [MOSS-RLHF][Part I: PPO][Part II: Reward Modeling][Does RLHF Scale]
Safe RLHF: Safe Reinforcement Learning from Human Feedback, Dai et al., ICLR 2024 Spotlight. [paper][code][align-anything][Safe-Policy-Optimization]
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization, Huang et al., arxiv 2024. [paper][code][blog][trl][trlx][RL4LMs]
RLHF Workflow: From Reward Modeling to Online RLHF, Dong et al., arxiv 2024. [paper][code][Online-RLHF][Online-DPO-R1]
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework, Hu et al., arxiv 2024. [paper][code][REINFORCE++][OpenRLHF-M][Unraveling RLHF and Its Variants][Does RLHF Scale]
HybridFlow: A Flexible and Efficient RLHF Framework, Sheng et al., arxiv 2024. [paper][code][DAPO][VC-PPO]
LIMA: Less Is More for Alignment, Zhou et al., NeurIPS 2023. [paper][LIMO][LIMR]
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023 Runner-up Award. [paper][Unofficial Implementation][trl][dpo_trainer]
BPO: Black-Box Prompt Optimization: Aligning Large Language Models without Model Training, Cheng et al., arxiv 2023. [paper][code]
KTO: Model Alignment as Prospect Theoretic Optimization, Ethayarajh et al., arxiv 2024. [paper][code]
ORPO: Monolithic Preference Optimization without Reference Model, Hong et al., EMNLP 2024. [paper][code][GRPO][GRPO Trainer][tiny-grpo][simple_GRPO][grpo-flat][kl-rel-to-ref-in-rl-zh]
TDPO: Token-level Direct Preference Optimization, Zeng et al., arxiv 2024. [paper][code][Step-DPO][FineGrainedRLHF][MCTS-DPO][Critical Tokens Matter]
SimPO: Simple Preference Optimization with a Reference-Free Reward, Meng et al., arxiv 2024. [paper][code]
Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Lee et al., arxiv 2023. [paper][[code]][awesome-RLAIF]
Direct Language Model Alignment from Online AI Feedback, Guo et al., arxiv 2024. [paper]
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models, Li et al., ICML 2024. [paper][code][policy_optimization]
Zephyr: Direct Distillation of LM Alignment, Tunstall et al., arxiv 2023. [paper][code]
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Burns et al., arxiv 2023. [paper][code][weak-to-strong-deception][Evolving Alignment via Asymmetric Self-Play][easy-to-hard][Debate Helps Weak-to-Strong Generalization][Detecting misbehavior in frontier reasoning models]
SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, Chen et al., arxiv 2024. [paper][code][unofficial implementation]
SPPO: Self-Play Preference Optimization for Language Model Alignment, Wu et al., arxiv 2024. [paper][code][A Survey on Self-play Methods in Reinforcement Learning]
CALM: LLM Augmented LLMs: Expanding Capabilities through Composition, Bansal et al., arxiv 2024. [paper][CALM-pytorch]
Self-Rewarding Language Models, Yuan et al., arxiv 2024. [paper][unofficial implementation][Meta-Rewarding Language Models][Self-Taught Evaluators]
Anthropic: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Hubinger et al., arxiv 2024. [paper]
LongAlign: A Recipe for Long Context Alignment of Large Language Models, Bai et al., EMNLP 2024. [paper][code]
Aligner: Efficient Alignment by Learning to Correct, Ji et al., NeurIPS 2024 Oral. [paper][code]
A Survey on Knowledge Distillation of Large Language Models, Xu et al., arxiv 2024. [paper][code]
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment, Shen et al., arxiv 2024. [paper][code][NeMo-Curator][Nemotron-4 340B Technical Report][Mistral NeMo][SparseLLM][MaskLLM][HelpSteer2-Preference]
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Ni et al., arxiv 2024. [paper][code]
Towards Scalable Automated Alignment of LLMs: A Survey, Cao et al., arxiv 2024. [paper][code]
Putting RL back in RLHF, Huang and Ahmadian, 2024. [blog]
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
Rule Based Rewards for Language Model Safety, Mu et al., OpenAI 2024. [blog][paper][code]
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning, Zhao et al., arxiv 2024. [paper][code][prompt2model]

3.3.2 Context Length

Thus Spake Long-Context Large Language Model, Liu et al., arxiv 2025. [paper][A Comprehensive Survey on Long Context Language Modeling]
ALiBi: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, Press et al., ICLR 2022. [paper][code]
Positional Interpolation: Extending Context Window of Large Language Models via Positional Interpolation, Chen et al., arxiv 2023. [paper]
Scaling Transformer to 1M tokens and beyond with RMT, Bulatov et al., AAAI 2024. [paper][code][LM-RMT]
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text, Zhou et al., arxiv 2023. [paper][code]
LongNet: Scaling Transformers to 1,000,000,000 Tokens, Ding et al., arxiv 2023. [paper][code][unofficial code]
Focused Transformer: Contrastive Training for Context Scaling, Tworkowski et al., NeurIPS 2023. [paper][code]
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models, Chen et al., ICLR 2024 Oral. [paper][code]
StreamingLLM: Efficient Streaming Language Models with Attention Sinks, Xiao et al., ICLR 2024. [paper][code][SwiftInfer][SwiftInfer blog]
YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR 2024. [paper][code][LM-Infinite]
Ring Attention with Blockwise Transformers for Near-Infinite Context, Liu et al., ICLR 2024. [paper][code][ring-attention-pytorch][ring-flash-attention][local-attention][tree_attention]
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, Jiang et al., ACL 2024. [paper][code]
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., arxiv 2024. [paper][code]
LongRoPE2: Near-Lossless LLM Context Window Scaling, Shang et al., arxiv 2025. [paper][code]
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, Jin et al., arxiv 2024. [paper][code]
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey, Pawar et al., arxiv 2024. [paper][Awesome-LLM-Long-Context-Modeling]
Data Engineering for Scaling Language Models to 128K Context, Fu et al., arxiv 2024. [paper][code]
CEPE: Long-Context Language Modeling with Parallel Context Encoding, Yen et al., ACL 2024. [paper][code]
Training-Free Long-Context Scaling of Large Language Models, An et al., ICML 2024. [paper][code]
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory, Xiao et al., NeurIPS 2024. [paper][code]
Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models, Song et al., arxiv 2024. [paper][code][LLMTest_NeedleInAHaystack][RULER][LooGLE][LongBench][google-deepmind/loft]
Infini-Transformer: Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arxiv 2024. [paper][infini-transformer-pytorch][InfiniTransformer][infini-mini-transformer][megalodon][InfiniteHiP]
Activation Beacon: Long Context Compression with Activation Beacon, Zhang et al., ICLR 2025. [paper][code][Extending Llama-3's Context Ten-Fold Overnight]
Make Your LLM Fully Utilize the Context, An et al., arxiv 2024. [paper][code]
CoPE: Contextual Position Encoding: Learning to Count What's Important, Golovneva et al., arxiv 2024. [paper][rope_cope]
Scaling Granite Code Models to 128K Context, Stallone et al., arxiv 2024. [paper][code][granite-3.1-language-models]
Generalizing an LLM from 8k to 1M Context using Qwen-Agent, Qwen Team, 2024. [blog][Qwen2.5-1M]
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs, Bai et al., arxiv 2024. [paper][code][LongCite][LongReward][context-cite][OmniThink][SelfCite]
A failed experiment: Infini-Attention, and why we should keep trying, HuggingFace Blog, 2024. [blog][Magic Blog]
Why Does the Effective Context Length of LLMs Fall Short, An et al., arxiv 2024. [paper][code][rotary-embedding-torch]
How to Train Long-Context Language Models (Effectively), Gao et al., arxiv 2024. [paper][code]

3.3.3 Corpus

[datatrove][datasets][doccano][label-studio][autolabel][synthetic-data-generator][NeMo-Curator][distilabel][easy-dataset]
*Thinking about High-Quality Human Data, Lilian Weng, 2024. [blog]
C4: Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus, Dodge et al., arxiv 2021. [paper][dataset][bookcorpus][the-pile]
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, Laurençon et al., NeurIPS 2023. [paper][code][dataset]
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only, Penedo et al., arxiv 2023. [paper][dataset]
Data-Juicer: A One-Stop Data Processing System for Large Language Models, Chen et al., arxiv 2023. [paper][code]
UltraChat: Enhancing Chat Language Models by Scaling High-quality Instructional Conversations, Ding et al., EMNLP 2023. [paper][code][ultrachat]
UltraFeedback: Boosting Language Models with High-quality Feedback, Cui et al., ICML 2024. [paper][code][UltraInteract_sft]
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code]
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset, Qiu et al., arxiv 2024. [paper][dataset][CCI3.0-HQ][LabelLLM][labelU][MinerU][PDF-Extract-Kit]
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, Soldaini et al., ACL 2024. [paper][code][OLMo][CoSyn]
Datasets for Large Language Models: A Comprehensive Survey, Liu et al., arxiv 2024. [paper][Awesome-LLMs-Datasets]
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows, Patel et al., arxiv 2024. [paper][code]
Large Language Models for Data Annotation: A Survey, Tan et al., arxiv 2024. [paper][code]
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance, Ye et al., arxiv 2024. [paper][code][regmix]
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning, Bai et al., arxiv 2024. [paper][dataset]
Best Practices and Lessons Learned on Synthetic Data for Language Models, Liu et al., arxiv 2024. [paper][A Survey on Data Synthesis and Augmentation for Large Language Models]
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, HuggingFace, 2024. [paper][blogpost][fineweb][fineweb-edu]
DataComp: In search of the next generation of multimodal datasets, Gadre et al., arxiv 2023. [paper][code][multimodal_textbook]
DataComp-LM: In search of the next generation of training sets for language models, Li et al., arxiv 2024. [paper][code][apple/DCLM-7B-8k][data-agora]
Scaling Synthetic Data Creation with 1,000,000,000 Personas, Chan et al., arxiv 2024. [paper][code][MAGA]
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale, Zhou et al., arxiv 2024. [paper][code]
MinerU: An Open-Source Solution for Precise Document Content Extraction, Wang et al., arxiv 2024. [paper][code][PDF-Extract-Kit][DocLayout-YOLO][OmniDocBench][Document Parsing Unveiled][Docling Technical Report][markitdown][pandoc]
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models, Lai et al., arxiv 2024. [paper][BLIP]
RedStone: Curating General, Code, Math, and QA Data for Large Language Models, Chang et al., arxiv 2024. [paper][code]
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training, Yu et al., arxiv 2025. [paper][dataset]
Craw4LLM: Efficient Web Crawling for LLM Pretraining, Yu et al., arxiv 2025. [paper][code][DataMan]
[RedPajama-Data][xland-minigrid-datasets][OmniCorpus][dclm][Infinity-Instruct][MNBVC][LMSYS-Chat-1M][kangas][openwebtext][open-thoughts][Bespoke-Stratos-17k][dolphin-r1][reasoning-gym]
[llm-datasets][Awesome-LLM-Synthetic-Data]

3.3.4 Evaluation

[evaluate][evaluation-guidebook][EvalScope][llmperf][OpenEvals][Awesome-LLM-Eval][LLM-eval-survey][llm_benchmarks][Awesome-LLMs-Evaluation-Papers]
MMLU: Measuring Massive Multitask Language Understanding, Hendrycks et al., ICLR 2021. [paper][code][LiveBench]
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, Wang et al., EMNLP 2022. [paper][code]
HELM: Holistic Evaluation of Language Models, Liang et al., arxiv 2022. [paper][code]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Zheng et al., arxiv 2023. [paper][code]
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark, Xu et al., arxiv 2023. [paper][code][SuperCLUE-RAG]
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models, Huang et al., NeurIPS 2023. [paper][code][chinese-llm-benchmark]
CMMLU: Measuring massive multitask language understanding in Chinese, Li et al., arxiv 2023. [paper][code]
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark, Zhang et al., arxiv 2024. [paper][code]
GAIA: A Benchmark for General AI Assistants, Mialon et al., ICLR 2024. [paper][code]
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, Chiang et al., ICML 2024. [paper][demo][Challenges in Trustworthy Human Evaluation of Chatbots]
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, Kim et al., EMNLP 2024. [paper][code][prometheus][prometheus-vision]
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models, Zhang et al., arxiv 2024. [paper][code][VLMEvalKit][VideoMMMU]
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark, Yue et al., arxiv 2024. [paper][code]
Law of the Weakest Link: Cross Capabilities of Large Language Models, Zhong et al., arxiv 2024. [paper][code]
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering, Chan et al., arxiv 2024. [paper][code][swarm]
[Open LLM Leaderboard][lighteval][evaluate]
[AlpacaEval Leaderboard][alpaca_eval]
[Chatbot-Arena-Leaderboard][blog][FastChat][arena-hard-auto][WebDev Arena]
[lm-evaluation-harness][OpenAI Evals][simple-evals]
[OpenCompass][GAOKAO-Eval][VLMEvalKit]
[llm-colosseum][GamingAgent][UltraEval][Humanity's Last Exam]

3.3.5 Hallucination

Extrinsic Hallucinations in LLMs, Lilian Weng, 2024. [blog]
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models, Zhang et al., arxiv 2023. [paper][code]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Huang et al., arxiv 2023. [paper][code][Awesome-MLLM-Hallucination]
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, Li et al., arxiv 2024. [paper][code]
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios, Chem et al., arxiv 2023. [paper][code][OlympicArena][FActScore]
Chain-of-Verification Reduces Hallucination in Large Language Models, Dhuliawala et al., arxiv 2023. [paper][code]
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models, Guan et al., CVPR 2024. [paper][code]
Woodpecker: Hallucination Correction for Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation, Huang et al., CVPR 2024 Highlight. [paper][code]
TrustLLM: Trustworthiness in Large Language Models, Sun et al., arxiv 2024. [paper][code]
SAFE: Long-form factuality in large language models, Wei et al., arxiv 2024. [paper][code]
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models, Hu et al., arxiv 2024. [paper][code][HaluAgent][LLMsKnow]
Detecting hallucinations in large language models using semantic entropy, Farquhar et al., Nature 2024. [paper][semantic_uncertainty][long_hallucinations][Semantic Uncertainty ICLR 2023][Lynx-hallucination-detection]
A Survey on the Honesty of Large Language Models, Li et al., arxiv 2024. [paper][code]
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, Orgad et al., arxiv 2024. [paper][code]

3.3.6 Inference

How to make LLMs go fast, 2023. [blog][Benchmarking LLM Inference Backends][CSE 234: Data Systems for Machine Learning][CS 598: Systems for Generative AI]
A Visual Guide to Quantization, 2024. [blog]
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, Miao et al., arxiv 2023. [paper][Awesome-Quantization-Papers][awesome-model-quantization][qllm-eval]
Full Stack Optimization of Transformer Inference: a Survey, Kim et al., arxiv 2023. [paper]
A Survey on Efficient Inference for Large Language Models, Zhou et al., arxiv 2024. [paper]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, Dettmers et al., NeurIPS 2022. [paper][code]
LLM-FP4: 4-Bit Floating-Point Quantized Transformers, Liu et al., arxiv 2023. [paper][code]
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models, Shao et al., ICLR 2024 Spotlight. [paper][code][SpinQuant][EfficientQAT][ABQ-LLM][VPTQ][ppq]
BitNet: Scaling 1-bit Transformers for Large Language Models, Wang et al., arxiv 2023. [paper][code][microsoft/BitNet][unofficial implementation][BitNet-Transformers][BitNet b1.58][BitNet a4.8][T-MAC][BitBLAS][BiLLM][decoupleQ]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Frantar et al., ICLR 2023. [paper][code][AutoGPTQ][QMoE][llmc]
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Lin et al., MLSys 2024 Best Paper. [paper][code][AutoAWQ][smoothquant][omniserve]
LLM in a flash: Efficient Large Language Model Inference with Limited Memory, Alizadeh et al., arxiv 2023. [paper][air_llm]
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models, Jiang et al., EMNLP 2023. [paper][code]
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU, Sheng et al., ICML 2023. [paper][code]
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU, Song et al., arxiv 2023. [paper][code][llama.cpp][airllm][PowerInfer-2][PowerServe]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Dao et al., NeurIPS 2022. [paper][code]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, Tri Dao, ICLR 2024. [paper][code][xformers][SageAttention][SpargeAttn]
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, Shah et al., arxiv 2024. [paper][code]
vllm: Efficient Memory Management for Large Language Model Serving with PagedAttention, Kwon et al., arxiv 2023. [paper][code][FastChat][ollama]
SGLang: Efficient Execution of Structured Language Model Programs, Zheng et al., NeurIPS 2024. [blog][paper][code][sgl-learning-materials]
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving, Ye et al., arxiv 2025. [paper][code]
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads, Cai et al., ICML 2024. [paper][code][SnapKV][KVCache-Factory][SpecInfer]
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, Li et al., ICML 2024. [paper][code][EAGLE-2][EAGLE-3][LLMSpeculativeSampling][Sequoia][HASS][LongSpec]
ReDrafter: Recurrent Drafter for Fast Speculative Decoding in Large Language Models, Cheng et al., arxiv 2024. [blog][paper][code][A Hitchhiker's Guide to Speculative Decoding]
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding, Xia et al., arxiv 2024. [paper][code][Speculative Decoding and Beyond][Spec-Bench]
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding, Liu et al., arxiv 2024. [paper][[code]][Ouroboros]
Lookahead Decoding: Break the Sequential Dependency of LLM Inference Using Lookahead Decoding, Fu et al., ICML 2024. [paper][code][Consistency_LLM][Lookahead]
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, Jiang et al., NeurIPS 2024. [paper][code]
Sarathi-Serve: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve, Agrawal et al., OSDI 2024. [paper][code][SARATHI][ORCA OSDI 2022][continuous batching blog][vattention]
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving, Zhong et al., OSDI 2024. [blog][paper][code][Adrenaline]
Prompt Cache: Modular Attention Reuse for Low-Latency Inference, Gim et al., ICML 2024. [paper][code][FastServe]
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention, Brandon et al., arxiv 2024. [paper][YOCO][KVCache-Factory][InfiniGen][kvpress]
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving, Qin et al., arxiv 2024. [paper][code][ktransformers]
NanoFlow: Towards Optimal Large Language Model Serving Throughput, Zhu et al., arxiv 2024. [paper][code]
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads, Xiao et al., arxiv 2024. [paper][code][Star-Attention]
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models, Dong et al., arxiv 2024. [paper][code][mlc-llm]
[TensorRT-LLM][TransformerEngine][FasterTransformer][TritonServer][Dynamo][GenerativeAIExamples][TensorRT-Model-Optimizer][TensorRT][kvpress][OpenVINO]
[DeepSpeed-MII][DeepSpeed-FastGen][ONNX Runtime][onnx][Nanoflow]
[text-generation-inference][quantization][optimum-quanto][huggingface-inference-toolkit][torchao]
[OpenLLM][mlc-llm][ollama][open-webui][torchchat]
[LMDeploy][lightllm][Xinference][LitServe]
[ggml][exllamav2][llama.cpp][ktransformers][gpt-fast][fastllm][CTranslate2][ipex-llm][rtp-llm][KsanaLLM][ppl.nn][ZhiLight][WeChat-TFCC][ncnn][llumnix][dash-infer][truss][chitu]
[ChuanhuChatGPT][ChatGPT-Next-Web]

3.3.7 MoE

Mixture of Experts Explained, Sanseviero et al., Hugging Face Blog 2023. [blog]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al., arxiv 2017. [paper][Re-Implementation]
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Lepikhin et al., ICLR 2021. [paper][mixture-of-experts]
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Gale et al., arxiv 2022. [paper][code]
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models, Shen et al., arxiv 2023. [paper][[code]]
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, Fedus et al., arxiv 2021. [paper][code]
Fast Inference of Mixture-of-Experts Language Models with Offloading, Eliseev and Mazur, arxiv 2023. [paper][code]
Mixtral-8×7B: Mixtral of Experts, Jiang et al., arxiv 2023. [paper][code][megablocks-public][model][blog][Chinese-Mixtral-8x7B][Chinese-Mixtral]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Dai et al., ACL 2024. [paper][code]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, DeepSeek-AI, arxiv 2024. [paper][code][DeepSeek-V2.5]
DeepSeek-V3 Technical Report, DeepSeek-AI, arxiv 2024. [paper][code][DeepSeek-R1][DeepEP]
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models, Wang et al., ACL 2024. [paper][code][Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts][On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models]
Evolutionary Optimization of Model Merging Recipes, Akiba et al., arxiv 2024. [paper][code]
A Closer Look into Mixture-of-Experts in Large Language Models, Lo et al., arxiv 2024. [paper][code]
A Survey on Mixture of Experts, Cai et al., arxiv 2024. [paper][code]
HMoE: Heterogeneous Mixture of Experts for Language Modeling, Wang et al., arxiv 2024. [paper][Configurable Foundation Models: Building LLMs from a Modular Perspective]
OLMoE: Open Mixture-of-Experts Language Models, Muennighoff et al., ICLR 2025. [paper][code][OpenMoE]
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, Sun et al., arxiv 2024. [paper][code][Ling]
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts, Zhang et al., arxiv 2025. [paper][code][open-infra-index]
Ling-MOE: Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs, Ling Team, AI@Ant Group, arxiv 2025. [paper][inclusionAI][Ling-Coder-lite][DLRover]
[llama-moe][Aurora][OpenMoE][makeMoE][PEER-pytorch][GRIN-MoE][MoE-plus-plus][MoH]

3.3.8 PEFT (Parameter-efficient Fine-tuning)

[DeepSpeed][DeepSpeedExamples][blog]
[Megatron-LM][NeMo][Megatron-DeepSpeed][Megatron-DeepSpeed][Pai-Megatron-Patch]
[torchtune][torchtitan][torchao]
[PEFT][trl][autotrain-advanced][accelerate][LLaMA-Factory][LMFlow][xtuner][MFTCoder][llm-foundry][ms-swift][Liger-Kernel]
[unsloth][Meta Lingua][oumi]
[mergekit][merge-models][Model Merging][OpenChatKit]
[The Effect of Prompt Tokens on Instruction Tuning]
LoRA: Low-Rank Adaptation of Large Language Models, Hu et al., ICLR 2022. [paper][code][LoRA From Scratch][lora][dora][MoRA][ziplora-pytorch][alpaca-lora][lorax]
QLoRA: Efficient Finetuning of Quantized LLMs, Dettmers et al., NeurIPS 2023 Oral. [paper][code][bitsandbytes][unsloth][ir-qlora][fsdp_qlora]
S-LoRA: Serving Thousands of Concurrent LoRA Adapters, Sheng et al., arxiv 2023. [paper][code][AdaLoRA][LoRAMoE][lorahub][O-LoRA][qa-lora]
LoRA-GA: Low-Rank Adaptation with Gradient Approximation, Wang et al., arxiv 2024. [paper][code][LoRA-Pro blog][dora]
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, Zhao et al., arxiv 2024. [paper][code][Q-GaLore][WeLore][Fira]
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL 2021. [paper][code]
Adapter: Parameter-Efficient Transfer Learning for NLP, Houlsby et al., ICML 2019. [paper][code][unify-parameter-efficient-tuning]
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning, Poth et al., EMNLP 2023. [paper][code][A Survey on LoRA of Large Language Models]
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models, Hu et al., EMNLP 2023. [paper][code]
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Zhang et al., ICLR 2024. [paper][code]
LLaMA Pro: Progressive LLaMA with Block Expansion, Wu et al., arxiv 2024. [paper][code]
P-Tuning: GPT Understands, Too, Liu et al., arxiv 2021. [paper][code]
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Liu et al., ACL 2022. [paper][code][pet][PrefixTuning]
Towards a Unified View of Parameter-Efficient Transfer Learning, He et al., ICLR 2022. [paper][code]
Parameter-efficient fine-tuning of large-scale pre-trained language models, Ding et al., Nature Machine Intelligence 2023. [paper][code]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, Han et al., arxiv 2024. [paper]
Parameter-Efficient Fine-Tuning for Foundation Models, Zhang et al., arxiv 2025. [paper][code]
Mixed Precision Training, Micikevicius et al., ICLR 2018. [paper]
8-bit Optimizers via Block-wise Quantization Dettmers et al., ICLR 2022. [paper][code]
FP8-LM: Training FP8 Large Language Models Peng et al., arxiv 2023. [paper][code]
NEFTune: Noisy Embeddings Improve Instruction Finetuning, Jain et al., ICLR 2024. [paper][code][NoisyTune][transformer_arithmetic]
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models, Diao et al., NAACL 2024. [paper][code]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, Zheng et al., ACL 2024. [paper][code][360-LLaMA-Factory][EasyR1]
ReFT: Representation Finetuning for Language Models, Wu et al., arxiv 2024. [paper][code]

3.3.9 Prompt Learning

OpenPrompt: An Open-source Framework for Prompt-learning, Ding et al., arxiv 2021. [paper][code]
Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning, Su et al., arxiv 2022. [paper]
Large Language Models Are Human-Level Prompt Engineers, Zhou et al., ICLR 2023. [paper][code]
Large Language Models as Optimizers, Yang et al., arxiv 2023. [paper][code]
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4, Bsharat et al., arxiv 2023. [paper][code]
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, Suzgun and Kalai, arxiv 2024. [paper][code][docs]
AutoPrompt: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases, Levi et al., arxiv 2024. [paper][code][automatic_prompt_engineer][appl][sammo][prompt-poet][ell]
LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language, Wang et al., arxiv 2024. [paper][code]
The Prompt Report: A Systematic Survey of Prompting Techniques, Schulhoff et al., arxiv 2024. [paper][code][A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks][A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications]
[PromptPapers][OpenAI Docs][ChatGPT Prompt Engineering for Developers][Prompt Engineering Guide][k12promptguide][gpt-prompt-engineer][awesome-chatgpt-prompts][awesome-chatgpt-prompts-zh][Prompt_Engineering]
The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP 2021. [paper][code][soft-prompt-tuning][Prompt-Tuning]
A Survey on In-context Learning, Dong et al., EMNLP 2024. [paper][code]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work, Min et al., EMNLP 2022. [paper][code]
Larger language models do in-context learning differently, Wei et al., arxiv 2023. [paper]
PAL: Program-aided Language Models, Gao et al., ICML 2023. [paper][code][code-act]
A Comprehensive Survey on Instruction Following, Lou et al., arxiv 2023. [paper][code]
RLHF: Deep reinforcement learning from human preferences, Christiano et al., NIPS 2017. [paper]
RLHF: Fine-Tuning Language Models from Human Preferences, Ziegler et al., arxiv 2019. [paper][code][lm-human-preference-details]
RLHF: Learning to summarize from human feedback, Stiennon et al., NeurIPS 2020. [paper][code]
RLHF: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper]
Instruction Tuning for Large Language Models: A Survey, Zhang et al., arxiv 2023. [paper][code]
What learning algorithm is in-context learning? Investigations with linear models, Akyürek et al., ICLR 2023. [paper]
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al., arxiv 2022. [paper][code]

3.3.10 RAG (Retrieval Augmented Generation)

Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arxiv 2023. [paper][code][Modular RAG]
Retrieval-Augmented Generation for AI-Generated Content: A Survey, Zhao et al., arxiv 2024. [paper][code]
A Survey on Retrieval-Augmented Text Generation for Large Language Models, Huang et al., arxiv 2024. [paper][Retrieval-Augmented Generation for Natural Language Processing: A Survey][A Survey on RAG Meeting LLMs][A Comprehensive Survey of Retrieval-Augmented Generation]
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, Hu et al., arxiv 2024. [paper][code]
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey, Ni et al., arxiv 2025. [paper][code][TrustRAG]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [paper][code][model][docs][FAISS]
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., ICLR 2024 Oral. [paper][code][CRAG][Golden-Retriever]
Dense Passage Retrieval for Open-Domain Question Answering, Karpukhin et al., EMNLP 2020. [paper][code]
Internet-Augmented Dialogue Generation Komeili et al., arxiv 2021. [paper]
RETRO: Improving language models by retrieving from trillions of tokens, Borgeaud et al., arxiv 2021. [paper][RETRO-pytorch]
FLARE: Active Retrieval Augmented Generation, Jiang et al., EMNLP 2023. [paper][code]
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [paper][code]
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., EMNLP 2024. [paper]
Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arxiv 2023. [paper][code]
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, Sarthi et al., ICLR 2024. [paper][code][tree2retriever][TrustRAG]
When Large Language Models Meet Vector Databases: A Survey, Jing et al., arxiv 2024. [paper][A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
RAFT: Adapting Language Model to Domain Specific RAG, Zhang et al., arxiv 2024. [paper][code]
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback, Liu et al., arxiv 2024. [paper][code]
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Chan et al., arxiv 2024. [paper][code][Adaptive-RAG][Advanced RAG 11: Query Classification and Refinement]
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers, Sawarkar et al., arxiv 2024. [paper][code][infinity]
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jin et al., WWW 2025. [paper][code][FlashRAG-Paddle][Auto-RAG][flexrag][LevelRAG]
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, Gutiérrez et al., NeurIPS 2024. [paper][code][HippoRAG 2]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Edge et al., arxiv 2024. [paper][code][PIKE-RAG][GraphRAG-Local-UI][nano-graphrag][fast-graphrag][graph-rag][llm-graph-builder][Triplex][knowledge_graph_maker][itext2kg][KG_RAG]
LightRAG: Simple and Fast Retrieval-Augmented Generation, Guo et al., arxiv 2024. [paper][code][MiniRAG][KAG][HybGRAG][CAG][GraphRAG]
Graph Retrieval-Augmented Generation: A Survey, Peng et al., arxiv 2024. [paper][Retrieval-Augmented Generation with Graphs][code][Awesome-GraphRAG]
Searching for Best Practices in Retrieval-Augmented Generation, Wang et al., arxiv 2024. [paper][code][Seven Failure Points When Engineering a Retrieval Augmented Generation System][Improving Retrieval Performance in RAG Pipelines with Hybrid Search][15 Advanced RAG Techniques from Pre-Retrieval to Generation]
Self-Reasoning: Improving Retrieval Augmented Language Model with Self-Reasoning, Xia et al., arxiv 2024. [paper]
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, Fleischer et al., arxiv 2024. [paper][code][fastRAG][rag-retrieval-study]
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, Zhu et al., arxiv 2024. [paper][code][Evaluation of Retrieval-Augmented Generation: A Survey][ragas][RAGChecker][rageval][CORAL][WebWalker]
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning, Yuan et al., arxiv 2024. [paper][code][ind_kdd_2024/][KDD2024-WhoIsWho-Top3]
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery, Qian et al., WWW 2025. [paper][code][mem0][Memary][MemoryScope][memoripy][memobase][A-MEM][cognee]
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, Tan et al., WWW 2025. [paper][code]
Introducing Contextual Retrieval, Anthropic, 2024. [blog][Contextual Compression in Retrieval-Augmented Generation for Large Language Models][ContextRAG]
LGMGC: Passage Segmentation of Documents for Extractive Question Answering, Liu et al., arxiv 2025. [paper][Meta-Chunking][chonkie]
ColPali: Efficient Document Retrieval with Vision Language Models, Faysse et al., arxiv 2024. [paper][code][docling][M3DocRAG][Visualized BGE][OmniSearch][nv-ingest]
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents, Yu et al., ICLR 2025. [paper][code][Visualized BGE][RAGViz][RAVQA][ViDoRAG][Gurubase]
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos, Ren et al., arxiv 2025. [paper][code][StreamRAG][VideoRAG]
Search-o1: Agentic Search-Enhanced Large Reasoning Models, Li et al., arxiv 2025. [paper][code][CoRAG][DeepRAG][StructRAG][ReAG][Search-R1][r1-reasoning-rag][R1-Searcher][MCTS-RAG][ReaRAG]
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, Chen et al., arxiv 2025. [paper][code]
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, Singh et al., arxiv 2025. [paper][code]
ACL 2023 Tutorial: Retrieval-based Language Models and Applications, Asai et al., ACL 2023. [link]
[Advanced RAG Techniques: an Illustrated Overview][Chinese Version][RAG_Techniques][Controllable-RAG-Agent][GenAI_Agents][bRAG-langchain][GenAI-Showcase]
[LangChain][blog][LangChain Hub][langgraph][executive-ai-assistant]
[LlamaIndex][llama_deploy][A Cheat Sheet and Some Recipes For Building Advanced RAG][Fine-Tuning Embeddings for RAG with Synthetic Data]
[chatgpt-retrieval-plugin][Awesome-LLM-RAG-Application]
[haystack][Langchain-Chatchat][ragflow][infinity]
[ragas][rageval]
Browse the web with GPT-4V and Vimium [vimGPT]
[QAnything][ragflow][fastRAG][anything-llm][FastGPT][mem0][Memary]
[trt-llm-rag-windows][history_rag][gpt-crawler][R2R][rag-notebook-to-microservices][MaxKB][Verba][cognita][llmware][quivr][kotaemon][RAGMeUp][pandas-ai][DeepSeek-RAG-Chatbot]
[RAG-Retrieval][FlashRank][rank_bm25][PGRAG][CRUD_RAG][PlanRAG][DPA-RAG][FollowRAG][LongRAG][structured-rag][RAGLab][autogluon-rag][VARAG][PAI-RAG][RagVL][AutoRAG][RetroLLM][RAG-Instruct][RapidRAG][UltraRAG][MMOA-RAG][EasyRAG][HiRAG]
[PDF-Extract-Kit][MinerU][colpali][localGPT-Vision][mPLUG-DocOwl][nv-ingest]

Text Embedding

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models, Thakur et al., NeurIPS 2021. [paper][code][AIR-Bench]
MTEB: Massive Text Embedding Benchmark, Muennighoff et al., arxiv 2022. [paper][code][leaderboard][MMTEB]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers et al., EMNLP 2019. [paper][code][model][vec2text]
SimCSE: Simple Contrastive Learning of Sentence Embeddings, Gao et al., EMNLP 2021. [paper][code][AnglE ACL 2024]
OpenAI: Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arxiv 2022. [paper][blog]
MRL: Matryoshka Representation Learning, Kusupati et al., NeurIPS 2022. [paper][code]
BGE: C-Pack: Packaged Resources To Advance General Chinese Embedding, Xiao et al., SIGIR 2024. [paper][code][bge reranker][FlagEmbedding]
LLM-Embedder: Retrieve Anything To Augment Large Language Models, Zhang et al., arxiv 2023. [paper][code][ACL 2024][llm_reranker][FlagEmbedding]
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Chen et al., ACL 2024. [paper][code][FlagEmbedding][blog]
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval, Zhou et al., arxiv 2024. [paper][code][MegaPairs]
BGE-ICL: Making Text Embedders Few-Shot Learners, Li et al., arxiv 2024. [paper][code]
[m3e-base][acge_text_embedding][xiaobu-embedding-v2][stella_en_1.5B_v5][Conan-embedding-v1]
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, Günther et al., arxiv 2023. [paper][jina-embeddings-v2][jina-reranker-v2][reader-lm-1.5b][ReaderLM-v2][pe_rank][Jina CLIP][jina-embeddings-v3]
GTE: Towards General Text Embeddings with Multi-stage Contrastive Learning, Li et al., arxiv 2023. [paper][model][gte-Qwen2-7B-instruct][gte-large-en-v1.5]
[BCEmbedding][bce-embedding-base_v1][bce-reranker-base_v1]
[CohereV3]
One Embedder, Any Task: Instruction-Finetuned Text Embeddings, Su et al., ACL 2023. [paper][code]
E5: Improving Text Embeddings with Large Language Models, Wang et al., ACL 2024. [paper][code][model][llm2vec][When Text Embedding Meets Large Language Model: A Comprehensive Survey]
Nomic Embed: Training a Reproducible Long Context Text Embedder, Nussbaum et al., Nomic AI 2024. [paper][code][nomic-embed-vision-v1.5][nomic-embed-text-v2-moe]
GritLM: Generative Representational Instruction Tuning, Muennighoff et al., arxiv 2024. [paper][code][OneGen]
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, BehnamGhader et al., arxiv 2024. [paper][code][VLM2Vec][Gemini Embedding]
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, Lee et al., arxiv 2024. [paper][model][nv-ingest]
PE-Rank: Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models, Liu et al., arxiv 2024. [paper][code]
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs, Lin et al., arxiv 2024. [paper][modeling_nvmmembed.py][magiclens][E5-V][Visualized BGE][VLM2Vec][GME-Qwen2-VL][mmE5][LLaVE]
[JamAIBase][fastembed][rerankers][Rankify][RAG-Retrieval][model2vec]

3.3.11 Reasoning and Planning

Few-Shot-CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al., NeurIPS 2022. [paper][chain-of-thought-hub]
Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR 2023. [paper]
Zero-Shot-CoT: Large Language Models are Zero-Shot Reasoners, Kojima et al., NeurIPS 2022. [paper][code]
Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, Zhang et al., ICLR 2023. [paper][code]
Multimodal Chain-of-Thought Reasoning in Language Models, Zhang et al., arxiv 2023. [paper][code]
Fine-tune-CoT: Large Language Models Are Reasoning Teachers, Ho et al., ACL 2023. [paper][code]
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning, Kim et al., EMNLP 2023. [paper][code]
Chain-of-Thought Reasoning Without Prompting, Wang et al., arxiv 2024. [paper]
ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., ICLR 2023. [paper][code]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al., arxiv 2023. [paper][code][AutoAct]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., NeurIPS 2023. [paper][code][Plug in and Play Implementation][tree-of-thought-prompting]
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, Besta et al., arxiv 2023. [paper][code]
Cumulative Reasoning with Large Language Models, Zhang et al., arxiv 2023. [paper][code][On the Diagram of Thought]
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Sel et al., arxiv 2023. [paper][unofficial code]
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation, Ding et al., arxiv 2023. [paper][code]
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, Ye et al., arxiv 2024. [paper][code]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, Zhou et al., ICLR 2023. [paper]
DEPS: Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Wang et al., arxiv 2023. [paper][code]
RAP: Reasoning with Language Model is Planning with World Model, Hao et al., EMNLP 2023. [paper][code][LLM Reasoners COLM 2024][AgentGen KDD 2025]
LEMA: Learning From Mistakes Makes LLM Better Reasoner, An et al., arxiv 2023. [paper][code]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, Chen et al., TMLR 2023. [paper][code]
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al., arxiv 2023. [paper][[code]]
The Impact of Reasoning Step Length on Large Language Models, Jin et al., arxiv 2024. [paper][code]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, Wang et al., ACL 2023. [paper][code][maestro]
Improving Factuality and Reasoning in Language Models through Multiagent Debate, Du et al., ICML 2024. [paper][code][Multi-Agents-Debate]
Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., NeurIPS 2024. [paper][code][MCT Self-Refine][SelFee]
Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS 2023. [paper][code]
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, Gou et al., ICLR 2024. [paper][code]
LATS: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, Zhou et al., ICML 2024. [paper][code]
Self-Discover: Large Language Models Self-Compose Reasoning Structures, Zhou et al., NeurIPS 2024. [paper][unofficial implementation][SELF-DISCOVER]
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Wang et al., arxiv 2024. [paper][code]
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents, Zhu et al., arxiv 2024. [paper][code][KnowLM][KnowPAT]
Advancing LLM Reasoning Generalists with Preference Trees, Yuan et al., ICLR 2025. [paper][code][PRIME]
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, Yang et al., arxiv 2024. [paper][code][SymbCoT]
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models, Singh et al., arxiv 2023. [paper][unofficial code]
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent, Aksitov et al., arxiv 2023. [paper][[code]]
Searchformer: Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*, Lehnert et al., COLM 2024. [paper][code][Dualformer]
Coconut: Training Large Language Models to Reason in a Continuous Latent Space, Hao et al., arxiv 2024. [paper][code]
How Far Are We from Intelligent Visual Deductive Reasoning?, Zhang et al., arxiv 2024. [paper][code]
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Lee et al., arxiv 2024. [paper][code]
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning, Kim et al., arxiv 2024. [paper][code][QueryAgent][OctoTools][START]
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning, Wang et al., arxiv 2024. [paper][code]
Internal Consistency and Self-Feedback in Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code]
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning, Wang et al., ACL 2024. [paper][code]
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search, Zhang et al., NeurIPS 2024. [paper][code][llm-mcts][LightZero][Agent-R][atom]
GenRM: Generative Verifiers: Reward Modeling as Next-Token Prediction, Zhang et al., arxiv 2024. [paper][CriticGPT][Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning][Agentic Reward Modeling][Reward Hacking in Reinforcement Learning]
PRIME: Process Reinforcement through Implicit Rewards, Cui et al., arxiv 2025. [paper][code][Free Process Rewards without Process Labels][OREAL][VisualPRM]
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers, Qi et al., arxiv 2024. [paper][code][rStar-Math][Orca 2][STaR][Quiet-STaR]
OpenAI o1: Learning to Reason with LLMs, OpenAI, 2024. [blog][OpenAI o1 System Card][Detecting misbehavior in frontier reasoning models][Agent Q][Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters][search-and-learn][Let's Verify Step by Step][Thinking LLMs: General Instruction Following with Thought Generation][Awesome-LLM-Strawberry][Awesome-LLM-Reasoning][Claude's extended thinking]
O1 Replication Journey: A Strategic Progress Report -- Part 1, Qin et al., arxiv 2024. [paper][code][O1 Replication Journey -- Part 2][O1 Replication Journey -- Part 3][Scaling of Search and Learning][Revisiting the Test-Time Scaling of o1-like Models][LLaMA-O1][Marco-o1][QwQ-32B][qvq-72b-preview][QwQ-Max-Preview][SkyThought][On the Overthinking of o1-Like LLMs][On the Underthinking of o1-Like LLMs]
ReFT: Reasoning with Reinforced Fine-Tuning, Luong et al., ACL 2024. [paper][code][VinePPO][OpenRFT][SoRFT][MRT][Visual-RFT]
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step, Xu et al., arxiv 2024. [paper][code][internvl2.0_mpo][Insight-V][VisVM][Mulberry][AR-MCTS][Virgo][Virgo][LlamaV-o1][Image-Generation-CoT][Awesome-MLLM-Reasoning][Multimodal Chain-of-Thought Reasoning]
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems, Min et al., arxiv 2024. [paper][code][Enhancing LLM Reasoning with Reward-guided Tree Search][An Empirical Study on Eliciting and Improving R1-like Reasoning Models][Towards Large Reasoning Models][Understanding Reasoning LLMs][The State of LLM Reasoning Models]
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, Chen et al., arxiv 2025. [paper][code][Awesome-System2-Reasoning-LLM][Stop Overthinking]
Mind Evolution: Evolving Deeper LLM Thinking, Lee et al., arxiv 2025. [paper][mind-evolution-pytorch]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, DeepSeek-AI, arxiv 2025. [paper][code][Open-R1][TinyZero][simpleRL-reason][Logic-RL][DeepScaleR][oat-zero][understand-r1-zero][X-R1][Light-R1][ragen][R1-V][VLM-R1][open-r1-multimodal][R1-Onevision][Visual-RFT][VisualThinker-R1-Zero][the-illustrated-deepseek-r1][Understanding Reasoning LLMs]
Kimi k1.5: Scaling Reinforcement Learning with LLMs, Kimi Team, arxiv 2025. [paper][code][demystify-long-cot]
s1: Simple test-time scaling, Muennighoff et al., arxiv 2025. [paper][code][Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters][probabilistic-inference-scaling][LIMO][OpenThinker-32B][L1]
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling, Hou et al., arxiv 2025. [paper][code]
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, Geiping et al., arxiv 2025. [paper][code][ReasonFlux][Can 1B LLM Surpass 405B LLM][SkyThought]
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, Xie et al., arxiv 2025. [paper][code][FastCuRL]
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, Zeng et al., arxiv 2025. [paper][code][CodeIO]
[Open-Reasoner-Zero]
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model, Zhou et al., arxiv 2025. [paper][code][R1-V][VLM-R1][open-r1-multimodal][R1-Onevision][Visual-RFT][R1-Omni][Vision-R1][Open-R1-Video][OpenVLThinker]
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL, Peng et al., arxiv 2025. [paper][code][MM-EUREKA][Skywork-R1V][Vision-R1]
Video-R1: Reinforcing Video Reasoning in MLLMs, Feng et al., arxiv 2025. [paper][code]
[Agent-R1][ragen][OpenManus-RL][SWEET-RL]
[llm-reasoners][g1][Open-O1][show-me][OpenR][Search-o1]

Survey

[Prompt4ReasoningPapers]

3.4 LLM Theory

Scaling Laws for Neural Language Models, Kaplan et al., arxiv 2020. [paper][unofficial code][Scaling Laws for LLMs: From GPT-3 to o3]
Emergent Abilities of Large Language Models, Wei et al., TMRL 2022. [paper]
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., NeurIPS 2022. [paper]
Scaling Laws for Autoregressive Generative Modeling, Henighan et al., arxiv 2020. [paper]
Are Emergent Abilities of Large Language Models a Mirage, Schaeffer et al., NeurIPS 2023 Outstanding Paper. [paper]
Understanding Emergent Abilities of Language Models from the Loss Perspective, Du et al., arxiv 2024. [paper][Predicting Emergent Capabilities by Finetuning]
S2A: System 2 Attention (is something you might need too, Weston et al., arxiv 2023. [paper][Distilling System 2 into System 1][system-2-research][Test-time Computing: from System-1 Thinking to System-2 Thinking][Towards System 2 Reasoning in LLMs][Awesome-System2-Reasoning-LLM]
Memory3: Language Modeling with Explicit Memory, Yang et al., arxiv 2024. [paper]
Scaling Laws for Downstream Task Performance of Large Language Models, Isik et al., arxiv 2024. [paper][Establishing Task Scaling Laws via Compute-Efficient Model Ladders][Inference Scaling Laws][Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models][Distillation Scaling Laws]
Scalable Pre-training of Large Autoregressive Image Models, El-Nouby et al., arxiv 2024. [paper][code][An Empirical Study of Autoregressive Pre-training from Videos]
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, Zhang et al., ICLR 2024. [paper]
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems, Li et al., arxiv 2024. [paper]
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws, Allen-Zhu et al, arxiv 2024. [paper]
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process, Ye et al., arxiv 2024. [paper][project page]
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems, Ye et al., arxiv 2024. [paper]
Language Modeling Is Compression, Delétang et al., arxiv 2023. [paper][CompressARC]
Language Models Represent Space and Time, Gurnee and Tegmark, ICLR 2024. [paper][code][The Geometry of Concepts: Sparse Autoencoder Feature Structure]
The Platonic Representation Hypothesis, Huh et al., arxiv 2024. [paper][code]
Observational Scaling Laws and the Predictability of Language Model Performance, Ruan et al., NeurIPS 2024 Spotlight. [paper][code]
Scaling Laws for Precision, Kumar et al., arxiv 2024. [paper][Scaling Laws for Floating Point Quantization Training]
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining, Li et al., arxiv 2025. [paper][code]
Language models can explain neurons in language models, OpenAI, 2023. [blog][code][transformer-debugger]
Scaling and evaluating sparse autoencoders, Gao et al., arxiv 2024. [OpenAI Blog][paper][code][sae-auto-interp][multimodal-sae][Language-Model-SAEs][SAE-Reasoning]
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic, 2023. [blog]
Mapping the Mind of a Large Language Model, Anthropic, 2024. [blog][tracing-thoughts-language-model]
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era, Wu et al., arxiv 2024. [paper][code]
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models, Tufanov et al., arxiv 2024. [paper][code][LLM-Microscope]
Transformer Explainer: Interactive Learning of Text-Generative Models, Cho et al., arxiv 2024. [paper][code][demo]
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation, Singh et al., ICML 2024 Spotlight. [paper][code]
[Transformer Circuits Thread][colah's blog][Transformer Interpretability][Awesome-Interpretability-in-Large-Language-Models][TransformerLens][inseq]
ROME: Locating and Editing Factual Associations in GPT, Meng et al., NeurIPS 2022. [paper][code][FastEdit]
Editing Large Language Models: Problems, Methods, and Opportunities, Yao et al., EMNLP 2023. [paper][code][Knowledge Mechanisms in Large Language Models: A Survey and Perspective]
A Comprehensive Study of Knowledge Editing for Large Language Models, Zhang et al., arxiv 2024. [paper][code]
[awesome-llm-interpretability][Awesome-LLM-Interpretability]

3.5 Chinese Model

[Awesome-Chinese-LLM][awesome-LLMs-In-China][awesome-LLM-resourses]
GLM: General Language Model Pretraining with Autoregressive Blank Infilling, Du et al., ACL 2022. [paper][code][UniLM]
GLM-130B: An Open Bilingual Pre-trained Model, Zeng et al., ICLR 2023. [paper][code]
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models, Zhou et al., EMNLP 2024. [paper][code]
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools, Zeng et al., arxiv 2024. [paper][ChatGLM-6B][ChatGLM2-6B][ChatGLM3][GLM-4][modeling_chatglm.py][AgentTuning][AlignBench][GLM-Edge]
Baichuan 2: Open Large-scale Language Models, Yang et al., arxiv 2023. [paper][code][BaichuanSEED][Baichuan Alignment Technical Report][KV Shifting Attention Enhances Language Modeling][Baichuan-M1]
Baichuan-Omni Technical Report, Li et al., arxiv 2024. [paper][code][Baichuan-Omni-1.5 Technical Report][Baichuan-Omni-1.5]
Qwen Technical Report, Bai et al., arxiv 2023. [paper][code]
Qwen2 Technical Report, Yang et al., arxiv 2024. [paper][code][Qwen-Agent][AutoIF][modeling_qwen2.py]
Qwen2.5 Technical Report, Qwen Team, arxiv 2024. [paper][code][documentation][Qwen2.5-1M Technical Report][QwQ]
Yi: Open Foundation Models by 01.AI, Young et al., arxiv 2024. [paper][code][Yi-1.5]
InternLM2 Technical Report, Cai et al., arxiv 2024. [paper][code][HuixiangDou]
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, Bi et al., arxiv 2024. [paper][DeepSeek-LLM][DeepSeek-V2][DeepSeek-V3][DeepSeek-Coder]
[Moonlight][APOLLO]
MiniMax-01: Scaling Foundation Models with Lightning Attention, MiniMax, arxiv 2025. [paper][code][Linear-MoE]
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies, Hu et al., arxiv 2024. [paper][code][MiniCPM-o]
TeleChat Technical Report, Wang et al., arxiv 2024. [paper][code][TeleChat2][Tele-FLM Technical Report][Tele-FLM][Tele-FLM-1T]
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca, Cui et al., arxiv 2023. [paper][code][Chinese-LLaMA-Alpaca-2][Chinese-LLaMA-Alpaca-3][baby-llama2-chinese]
Rethinking Optimization and Architecture for Tiny Language Models, Tang et al., arxiv 2024. [paper][code]
YuLan: An Open-source Large Language Model, Zhu et al., arxiv 2024. [paper][code][Yulan-GARDEN][YuLan-Mini]
Towards Effective and Efficient Continual Pre-training of Large Language Models, Chen et al., arxiv 2024. [paper][code]
[MOSS][MOSS-RLHF]
[Skywork][Skywork-MoE][MiniMax-01][Orion][BELLE][Yuan-2.0][Yuan2.0-M32][Fengshenbang-LM][Index-1.9B][Aquila2]
[LlamaFamily/Llama-Chinese][LinkSoul-AI/Chinese-Llama-2-7b][llama3-Chinese-chat][phi3-Chinese][LLM-Chinese][Llama3-Chinese-Chat][llama3-chinese]
[Firefly][GPT2-chitchat]
Alpaca-CoT: An Empirical Study of Instruction-tuning Large Language Models in Chinese, Si et al., arxiv 2023. [paper][code]
Safety Assessment of Chinese Large Language Models, Sun et al., arxiv 2023. [paper][code][PurpleLlama]

CV

CS231n: Deep Learning for Computer Vision [link]

1. Basic for CV

AlexNet: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012. [paper][AlexNet-Source-Code]
VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., ICLR 2015. [paper]
GoogLeNet: Going Deeper with Convolutions, Szegedy et al., CVPR 2015. [paper]
ResNet: Deep Residual Learning for Image Recognition, He et al., CVPR 2016 Best Paper. [paper][code][resnet_inference.py]
DenseNet: Densely Connected Convolutional Networks, Huang et al., CVPR 2017 Oral. [paper][code]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan et al., ICML 2019. [paper][code][EfficientNet-PyTorch][noisystudent]
BYOL: Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al., arxiv 2020. [paper][code][byol-pytorch][simsiam]
ConvNeXt: A ConvNet for the 2020s, Liu et al., CVPR 2022. [paper][code][ConvNeXt-V2]

2. Contrastive Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]
CoCa: Contrastive Captioners are Image-Text Foundation Models, Yu et al., arxiv 2024. [paper][CoCa-pytorch][multimodal]
DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code]
FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]
InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]

3. CV Application

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall et al., ECCV 2020. [paper][code][nerf-pytorch][NeRF-Factory][LERF][LangSplat]
GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior, Wang et al., CVPR 2021. [paper][code][Real-ESRGAN][DreamClear]
CodeFormer: Towards Robust Blind Face Restoration with Codebook Lookup Transformer, Zhou et al., NeurIPS 2022. [paper][code][APISR][EvTexture][video2x][PMRF][SVFR]
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers, Li et al., ECCV 2022. [paper][code][occupancy_networks][VoxFormer][TPVFormer][GeMap]
UniAD: Planning-oriented Autonomous Driving, Hu et al., CVPR 2023 Best Paper. [paper][code][DriveAGI][End-to-end-Autonomous-Driving][Awesome-LLM4AD]
MagicDrive: Street View Generation with Diverse 3D Geometry Control, Gao et al., ICLR 2024. [paper][code][MagicDrive3D][MagicDriveDiT][DiffusionDrive][openemma][DrivingWorld][AlphaDrive]
[apollo][dagr]
Nougat: Neural Optical Understanding for Academic Documents, Blecher et al., arxiv 2023. [paper][code][marker][MixTeX-Latex-OCR][kosmos-2.5][gptpdf][MegaParse][omniparse][llama_parse][PDF-Extract-Kit][docling][ViTLP][markitdown][OCRmyPDF]
FaceChain: A Playground for Identity-Preserving Portrait Generation, Liu et al., arxiv 2023. [paper][code]
MGIE: Guiding Instruction-based Image Editing via Multimodal Large Language Models, Fu et al., ICLR 2024 Spotlight. [paper][code]
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, Li et al., CVPR 2024. [paper][code][AnyDoor][VideoAnydoor]
InstantID: Zero-shot Identity-Preserving Generation in Seconds, Wang et al., arxiv 2024. [paper][code][InstantStyle][ID-Animator][ConsistentID][PuLID][ComfyUI-InstantID][StableAnimator][MV-Adapter][InfiniteYou]
ReplaceAnything as you want: Ultra-high quality content replacement, [link][OutfitAnyone][IDM-VTON][IMAGDressing][CatVTON][Awesome-Try-On-Models]
LayerDiffusion: Transparent Image Layer Diffusion using Latent Transparency, Zhang et al., arxiv 2024. [paper][code][sd-forge-layerdiffusion][LayerDiffuse_DiffusersCLI][IC-Light][Paints-UNDO]
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image, Wu et al., arxiv 2024. [paper][code][MeshAnything][MeshAnythingV2][InstantMesh][prolificdreamer][Metric3D][ReconX][DimensionX][LLaMa-Mesh][DeepMesh][MetaSpatial]
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement, Boss et al., arxiv 2024. [paper][code][ViewCrafter][3DTopia-XL][TRELLIS][See3D][Awesome-LLM-3D][MIDI-3D][cube][Kiss3DGen]
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images, Huang et al., arxiv 2025. [paper][code][mvdust3r][LHM]
Sapiens: Foundation for Human Vision Models, Khirodkar et al., ECCV 2024 Oral. [paper][code]
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model, Wei et al., arxiv 2024. [paper][code][PaddleOCR][tesseract][EasyOCR][llm_aided_ocr][surya][Umi-OCR][zerox]
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models, Poznanski et al., arxiv 2025. [paper][code][SmolDocling-OCR-App]
[deepfakes/faceswap][DeepFaceLab][DeepFaceLive][deepface][Deep-Live-Cam][roop][DeepFakeDefenders][HivisionIDPhotos][insightface][VisoMaster]
[IOPaint][SPADE][PowerPaint]
[MuseV][ToonCrafter]
[supervision][labelme]

4. Foundation Model

ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al., ICLR 2021. [paper][code][vit-pytorch][efficientvit][FasterViT][EfficientFormer][Agent-Attention][T2T-ViT][ViViT]
ViT-Adapter: Vision Transformer Adapter for Dense Predictions, Chen et al., ICLR 2023 Spotlight. [paper][code]
Vision Transformers Need Registers, Darcet et al., ICLR 2024 Outstanding Paper. [paper][ConceptAttention]
DeiT: Training data-efficient image transformers & distillation through attention, Touvron et al., ICML 2021. [paper][code]
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, Kim et al., ICML 2021. [paper][code]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu et al., ICCV 2021. [paper][code][Video-Swin-Transformer]
MAE: Masked Autoencoders Are Scalable Vision Learners, He et al., CVPR 2022. [paper][code][FLIP][LVMAE-pytorch][SparK]
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks, Xiao et al., CVPR 2024 Oral. [paper][model][Inference code][Florence-VL]
LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Bai et al., arxiv 2023. [paper][code]
GLEE: General Object Foundation Model for Images and Videos at Scale, Wu wt al., CVPR 2024 Highlight. [paper][code]
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One, Ranzinger et al., CVPR 2024. [paper][code]
Tokenize Anything via Prompting, Pan et al., ECCV 2024. [paper][code]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Zhu et al., ICML 2024. [paper][code][VMamba][mambaout][MLLA][OmniMamba]
MambaVision: A Hybrid Mamba-Transformer Vision Backbone, Hatamizadeh and Kautz, arxiv 2024. [paper][code]
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Yang et al., CVPR 2024. [paper][code][Depth-Anything-V2][PromptDA][Video-Depth-Anything][ml-depth-pro][DepthCrafter][rollingdepth]
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, Guo et al., arxiv 2024. [paper][code]
TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation, Yu et al., NeurIPS 2024. [paper][code][titok-pytorch][Randomized Autoregressive Visual Generation][tokenize-anything][Cosmos-Tokenizer][ViTok][UniTok]
Theia: Distilling Diverse Vision Foundation Models for Robot Learning, Shang et al., arxiv 2024. [paper][code]
[pytorch-image-models][vision][Pointcept]

5. Generative Model (GAN and VAE)

GAN: Generative Adversarial Networks, Goodfellow et al., NeurIPS 2014. [paper][code][Pytorch-GAN][DCGAN]
StyleGAN3: Alias-Free Generative Adversarial Networks, Karras etal., NeurIPS 2021. [paper][code][StyleFeatureEditor]
GigaGAN: Scaling up GANs for Text-to-Image Synthesis, Kang et al., CVPR 2023 Highlight. [paper][project repo][gigagan-pytorch]
R3GAN: The GAN is dead; long live the GAN! A Modern GAN Baseline, Huang et al., NeurIPS 2024. [paper][code]
[pytorch-CycleGAN-and-pix2pix][img2img-turbo]
VAE: Auto-Encoding Variational Bayes, Kingma et al., arxiv 2013. [paper][code][Pytorch-VAE][VAE blog][Glow]
VQ-VAE: Neural Discrete Representation Learning, Oord et al., NIPS 2017. [paper][code][vector-quantize-pytorch]
VQ-VAE-2: Generating Diverse High-Fidelity Images with VQ-VAE-2, Razavi et al., arxiv 2019. [paper][code]
VQGAN: Taming Transformers for High-Resolution Image Synthesis, Esser et al., CVPR 2021. [paper][code][FQGAN]
VAR: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, Tian et al., NeurIPS 2024 Best Paper. [paper][code][Infinity]
LlamaGen: Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, Sun et al., arxiv 2024. [paper][code][Liquid][causalfusion][DnD-Transformer][chameleon][Emu3]
MAR: Autoregressive Image Generation without Vector Quantization, Li et al., arxiv 2024. [paper][code][autoregressive-diffusion-pytorch][NOVA]
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation, Luo et al., arxiv 2024. [paper][code][magvit2-pytorch]
OmniGen: Unified Image Generation, Xiao et al., arxiv 2024. [paper][code][DreamOmni]
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer, Tang et al., arxiv 2024. [paper][code]
Autoregressive Models in Vision: A Survey, Xiong et al., arxiv 2024. [paper][code]
IBQ: Taming Scalable Visual Tokenizer for Autoregressive Image Generation, Shi et al., arxiv 2024. [paper][code][tokenize-anything][Cosmos-Tokenizer][OmniTokenizer][UniTok][TokenFlow][Divot][VidTok]
Flow Matching Guide and Code, Lipman et al., arxiv 2024. [paper][code]
Fractal Generative Models, Li et al., arxiv 2025. [paper][code]

6. Image Editing

InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]
DragAnything: Motion Control for Anything using Entity Representation, Wu et al., ECCV 2024. [paper][code][Framer][SG-I2V][Go-with-the-Flow]
LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]
Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]
PromptFix: You Prompt and We Fix the Photo, Yu et al., NeurIPS 2024. [paper][code]
MimicBrush: Zero-shot Image Editing with Reference Imitation, Chen et al., arxiv 2024. [paper][code][EchoMimic][echomimic_v2][BrushNet]
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models, Shuai et al., arxiv 2024. [paper][code]
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models, Atzmon et al., arxiv 2024. [paper]
MagicQuill: An Intelligent Interactive Image Editing System, Liu et al., arxiv 2024. [paper][code]
BrushEdit: All-In-One Image Inpainting and Editing, Li et al., arxiv 2024. [paper][code][DiffuEraser][PhotoDoodle]
[EditAnything][ComfyUI-UltraEdit-ZHO][libcom][Awesome-Image-Composition][RF-Solver-Edit][KV-Edit]

7. Object Detection

DETR: End-to-End Object Detection with Transformers, Carion et al., arxiv 2020. [paper][code][detrex][RT-DETR][rf-detr]
Focus-DETR: Less is More_Focus Attention for Efficient DETR, Zheng et al., arxiv 2023. [paper][code]
U2-Net_Going Deeper with Nested U-Structure for Salient Object Detection, Qin et al., arxiv 2020. [paper][code]
YOLO: You Only Look Once: Unified, Real-Time Object Detection Redmon et al., arxiv 2015. [paper]
YOLOX: Exceeding YOLO Series in 2021, Ge et al., arxiv 2021. [paper][code]
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, Wang et al., arxiv 2023. [paper][code]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, Liu et al., ECCV 2024. [paper][code][DINO-X][OV-DINO][OmDet][groundingLMM][DeepPerception][Awesome-Visual-Grounding]
YOLO-World: Real-Time Open-Vocabulary Object Detection, Cheng et al., CVPR 2024. [paper][code]
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, Wang et al., arxiv 2024. [paper][code]
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, Jiang et al., arxiv 2024. [paper][code][ChatRex][RexSeek]
YOLOv10: Real-Time End-to-End Object Detection, Wang et al., arxiv 2024. [paper][code][YOLOE][YOLOv12]
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement, Peng et al., ICLR 2025. [paper][code]
[detectron2][yolov5][mmdetection][mmdetection3d][detrex][Ultralytics YOLO11][AlphaPose]

8. Semantic Segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al., MICCAI 2015. [paper][Pytorch-UNet][xLSTM-UNet-Pytorch]
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers, Xie et al., NeurIPS 2021. [paper][code]
Segment Anything, Kirillov et al., ICCV 2023. [paper][code][SAM-Adapter-PyTorch][EditAnything][SegmentAnything3D][Semantic-Segment-Anything]
SAM 2: Segment Anything in Images and Videos, Ravi et al., SIGGRAPH 2024. [blog][paper][code][SAM2Long][Sa2VA][VGGT]
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, Xiong et al., CVPR 2024. [paper][code][FastSAM][RobustSAM][MobileSAM]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks, Ren et al., arxiv 2024. [paper][code][Grounded-SAM-2]
LISA: Reasoning Segmentation via Large Language Model, Lai et al., arxiv 2023. [paper][code][VideoLISA][Seg-Zero]
Track Anything: Segment Anything Meets Videos, Yang et al., arxiv 2023. [paper][code][Caption-Anything][Tracking-Anything-with-DEVA][tracks-to-4d]
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory, Yang et al., arxiv 2024. [paper][code][EfficientTAM]
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding, Zhang et al., arxiv 2024. [paper][code]
[mmsegmentation][mmdeploy][Painter]

9. Video

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, Tong et al., NeurIPS 2022 Spotlight. [paper][code]
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts, Zhao et al., arxiv 2024. [paper][code]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation, Wang et al., arxiv 2024. [paper]
[V-JEPA][I-JEPA][jepa-intuitive-physics][DINO-WM]
VideoMamba: State Space Model for Efficient Video Understanding, Li et al., ECCV 2024. [paper][code]
VideoChat: Chat-Centric Video Understanding, Li et al., CVPR 2024 Highlight. [paper][code]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models, Maaz et al., ACL 2024. [paper][code][Video-LLaMA][MovieChat][Chat-UniVi]
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark, Li et al., CVPR 2024 Highlight. [paper][code][PhyGenBench]
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer, Zhang et al., EMNLP 2024. [paper][code][vision-agent]
Tarsier: Recipes for Training and Evaluating Large Video Description Models, Wang et al., arxiv 2024. [paper][code][Tarsier2]
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions, Ju et al., arxiv 2024. [paper][code]
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling, Men et al., arxiv 2024. [paper][code][MIMO-pytorch][StableV2V][SpatialLM]
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding, Shu et al., arxiv 2024. [paper][code][LongVU][VisionZip][TimeChat][STORM][BIMBA][Vamba]
Enhance-A-Video: Better Generated Video for Free, Luo et al., arxiv 2025. [paper][code][VideoSys][Magic-1-For-1]
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos, Yuan et al., arxiv 2025. [paper][code]
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos, Ren et al., arxiv 2025. [paper][code][LWM][iVideoGPT]
[Awesome-LLMs-for-Video-Understanding][person_search]

10. Survey for CV

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy, Vishniakov et al., arxiv 2023. [paper][code]
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey, Xin et al., arxiv 2024. [paper][code]

Multimodal

1. Audio

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford et al., arxiv 2022. [paper][code][whisper.cpp][faster-whisper][WhisperFusion][whisper-diarization]
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio, Bain et al., arxiv 2023. [paper][code]
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling，Gandhi et al., arxiv 2023. [paper][code]
Speculative Decoding for 2x Faster Whisper Inference, Sanchit Gandhi, HuggingFace Blog 2023. [blog][paper]
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, Wang et al., arxiv 2023. [paper][code][CR-CTC]
VALL-E-X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, Zhang et al., arxiv 2023. [paper][code][VALL-E-X]
Seamless: Multilingual Expressive and Streaming Speech Translation, Seamless Communication et al., arxiv 2023. [paper][code][audiocraft]
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation, Seamless Communication et al., arxiv 2023. [paper][code]
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, Li et al., NeurIPS 2023. [paper][code][Kokoro-82M][kokoro-onnx]
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit, Zhang et al., arxiv 2023. [paper][code][FoleyCrafter][vta-ldm][MMAudio]
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, Kim et al., ICML 2021. [paper][code][Bert-VITS2][so-vits-svc-fork][GPT-SoVITS][VITS-fast-fine-tuning]
OpenVoice: Versatile Instant Voice Cloning, Qin et al., arxiv 2023. [paper][code][MockingBird][clone-voice][Real-Time-Voice-Cloning]
Spirit LM: Interleaved Spoken and Written Language Model, Nguyen et al., arxiv 2024. [paper][code][slamkit][Llasa]
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models, Ju et al., arxiv 2024. [paper][NaturalSpeech][e2-tts-pytorch]
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild, Peng et al., arxiv 2024. [paper][code]
WavLLM: Towards Robust and Adaptive Speech Large Language Model, Hu et al., arxiv 2024. [paper][code]
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation, Xu et al., arxiv 2024. [paper][code][hallo2][hallo3][champ][PersonaTalk][JoyVASA][memo][EDTalk][LatentSync]
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning, Zhang et al., ACL 2024. [paper][code][LLaMA-Omni][SpeechGPT][SpeechGPT-2.0-preview]
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization, Wu et al., AAAI 2025. [paper][code]
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs, Tongyi Speech Team, arxiv 2024. [paper][code][CosyVoice][CosyVoice 2][MinMo]
Qwen2-Audio Technical Report, Chu et al., arxiv 2024. [blog][paper][code][Qwen-Audio][Qwen2.5-Omni]
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot, Zeng et al., arxiv 2024. [paper][code]
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling, Ji et al., arxiv 2024. [paper][code][WavChat]
Language Model Can Listen While Speaking, Ma et al., arxiv 2024. [paper][demo]
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming, Xie et al., arxiv 2024. [paper][code][mini-omni2][moshi][LLaMA-Omni][OpenOmni]
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching, Chen et al., arxiv 2024. [paper][code][FireRedTTS][FireRedASR][Seed-TTS][TangoFlux]
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens, Wang et al., arxiv 2025. [paper][code]
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis, Liao et al., arxiv 2024. [paper][code][Bert-VITS2]
YuE: Scaling Open Foundation Models for Long-Form Music Generation, Yuan et al., arxiv 2025. [paper][code][MuQ][musiclm-pytorch][Seed-Music][XMusic][MusicGen][InspireMusic][SongGen][NotaGen][DiffRhythm][MusiCoT]
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction, Step-Audio Team, arxiv 2025. [paper][code][Baichuan-Audio]
AudioX: Diffusion Transformer for Anything-to-Audio Generation, Tian et al., arxiv 2025. [paper][code]
Github Repositories
[coqui-ai/TTS][suno-ai/bark][ChatTTS][fish-speech][CSM][WhisperSpeech][MeloTTS][parler-tts][MARS5-TTS][metavoice-src][OuteTTS][RealtimeTTS][RealtimeSTT][Zonos][Orpheus-TTS]
[stable-audio-tools][Qwen-Audio][pyannote-audio][ims-toucan][AudioLCM][speech-to-speech][ichigo][TEN-Agent]
[FunASR][FunClip][FunAudioLLM][TeleSpeech-ASR][EmotiVoice][wenet][nanospeech][r1-aqa]
[SadTalker][Wav2Lip][video-retalking][SadTalker-Video-Lip-Sync][LatentSync][AniPortrait][GeneFacePlusPlus][V-Express][MuseTalk][EchoMimic][echomimic_v2][MimicTalk][Real3DPortrait][MiniMates][Linly-Talker]
[Retrieval-based-Voice-Conversion-WebUI][Awesome-ChatTTS]
[speech-trident][AudioNotes][pyvideotrans][VideoLingo][outspeed][VideoChat][MMAudio][pipecat][PDF2Audio][Open-LLM-VTuber]

2. Blip

ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation, Li et al., NeurIPS 2021. [paper][code]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, Li et al., ICML 2022. [paper][code][laion-coco]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, Li et al., ICML 2023. [paper][code]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning, Dai et al., arxiv 2023. [paper][code]
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning, Panagopoulou et al., arxiv 2023. [paper][code]
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models, Xue et al., arxiv 2024. [paper][code]
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations, Qin et al., arxiv 2024. [paper][code]
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs, Ryoo et al., arxiv 2024. [paper]
LAVIS: A Library for Language-Vision Intelligence, Li et al., arxiv 2022. [paper][code]
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts, Bao et al., NeurIPS 2022. [paper][code]
BEiT: BERT Pre-Training of Image Transformers, Bao et al., ICLR 2022 Oral presentation. [paper][code]
BeiT-V3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Wang et al., CVPR 2023. [paper][code]

3. Clip

CLIP: Learning Transferable Visual Models From Natural Language Supervision, Radford et al., ICML 2021. [paper][code][open_clip][clip-as-service][SigLIP][EVA][DIVA][Clip-Forge]
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code]
GLIPv2: Unifying Localization and Vision-Language Understanding, Zhang et al., NeurIPS 2022. [paper][code][GLIGEN]
SigLIP: Sigmoid Loss for Language Image Pre-Training, Zhai et al, arxiv 2023. [paper][SigLIP 2][siglip]
EVA-CLIP: Improved Training Techniques for CLIP at Scale, Sun et al., arxiv 2023. [paper][code][EVA-CLIP-18B]
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese, Yang et al., arxiv 2022. [paper][code]
MetaCLIP: Demystifying CLIP Data, Xu et al., ICLR 2024 Spotlight. [paper][code]
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want, Sun et al., arxiv 2023. [paper][code][Bootstrap3D]
MMVP: Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training, Vasu et al., CVPR 20224. [paper][code]
Long-CLIP: Unlocking the Long-Text Capability of CLIP, Zhang et al., ECCV 2024. [paper][code][Inf-CLIP]
CLOC: Contrastive Localized Language-Image Pre-Training, Chen et al., arxiv 2024. [paper]
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation, Huang et al., arxiv 2024. [paper][code]
SuperClass: Classification Done Right for Vision-Language Pre-Training, Huang et al., NeurIPS 2024. [paper][code]
AIM-v2: Multimodal Autoregressive Pre-training of Large Vision Encoders, Fini et al., arxiv 2024. [paper][code]
Scaling Pre-training to One Hundred Billion Data for Vision Language Models, Wang et al., arxiv 2025. [paper][Scaling Vision Pre-Training to 4K Resolution]

4. Diffusion Model

Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arxiv 2024. [paper][diffusion-models-class]
Denoising Diffusion Probabilistic Models, Ho et al., NeurIPS 2020. [paper][code][Pytorch Implementation][RDDM]
Improved Denoising Diffusion Probabilistic Models, Nichol and Dhariwal, ICML 2021. [paper][code]
Diffusion Models Beat GANs on Image Synthesis, Dhariwal and Nichol, NeurIPS 2021. [paper][code]
Classifier-Free Diffusion Guidance, Ho and Salimans, NeurIPS 2021. [paper][code]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Nichol et al., arxiv 2021. [paper][code]
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code][dalle-mini]
Stable-Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models, Rombach et al., CVPR 2022. [paper][code][CompVis/stable-diffusion][Stability-AI/stablediffusion][ml-stable-diffusion][cleandift]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et al., arxiv 2023. [paper][code][SDXL-Lightning]
Introducing Stable Cascade, Stability AI, 2024. [link][code][model]
SDXL-Turbo: Adversarial Diffusion Distillation, Sauer et al., arxiv 2023. [paper][code]
LCM: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Luo et al., arxiv 2023. [paper][code][Hyper-SD][DMD2][ddim]
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module, Luo et al., arxiv 2023. [paper][code][diffusion-forcing][InstaFlow]
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Esser et al., ICML 2024 Best Paper. [paper][model][mmdit]
SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation, Sauer et al., arxiv 2024. [paper][SD3.5]
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation, Kodaira et al., arxiv 2023. [paper][code]
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models, Marjit et al., arxiv 2024. [paper][code]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][code]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Blattmann et al., arxiv 2023. [paper][code][Stable Video 4D][VideoCrafter][Video-Infinity]
Consistency Models, Song et al., arxiv 2023. [paper][code][Consistency Decoder]
sCM: Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models, Lu and Song, arxiv 2024. [paper][blog]
A Survey on Video Diffusion Models, Xing et al., srxiv 2023. [paper][code]
Diffusion Models: A Comprehensive Survey of Methods and Applications, Yang et al., arxiv 2023. [paper][code]
MAGVIT2: Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Yu et al., ICLR 2024. [paper][magvit2-pytorch][Open-MAGVIT2][LlamaGen]
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models, Avrahami et al., arxiv 2023. [paper][code]
U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models, Bao et al., CVPR 2023. [paper][code][RIFLEx]
UniDiffuser: One Transformer Fits All Distributions in Multi-Modal Diffusion, Bao et al., arxiv 2023. [paper][code]
Matryoshka Diffusion Models, Gu et al., arxiv 2023. [paper][code]
SEDD: Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution, Lou et al., ICML 2024 Best Paper. [paper][code]
l-DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, Chen et al., ICLR 2025. [paper]
DiT: Scalable Diffusion Models with Transformers, Peebles et al., ICCV 2023 Oral. [paper][code][VideoSys][MDT][fast-DiT][FastVideo][xDiT][PipeFusion][rlt][U-DiT][LightningDiT]
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers, Ma et al., arxiv 2024. [paper][code]
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis, Ren et al., NeurIPS 2024. [paper][model][AdaCache]
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer, Yang et al., arxiv 2024. [paper][code]
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, Chen et al., arxiv 2024. [paper][code]
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget, Sehwag et al., arxiv 2024. [paper][code][tiny-stable-diffusion]
Training Video Foundation Models with NVIDIA NeMo, Patel et al., arxiv 2025. [paper][code]
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model, Zhou et al. ICLR 2025. [paper][transfusion-pytorch][chameleon][MonoFormer][MetaMorph][LlamaFusion]
REPA: Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think, Yu et al., arxiv 2024. [paper][code]
In-Context LoRA for Diffusion Transformers, Huang et al., arxiv 2024. [paper][code]
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models, Li et al., arxiv 2024. [paper][code][SnapGen]
Training-free Regional Prompting for Diffusion Transformers, Chen et al., arxiv 2024. [paper][code][Add-it][RAG-Diffusion][DiTCtrl]
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps, Ma et al., arxiv 2025. [paper][SANA 1.5][Image-Generation-CoT][Reflect-DiT][Video-T1]
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers, Shi et al., arxiv 2025. [paper][code][Expert Race]
Github Repositories
[Awesome-Diffusion-Models][Awesome-Video-Diffusion]
[stable-diffusion-webui][stable-diffusion-webui-colab][sd-webui-controlnet][stable-diffusion-webui-forge][automatic]
[Fooocus][Omost]
[ComfyUI][streamlit][gradio][ComfyUI-Workflows-ZHO][ComfyUI_Bxb]
[diffusers][DiffSynth-Studio]

5. Multimodal LLM

LLaVA: Visual Instruction Tuning, Liu et al., NeurIPS 2023 Oral. [paper][code][ViP-LLaVA][LLaVA-pp][TinyLLaVA_Factory][LLaVA-RLHF][LLaVA-KD]
LLaVA-1.5: Improved Baselines with Visual Instruction Tuning, Liu et al., arxiv 2023. [paper][code][LLaVA-UHD][LLaVA-HR]
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models, Li et al., arxiv 2024. [paper][code][LLaVA-Plus-Codebase][Open-LLaVA-NeXT][MG-LLaVA][LongVA][LongLLaVA][LLaVA-Mini]
LLaVA-OneVision: Easy Visual Task Transfer, Li et al., arxiv 2024. [paper][code]
LLaVA-Video: Video Instruction Tuning With Synthetic Data, Zhang et al., arxiv 2024. [paper][code][LLaVA-Critic][LLaVA-Video-178K][VLog]
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day, Li et al., arxiv 2023. [paper][code]
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection, Lin et al., EMNLP 2024. [paper][code][PLLaVA][ml-slowfast-llava]
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, Lin et al., arxiv 2024. [paper][code]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, Zhu et al., arxiv 2023. [paper][code][MiniGPT-4-ZH]
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, Chen et al., arxiv 2023. [paper][code]
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens, Ataallah et al., arxiv 2024. [paper][code]
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens, Zheng et al., arxiv 2023. [paper][code]
Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS 2022. [paper][open-flamingo][flamingo-pytorch]
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding, Zhang et al., EMNLP 2023. [paper][code][VideoLLaMA2][VideoLLaMA3][VideoRefer][VideoLLM-online][LLaMA-VID]
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs, Zhao et al., arxiv 2023. [paper][code][OFA][AnyGPT]
Emu: Generative Pretraining in Multimodality, Sun et al., ICLR 2024. [paper][code]
Emu3: Next-Token Prediction is All You Need, Wang et al., arxiv 2024. [paper][code]
EVE: Unveiling Encoder-Free Vision-Language Models, Diao et al., NeurIPS 2024. [paper][code][EVEv2]
DreamLLM: Synergistic Multimodal Comprehension and Creation, Dong et al., ICLR 2024 Spotlight. [paper][code][dreambench_plus]
Meta-Transformer: A Unified Framework for Multimodal Learning, Zhang et al., arxiv 2023. [paper][code]
NExT-GPT: Any-to-Any Multimodal LLM, Wu et al., arxiv 2023. [paper][code]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, Wu et al., arxiv 2023. [paper][code]
SoM: Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V, Yang et al., arxiv 2023. [paper][code]
Ferret: Refer and Ground Anything Anywhere at Any Granularity, You et al., arxiv 2023. [paper][code][Ferret-UI][Ferret-UI 2]
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities, Bachmann et al., arxiv 2024. [paper][code][MM1.5]
CogVLM: Visual Expert for Pretrained Language Models, Wang et al., arxiv 2023. [paper][code][VisualGLM-6B][CogCoM]
CogVLM2: Visual Language Models for Image and Video Understanding, Hong et al., arxiv 2024. [paper][code][glm-4v-9b]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, Bai et al., arxiv 2023. [paper][code]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution, Wang et al., arxiv 2024. [paper][code][modeling_qwen2_vl.py][qwen2.5-vl blog][finetune-Qwen2-VL][Qwen2-VL-Finetune][Oryx][Video-XL][Video-ChatGPT]
Qwen2.5-VL Technical Report, Bai et al., arxiv 2025. [paper][code]
Qwen2.5-Omni Technical Report, Xu et al., arxiv 2025. [paper][code]
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition, Zhang et al., arxiv 2023. [paper][code][InternLM-XComposer2.5-OmniLive][InternLM-XComposer2.5-Reward]
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks, Chen et al., CVPR 2024 Oral. [paper][code][InternVideo][InternVid][InternVL1.5 paper][Mono-InternVL][InternVL2.5 paper]
DeepSeek-VL: Towards Real-World Vision-Language Understanding, Lu et al., arxiv 2024. [paper][code]
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding, Wu et al., arxiv 2024. [paper][code]
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, Wu et al., arxiv 2024. [paper][code][JanusFlow]
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling, Chen et al., arxiv 2025. [paper][code]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions, Chen et al., arxiv 2023. [paper][code]
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions, Chen et al., arxiv 2024. [paper][code]
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones, Yuan et al., arxiv 2023. [paper][code]
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models, Li et al., CVPR 2024. [paper][code]
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models, Wei et al., arxiv 2023. [paper][code]
Vary-toy: Small Language Model Meets with Reinforced Vision Vocabulary, Wei et al., arxiv 2024. [paper][code][Slow-Perception]
VILA: On Pre-training for Visual Language Models, Lin et al., CVPR 2024. [paper][code][LongVILA][Eagle][NVLM][NVILA]
POINTS1.5: Building a Vision-Language Model towards Real World Applications, Liu et al., arxiv 2024. [paper][code][POINTS][Multi-Modal Generative Embedding Model][Number it: Temporal Grounding Videos like Flipping Manga][Valley][RDTF]
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code][Navigation World Models][iVideoGPT][VideoWorld][OpenDWM]
Cosmos World Foundation Model Platform for Physical AI, NVIDIA, arxiv 2025. [paper][code][Cosmos-Tokenizer][Genesis][mujoco_menagerie]
Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code][X-Prompt]
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts, Li et al., arxiv 2024. [paper][code]
RL4VLM: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning, Zhai et al., arxiv 2024. [paper][code][RLHF-V][RLAIF-V][MM-RLHF][OmniAlign-V][VisRL]
Visual-RFT: Visual Reinforcement Fine-Tuning, Liu et al., arxiv 2025. [paper][code][UnifiedReward]
OpenVLA: An Open-Source Vision-Language-Action Model, Kim et al., arxiv 2024. [paper][code][openpi][π0][Emma-X][RoboVLMs][RoboFlamingo][SpatialVLA][SpatialLM]
Magma: A Foundation Model for Multimodal AI Agents, Yang et al., CVPR 2025. [paper][code]
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis, Fu et al., arxiv 2024. [paper][code][lmms-eval][VLMEvalKit][multimodal-needle-in-a-haystack][MM-NIAH][VideoNIAH][ChartMimic][WildVision][HourVideo]
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities, Yu et al., ICML 2024. [paper][code][UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling][Thinking in Space]
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code][LVLM_Interpretation]
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models, Sun et al., ICML 2024. [paper][code]
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation, Chern et al., arxiv 2024. [paper][code]
PaliGemma: A versatile 3B VLM for transfer, Beyer et al., arxiv 2024. [paper][code][pytorch-paligemma][PaliGemma 2]
Pixtral 12B, Agrawal et al., arxiv 2024. [paper][webpage][Pixtral-12B-2409][Pixtral-Large-Instruct-2411][Mistral Small 3.1]
MiniCPM-V: A GPT-4V Level MLLM on Your Phone, Yao et al., arxiv 2024. [paper][code][blog][VisCPM][RLHF-V][RLAIF-V]
VITA: Towards Open-Source Interactive Omni Multimodal LLM, Fu et al., arxiv 2024. [paper][code][VITA-1.5][Freeze-Omni][Long-VITA][Lyra][Ola]
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation, Xie et al., ICLR 2025. [paper][code][Show-o-Turbo][OmniGen][Transfusion][VILA-U][LWM][VARGPT][HermesFlow]
MIO: A Foundation Model on Multimodal Tokens, Wang et al., arxiv 2024. [paper][Emu3]
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities, Xie and Wu, arxiv 2024. [paper][code][moshivis]
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding, Shen et al., arxiv 2024. [paper][code][Video-XL][VisionZip][Apollo]
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models, Liang et al., arxiv 2024. [paper][Transfusion]
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing, Fei et al., NeurIPS 2024. [paper][code]
[MiniCPM-V][moondream][MobileVLM][OmniFusion][Bunny][MiCo][Vitron][mPLUG-Owl][mPLUG-DocOwl][Ovis][Aria][unicom][Infini-Megrez]
[datacomp][MMDU][MINT-1T][OpenVid-1M][SkyScript-100M][FineVideo]
[mllm][lmms-finetune]

6. Text2Image

DALL-E: Zero-Shot Text-to-Image Generation, Ramesh et al., arxiv 2021. [paper][code]
DALL-E3: Improving Image Generation with Better Captions, Betker et al., OpenAI 2023. [paper][code][blog][Glyph-ByT5]
ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models, Zhang et al., ICCV 2023 Marr Prize. [paper][code][ControlNet_Plus_Plus][ControlNeXt][ControlAR][OminiControl][ROICtrl]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, Mou et al., AAAI 2024. [paper][code]
AnyText: Multilingual Visual Text Generation And Editing, Tuo et al., arxiv 2023. [paper][code]
RPG: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs, Yang et al., ICML 2024. [paper][code][IterComp]
LAION-5B: An open large-scale dataset for training next generation image-text models, Schuhmann et al., NeurIPS 2022. [paper][code][blog][laion-coco][multimodal_textbook][kangas]
DeepFloyd IF: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., arxiv 2022. [paper][code]
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., NeurIPS 2022. [paper][unofficial code][Imagen Video]
Instruct-Imagen: Image Generation with Multi-modal Instruction, Hu et al., arxiv 2024. [paper][Imagen 3]
CogView: Mastering Text-to-Image Generation via Transformers, Ding et al., NeurIPS 2021. [paper][code][ImageReward]
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers, Ding et al., arxiv 2022. [paper][code]
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion, Zheng et al., ECCV 2024. [paper][code][CogView4]
TextDiffuser: Diffusion Models as Text Painters, Chen et al., arxiv 2023. [paper][code]
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering, Chen et al., arxiv 2023. [paper][code]
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, Chen et al., arxiv 2023. [paper][code]
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models, Chen et al., arxiv 2024. [paper][code]
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation, Chen et al., arxiv 2024. [paper][code]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, Ye et al., arxiv 2023. [paper][code][ID-Animator][InstantID]
Controllable Generation with Text-to-Image Diffusion Models: A Survey, Cao et al., arxiv 2024. [paper][code]
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation, Zhou et al., NeurIPS 2024. [paper][code][AutoStudio][story-adapter]
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding, Li et al., arxiv 2024. [paper][code][Hunyuan3D-1][Hunyuan3D-2][FlashVDM][xDiT]
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation, Li et al., CVPR 2024. [paper][t2v_metrics][VQAScore]
[Kolors][Kolors-Virtual-Try-On][EVLM: An Efficient Vision-Language Model for Visual Understanding]
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models, Zhao et al., NeurIPS 2024. [paper][code]
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens, Fan et al., arxiv 2024. [paper]
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis, Bai et al., arxiv 2024. [paper][code]
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers, Xie et al., ICLR 2025. [paper][code][SANA 1.5][SANA-Sprint]
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model, Gong et al., arxiv 2025. [paper]
[flux][x-flux][x-flux-comfyui][FLUX.1-dev-LoRA][qwen2vl-flux][1.58-bit FLUX][3DIS]

7. Text2Video

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al., arxiv 2023. [paper][code][Animate Anyone 2][Open-AnimateAnyone][Moore-AnimateAnyone][AnimateAnyone][UniAnimate][Animate-X][StableAnimator][DisPose]
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, Tian et al., arxiv 2024. [paper][code][V-Express][EMO2]
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation, Wei wt al., arxiv 2024. [paper][code]
DreaMoving: A Human Video Generation Framework based on Diffusion Models, Feng et al., arxiv 2023. [paper][code]
MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model, Xu et al., CVPR 2024. [paper][code][champ][MegActor][X-Dyna]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors, Xing et al., ECCV 2024. [paper][code]
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control, Guo et al., arxiv 2024. [paper][code][FasterLivePortrait][FollowYourEmoji]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al., arxiv 2023. [paper][code]
[Awesome-Video-Diffusion]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][video-diffusion-pytorch]
Make-A-Video: Text-to-Video Generation without Text-Video Data, Singer et al., arxiv 2022. [paper][make-a-video-pytorch][Make-An-Audio-2]
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al., ICCV 2023. [paper][code]
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al., ICCV 2023 Oral. [paper][code][StreamingT2V][ControlVideo]
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers, Hong et al., ICLR 2023. [paper][code]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, Yang et al., arxiv 2024. [paper][code][VisionReward][finetrainers][VideoTuna][TransPixar][STAR][FlashVideo][CogVideoX-Fun]
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al., AAAI 2024. [paper][code][Follow-Your-Pose v2][Follow-Your-Emoji]
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts, Ma et al., arxiv 2024. [paper][code]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, Guo et al., arxiv 2023. [paper][code][AnimateDiff-Lightning]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al., ICCV 2023. [paper][code]
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models, Zhang et al., arxiv 2023. [paper][code]
TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos, Wang et al., arxiv 2023. [paper][code]
Lumiere: A Space-Time Diffusion Model for Video Generation, Bar-Tal et al., arxiv 2024. [paper][lumiere-pytorch]
Sora: Creating video from text, OpenAI, 2024. [blog][product][Generative Models for Image and Long Video Synthesis][Generative Models of Images and Neural Networks][Open-Sora][VideoSys][Open-Sora-Plan][minisora][SoraWebui][MuseV][PhysDreamer][easyanimate]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Liu et al., arxiv 2024. [paper][code]
How Far is Video Generation from World Model: A Physical Law Perspective, Kang et al., arxiv 2024. [paper][code][Generative Physical AI in Vision: A Survey][PhysicsGen]
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework, Yuan et al., arxiv 2024. [paper][code]
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, Dehghani et al., NeurIPS 2024. [paper][unofficial code]
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Kondratyuk et al., ICML 2024 Best Paper. [paper]
Latte: Latent Diffusion Transformer for Video Generation, Ma et al., arxiv 2024. [paper][code][LaVIT][LaVie][VBench][Vchitect-2.0][LiteGen]
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, Menapace et al., arxiv 2024. [paper][articulated-animation]
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance, Feng et al., arxiv 2024. [paper][code][Qihoo-T2X]
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos, Hu et al., arxiv 2024. [paper][code]
Loong: Generating Minute-level Long Videos with Autoregressive Language Models, Wang et al., arxiv 2024. [paper]
Pyramidal Flow Matching for Efficient Video Generative Modeling, Jin et al., arxiv 2024. [paper][code][LaVIT][ml-tarflow]
Goku: Flow Based Video Generative Foundation Models, Chen et al., arxiv 2025. [paper][code]
Allegro: Open the Black Box of Commercial-Level Video Generation Model, Zhou et al., arxiv 2024. [paper][code]
Open-Sora Plan: Open-Source Large Video Generation Model, Lin et al., arxiv 2024. [paper][code][Open-Sora][ConsisID][CoDeF]
Open-Sora: Democratizing Efficient Video Production for All, Zheng et al., arxiv 2024. [paper][code][Open-Sora 2.0][VideoSys]
Movie Gen: A Cast of Media Foundation Models, The Movie Gen team @ Meta, 2024. [blog][paper][unofficial code][VideoJAM]
HunyuanVideo: A Systematic Framework For Large Video Generative Models, Kong et al., arxiv 2024. [paper][code][HunyuanVideo-I2V][FastVideo][HunyuanVideoGP]
NOVA: Autoregressive Video Generation without Vector Quantization, Deng et al., ICLR 2025. [paper][code][Emu3]
LTX-Video: Realtime Video Latent Diffusion, HaCohen et al., arxiv 2025. [paper][code]
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model, Step-Video Team, arxiv 2025. [paper][code][Step-Video-TI2V]
Wan: Open and Advanced Large-Scale Video Generative Models, Wan Team, Alibaba Group, arxiv 2025. [paper][code][VACE]
[SkyReels-V1][SkyReels-A1]
[MoneyPrinterTurbo][MoneyPrinterV2][clapper][videos][manim][ManimML][TheoremExplainAgent][Mochi 1][genmoai-smol][Kandinsky-4][story-flicks][Cosmos]

8. Survey for Multimodal

A Survey on Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][Awesome-Multimodal-Large-Language-Models][Aligning Multimodal LLM with Human Preference: A Survey][MME][MME-Survey]
Multimodal Foundation Models: From Specialists to General-Purpose Assistants, Li et al., arxiv 2023. [paper][cvinw_readings]
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities, Lu et al., arxiv 2024. [paper][Leaderboards]
Efficient Multimodal Large Language Models: A Survey, Jin et al., arxiv 2024. [paper][code]
An Introduction to Vision-Language Modeling, Bordes et al., arxiv 2024. [paper][Video Diffusion Models: A Survey]
Building and better understanding vision-language models: insights and future directions, Laurençon et al., arxiv 2024. [paper]
Video Understanding with Large Language Models: A Survey, Tang et al., arxiv 2023. [paper][code]
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey, Chen et al., arxiv 2024. [paper][code]
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review, Fu et al., arxiv 2025. [paper]

9. Other

Fuyu-8B: A Multimodal Architecture for AI Agents Bavishi et al., Adept blog 2023. [blog][model]
Otter: A Multi-Modal Model with In-Context Instruction Tuning, Li et al., arxiv 2023. [paper][code]
OtterHD: A High-Resolution Multi-modality Model, Li et al., arxiv 2023. [paper][code][model]
CM3leon: Scaling Autoregressive Multi-Modal Models_Pretraining and Instruction Tuning, Yu et al., arxiv 2023. [paper][Unofficial Implementation]
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer, Tian et al., arxiv 2024. [paper][code]
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations, Qi et al., arxiv 2024. [paper][code]
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models, Gao et al., arxiv 2024. [paper][code]
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers, Gao et al., arxiv 2024. [paper][code][Lumina-Image-2.0]
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining, Liu et al., arxiv 2024. [paper][code][Lumina-Video]
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]
Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code][X-Prompt]
*SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation, Ge et al., arxiv 2024. [paper][code][SEED][SEED-Story]

Reinforcement Learning

1.Basic for RL

Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy, 2016. [blog][reinforcement-learning-an-introduction][easy-rl][deep-rl-course][wangshusen/DRL]
DQN: Playing Atari with Deep Reinforcement Learning, Mnih et al., arxiv 2013. [paper][code]
DQNNaturePaper: Human-level control through deep reinforcement learning, Mnih et al., Nature 2015. [paper][DQN-tensorflow][DQN_pytorch]
DDQN: Deep Reinforcement Learning with Double Q-learning, Hasselt et al., AAAI 2016. [paper][RL-Adventure][deep-q-learning][Deep-RL-Keras]
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hesssel et al., AAAI 2018. [paper][Rainbow]
DDPG: Continuous control with deep reinforcement learning, Lillicrap et al., ICLR 2016. [paper][pytorch-ddpg]
PPO: Proximal Policy Optimization Algorithms, Schulman et al., arxiv 2017. [paper][code][trl ppo_trainer][PPO-PyTorch][implementation-matters][PPOxFamily][The 37 Implementation Details of PPO][ppo-implementation-details]
Diffusion Models for Reinforcement Learning: A Survey, Zhu et al., arxiv 2023. [paper][code][diffusion_policy]
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, Matthias Lehmann, arxiv 2024. [paper][code]
MR.Q: Towards General-Purpose Model-Free Reinforcement Learning, Fujimoto et al., arxiv 2025. [paper][code]
[tianshou][rlkit][pytorch-a2c-ppo-acktr-gail][Safe-Reinforcement-Learning-Baselines][CleanRL][openrl][ElegantRL][spinningup]

2. LLM for decision making

Decision Transformer_Reinforcement Learning via Sequence Modeling, Chen et al., NeurIPS 2021. [paper][code]
Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al., NeurIPS 2021. [paper][code]
Guiding Pretraining in Reinforcement Learning with Large Language Models, Du et al., ICML 2023. [paper][code]
Introspective Tips: Large Language Model for In-Context Decision Making, Chen et al., arxiv 2023. [paper]
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Chebotar et al., CoRL 2023. [paper][Unofficial Implementation]
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods, Cao et al., arxiv 2024. [paper]

GNN

[GNNPapers][dgl][Awesome_Graph_Foundation_Models]
A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., Distill 2021. [paper]
CS224W: Machine Learning with Graphs, Stanford. [link]
GCN: Semi-Supervised Classification with Graph Convolutional Networks, Kipf and Welling, ICLR 2017. [paper][code][pygcn]
GAE: Variational Graph Auto-Encoders, Kipf and Welling, arxiv 2016. [paper][code][gae-pytorch]
GAT: Graph Attention Networks, Veličković et al., ICLR 2018. [paper][code][pyGAT][pytorch-GAT]
GIN: How Powerful are Graph Neural Networks?, Xu et al., ICLR 2019. [paper][code]
Graphormer: Do Transformers Really Perform Bad for Graph Representation, Ying et al., NeurIPS 2021. [paper][code]
GraphGPT: Graph Instruction Tuning for Large Language Models, Tang et al., SIGIR 2024. [paper][code][Graph-Bert][(G2PT]
OpenGraph: Towards Open Graph Foundation Models, Xia et al., arxiv 2024. [paper][code][AnyGraph][GraphAgent][openspg]
A Survey of Large Language Models for Graphs, Ren et al., KDD 2024. [paper][code]
[pytorch_geometric][GNN-Recommender-Systems]

Survey for GNN

Transformer Architecture

Attention is All you Need, Vaswani et al., NIPS 2017. [paper][code][transformer-debugger][The Illustrated Transformer][The Random Transformer][The Annotated Transformer][Transformers-Tutorials][x-transformers][matmulfreellm]
RoPE: RoFormer: Enhanced Transformer with Rotary Position Embedding, Su et al., arxiv 2021. [paper][code][rotary-embedding-torch][rerope][blog][positional_embedding][longformer]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, Ainslie et al., arxiv 2023. [paper][unofficial code][MLA blog][FlashMLA][MFA][TPA][Sigma][TransMLA][MHA2MLA]
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention, Yuan et al., arxiv 2025. [paper][NSA-pytorch][native-sparse-attention][flash-linear-attention]
MoBA: Mixture of Block Attention for Long-Context LLMs, Lu et al., [paper][code]
RWKV: Reinventing RNNs for the Transformer Era, Peng et al., EMNLP 2023. [website][paper][code][ChatRWKV][rwkv.cpp]
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, Peng et al., arxiv 2024. [paper][code][Awesome-RWKV-in-Vision][RWKV-7]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, Gu and Dao, COLM 2024. [paper][code][Transformers are SSMs][mamba-minimal][Awesome-Mamba-Papers][Falcon Mamba][flash-linear-attention]
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, De et al., arxiv 2024. [paper][recurrentgemma]
Jamba: A Hybrid Transformer-Mamba Language Model, Lieber et al., arxiv 2024. [paper][model][Samba]
Neural Network Diffusion, Wang et al., arxiv 2024. [paper][code][GPD][tree-diffusion]
KAN: Kolmogorov-Arnold Networks, Liu et al., arxiv 2024. [paper][code][KAN 2.0][efficient-kan][kan-gpt][Convolutional-KANs][kat][FAN]
xLSTM: Extended Long Short-Term Memory, Beck et al., arxiv 2024. [paper][code][vision-lstm][xLSTM 7B][PyxLSTM][xlstm-cuda][Attention as an RNN][Were RNNs All We Needed]
TTT: Learning to (Learn at Test Time): RNNs with Expressive Hidden States, Sun et al., arxiv 2024. [paper][ttt-lm-pytorch][marc][Titans: Learning to Memorize at Test Time][titans-pytorch][Test-Time Training with Self-Supervision for Generalization under Distribution Shifts]
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters, Wang et al., arxiv 2024. [paper][code]
Byte Latent Transformer: Patches Scale Better Than Tokens, Pagnoni et al., arxiv 2024. [paper][code][lingua][Large Concept Models][LLM Pretraining with Continuous Concepts]

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-AI-Papers

Similar Open Source Tools

LLM4IR-Survey is a collection of papers related to large language models for information retrieval, organized according to the survey paper 'Large Language Models for Information Retrieval: A Survey'. It covers various aspects such as query rewriting, retrievers, rerankers, readers, search agents, and more, providing insights into the integration of large language models with information retrieval systems.

github

: 390

llm-misinformation-survey

The 'llm-misinformation-survey' repository is dedicated to the survey on combating misinformation in the age of Large Language Models (LLMs). It explores the opportunities and challenges of utilizing LLMs to combat misinformation, providing insights into the history of combating misinformation, current efforts, and future outlook. The repository serves as a resource hub for the initiative 'LLMs Meet Misinformation' and welcomes contributions of relevant research papers and resources. The goal is to facilitate interdisciplinary efforts in combating LLM-generated misinformation and promoting the responsible use of LLMs in fighting misinformation.

github

: 68

awesome-llm-attributions

This repository focuses on unraveling the sources that large language models tap into for attribution or citation. It delves into the origins of facts, their utilization by the models, the efficacy of attribution methodologies, and challenges tied to ambiguous knowledge reservoirs, biases, and pitfalls of excessive attribution.

github

: 152

awesome-tool-llm

This repository focuses on exploring tools that enhance the performance of language models for various tasks. It provides a structured list of literature relevant to tool-augmented language models, covering topics such as tool basics, tool use paradigm, scenarios, advanced methods, and evaluation. The repository includes papers, preprints, and books that discuss the use of tools in conjunction with language models for tasks like reasoning, question answering, mathematical calculations, accessing knowledge, interacting with the world, and handling non-textual modalities.

github

: 114

LLM-Agent-Paper-List

github

: 5.8k

LLMSurvey

github

: 9.0k

Knowledge-Conflicts-Survey

Knowledge Conflicts for LLMs: A Survey is a repository containing a survey paper that investigates three types of knowledge conflicts: context-memory conflict, inter-context conflict, and intra-memory conflict within Large Language Models (LLMs). The survey reviews the causes, behaviors, and possible solutions to these conflicts, providing a comprehensive analysis of the literature in this area. The repository includes detailed information on the types of conflicts, their causes, behavior analysis, and mitigating solutions, offering insights into how conflicting knowledge affects LLMs and how to address these conflicts.

github

: 64

awesome-deeplogic

Awesome deep logic is a curated list of papers and resources focusing on integrating symbolic logic into deep neural networks. It includes surveys, tutorials, and research papers that explore the intersection of logic and deep learning. The repository aims to provide valuable insights and knowledge on how logic can be used to enhance reasoning, knowledge regularization, weak supervision, and explainability in neural networks.

github

: 214

LLM-Agent-Survey

LLM-Agent-Survey is a comprehensive repository that provides a curated list of papers related to Large Language Model (LLM) agents. The repository categorizes papers based on LLM-Profiled Roles and includes high-quality publications from prestigious conferences and journals. It aims to offer a systematic understanding of LLM-based agents, covering topics such as tool use, planning, and feedback learning. The repository also includes unpublished papers with insightful analysis and novelty, marked for future updates. Users can explore a wide range of surveys, tool use cases, planning workflows, and benchmarks related to LLM agents.

github

: 113

Awesome-Embodied-Agent-with-LLMs

This repository, named Awesome-Embodied-Agent-with-LLMs, is a curated list of research related to Embodied AI or agents with Large Language Models. It includes various papers, surveys, and projects focusing on topics such as self-evolving agents, advanced agent applications, LLMs with RL or world models, planning and manipulation, multi-agent learning and coordination, vision and language navigation, detection, 3D grounding, interactive embodied learning, rearrangement, benchmarks, simulators, and more. The repository provides a comprehensive collection of resources for individuals interested in exploring the intersection of embodied agents and large language models.

github

: 1.2k

Awesome-LLM-in-Social-Science

Awesome-LLM-in-Social-Science is a repository that compiles papers evaluating Large Language Models (LLMs) from a social science perspective. It includes papers on evaluating, aligning, and simulating LLMs, as well as enhancing tools in social science research. The repository categorizes papers based on their focus on attitudes, opinions, values, personality, morality, and more. It aims to contribute to discussions on the potential and challenges of using LLMs in social science research.

github

: 388

MedLLMsPracticalGuide

This repository serves as a practical guide for Medical Large Language Models (Medical LLMs) and provides resources, surveys, and tools for building, fine-tuning, and utilizing LLMs in the medical domain. It covers a wide range of topics including pre-training, fine-tuning, downstream biomedical tasks, clinical applications, challenges, future directions, and more. The repository aims to provide insights into the opportunities and challenges of LLMs in medicine and serve as a practical resource for constructing effective medical LLMs.

github

: 1.3k

Call-for-Reviewers

The `Call-for-Reviewers` repository aims to collect the latest 'call for reviewers' links from various top CS/ML/AI conferences/journals. It provides an opportunity for individuals in the computer/ machine learning/ artificial intelligence fields to gain review experience for applying for NIW/H1B/EB1 or enhancing their CV. The repository helps users stay updated with the latest research trends and engage with the academic community.

github

: 688

awesome_LLM-harmful-fine-tuning-papers

This repository is a comprehensive survey of harmful fine-tuning attacks and defenses for large language models (LLMs). It provides a curated list of must-read papers on the topic, covering various aspects such as alignment stage defenses, fine-tuning stage defenses, post-fine-tuning stage defenses, mechanical studies, benchmarks, and attacks/defenses for federated fine-tuning. The repository aims to keep researchers updated on the latest developments in the field and offers insights into the vulnerabilities and safeguards related to fine-tuning LLMs.

github

: 145

Awesome-LLM-Robotics

This repository contains a curated list of **papers using Large Language/Multi-Modal Models for Robotics/RL**. Template from awesome-Implicit-NeRF-Robotics Please feel free to send me pull requests or email to add papers! If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others! ## Overview * Surveys * Reasoning * Planning * Manipulation * Instructions and Navigation * Simulation Frameworks * Citation

github

: 3.5k

For similar tasks

No tools available

For similar jobs

No tools available