Awesome-AgenticLLM-RL-Papers

None

Stars: 245

Visit

This repository serves as the official source for the survey paper 'The Landscape of Agentic Reinforcement Learning for LLMs: A Survey'. It provides an extensive overview of various algorithms, methods, and frameworks related to Agentic RL, including detailed information on different families of algorithms, their key mechanisms, objectives, and links to relevant papers and resources. The repository covers a wide range of tasks such as Search & Research Agent, Code Agent, Mathematical Agent, GUI Agent, RL in Vision Agents, RL in Embodied Agents, and RL in Multi-Agent Systems. Additionally, it includes information on environments, frameworks, and methods suitable for different tasks related to Agentic RL and LLMs.

README:

Awesome-AgenticLLM-RL-Papers

This is the Official repo for the survey paper: The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

ArXiv – https://arxiv.org/abs/2509.02547

HuggingFace – https://huggingface.co/papers/2509.02547

Citation

@misc{zhang2025landscapeagenticreinforcementlearning,
      title={The Landscape of Agentic Reinforcement Learning for LLMs: A Survey}, 
      author={Guibin Zhang and Hejia Geng and Xiaohang Yu and Zhenfei Yin and Zaibin Zhang and Zelin Tan and Heng Zhou and Zhongzhi Li and Xiangyuan Xue and Yijiang Li and Yifan Zhou and Yang Chen and Chen Zhang and Yutao Fan and Zihu Wang and Songtao Huang and Yue Liao and Hongru Wang and Mengyue Yang and Heng Ji and Michael Littman and Jun Wang and Shuicheng Yan and Philip Torr and Lei Bai},
      year={2025},
      eprint={2509.02547},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.02547}, 
}

Sec2.7 Agentic RL: Algorithms

Clip corresponds to preventing the policy ratio from moving too far from 1 for ensuring stable updates.
KL penalty corresponds to penalizing the KL divergence between the learned policy and the reference policy for ensuring alignment.

Method	Year	Objective Type	Clip	KL Penalty	Key Mechanism	Signal	Link	Resource
*PPO family*
PPO	2017	Policy gradient	Yes	No	Policy ratio clipping	Reward	Paper	-
VAPO	2025	Policy gradient	Yes	Adaptive	Adaptive KL penalty + variance control	Reward + variance signal	Paper	-
PF-PPO	2024	Policy gradient	Yes	Yes	Policy filtration	Noisy reward	Paper	Code
VinePPO	2024	Policy gradient	Yes	Yes	Unbiased value estimates	Reward	Paper	Code
PSGPO	2024	Policy gradient	Yes	Yes	Process supervision	Process Reward	Paper	-
*DPO family*
DPO	2024	Preference optimization	No	Yes	Implicit reward related to the policy	Human preference	Paper	-
β-DPO	2024	Preference optimization	No	Adaptive	Dynamic KL coefficient	Human preference	Paper	Code
SimPO	2024	Preference optimization	No	Scaled	Use avg log-prob of a sequence as implicit reward	Human preference	Paper	Code
IPO	2024	Implicit preference	No	No	LLMs as preference classifiers	Preference rank	Paper	-
KTO	2024	Knowledge transfer optimization	No	Yes	Teacher stabilization	Teacher-student logit	Paper	Code Model
ORPO	2024	Online regularized preference optimization	No	Yes	Online stabilization	Online feedback reward	Paper	Code Model
Step-DPO	2024	Preference optimization	No	Yes	Step-wise supervision	Step-wise preference	Paper	Code Model
LCPO	2025	Preference optimization	No	Yes	Length preference with limited data/training	Reward	Paper	-
*GRPO family*
GRPO	2025	Policy gradient under group-based reward	Yes	Yes	Group-based relative reward to eliminate value estimates	Group-based reward	Paper	-
DAPO	2025	Surrogate of GRPO's	Yes	Yes	Decoupled clip + dynamic sampling	Dynamic group-based reward	Paper	Code Model Website
GSPO	2025	Surrogate of GRPO's	Yes	Yes	Sequence-level clipping, rewarding, optimization	Smooth group-based reward	Paper	-
GMPO	2025	Surrogate of GRPO's	Yes	Yes	Geometric mean of token-level rewards	Margin-based reward	Paper	Code
ProRL	2025	Same as GRPO's	Yes	Yes	Reference policy reset	Group-based reward	Paper	Model
Posterior-GRPO	2025	Same as GRPO's	Yes	Yes	Reward only successful processes	Process-based reward	Paper	-
Dr.GRPO	2025	Unbiased GRPO objective	Yes	Yes	Eliminate bias in optimization	Group-based reward	Paper	Code Model
Step-GRPO	2025	Same as GRPO's	Yes	Yes	Rule-based reasoning rewards	Step-wise reward	Paper	Code Model
SRPO	2025	Same as GRPO's	Yes	Yes	Two-staged history-resampling	Reward	Paper	Model
GRESO	2025	Same as GRPO's	Yes	Yes	Pre-rollout filtering	Reward	Paper	Code Website
StarPO	2025	Same as GRPO's	Yes	Yes	Reasoning-guided actions for multi-turn interactions	Group-based reward	Paper	Code Website
GHPO	2025	Policy gradient	Yes	Yes	Adaptive prompt refinement	Reward	Paper	Code
Skywork R1V2	2025	GRPO with hybrid reward signal	Yes	Yes	Selective sample buffer	Multimodal reward	Paper	Code Model
ASPO	2025	GRPO with shaped advantage	Yes	Yes	Clipped bias to advantage	Group-based reward	Paper	Code Model
TreePo	2025	Same as GRPO's	Yes	Yes	Self-guided rollout, reduced compute burden	Group-based reward	Paper	Code Model Website
EDGE-GRPO	2025	Same as GRPO's	Yes	Yes	Entropy-driven advantage + error correction	Group-based reward	Paper	Code Model
DARS	2025	Same as GRPO's	Yes	No	Multi-stage rollout for hardest problems	Group-based reward	Paper	Code Model
CHORD	2025	Weighted GRPO + SFT	Yes	Yes	Auxiliary supervised loss	Group-based reward	Paper	Code
PAPO	2025	Surrogate of GRPO's	Yes	Yes	Implicit Perception Loss	Group-based reward	Paper	Code Model Website
Pass@k Training	2025	Same as GRPO's	Yes	Yes	Pass@k metric as reward	Group-based reward	Paper	Code

Sec4.1 Task: Search & Research Agent

Method	Category	Base LLM	Link	Resource
*Open Source Methods*
DeepRetrieval	External	Qwen2.5-3B-Instruct, Llama-3.2-3B-Instruct	Paper	Code
Search-R1	External	Qwen2.5-3B/7B-Base/Instruct	Paper	Code
R1-Searcher	External	Qwen2.5-7B, Llama3.1-8B-Instruct	Paper	Code
R1-Searcher++	External	Qwen2.5-7B-Instruct	Paper	Code
ReSearch	External	Qwen2.5-7B/32B-Instruct	Paper	Code
StepSearch	External	Qwen2.5-3B/7B-Base/Instruct	Paper	Code
WebDancer	External	Qwen2.5-7B/32B, QWQ-32B	Paper	Code
WebThinker	External	QwQ-32B, DeepSeek-R1-Distilled-Qwen-7B/14B/32B, Qwen2.5-32B-Instruct	Paper	Code
WebSailor	External	Qwen2.5-3B/7B/32B/72B	Paper	Code
WebWatcher	External	Qwen2.5-VL-7B/32B	Paper	Code
ASearcher	External	Qwen2.5-7B/14B, QwQ-32B	Paper	Code
ZeroSearch	Internal	Qwen2.5-3B/7B-Base/Instruct	Paper	Code
SSRL	Internal	Qwen2.5-1.5B/3B/7B/14B/32B/72B-Instruct, Llama-3.2-1B/8B-Instruct, Llama-3.1-8B/70B-Instruct, Qwen3-0.6B/1.7B/4B/8B/14B/32B	Paper	Code
*Closed Source Methods*
OpenAI Deep Research	External	OpenAI Models	Blog	Website
Perplexity’s DeepResearch	External	-	Blog	Website
Google Gemini’s DeepResearch	External	Gemini	Blog	Website
Kimi-Researcher	External	Kimi K2	Blog	Website
Grok AI DeepSearch	External	Grok3	Blog	Website
Doubao with Deep Think	External	Doubao	Blog	Website

Sec4.2 Task: Code Agent

Method	RL Reward Type	Base LLM	Link	Resource
*RL for Code Generation*
AceCoder	Outcome	Qwen2.5-Coder-7B-Base/Instruct, Qwen2.5-7B-Instruct	Paper	Code
DeepCoder-14B	Outcome	Deepseek-R1-Distilled-Qwen-14B	Blog	Code
RLTF	Outcome	CodeGen-NL 2.7B, CodeT5	Paper	Code
CURE	Outcome	Qwen2.5-7B/14B-Instruct, Qwen3-4B	Paper	Code
Absolute Zero	Outcome	Qwen2.5-7B/14B, Qwen2.5-Coder-3B/7B/14B, Llama-3.1-8B	Paper	Code
StepCoder	Process	DeepSeek-Coder-Instruct-6.7B	Paper	Code
Process Supervision-Guided PO	Process	-	Paper	-
CodeBoost	Process	Qwen2.5-Coder-7B-Instruct, Llama-3.1-8B-Instruct, Seed-Coder-8B-Instruct, Yi-Coder-9B-Chat	Paper	Code
PRLCoder	Process	CodeT5+, Unixcoder, T5-base	Paper	-
o1-Coder	Process	DeepSeek-1.3B-Instruct	Paper	Code
CodeFavor	Process	Mistral-NeMo-12B-Instruct, Gemma-2-9B-Instruct, Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3	Paper	Code
Focused-DPO	Process	DeepSeek-Coder-6.7B-Base/Instruct, Magicoder-S-DS-6.7B, Qwen2.5-Coder-7B-Instruct	Paper	-
*RL for Iterative Code Refinement*
RLEF	Outcome	Llama-3.0-8B-Instruct, Llama-3.1-8B/70B-Instruct	Paper	-
μCode	Outcome	Llama-3.2-1B/8B-Instruct	Paper	Code
R1-Code-Interpreter	Outcome	Qwen2.5-7B/14B-Instruct-1M, Qwen2.5-3B-Instruct	Paper	Code
IterPref	Process	Deepseek-Coder-7B-Instruct, Qwen2.5-Coder-7B, StarCoder2-15B	Paper	-
LeDex	Process	StarCoder-15B, CodeLlama-7B/13B	Paper	-
CTRL	Process	Qwen2.5-Coder-7B/14B/32B-Instruct	Paper	Code
ReVeal	Process	DAPO-Qwen-32B, Qwen2.5-32B-Instruc(not-working)	Paper	-
Posterior-GRPO	Process	Qwen2.5-Coder-3B/7B-Base, Qwen2.5-Math-7B	Paper	-
Policy Filtration for RLHF	Process	DeepSeek-Coder-6.7B, Qwen1.5-7B	Paper	Code
*RL for Automated Software Engineering (SWE)*
DeepSWE	Outcome	Qwen3-32B	Blog	Code
SWE-RL	Outcome	Llama-3.3-70B-Instruct	Paper	Code
Satori-SWE	Outcome	Qwen-2.5-Math-7B	Paper	Code
RLCoder	Outcome	CodeLlama7B, StartCoder-7B, StarCoder2-7B, DeepSeekCoder-1B/7B	Paper	Code
Qwen3-Coder	Outcome	-	Paper	Code
ML-Agent	Outcome	Qwen2.5-7B-Base/Instruct, DeepSeek-R1-Distill-Qwen-7B	Paper	Code
Golubev et al.	Process	Qwen2.5-72B-Instruct	Paper	-
SWEET-RL	Process	Llama-3.1-8B/70B-Instruct	Paper	Code

Sec4.3 Task: Mathematical Agent

Method	Reward	Link	Resource
*RL for Informal Mathematical Reasoning*
ARTIST	Outcome	Paper	-
ToRL	Outcome	Paper	Code Model
ZeroTIR	Outcome	Paper	Code Model
TTRL	Outcome	Paper	Code
RENT	Outcome	Paper	Code Website
Satori	Outcome	Paper	Code Model Website
1-shot RLVR	Outcome	Paper	Code Model
Prover-Verifier Games (legibility)	Outcome	Paper	-
rStar2-Agent	Outcome	Paper	Code
START	Process	Paper	-
LADDER	Process	Paper	-
SWiRL	Process	Paper	-
RLoT	Process	Paper	Code
*RL for Formal Mathematical Reasoning*
DeepSeek-Prover-v1.5	Outcome	Paper	Code Model
Leanabell-Prover	Outcome	Paper	Code Model
Kimina-Prover (Preview)	Outcome	Paper	Code Model
Seed-Prover	Outcome	Paper	Code
DeepSeek-Prover-v2	Process	Paper	Code Model
ProofNet++	Process	Paper	-
Leanabell-Prover-v2	Process	Paper	Code
*Hybrid*
InternLM2.5-StepProver	Hybrid	Paper	Code
Lean-STaR	Hybrid	Paper	Code Model Website
STP	Hybrid	Paper	Code Model

Sec4.4 Task: GUI Agent

Method	Paradigm	Environment	Link	Resource
*Non-RL GUI Agents*
MM-Navigator	Vanilla VLM	-	Paper	Code
SeeAct	Vanilla VLM	-	Paper	Code
TRISHUL	Vanilla VLM	-	Paper	-
InfiGUIAgent	SFT	-	Paper	Code Model Website
UI-AGILE	SFT	-	Paper	Code Model
TongUI	SFT	-	Paper	Code Model Website
*RL-based GUI Agents*
GUI-R1	RL	Static	Paper	Code Model
UI-R1	RL	Static	Paper	Code Model
InFiGUI-R1	RL	Static	Paper	Code Model
AgentCPM	RL	Static	Paper	Code Model
WebAgent-R1	RL	Interactive	Paper	-
Vattikonda et al.	RL	Interactive	Paper	-
UI-TARS	RL	Interactive	Paper	Code Model Website
DiGiRL	RL	Interactive	Paper	Code Model Website
ZeroGUI	RL	Interactive	Paper	Code
MobileGUI-RL	RL	Interactive	Paper	-

Sec4.5 Task: RL in Vision Agents

TO BE ADDED

Sec4.6 Task: RL in Embodied Agents

TO BE ADDED

Sec4.7 Task: RL in Multi-Agent Systems

“Dynamic” denotes whether the multi-agent system is task-dynamic, i.e., processes different task queries with different configurations (agent count, topologies, reasoning depth, prompts, etc).
“Train” denotes whether the method involves training the LLM backbone of agents.

Method	Dynamic	Train	RL Algorithm	Link	Resource
*RL-Free Multi-Agent Systems (not exhaustive)*
CAMEL	✗	✗	-	Paper	Code Model
MetaGPT	✗	✗	-	Paper	Code
MAD	✗	✗	-	Paper	Code
MoA	✗	✗	-	Paper	Code
AFlow	✗	✗	-	Paper	Code
*RL-Based Multi-Agent Training*
GPTSwarm	✗	✗	policy gradient	Paper	Code Website
MaAS	✓	✗	policy gradient	Paper	Code
G-Designer	✓	✗	policy gradient	Paper	Code
MALT	✗	✓	DPO	Paper	-
MARFT	✗	✓	MARFT	Paper	Code
MAPoRL	✓	✓	PPO	Paper	Code
MLPO	✓	✓	MLPO	Paper	-
ReMA	✓	✓	MAMRP	Paper	Code
FlowReasoner	✓	✓	GRPO	Paper	Code
LERO	✓	✓	MLPO	Paper	-
CURE	✗	✓	rule-based RL	Paper	Code Model
MMedAgent-RL	✗	✓	GRPO	Paper	-

Sec4.8. Task: Other Tasks

TO BE ADDED

Sec5.1 Environments

The agent capabilities are denoted by:
① Reasoning, ② Planning, ③ Tool Use, ④ Memory, ⑤ Collaboration, ⑥ Self-Improve.

Environment / Benchmark	Agent Capability	Task Domain	Modality	Link	Resource
LMRL-Gym	①, ④	Interaction	Text	Paper	Code
ALFWorld	②, ①	Embodied, Text Games	Text	Paper	Code Website
TextWorld	②, ①	Text Games	Text	Paper	Code
ScienceWorld	①, ②	Embodied, Science	Text	Paper	Code Website
AgentGym	①, ④	Text Games	Text	Paper	Code Website
Agentbench	①	General	Text, Visual	Paper	Code
InternBootcamp	①	General, Coding, Logic	Text	Paper	Code
LoCoMo	④	Interaction	Text	Paper	Code Website
MemoryAgentBench	④	Interaction	Text	Paper	Code
WebShop	②, ③	Web	Text	Paper	Code Website
Mind2Web	②, ③	Web	Text, Visual	Paper	Code Website
WebArena	②, ③	Web	Text	Paper	Code Website
VisualwebArena	①, ②, ③	Web	Text, Visual	Paper	Code Website
AppWorld	②, ③	App	Text	Paper	Code Website
AndroidWorld	②, ③	GUI, App	Text, Visual	Paper	Code
OSWorld	②, ③	GUI, OS	Text, Visual	Paper	Code Website
Debug-Gym	①, ③	SWE	Text	Paper	Code Website
MLE-Dojo	②, ①	MLE	Text	Paper	Code Website
τ-bench	①, ③	SWE	Text	Paper	Code
TheAgentCompany	②, ③, ⑤	SWE	Text	Paper	Code Website
MedAgentGym	①	Science	Text	Paper	Code
SecRepoBench	①, ③	Coding, Security	Text	Paper	-
R2E-Gym	①, ②	SWE	Text	Paper	Code Website
HumanEval	①	Coding	Text	Paper	Code
MBPP	①	Coding	Text	Paper	Code
BigCodeBench	①	Coding	Text	Paper	Code Website
LiveCodeBench	①	Coding	Text	Paper	Code Website
SWE-bench	①, ③	SWE	Text	Paper	Code Website
SWE-rebench	①, ③	SWE	Text	Paper	Website
DevBench	②, ①	SWE	Text	Paper	Code
ProjectEval	②, ①	SWE	Text	Paper	Code Website
DA-Code	①, ③	Data Science, SWE	Text	Paper	Code Website
ColBench	②, ①, ③	SWE, Web Dev	Text	Paper	Code Website
NoCode-bench	②, ①	SWE	Text	Paper	Code Website
MLE-Bench	②, ①, ③	MLE	Text	Paper	Code Website
PaperBench	②, ①, ③	MLE	Text	Paper	Code Website
Crafter	②, ④	Game	Visual	Paper	Code Website
Craftax	②, ④	Game	Visual	Paper	Code
ELLM (Crafter variant)	②, ①	Game	Visual	Paper	Code Website
SMAC / SMAC-Exp	⑤, ②	Game	Visual	Paper	Code
Factorio	②, ①	Game	Visual	Paper	Code Website

Sec5.2 Frameworks

Framework	Type	Key Features	Link	Resource
*Agentic RL Frameworks*
Verifiers	Agent RL / LLM RL	Verifiable environment setup	-	Code
SkyRL-v0/v0.1	Agent RL	Long-horizon real-world training	Blog (v0) Blog (v0.1)	Code
AREAL	Agent RL / LLM RL	Asynchronous training	Paper	Code
MARTI	Multi-agent RL / LLM RL	Integrated multi-agent training	-	Code
EasyR1	Agent RL / LLM RL	Multimodal support	-	Code
AgentFly	Agent RL	Scalable asynchronous execution	Paper	Code
Agent Lightning	Agent RL	Decoupled hierarchical RL	Paper	Code
*RLHF and LLM Fine-tuning Frameworks*
OpenRLHF	RLHF / LLM RL	High-performance scalable RLHF	Paper	Code
TRL	RLHF / LLM RL	Hugging Face RLHF	-	Code
trlX	RLHF / LLM RL	Distributed large-model RLHF	Paper	Code
HybridFlow	RLHF / LLM RL	Streamlined experiment management	Paper	Code
SLiMe	RLHF / LLM RL	High-performance async RL	-	Code
*General-purpose RL Frameworks*
RLlib	General RL / Multi-agent RL	Production-grade scalable library	Paper	Code
Acme	General RL	Modular distributed components	Paper	Code
Tianshou	General RL	High-performance PyTorch platform	Paper	Code
Stable Baselines3	General RL	Reliable PyTorch algorithms	Paper	Code
PFRL	General RL	Benchmarked prototyping algorithms	Paper	Code

For Tasks:

Click tags to check more tools for each tasks

search papers generate code solve mathematical reasoning design gui interfaces train multi-agent systems

For Jobs:

research scientist machine learning engineer data scientist ai researcher software developer

Alternative AI tools for Awesome-AgenticLLM-RL-Papers

Similar Open Source Tools

Awesome-AgenticLLM-RL-Papers

github

: 245

Awesome-AGI

Awesome-AGI is a curated list of resources related to Artificial General Intelligence (AGI), including models, pipelines, applications, and concepts. It provides a comprehensive overview of the current state of AGI research and development, covering various aspects such as model training, fine-tuning, deployment, and applications in different domains. The repository also includes resources on prompt engineering, RLHF, LLM vocabulary expansion, long text generation, hallucination mitigation, controllability and safety, and text detection. It serves as a valuable resource for researchers, practitioners, and anyone interested in the field of AGI.

github

: 424

ML-AI-2-LT

ML-AI-2-LT is a repository that serves as a glossary for machine learning and deep learning concepts. It contains translations and explanations of various terms related to artificial intelligence, including definitions and notes. Users can contribute by filling issues for unclear concepts or by submitting pull requests with suggestions or additions. The repository aims to provide a comprehensive resource for understanding key terminology in the field of AI and machine learning.

github

: 52

LLM-for-Healthcare

The repository 'LLM-for-Healthcare' provides a comprehensive survey of large language models (LLMs) for healthcare, covering data, technology, applications, and accountability and ethics. It includes information on various LLM models, training data, evaluation methods, and computation costs. The repository also discusses tasks such as NER, text classification, question answering, dialogue systems, and generation of medical reports from images in the healthcare domain.

github

: 96

kumo-search

Kumo search is an end-to-end search engine framework that supports full-text search, inverted index, forward index, sorting, caching, hierarchical indexing, intervention system, feature collection, offline computation, storage system, and more. It runs on the EA (Elastic automic infrastructure architecture) platform, enabling engineering automation, service governance, real-time data, service degradation, and disaster recovery across multiple data centers and clusters. The framework aims to provide a ready-to-use search engine framework to help users quickly build their own search engines. Users can write business logic in Python using the AOT compiler in the project, which generates C++ code and binary dynamic libraries for rapid iteration of the search engine.

github

: 248

PaddleScience

PaddleScience is a scientific computing suite developed based on the deep learning framework PaddlePaddle. It utilizes the learning ability of deep neural networks and the automatic (higher-order) differentiation mechanism of PaddlePaddle to solve problems in physics, chemistry, meteorology, and other fields. It supports three solving methods: physics mechanism-driven, data-driven, and mathematical fusion, and provides basic APIs and detailed documentation for users to use and further develop.

github

: 305

BlossomLM

BlossomLM is a series of open-source conversational large language models. This project aims to provide a high-quality general-purpose SFT dataset in both Chinese and English, making fine-tuning accessible while also providing pre-trained model weights. **Hint**: BlossomLM is a personal non-commercial project.

github

: 55

Awesome-AISourceHub

Awesome-AISourceHub is a repository that collects high-quality information sources in the field of AI technology. It serves as a synchronized source of information to avoid information gaps and information silos. The repository aims to provide valuable resources for individuals such as AI book authors, enterprise decision-makers, and tool developers who frequently use Twitter to share insights and updates related to AI advancements. The platform emphasizes the importance of accessing information closer to the source for better quality content. Users can contribute their own high-quality information sources to the repository by following specific steps outlined in the contribution guidelines. The repository covers various platforms such as Twitter, public accounts, knowledge planets, podcasts, blogs, websites, YouTube channels, and more, offering a comprehensive collection of AI-related resources for individuals interested in staying updated with the latest trends and developments in the AI field.

github

: 679

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

Cool-GenAI-Fashion-Papers

Cool-GenAI-Fashion-Papers is a curated list of resources related to GenAI-Fashion, including papers, workshops, companies, and products. It covers a wide range of topics such as fashion design synthesis, outfit recommendation, fashion knowledge extraction, trend analysis, and more. The repository provides valuable insights and resources for researchers, industry professionals, and enthusiasts interested in the intersection of AI and fashion.

github

: 129

linktre-tools

The 'linktre-tools' repository is a collection of tools and resources for independent developers, AI products, cross-border e-commerce, and self-media office assistance. It aims to provide a curated list of tools and products in these areas. Users are encouraged to contribute by submitting pull requests and raising issues for continuous updates. The repository covers a wide range of topics including AI tools, independent development tools, popular AI products, tools for web development, online tools, media operations, and cross-border e-commerce resources.

github

: 338

factualNLG

FactualNLG is a tool designed to analyze the consistency of edits in summaries. It includes a benchmark with various LLM models, data release for the SummEdits benchmark, explanation analysis for identifying inconsistent summaries, and prompts used in experiments.

github

: 59

MobileLLM

This repository contains the training code of MobileLLM, a language model optimized for on-device use cases with fewer than a billion parameters. It integrates SwiGLU activation function, deep and thin architectures, embedding sharing, and grouped-query attention to achieve high-quality LLMs. MobileLLM-125M/350M shows significant accuracy improvements over previous models on zero-shot commonsense reasoning tasks. The design philosophy scales effectively to larger models, with state-of-the-art results for MobileLLM-600M/1B/1.5B.

github

: 917

DeepSparkHub

DeepSparkHub is a repository that curates hundreds of application algorithms and models covering various fields in AI and general computing. It supports mainstream intelligent computing scenarios in markets such as smart cities, digital individuals, healthcare, education, communication, energy, and more. The repository provides a wide range of models for tasks such as computer vision, face detection, face recognition, instance segmentation, image generation, knowledge distillation, network pruning, object detection, 3D object detection, OCR, pose estimation, self-supervised learning, semantic segmentation, super resolution, tracking, traffic forecast, GNN, HPC, methodology, multimodal, NLP, recommendation, reinforcement learning, speech recognition, speech synthesis, and 3D reconstruction.

github

: 67

Awesome-LLM4IE-Papers

github

: 645

awesome-pretrained-chinese-nlp-models

github

: 5.2k

For similar tasks

Awesome-AgenticLLM-RL-Papers

github

: 245

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

onnxruntime-genai

ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

github

: 831

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 6.1k

generative-ai-python

The Google AI Python SDK is the easiest way for Python developers to build with the Gemini API. The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, and code.

github

: 859

jetson-generative-ai-playground

This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.

github

: 94

chat-ui

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.

github

: 9.2k

MetaGPT

MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

github

: 51.4k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.1k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675