verl

verl: Volcano Engine Reinforcement Learning for LLMs

Stars: 19177

Visit

verl is a flexible and efficient RL training library for large language models (LLMs). It offers easy extension of diverse RL algorithms, seamless integration with existing LLM infra, flexible device mapping, and integration with popular Hugging Face models. The library provides state-of-the-art throughput, efficient actor model resharding, and supports various RL algorithms like PPO, GRPO, and more. It also supports model-based and function-based rewards for tasks like math and coding, vision-language models, and multi-modal RL. verl is used for tasks like training large language models, reasoning tasks, reinforcement learning with diverse algorithms, and multi-modal RL.

README:

👋 Hi, everyone! verl is a RL training library initiated by ByteDance Seed team and maintained by the verl community.

verl: Volcano Engine Reinforcement Learning for LLMs

verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).

verl is the open-source version of HybridFlow: A Flexible and Efficient RLHF Framework paper.

verl is flexible and easy to use with:

Easy extension of diverse RL algorithms: The hybrid-controller programming model enables flexible representation and efficient execution of complex post-training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc
Flexible device mapping: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
Ready integration with popular HuggingFace models

verl is fast with:

State-of-the-art throughput: SOTA LLM training and inference engine integrations and SOTA RL throughput.
Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

News

[2026/01] verl has been migrated to the verl-project
[2026/01] verl first meetup was successfully held in Shanghai on 01/10, hosted by Volcengine and NVIDIA, the slides has been uploaded to verl-data.
[2026/01] The recipe directory has been migrated to a dedicated repository: verl-recipe and added as a submodule. See https://github.com/volcengine/verl/pull/4795. It can be used as it was after git submodule update --init --recursive recipe. Note that transfer_queue, fully_async_policy, one_step_off_policy and vla are kept under verl/experimental since they are planned to be merged into the main library. Use them through verl.experimental.{module}.
[2025/12] Mind Lab successfully used verl and Megatron-bridge to train GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog.
[2025/10] verl is presented in the PyTorch Conference 2025.
[2025/08] verl is presented in the PyTorch Expert Exchange Webinar. Slides available.
[2025/07] The ReTool recipe is fully open sourced. Blog
[2025/07] The first verl meetup will be held at ICML Vancouver on July 16th! Please join us if you are at ICML! (onsite only)
[2025/06] verl with Megatron backend enables large MoE models such as DeepSeek-671B and Qwen3-235B.
[2025/03] DAPO is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is available in recipe/dapo now.

more...

[2025/04] [Seed-Thinking-v1.5](https://github.com/ByteDance-Seed/Seed-Thinking-v1.5/blob/main/seed-thinking-v1.5.pdf) tech report is released! Trained with verl, Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains.
[2025/07] verl keynote at [AWS AI Hours Singapore](https://pages.awscloud.com/aws-ai-hours-sg.html#agenda) on 7/8, verl & verl-agent project updates at [Agent for SWE meetup](https://lu.ma/e498qhsi) by LF AI & Data Singapore on 7/11.
[2025/06] verl team will provide latest project updates at [PyTorch Day China](https://www.lfasiallc.com/pytorch-day-china/) on June 7th. Meet our dev team in Beijing!
[2025/04] [VAPO](https://arxiv.org/pdf/2504.05118) (value-based augmented PPO) paper covers our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DAPO-32B.
[2025/05] [PF-PPO](https://arxiv.org/abs/2409.06957), accepted to ICML 2025, is now supported in verl! PF-PPO enhances policy learning efficiency and robustness by filtering potentially noisy reward signals and reusing high-quality experiences via a replay buffer.
[2025/04] We will give a tutorial about latest post-training techniques and programming guide for verl at [ICLR 2025 Expo](https://iclr.cc/virtual/2025/calendar?filter_events=Expo+Talk+Panel&filter_rooms=), [SCI-FM workshop](https://open-foundation-model.github.io/) and [LMSys afterparty](https://lu.ma/d23nyynm). Talk materials available [here](https://github.com/eric-haibin-lin/verl-community/tree/main/iclr25).
[2025/03] verl v0.3.0.post1 is released! See [release note](https://github.com/volcengine/verl/releases/) for details. It achieves [~1.4x speedup](https://tongyx361.github.io/blogs/posts/verl-intro/#/verl-flexible-and-efficient-rl-for-llms) compared to prev versions.
[2025/05] verl will be presented at [A2M Shanghai](https://a2m.msup.com.cn/home/?aid=4488&city=shanghai) on 5/16 - 5/17.
[2025/05] verl will be presented at [GOSIM x PyTorch Day 2025](https://paris2025.gosim.org/). See you in Paris!
[2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [SGLang-LMSYS Org Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid-March.
[2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!
[2025/02] verl v0.2.0.post2 is released!
[2025/02] We presented verl in the Bytedance/NVIDIA/Anyscale Ray Meetup. See you in San Jose!
[2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).
[2024/12] verl is presented at Ray Forward 2024. Slides available here
[2024/12] The team presented Post-training LLMs: From Algorithms to Infrastructure at NeurIPS 2024. Slides and video available.
[2024/10] verl is presented at Ray Summit. Youtube video available.
[2024/08] HybridFlow (verl) is accepted to EuroSys 2025.

Key Features

FSDP, FSDP2 and Megatron-LM for training.
vLLM, SGLang and HF Transformers for rollout generation.
Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-3, Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
Supervised fine-tuning.
Reinforcement learning with PPO, GRPO, GSPO, ReMax, REINFORCE++, RLOO, PRIME, DAPO, DrGRPO, KL_Cov & Clip_Cov etc.
- Support model-based reward and function-based reward (verifiable reward) for math, coding, etc
- Support vision-language models (VLMs) and multi-modal RL with Qwen2.5-vl, Kimi-VL
- Multi-turn with tool calling
LLM alignment recipes such as Self-play preference optimization (SPPO)
Flash attention 2, sequence packing, sequence parallelism support via DeepSpeed Ulysses, LoRA, Liger-kernel.
Scales up to 671B models and hundreds of GPUs with expert parallelism
Multi-gpu LoRA RL support to save memory.
Experiment tracking with wandb, swanlab, mlflow and tensorboard.
Hardware Support: Supports NVIDIA, AMD, Ascend

Upcoming Features and Changes

Q3 Roadmap https://github.com/volcengine/verl/issues/2388
DeepSeek 671b optimizations with Megatron https://github.com/volcengine/verl/issues/1033
Multi-turn rollout and tools using optimizations https://github.com/volcengine/verl/issues/1882
Agent integration
Async and off-policy architecture https://github.com/volcengine/verl/pull/2231
List of breaking changes since v0.4 https://github.com/volcengine/verl/discussions/2270

Getting Started

Documentation

Quickstart:

Installation
Quickstart
Programming Guide & Tech Talk (in Chinese)
PPO in verl
GRPO in verl

Running a PPO example step-by-step:

Reproducible algorithm baselines:

RL performance on coding, math

For code explanation and advance usage (extension):

PPO Trainer and Workers
Advanced Usage and Extension

Blogs from the community

Performance Tuning Guide

The performance is essential for on-policy RL algorithm. We have written a detailed performance tuning guide to help you optimize performance.

Upgrade to vLLM >= v0.8.2

verl now supports vLLM>=0.8.2 when using FSDP as the training backend. Please refer to this document for the installation guide and more information. Please avoid vllm 0.7.x, which contains bugs that may lead to OOMs and unexpected errors.

Use Latest SGLang

SGLang is fully supported with verl, and SGLang RL Group is working extensively on building unique features, including multi-turn agentic RL, VLM RLHF, server-based RL, and partial rollout. Please refer to this document for the installation guide and more information.

Upgrade to FSDP2

verl is fully embracing FSDP2! FSDP2 is recommended by torch distributed team, providing better throughput and memory usage, and is composible with other features (e.g. torch.compile). To enable FSDP2, simply use verl main and set the following options:

actor_rollout_ref.ref.strategy=fsdp2
actor_rollout_ref.actor.strategy=fsdp2
critic.strategy=fsdp2

Furthermore, FSDP2 cpu offloading is compatible with gradient accumulation. You can turn it on to save memory with actor_rollout_ref.actor.fsdp_config.offload_policy=True. For more details, see https://github.com/volcengine/verl/pull/1026

AMD Support (ROCm Kernel)

verl now supports FSDP as the training engine (Megatron support coming soon) and both integrates with vLLM and SGLang as inference engines. Please refer to this document for the installation guide and more information, and this document for the vLLM performance tuning for ROCm.

Citation and acknowledgement

If you find the project helpful, please cite:

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and contributed by Bytedance, Anyscale, LMSys.org, Alibaba Qwen team, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, ke.com, All Hands AI, ModelBest, JD AI Lab, Microsoft Research, StepFun, Amazon, LinkedIn, Meituan, Camel-AI, OpenManus, Xiaomi, NVIDIA research, Baichuan, RedNote, SwissAI, Moonshot AI (Kimi), Baidu, Snowflake, Skywork.ai, JetBrains, IceSword Lab, and many more.

Awesome Projects Built with `verl`

Welcome to register your awesome project build with verl for other developers' reference!

TinyZero: a reproduction of DeepSeek R1 Zero recipe for reasoning tasks
SkyThought: RL training for Sky-T1-7B by NovaSky AI team.
simpleRL-reason: SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Easy-R1: Multi-modal RL training framework
OpenManus-RL: LLM Agents RL tuning framework for multiple agent environments.
rllm: async RL training with verl-pipeline
RAGEN: a general-purpose reasoning agent training framework
Search-R1: RL with reasoning and searching (tool-call) interleaved LLMs
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Skywork-OR1: Skywork open reaonser series
ToRL: Scaling tool-integrated RL
Absolute Zero Reasoner: A no human curated data self-play framework for reasoning
verl-agent: A scalable training framework for long-horizon LLM/VLM agents, along with a new algorithm GiGPO
RL-Factory: An easy and efficient RL post-training framework for Agentic Learning
ReTool: ReTool: reinforcement learning for strategic tool use in LLMs. Code release is in progress...
verl-tool: An unified and easy-to-extend tool-agent training framework based on verl
PRIME: Process reinforcement through implicit rewards
MemAgent: MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent
POLARIS: A Post-training recipe for scaling RL on Advanced Reasoning models
GUI-R1: GUI-R1: A Generalist R1-style Vision-Language Action Model For GUI Agents
DeepRetrieval: RL Training of Search Agent with Search/Retrieval Outcome
Code-R1: Reproducing R1 for Code with Reliable Rewards
DeepResearcher: Scaling deep research via reinforcement learning in real-world environments
VAGEN: Training VLM agents with multi-turn reinforcement learning
RM-R1: RL training of reasoning reward models
LUFFY: Learning to Reason under Off-Policy Guidance
DeepMath: DeepMath-103K data and series models for math reasoning
PACS: Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Entropy Mechanism of RL: The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning
LLaSA-TTS-GRPO: TTS fine-tuning with GRPO optimization based on LLASA models
PF-PPO: Policy Filtration for PPO based on the reliability of reward signals for more efficient and robust RLHF.
RACRO: Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning
Agent Lightning: A flexible and extensible framework that enables seamless agent optimization for any existing agent framework.
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use.
Kimina-Prover-RL: Training pipeline for formal theorem proving, based on a paradigm inspired by DeepSeek-R1.
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization.
rStar2-Agent: Using reinforcement learning with multi-step tool-calling for math tasks, rStar2-Agent-14B reaches frontier-level math reasoning in just 510 RL training steps
Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition
SimpleVLA-RL: SimpleVLA-RL: A Simple yet Effective Vision-Language Action Model for Reinforcement Learning
Table-R1: Table-R1: Inference-Time Scaling for Table Reasoning
Revisual-R1: Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
ARES: ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Meta-Bandit-LLM: Meta-Bandit-LLM: Long-horizon multiturn interactive training for meta-bandit agents
PokeeResearch: PokeeResearch: State-of-the-art 7B DeepResearch Agent that leverages web search and content reading capabilities to answer complex questions using the most up-to-date information available online.
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
OneThinker: All-in-one Reasoning Model for Image and Video
OpenTinker: Democratizing Agentic Reinforcement Learning as a Service
FlowRL: Matching reward distributions via flow balance for diverse exploration and generalizable reasoning
Logic-RL: a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
Seed-Coder: RL training of Seed-Coder boosts performance on competitive programming
all-hands/openhands-lm-32b-v0.1: A strong, open coding agent model, trained with multi-turn fine-tuning
s3 Efficient Yet Effective Search Agent Training via RL
Rec-R1: Bridging Generative Large Language Models and Recommendation Systems via Reinforcement Learning
Explore RL Data Scaling: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
FIRE: Flaming-hot initiation with regular execution sampling for large language models
DQO: Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
cognition-engineering: Test time scaling drives cognition engineering.
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning.
AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
critic-rl: LLM critics for code generation
self-rewarding-reasoning-LLM: self-rewarding and correction with generative reward models
DeepEnlighten: Reproduce R1 with social reasoning tasks and analyze key findings
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
PURE: Credit assignment is the key to successful reinforcement fine-tuning using process reward model
cognitive-behaviors: Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
deepscaler: iterative context scaling with GRPO
DAPO: the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
SPEAR: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning (ICLR 2026)
RuleReasoner: RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling (ICLR 2026)

Contribution Guide

See contributions guide

About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. You can get to know Bytedance Seed better through the following channels👇

We are HIRING! Send us an email if you are interested in internship/FTE opportunities in RL for agents.

For Tasks:

Click tags to check more tools for each tasks

train language models reasoning tasks reinforcement learning multi-modal rl model-based rewards

For Jobs:

machine learning engineer ai researcher data scientist research scientist software engineer

Alternative AI tools for verl

Similar Open Source Tools

verl

github

: 19.2k

offensive-ai-compilation

github

: 1.2k

db2rest

DB2Rest is a modern low-code REST DATA API platform that simplifies the development of intelligent applications. It seamlessly integrates existing and new databases with language models (LMs/LLMs) and vector stores, enabling the rapid delivery of context-aware, reasoning applications without vendor lock-in.

github

: 167

LMCache

LMCache is a serving engine extension designed to reduce time to first token (TTFT) and increase throughput, particularly in long-context scenarios. It stores key-value caches of reusable texts across different locations like GPU, CPU DRAM, and Local Disk, allowing the reuse of any text in any serving engine instance. By combining LMCache with vLLM, significant delay savings and GPU cycle reduction are achieved in various large language model (LLM) use cases, such as multi-round question answering and retrieval-augmented generation (RAG). LMCache provides integration with the latest vLLM version, offering both online serving and offline inference capabilities. It supports sharing key-value caches across multiple vLLM instances and aims to provide stable support for non-prefix key-value caches along with user and developer documentation.

github

: 6.9k

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable by co-designing the frontend language and the runtime system. The core features of SGLang include: - **A Flexible Front-End Language**: This allows for easy programming of LLM applications with multiple chained generation calls, advanced prompting techniques, control flow, multiple modalities, parallelism, and external interaction. - **A High-Performance Runtime with RadixAttention**: This feature significantly accelerates the execution of complex LLM programs by automatic KV cache reuse across multiple calls. It also supports other common techniques like continuous batching and tensor parallelism.

github

: 23.5k

big-AGI

big-AGI is an AI suite designed for professionals seeking function, form, simplicity, and speed. It offers best-in-class Chats, Beams, and Calls with AI personas, visualizations, coding, drawing, side-by-side chatting, and more, all wrapped in a polished UX. The tool is powered by the latest models from 12 vendors and open-source servers, providing users with advanced AI capabilities and a seamless user experience. With continuous updates and enhancements, big-AGI aims to stay ahead of the curve in the AI landscape, catering to the needs of both developers and AI enthusiasts.

github

: 6.3k

local-deep-research

Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

github

: 4.0k

Awesome-AI-Agents

Awesome-AI-Agents is a curated list of projects, frameworks, benchmarks, platforms, and related resources focused on autonomous AI agents powered by Large Language Models (LLMs). The repository showcases a wide range of applications, multi-agent task solver projects, agent society simulations, and advanced components for building and customizing AI agents. It also includes frameworks for orchestrating role-playing, evaluating LLM-as-Agent performance, and connecting LLMs with real-world applications through platforms and APIs. Additionally, the repository features surveys, paper lists, and blogs related to LLM-based autonomous agents, making it a valuable resource for researchers, developers, and enthusiasts in the field of AI.

github

: 526

deepchecks

Deepchecks is a holistic open-source solution for AI & ML validation needs, enabling thorough testing of data and models from research to production. It includes components for testing, CI & testing management, and monitoring. Users can install and use Deepchecks for testing and monitoring their AI models, with customizable checks and suites for tabular, NLP, and computer vision data. The tool provides visual reports, pythonic/json output for processing, and a dynamic UI for collaboration and monitoring. Deepchecks is open source, with premium features available under a commercial license for monitoring components.

github

: 3.6k

LLM-Powered-RAG-System

LLM-Powered-RAG-System is a comprehensive repository containing frameworks, projects, components, evaluation tools, papers, blogs, and other resources related to Retrieval-Augmented Generation (RAG) systems powered by Large Language Models (LLMs). The repository includes various frameworks for building applications with LLMs, data frameworks, modular graph-based RAG systems, dense retrieval models, and efficient retrieval augmentation and generation frameworks. It also features projects such as personal productivity assistants, knowledge-based platforms, chatbots, question and answer systems, and code assistants. Additionally, the repository provides components for interacting with documents, databases, and optimization methods using ML and LLM technologies. Evaluation frameworks, papers, blogs, and other resources related to RAG systems are also included.

github

: 162

denser-retriever

Denser Retriever is an enterprise-grade AI retriever designed to streamline AI integration into applications, combining keyword-based searches, vector databases, and machine learning rerankers using xgboost. It provides state-of-the-art accuracy on MTEB Retrieval benchmarking and supports various heterogeneous retrievers for end-to-end applications like chatbots and semantic search.

github

: 119

PPTAgent

PPTAgent is an innovative system that automatically generates presentations from documents. It employs a two-step process for quality assurance and introduces PPTEval for comprehensive evaluation. With dynamic content generation, smart reference learning, and quality assessment, PPTAgent aims to streamline presentation creation. The tool follows an analysis phase to learn from reference presentations and a generation phase to develop structured outlines and cohesive slides. PPTEval evaluates presentations based on content accuracy, visual appeal, and logical coherence.

github

: 3.3k

lancedb

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering, and management of embeddings. The key features of LanceDB include: Production-scale vector search with no servers to manage. Store, query, and filter vectors, metadata, and multi-modal data (text, images, videos, point clouds, and more). Support for vector similarity search, full-text search, and SQL. Native Python and Javascript/Typescript support. Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index(*). Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB, and more on the way. LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

github

: 7.6k

pyspur

PySpur is a graph-based editor designed for LLM (Large Language Models) workflows. It offers modular building blocks, node-level debugging, and performance evaluation. The tool is easy to hack, supports JSON configs for workflow graphs, and is lightweight with minimal dependencies. Users can quickly set up PySpur by cloning the repository, creating a .env file, starting docker services, and accessing the portal. PySpur can also work with local models served using Ollama, with steps provided for configuration. The roadmap includes features like canvas, async/batch execution, support for Ollama, new nodes, pipeline optimization, templates, code compilation, multimodal support, and more.

github

: 4.2k

xtuner

XTuner is an efficient, flexible, and full-featured toolkit for fine-tuning large models. It supports various LLMs (InternLM, Mixtral-8x7B, Llama 2, ChatGLM, Qwen, Baichuan, ...), VLMs (LLaVA), and various training algorithms (QLoRA, LoRA, full-parameter fine-tune). XTuner also provides tools for chatting with pretrained / fine-tuned LLMs and deploying fine-tuned LLMs with any other framework, such as LMDeploy.

github

: 4.8k

biochatter

Generative AI models have shown tremendous usefulness in increasing accessibility and automation of a wide range of tasks. This repository contains the `biochatter` Python package, a generic backend library for the connection of biomedical applications to conversational AI. It aims to provide a common framework for deploying, testing, and evaluating diverse models and auxiliary technologies in the biomedical domain. BioChatter is part of the BioCypher ecosystem, connecting natively to BioCypher knowledge graphs.

github

: 135

For similar tasks

verl

github

: 19.2k

llm-verified-with-monte-carlo-tree-search

This prototype synthesizes verified code with an LLM using Monte Carlo Tree Search (MCTS). It explores the space of possible generation of a verified program and checks at every step that it's on the right track by calling the verifier. This prototype uses Dafny, Coq, Lean, Scala, or Rust. By using this technique, weaker models that might not even know the generated language all that well can compete with stronger models.

github

: 270

flashinfer

FlashInfer is a library for Language Languages Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-the-art performance across diverse scenarios.

github

: 5.0k

dolma

Dolma is a dataset and toolkit for curating large datasets for (pre)-training ML models. The dataset consists of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. The toolkit provides high-performance, portable, and extensible tools for processing, tagging, and deduplicating documents. Key features of the toolkit include built-in taggers, fast deduplication, and cloud support.

github

: 1.0k

web-llm

WebLLM is a modular and customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. WebLLM is fully compatible with OpenAI API. That is, you can use the same OpenAI API on any open source models locally, with functionalities including json-mode, function-calling, streaming, etc. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.

github

: 17.3k

nlp-llms-resources

The 'nlp-llms-resources' repository is a comprehensive resource list for Natural Language Processing (NLP) and Large Language Models (LLMs). It covers a wide range of topics including traditional NLP datasets, data acquisition, libraries for NLP, neural networks, sentiment analysis, optical character recognition, information extraction, semantics, topic modeling, multilingual NLP, domain-specific LLMs, vector databases, ethics, costing, books, courses, surveys, aggregators, newsletters, papers, conferences, and societies. The repository provides valuable information and resources for individuals interested in NLP and LLMs.

github

: 82

h2o-llmstudio

H2O LLM Studio is a framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs). With H2O LLM Studio, you can easily and effectively fine-tune LLMs without the need for any coding experience. The GUI is specially designed for large language models, and you can finetune any LLM using a large variety of hyperparameters. You can also use recent finetuning techniques such as Low-Rank Adaptation (LoRA) and 8-bit model training with a low memory footprint. Additionally, you can use Reinforcement Learning (RL) to finetune your model (experimental), use advanced evaluation metrics to judge generated answers by the model, track and compare your model performance visually, and easily export your model to the Hugging Face Hub and share it with the community.

github

: 4.1k

KULLM

KULLM (구름) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8×A100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.

github

: 527

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

verl

README:

verl: Volcano Engine Reinforcement Learning for LLMs

News

Key Features

Upcoming Features and Changes

Getting Started

Performance Tuning Guide

Upgrade to vLLM >= v0.8.2

Use Latest SGLang

Upgrade to FSDP2

AMD Support (ROCm Kernel)

Citation and acknowledgement

Awesome Projects Built with verl

Contribution Guide

About ByteDance Seed Team

For Tasks:

For Jobs:

Alternative AI tools for verl

Similar Open Source Tools

verl

offensive-ai-compilation

db2rest

LMCache

sglang

big-AGI

local-deep-research

Awesome-AI-Agents

deepchecks

LLM-Powered-RAG-System

denser-retriever

PPTAgent

lancedb

pyspur

xtuner

biochatter

For similar tasks

verl

llm-verified-with-monte-carlo-tree-search

flashinfer

dolma

web-llm

nlp-llms-resources

h2o-llmstudio

KULLM

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick

Awesome Projects Built with `verl`