ai-infra-learning

ai-infra-learning

This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.

Stars: 135

Visit
 screenshot

AI Infra Learning is a repository focused on providing resources and materials for learning about various topics related to artificial intelligence infrastructure. The repository includes documentation, papers, videos, and blog posts covering different aspects of AI infrastructure, such as large language models, memory management, decoding techniques, and text generation. Users can access a wide range of materials to deepen their understanding of AI infrastructure and improve their skills in this field.

README:

AI Infra 学习会议

主题 时间 预习资料 录频 文档 问题反馈 & 课后思考题
vLLM Quickstart 2025-05-11 Doc: vLLM AI INFRA 学习 01 - LLM 全景图介绍/vLLM 快速入门 01-vllm-quickstart
PagedAttention 2025-05-25 Blog: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Paper: Efficient Memory Management for Large Language Model Serving with PagedAttention

Video: Fast LLM Serving with vLLM and PagedAttention
AI INFRA 学习 02 - vLLM PagedAttention 论文精读 02-pagedattention 02-PagedAttention 问题反馈
Prefix Caching 2025-06-08 Doc: Automatic Prefix Caching

Design Doc: Automatic Prefix Caching

Paper: SGLang: Efficient Execution of Structured Language Model Programs
AI INFRA 学习 03 - Prefix Caching 原理详解 03-prefix-caching
Speculative Decoding 2025-06-22 Doc: Speculative Decoding

Blog: How Speculative Decoding Boosts vLLM Performance by up to 2.8x

Video: Hacker's Guide to Speculative Decoding in VLLM

Video: Speculative Decoding in vLLM

Paper: Accelerating Large Language Model Decoding with Speculative Sampling

Paper: Fast Inference from Transformers via Speculative Decoding
AI INFRA 学习 04 - Speculative Decoding 实现方案 04-speculative-decoding
Chunked-Prefills 2025-07-13 Doc: vLLM Chunked Prefill

Paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Paper: DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
AI INFRA 学习 05 - Chunked-Prefills 分块预填充 05-chunked-prefills 05-Chunked-Prefills 问题反馈 & 课后思考题
Disaggregating Prefill and Decoding 2025-09-21 Doc: Disaggregated Prefilling

Doc: vLLM Production Stack Disaggregated Prefill

Paper: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Paper: Splitwise: Efficient generative LLM inference using phase splitting

Video: vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM
AI INFRA 学习 06 - PD 分离推理架构详解 06-disaggregating-prefill-and-decoding 06-PD 分离问题反馈
LoRA Adapters Doc: LoRA Adapters
Paper: LoRA: Low-Rank Adaptation of Large Language Models
Quantization
Distributed Inference and Serving Doc: Distributed Inference and Serving

交流群(加群请备注来意)

微信公众号

搜索框传播样式-白色版

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ai-infra-learning

Similar Open Source Tools

For similar tasks

For similar jobs