HighPerfLLMs2024

HighPerfLLMs2024

None

Stars: 124

Visit
 screenshot

High Performance LLMs 2024 is a comprehensive course focused on building a high-performance Large Language Model (LLM) from scratch using Jax. The course covers various aspects such as training, inference, roofline analysis, compilation, sharding, profiling, and optimization techniques. Participants will gain a deep understanding of Jax and learn how to design high-performance computing systems that operate close to their physical limits.

README:

High Performance LLMs 2024

Build a full scale, high-performance LLM from scratch in Jax! We’ll cover training and inference, roofline analysis, compilation, sharding, profiling and more. You’ll leave the class comfortable in Jax and confident in your ability to design high-performance computing systems that reach near their physical limit.

Link to the Discord: https://discord.gg/2AWcVatVAw

Syllabus. We will:

  • Build a Jax LLM Implementation From Scratch
  • Analyze Single Chip Rooflines And Compilation
  • Analyze Distributed Computing via Sharding
  • Optimize LLM Training – what happens under the hood, rooflines, sharding
  • Optimize LLM Inference – what happens under the hood, rooflines, sharding
  • Deep Dive into flash, vLLM, continuous batching, etc.
  • Some deep dives along the way:
    • Attention, Flash Attention, vLLM, continuous batching
    • ML: Quantization, Checkpointing, Data Loading, Numerics
    • Practical Tips: Debugging, Overlapping Jax Kernels
    • Larger scale: Goodput
    • Fancy stuff: Ahead of Time Compilation
    • Going deeper: shard map, pallas.

Approximate Timing

3:30PM Pacific on Wednesdays, starting 2/21/2024. See below for links

Session Timing, Slides, Videos and Take-Home Exercises

Session Time Link to join (or recording) Slides Take-Home Exercises Summary
1 3:30PM US Pacific, 2/21/2024 Youtube recording slides link end-to-end Jax LLM
2 3:30PM US Pacific, 2/28/2024 Youtube recording slides link single chip perf and rooflines
3 3:30PM US Pacific, 3/13/2024 Youtube recording slides link multi chip perf and rooflines, 1
4 3:30PM US Pacific, 3/20/2024 Youtube recording slides link multi chip perf and rooflines, 1
5 3:30PM US Pacific, 3/27/2024 Youtube recording slides link attention
6 3:30PM US Pacific, 4/10/2024 Youtube recording slides link optimized training
postponed 3:30PM US Pacific, 4/17/2024 postponed
7 3:30PM US Pacific, 4/24/2024 Youtube recording slides link training e2e, inference analysis
postponed 3:30PM US Pacific, 5/01/2024 postponed
8 3:30PM US Pacific, 5/08/2024 Google Meet link

About me: I’m Rafi Witten, a tech lead on Cloud TPU/GPU Multipod. We develop MaxText and aim to push the frontier on Perf/TCO. In 2023, we executed the "Largest ML Job" ever demonstrated in public and pioneered “Accurate Quantized Training”, a technique for training with 8-bit integers.

Contact me via Discord https://discord.gg/2AWcVatVAw

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for HighPerfLLMs2024

Similar Open Source Tools

For similar tasks

For similar jobs