minimind-notes
🚀 [从零构建 LLM] 极简大模型训练原理与实践指南。包含 Transformer, Pretraining, SFT 核心代码与对照实验。 | A minimal, principle-first guide to understanding and building LLMs from scratch.
Stars: 54
MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.
README:
从 0 到 1 理解大模型:这不是"复制粘贴手册",而是"原理优先"的实验场
From 0 to 1: Not a "copy-paste" manual, but a principle-first experimental lab for LLMs.
🎉 完整交互式文档已上线 / Full Documentation Live
👉 https://minimind.wiki 👈
⚡ 快速开始 • 🗺️ 学习路线 • 📦 模块导航 • 🇺🇸 English Readme
MiniMind 旨在通过极其精简的代码和对照实验,帮助开发者通过实践深入理解大语言模型(LLM)的训练机制。不仅告诉你“怎么做”,更通过实验数据告诉你“为什么要这么做”。
Why this project? Understand every design choice in LLM training through comparative experiments.
这是一个模块化的 LLM 训练教案,帮助你理解现代大语言模型(如 Llama、GPT)的训练原理。
核心特点:
- ✅ 原理优先:理解"为什么这样设计",而不只是"怎么运行"
- ✅ 对照实验:每个设计选择都通过实验回答"不这样做会怎样"
- ✅ 模块化:6 个独立模块,从基础组件到完整架构
- ✅ 低门槛:学习阶段实验可在 CPU 运行(几分钟),完整训练需 GPU
基于项目:MiniMind - 从零训练超小语言模型的完整教程
这个项目特别适合想要进入大模型领域的同学。通过系统学习 LLM 训练原理,你将:
- ✅ 面试加分:深入理解 Transformer、Attention、RoPE 等核心机制,轻松应对技术面试
- ✅ 项目亮点:完成对照实验,展示你对 LLM 原理的深度理解,简历更有竞争力
- ✅ 快速上手:从零到一理解现代 LLM(Llama、GPT)的训练流程,不再只是"调包侠"
- ✅ 职业发展:掌握 LLM 训练原理,为未来从事大模型相关工作打下坚实基础
- 🎯 寻找大模型实习/工作的同学:系统学习 LLM 训练原理,提升技术面试通过率
- 📚 机器学习/深度学习学生:深入理解 Transformer 和 LLM 的内部机制,不再纸上谈兵
- 🔬 研究生/博士生:理解 LLM 训练原理,为研究和论文写作提供扎实基础
- 💡 研究者:了解现代 LLM 架构的设计选择及其背后的原理,启发研究方向
- 🤖 AI/ML 工程师:从"会用框架"提升到"理解原理",解决实际问题更有底气
- 🌐 全栈开发者:对 LLM 感兴趣,希望系统学习其训练机制,拓展技术栈
- ⚙️ 算法工程师:需要优化或改进 LLM 训练流程,理解原理才能做出正确决策
- 📖 有 PyTorch 基础:熟悉基本深度学习概念,想要深入 LLM 领域
- 🛠️ 喜欢动手实践:通过实验和代码理解原理,而非只看理论
- 🔍 追求深度理解:不满足于"跑通代码",想知道"为什么这样设计"
- 完全零基础的初学者(建议先学习 PyTorch 基础)
- 只想快速部署模型,不关心原理的用户
- 需要生产级代码和最佳实践的用户(本项目聚焦教学)
💪 如果你正在准备大模型岗位面试,或者想要深入理解 LLM 训练原理,这个项目就是为你准备的! 🚀
运行三个关键实验,理解 LLM 的核心设计选择:
# 1. 克隆仓库
git clone https://github.com/joyehuang/minimind-notes.git
cd minimind-notes
# 2. 创建并激活虚拟环境(需要 Python 3.9+,推荐 3.10/3.11)
python3 -m venv venv # Windows 用户请使用 python 替代 python3
source venv/bin/activate # Linux / macOS
# Windows: venv\Scripts\activate
# 3. 安装依赖
pip install -r requirements.txt
# 4. 实验 1:为什么需要归一化?
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py
# 5. 实验 2:为什么用 RoPE 位置编码?
cd ../../02-position-encoding/experiments
python exp1_rope_basics.py
# 6. 实验 3:Attention 如何工作?
cd ../../03-attention/experiments
python exp1_attention_basics.py💡 提示:学习阶段的实验只需 CPU,无需 GPU。完整模型训练需要 NVIDIA GPU(推荐 3090 及以上)。
你将看到:
- 梯度消失的可视化
- RoPE 旋转编码的原理
- Attention 权重的计算过程
下一步:阅读 ROADMAP.md 选择你的学习路径
根据你的时间和目标,选择合适的路径:
| 路径 | 时长 | 目标 | 链接 |
|---|---|---|---|
| ⚡ 快速体验 | 30 分钟 | 理解核心设计选择 | 开始 |
| 📚 系统学习 | 6 小时 | 掌握基础组件 | 开始 |
| 🎓 深度掌握 | 30+ 小时 | 从零训练模型 | 开始 |
详细路线图:ROADMAP.md
| 模块 | 核心问题 | 实验数 | 状态 |
|---|---|---|---|
| 01-normalization | 为什么要归一化?Pre-LN vs Post-LN? | 2 | ✅ 完整 |
| 02-position-encoding | 为什么选择 RoPE?如何长度外推? | 4 | 🟡 实验完成 |
| 03-attention | QKV 的直觉是什么?为什么多头? | 3 | 🟡 实验完成 |
| 04-feedforward | FFN 存储什么知识?为什么扩张? | 1 | 🟡 实验完成 |
| 模块 | 核心问题 | 状态 |
|---|---|---|
| 01-residual-connection | 为什么需要残差?如何稳定梯度? | 🔜 待开发 |
| 02-transformer-block | 如何组装组件?为什么这个顺序? | 🔜 待开发 |
图例:
- ✅ 完整:包含教学文档 + 实验代码 + 自测题
- 🟡 实验完成:有实验代码,文档待补充
- 🔜 待开发:仅目录结构
详细导航:modules/README.md
每个模块通过实验回答核心问题:
示例:归一化模块
| 配置 | 是否收敛 | NaN 出现步数 | 最终 Loss |
|---|---|---|---|
| ❌ NoNorm | 否 | ~500 | NaN |
| 是 | - | 3.5 | |
| ✅ Pre-LN + RMSNorm | 是 | - | 2.7 |
结论:Pre-LN + RMSNorm 最稳定 → 现代 LLM 的标准选择
实验 → 直觉 → 理论 → 代码
↓ ↓ ↓ ↓
10分钟 20分钟 30分钟 10分钟
先跑实验建立直觉,再看理论理解原理,最后读源码掌握实现。
所有实验基于 TinyShakespeare(1MB)或合成数据:
- ✅ 无需 GPU(CPU/MPS 均可)
- ✅ 每个实验 < 10 分钟
- ✅ 总数据量 < 100 MB
每个模块包含:
01-normalization/
├── README.md # 模块导航
├── teaching.md # 教学文档(Why/What/How)
├── code_guide.md # 源码导读(链接到 MiniMind)
├── quiz.md # 自测题
└── experiments/ # 对照实验
├── exp1_*.py
├── exp2_*.py
└── results/ # 预期输出
文档模板(teaching.md):
- Why(为什么):问题场景 + 直觉理解
- What(是什么):数学定义 + 对比表格
- How(怎么验证):实验设计 + 预期结果
- 框架:PyTorch 2.0+
- 数据:TinyShakespeare, TinyStories
- 可视化:Matplotlib, Seaborn
- 原项目:MiniMind
欢迎各种形式的贡献!我们特别欢迎:
- ✨ 新的对照实验
- 📊 更好的可视化
- 🌍 英文翻译
- 🐛 错误修正
- 📖 文档改进
快速开始:
提交前请确保:
- [ ] 实验可独立运行
- [ ] 代码有充分中文注释
- [ ] 结果可复现(固定随机种子)
- [ ] 遵循现有文件结构
minimind-notes/
├── modules/ # 模块化教学(新架构)
│ ├── common/ # 通用工具
│ ├── 01-foundation/ # 基础组件
│ └── 02-architecture/ # 架构组装
│
├── docs/ # 个人学习记录
│ ├── learning_log.md # 学习日志
│ ├── knowledge_base.md # 知识库
│ └── notes.md # 索引
│
├── model/ # MiniMind 原始代码
├── trainer/ # 训练脚本
├── dataset/ # 数据集
│
├── README.md # 本文件
├── ROADMAP.md # 学习路线图
└── CLAUDE.md # AI 助手指南
本仓库基于以下项目:
- MiniMind - 核心代码和训练流程
- 所有模块链接到 MiniMind 的真实实现
特别感谢 @jingyaogong 开源的 MiniMind 项目!
- minimind.wiki - 在线访问完整文档和交互式内容
- Attention Is All You Need - Transformer 原始论文
- RoFormer: RoPE - 旋转位置编码
- RMSNorm - 均方根归一化
- 问题反馈:GitHub Issues
- 原项目:MiniMind
MIT License - 详见 LICENSE
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for minimind-notes
Similar Open Source Tools
minimind-notes
MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.
PromptHub
PromptHub is a versatile tool for generating prompts and ideas to spark creativity and overcome writer's block. It provides a wide range of customizable prompts and exercises to inspire writers, artists, educators, and anyone looking to enhance their creative thinking. With PromptHub, users can access a diverse collection of prompts across various categories such as writing, drawing, brainstorming, and more. The tool offers a user-friendly interface and allows users to save and share their favorite prompts for future reference. Whether you're a professional writer seeking inspiration or a student looking to boost your creativity, PromptHub is the perfect companion to ignite your imagination and enhance your creative process.
Snap-Solver
Snap-Solver is a revolutionary AI tool for online exam solving, designed for students, test-takers, and self-learners. With just a keystroke, it automatically captures any question on the screen, analyzes it using AI, and provides detailed answers. Whether it's complex math formulas, physics problems, coding issues, or challenges from other disciplines, Snap-Solver offers clear, accurate, and structured solutions to help you better understand and master the subject matter.
vocotype-cli
VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.
ai_quant_trade
The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.
cockpit-tools
Cockpit Tools is a versatile AI IDE account management tool that supports Antigravity, Codex, GitHub Copilot, Windsurf, and Kiro. It allows efficient management of multiple AI IDE accounts with features like one-click switching, quota monitoring, automatic wake-up, and parallel running of multiple instances. The tool supports 16 languages and provides functionalities such as dashboard overview, account management for each supported platform, multiple instance management, quota monitoring, wake-up tasks, device fingerprinting, and plugin integration.
Unity-Skills
UnitySkills is an AI-driven Unity editor automation engine based on REST API. It allows AI to directly control Unity scenes through Skills. The tool offers extreme efficiency with Result Truncation and SKILL.md slimming, a versatile tool library with 282 Skills supporting Batch operations, ensuring transactional safety with automatic rollback, multiple instance support for controlling multiple Unity projects simultaneously, deep integration with Antigravity Slash Commands for interactive experience, compatibility with popular AI terminals like Claude Code, Antigravity, Gemini CLI, and support for Cinemachine 2.x/3.x dual versions with advanced camera control features like MixingCamera, ClearShot, TargetGroup, and Spline.
LunaBox
LunaBox is a lightweight, fast, and feature-rich tool for managing and tracking visual novels, with the ability to customize game categories, automatically track playtime, generate personalized reports through AI analysis, import data from other platforms, backup data locally or on cloud services, and ensure privacy and security by storing sensitive data locally. The tool supports multi-dimensional statistics, offers a variety of customization options, and provides a user-friendly interface for easy navigation and usage.
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
All-Model-Chat
All Model Chat is a feature-rich, highly customizable web chat application designed specifically for the Google Gemini API family. It integrates dynamic model selection, multimodal file input, streaming responses, comprehensive chat history management, and extensive customization options to provide an unparalleled AI interactive experience.
vscode-antigravity-cockpit
VS Code extension for monitoring Google Antigravity AI model quotas. It provides a webview dashboard, QuickPick mode, quota grouping, automatic grouping, renaming, card view, drag-and-drop sorting, status bar monitoring, threshold notifications, and privacy mode. Users can monitor quota status, remaining percentage, countdown, reset time, progress bar, and model capabilities. The extension supports local and authorized quota monitoring, multiple account authorization, and model wake-up scheduling. It also offers settings customization, user profile display, notifications, and group functionalities. Users can install the extension from the Open VSX Marketplace or via VSIX file. The source code can be built using Node.js and npm. The project is open-source under the MIT license.
chatless
Chatless is a modern AI chat desktop application built on Tauri and Next.js. It supports multiple AI providers, can connect to local Ollama models, supports document parsing and knowledge base functions. All data is stored locally to protect user privacy. The application is lightweight, simple, starts quickly, and consumes minimal resources.
torch-rechub
Torch-RecHub is a lightweight, efficient, and user-friendly PyTorch recommendation system framework. It provides easy-to-use solutions for industrial-level recommendation systems, with features such as generative recommendation models, modular design for adding new models and datasets, PyTorch-based implementation for GPU acceleration, a rich library of 30+ classic and cutting-edge recommendation algorithms, standardized data loading, training, and evaluation processes, easy configuration through files or command-line parameters, reproducibility of experimental results, ONNX model export for production deployment, cross-engine data processing with PySpark support, and experiment visualization and tracking with integrated tools like WandB, SwanLab, and TensorBoardX.
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning through code and experiencing AI XiaoZhi's voice functions without hardware conditions. The repository is based on the xiaozhi-esp32 port. It supports AI voice interaction, visual multimodal capabilities, IoT device integration, online music playback, voice wake-up, automatic conversation mode, graphical user interface, command-line mode, cross-platform support, volume control, session management, encrypted audio transmission, automatic captcha handling, automatic MAC address retrieval, code modularization, and stability optimization.
Lim-Code
LimCode is a powerful VS Code AI programming assistant that supports multiple AI models, intelligent tool invocation, and modular architecture. It features support for various AI channels, a smart tool system for code manipulation, MCP protocol support for external tool extension, intelligent context management, session management, and more. Users can install LimCode from the plugin store or via VSIX, or build it from the source code. The tool offers a rich set of features for AI programming and code manipulation within the VS Code environment.
Saber-Translator
Saber-Translator is your exclusive AI comic translation tool, designed to effortlessly eliminate language barriers and enjoy the original comic fun. It offers features like translating comic images/PDFs, intelligent bubble detection and text recognition, powerful AI translation engine with multiple service providers, highly customizable translation effects, real-time preview and convenient operations, efficient image management and download, model recording and recommendation, and support for language learning with dual prompt word outputs.
For similar tasks
minimind-notes
MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.
For similar jobs
second-brain-ai-assistant-course
This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.
knavigator
Knavigator is a project designed to analyze, optimize, and compare scheduling systems, with a focus on AI/ML workloads. It addresses various needs, including testing, troubleshooting, benchmarking, chaos engineering, performance analysis, and optimization. Knavigator interfaces with Kubernetes clusters to manage tasks such as manipulating with Kubernetes objects, evaluating PromQL queries, as well as executing specific operations. It can operate both outside and inside a Kubernetes cluster, leveraging the Kubernetes API for task management. To facilitate large-scale experiments without the overhead of running actual user workloads, Knavigator utilizes KWOK for creating virtual nodes in extensive clusters.
redb-open
reDB Node is a distributed, policy-driven data mesh platform that enables True Data Portability across various databases, warehouses, clouds, and environments. It unifies data access, data mobility, and schema transformation into one open platform. Built for developers, architects, and AI systems, reDB addresses the challenges of fragmented data ecosystems in modern enterprises by providing multi-database interoperability, automated schema versioning, zero-downtime migration, real-time developer data environments with obfuscation, quantum-resistant encryption, and policy-based access control. The project aims to build a foundation for future-proof data infrastructure.
minimind-notes
MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.
Interview-for-Algorithm-Engineer
This repository provides a collection of interview questions and answers for algorithm engineers. The questions are organized by topic, and each question includes a detailed explanation of the answer. This repository is a valuable resource for anyone preparing for an algorithm engineering interview.
LLM-as-HH
LLM-as-HH is a codebase that accompanies the paper ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution. It introduces Language Hyper-Heuristics (LHHs) that leverage LLMs for heuristic generation with minimal manual intervention and open-ended heuristic spaces. Reflective Evolution (ReEvo) is presented as a searching framework that emulates the reflective design approach of human experts while surpassing human capabilities with scalable LLM inference, Internet-scale domain knowledge, and powerful evolutionary search. The tool can improve various algorithms on problems like Traveling Salesman Problem, Capacitated Vehicle Routing Problem, Orienteering Problem, Multiple Knapsack Problems, Bin Packing Problem, and Decap Placement Problem in both black-box and white-box settings.
universal
The Universal Numbers Library is a header-only C++ template library designed for universal number arithmetic, offering alternatives to native integer and floating-point for mixed-precision algorithm development and optimization. It tailors arithmetic types to the application's precision and dynamic range, enabling improved application performance and energy efficiency. The library provides fast implementations of special IEEE-754 formats like quarter precision, half-precision, and quad precision, as well as vendor-specific extensions. It supports static and elastic integers, decimals, fixed-points, rationals, linear floats, tapered floats, logarithmic, interval, and adaptive-precision integers, rationals, and floats. The library is suitable for AI, DSP, HPC, and HFT algorithms.
UmaAi
UmaAi is a tool designed for algorithm learning purposes, specifically focused on analyzing scenario mechanics in a game. It provides functionalities such as simulating scenarios, searching, handwritten-logic, and OCR integration. The tool allows users to modify settings in config.h for evaluating cardset strength, simulating games, and understanding game mechanisms through the source code. It emphasizes that it should not be used for illegal purposes and is intended for educational use only.