minimind-notes

🚀 [从零构建 LLM] 极简大模型训练原理与实践指南。包含 Transformer, Pretraining, SFT 核心代码与对照实验。 | A minimal, principle-first guide to understanding and building LLMs from scratch.

Stars: 54

Visit

MiniMind is a modular training guide for Large Language Models (LLMs), aiming to help developers deeply understand the training mechanism of modern large language models such as Llama and GPT through concise code and comparative experiments. It prioritizes principles over operations, provides experiments for each design choice, consists of 6 independent modules from basic components to complete architecture, and offers low entry barriers for learning. Suitable for individuals preparing for jobs in large model fields, students/researchers in machine learning/deep learning, developers, and learners with basic PyTorch knowledge who seek a deep understanding of LLMs. Not suitable for complete beginners, users only interested in deploying models quickly without caring about principles, and those looking for production-level code and best practices.

README:

🧠 MiniMind | LLM 训练原理教案

从 0 到 1 理解大模型：这不是"复制粘贴手册"，而是"原理优先"的实验场
From 0 to 1: Not a "copy-paste" manual, but a principle-first experimental lab for LLMs.

🎉 完整交互式文档已上线 / Full Documentation Live
👉 https://minimind.wiki 👈

⚡ 快速开始 • 🗺️ 学习路线 • 📦 模块导航 • 🇺🇸 English Readme

📖 简介 (Introduction)

MiniMind 旨在通过极其精简的代码和对照实验，帮助开发者通过实践深入理解大语言模型（LLM）的训练机制。不仅告诉你“怎么做”，更通过实验数据告诉你“为什么要这么做”。

Why this project? Understand every design choice in LLM training through comparative experiments.

🎯 这是什么？

这是一个模块化的 LLM 训练教案，帮助你理解现代大语言模型（如 Llama、GPT）的训练原理。

核心特点：

✅ 原理优先：理解"为什么这样设计"，而不只是"怎么运行"
✅ 对照实验：每个设计选择都通过实验回答"不这样做会怎样"
✅ 模块化：6 个独立模块，从基础组件到完整架构
✅ 低门槛：学习阶段实验可在 CPU 运行（几分钟），完整训练需 GPU

基于项目：MiniMind - 从零训练超小语言模型的完整教程

👥 适合人群

🎯 正在寻找大模型岗位实习/工作的同学必看！

这个项目特别适合想要进入大模型领域的同学。通过系统学习 LLM 训练原理，你将：

✅ 面试加分：深入理解 Transformer、Attention、RoPE 等核心机制，轻松应对技术面试
✅ 项目亮点：完成对照实验，展示你对 LLM 原理的深度理解，简历更有竞争力
✅ 快速上手：从零到一理解现代 LLM（Llama、GPT）的训练流程，不再只是"调包侠"
✅ 职业发展：掌握 LLM 训练原理，为未来从事大模型相关工作打下坚实基础

🎓 学生和研究者

🎯 寻找大模型实习/工作的同学：系统学习 LLM 训练原理，提升技术面试通过率
📚 机器学习/深度学习学生：深入理解 Transformer 和 LLM 的内部机制，不再纸上谈兵
🔬 研究生/博士生：理解 LLM 训练原理，为研究和论文写作提供扎实基础
💡 研究者：了解现代 LLM 架构的设计选择及其背后的原理，启发研究方向

💻 开发者

🤖 AI/ML 工程师：从"会用框架"提升到"理解原理"，解决实际问题更有底气
🌐 全栈开发者：对 LLM 感兴趣，希望系统学习其训练机制，拓展技术栈
⚙️ 算法工程师：需要优化或改进 LLM 训练流程，理解原理才能做出正确决策

🚀 学习者

📖 有 PyTorch 基础：熟悉基本深度学习概念，想要深入 LLM 领域
🛠️ 喜欢动手实践：通过实验和代码理解原理，而非只看理论
🔍 追求深度理解：不满足于"跑通代码"，想知道"为什么这样设计"

❌ 不适合

完全零基础的初学者（建议先学习 PyTorch 基础）
只想快速部署模型，不关心原理的用户
需要生产级代码和最佳实践的用户（本项目聚焦教学）

💪 如果你正在准备大模型岗位面试，或者想要深入理解 LLM 训练原理，这个项目就是为你准备的！ 🚀

⚡ 快速开始

30 分钟体验核心设计

运行三个关键实验，理解 LLM 的核心设计选择：

# 1. 克隆仓库
git clone https://github.com/joyehuang/minimind-notes.git
cd minimind-notes

# 2. 创建并激活虚拟环境（需要 Python 3.9+，推荐 3.10/3.11）
python3 -m venv venv          # Windows 用户请使用 python 替代 python3
source venv/bin/activate      # Linux / macOS
# Windows: venv\Scripts\activate

# 3. 安装依赖
pip install -r requirements.txt

# 4. 实验 1：为什么需要归一化？
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py

# 5. 实验 2：为什么用 RoPE 位置编码？
cd ../../02-position-encoding/experiments
python exp1_rope_basics.py

# 6. 实验 3：Attention 如何工作？
cd ../../03-attention/experiments
python exp1_attention_basics.py

💡 提示：学习阶段的实验只需 CPU，无需 GPU。完整模型训练需要 NVIDIA GPU（推荐 3090 及以上）。

你将看到：

梯度消失的可视化
RoPE 旋转编码的原理
Attention 权重的计算过程

下一步：阅读 ROADMAP.md 选择你的学习路径

📚 学习路线

根据你的时间和目标，选择合适的路径：

路径	时长	目标	链接
⚡ 快速体验	30 分钟	理解核心设计选择	开始
📚 系统学习	6 小时	掌握基础组件	开始
🎓 深度掌握	30+ 小时	从零训练模型	开始

详细路线图：ROADMAP.md

🧱 模块导航

Tier 1: Foundation（基础组件）

模块	核心问题	实验数	状态
01-normalization	为什么要归一化？Pre-LN vs Post-LN？	2	✅ 完整
02-position-encoding	为什么选择 RoPE？如何长度外推？	4	🟡 实验完成
03-attention	QKV 的直觉是什么？为什么多头？	3	🟡 实验完成
04-feedforward	FFN 存储什么知识？为什么扩张？	1	🟡 实验完成

Tier 2: Architecture（架构组装）

模块	核心问题	状态
01-residual-connection	为什么需要残差？如何稳定梯度？	🔜 待开发
02-transformer-block	如何组装组件？为什么这个顺序？	🔜 待开发

图例：

✅ 完整：包含教学文档 + 实验代码 + 自测题
🟡 实验完成：有实验代码，文档待补充
🔜 待开发：仅目录结构

详细导航：modules/README.md

🔬 实验特色

1. 对照实验设计

每个模块通过实验回答核心问题：

示例：归一化模块

配置	是否收敛	NaN 出现步数	最终 Loss
❌ NoNorm	否	~500	NaN
⚠️ Post-LN	是	-	3.5
✅ Pre-LN + RMSNorm	是	-	2.7

结论：Pre-LN + RMSNorm 最稳定 → 现代 LLM 的标准选择

2. 渐进式学习

实验 → 直觉 → 理论 → 代码
  ↓      ↓      ↓      ↓
10分钟  20分钟  30分钟  10分钟

先跑实验建立直觉，再看理论理解原理，最后读源码掌握实现。

3. 可在笔记本运行

所有实验基于 TinyShakespeare（1MB）或合成数据：

✅ 无需 GPU（CPU/MPS 均可）
✅ 每个实验 < 10 分钟
✅ 总数据量 < 100 MB

📖 文档结构

每个模块包含：

01-normalization/
├── README.md           # 模块导航
├── teaching.md         # 教学文档（Why/What/How）
├── code_guide.md       # 源码导读（链接到 MiniMind）
├── quiz.md            # 自测题
└── experiments/        # 对照实验
    ├── exp1_*.py
    ├── exp2_*.py
    └── results/        # 预期输出

文档模板（teaching.md）：

Why（为什么）：问题场景 + 直觉理解
What（是什么）：数学定义 + 对比表格
How（怎么验证）：实验设计 + 预期结果

🛠️ 技术栈

框架：PyTorch 2.0+
数据：TinyShakespeare, TinyStories
可视化：Matplotlib, Seaborn
原项目：MiniMind

🤝 贡献指南

欢迎各种形式的贡献！我们特别欢迎：

✨ 新的对照实验
📊 更好的可视化
🌍 英文翻译
🐛 错误修正
📖 文档改进

快速开始：

提交前请确保：

[ ] 实验可独立运行
[ ] 代码有充分中文注释
[ ] 结果可复现（固定随机种子）
[ ] 遵循现有文件结构

📂 仓库结构

minimind-notes/
├── modules/                    # 模块化教学（新架构）
│   ├── common/                # 通用工具
│   ├── 01-foundation/         # 基础组件
│   └── 02-architecture/       # 架构组装
│
├── docs/                       # 个人学习记录
│   ├── learning_log.md        # 学习日志
│   ├── knowledge_base.md      # 知识库
│   └── notes.md              # 索引
│
├── model/                      # MiniMind 原始代码
├── trainer/                    # 训练脚本
├── dataset/                    # 数据集
│
├── README.md                   # 本文件
├── ROADMAP.md                  # 学习路线图
└── CLAUDE.md                   # AI 助手指南

📜 致谢

本仓库基于以下项目：

MiniMind - 核心代码和训练流程
所有模块链接到 MiniMind 的真实实现

特别感谢 @jingyaogong 开源的 MiniMind 项目！

🔗 相关资源

在线网站

minimind.wiki - 在线访问完整文档和交互式内容

论文

Attention Is All You Need - Transformer 原始论文
RoFormer: RoPE - 旋转位置编码
RMSNorm - 均方根归一化

博客

视频

Andrej Karpathy - Let's build GPT

📞 联系方式

问题反馈：GitHub Issues
原项目：MiniMind

📄 License

MIT License - 详见 LICENSE

⭐ 如果这个项目对你有帮助，请给个 Star！

🌐 访问在线网站： https://minimind.wiki

准备好了吗？ 开始你的学习之旅 🚀

For Tasks:

Click tags to check more tools for each tasks

understand transformer improve training process prepare for job interview deep dive into llm expand technical skills

For Jobs:

ai/ml engineer full stack developer algorithm engineer student researcher developer

Alternative AI tools for minimind-notes

Similar Open Source Tools

minimind-notes

github

: 54

PromptHub

PromptHub is a versatile tool for generating prompts and ideas to spark creativity and overcome writer's block. It provides a wide range of customizable prompts and exercises to inspire writers, artists, educators, and anyone looking to enhance their creative thinking. With PromptHub, users can access a diverse collection of prompts across various categories such as writing, drawing, brainstorming, and more. The tool offers a user-friendly interface and allows users to save and share their favorite prompts for future reference. Whether you're a professional writer seeking inspiration or a student looking to boost your creativity, PromptHub is the perfect companion to ignite your imagination and enhance your creative process.

github

: 545

Snap-Solver

Snap-Solver is a revolutionary AI tool for online exam solving, designed for students, test-takers, and self-learners. With just a keystroke, it automatically captures any question on the screen, analyzes it using AI, and provides detailed answers. Whether it's complex math formulas, physics problems, coding issues, or challenges from other disciplines, Snap-Solver offers clear, accurate, and structured solutions to help you better understand and master the subject matter.

github

: 74

vocotype-cli

VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.

github

: 177

ai_quant_trade

The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.

github

: 2.6k

cockpit-tools

Cockpit Tools is a versatile AI IDE account management tool that supports Antigravity, Codex, GitHub Copilot, Windsurf, and Kiro. It allows efficient management of multiple AI IDE accounts with features like one-click switching, quota monitoring, automatic wake-up, and parallel running of multiple instances. The tool supports 16 languages and provides functionalities such as dashboard overview, account management for each supported platform, multiple instance management, quota monitoring, wake-up tasks, device fingerprinting, and plugin integration.

github

: 649

Unity-Skills

UnitySkills is an AI-driven Unity editor automation engine based on REST API. It allows AI to directly control Unity scenes through Skills. The tool offers extreme efficiency with Result Truncation and SKILL.md slimming, a versatile tool library with 282 Skills supporting Batch operations, ensuring transactional safety with automatic rollback, multiple instance support for controlling multiple Unity projects simultaneously, deep integration with Antigravity Slash Commands for interactive experience, compatibility with popular AI terminals like Claude Code, Antigravity, Gemini CLI, and support for Cinemachine 2.x/3.x dual versions with advanced camera control features like MixingCamera, ClearShot, TargetGroup, and Spline.

github

: 141

LunaBox

LunaBox is a lightweight, fast, and feature-rich tool for managing and tracking visual novels, with the ability to customize game categories, automatically track playtime, generate personalized reports through AI analysis, import data from other platforms, backup data locally or on cloud services, and ensure privacy and security by storing sensitive data locally. The tool supports multi-dimensional statistics, offers a variety of customization options, and provides a user-friendly interface for easy navigation and usage.

github

: 178

llm-action

This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.

github

: 12.9k

All-Model-Chat

All Model Chat is a feature-rich, highly customizable web chat application designed specifically for the Google Gemini API family. It integrates dynamic model selection, multimodal file input, streaming responses, comprehensive chat history management, and extensive customization options to provide an unparalleled AI interactive experience.

github

: 744

vscode-antigravity-cockpit

VS Code extension for monitoring Google Antigravity AI model quotas. It provides a webview dashboard, QuickPick mode, quota grouping, automatic grouping, renaming, card view, drag-and-drop sorting, status bar monitoring, threshold notifications, and privacy mode. Users can monitor quota status, remaining percentage, countdown, reset time, progress bar, and model capabilities. The extension supports local and authorized quota monitoring, multiple account authorization, and model wake-up scheduling. It also offers settings customization, user profile display, notifications, and group functionalities. Users can install the extension from the Open VSX Marketplace or via VSIX file. The source code can be built using Node.js and npm. The project is open-source under the MIT license.

github

: 2.7k

chatless

Chatless is a modern AI chat desktop application built on Tauri and Next.js. It supports multiple AI providers, can connect to local Ollama models, supports document parsing and knowledge base functions. All data is stored locally to protect user privacy. The application is lightweight, simple, starts quickly, and consumes minimal resources.

github

: 212

torch-rechub

Torch-RecHub is a lightweight, efficient, and user-friendly PyTorch recommendation system framework. It provides easy-to-use solutions for industrial-level recommendation systems, with features such as generative recommendation models, modular design for adding new models and datasets, PyTorch-based implementation for GPU acceleration, a rich library of 30+ classic and cutting-edge recommendation algorithms, standardized data loading, training, and evaluation processes, easy configuration through files or command-line parameters, reproducibility of experimental results, ONNX model export for production deployment, cross-engine data processing with PySpark support, and experiment visualization and tracking with integrated tools like WandB, SwanLab, and TensorBoardX.

github

: 769

py-xiaozhi

py-xiaozhi is a Python-based XiaoZhi voice client designed for learning through code and experiencing AI XiaoZhi's voice functions without hardware conditions. The repository is based on the xiaozhi-esp32 port. It supports AI voice interaction, visual multimodal capabilities, IoT device integration, online music playback, voice wake-up, automatic conversation mode, graphical user interface, command-line mode, cross-platform support, volume control, session management, encrypted audio transmission, automatic captcha handling, automatic MAC address retrieval, code modularization, and stability optimization.

github

: 2.5k

Lim-Code

LimCode is a powerful VS Code AI programming assistant that supports multiple AI models, intelligent tool invocation, and modular architecture. It features support for various AI channels, a smart tool system for code manipulation, MCP protocol support for external tool extension, intelligent context management, session management, and more. Users can install LimCode from the plugin store or via VSIX, or build it from the source code. The tool offers a rich set of features for AI programming and code manipulation within the VS Code environment.

github

: 97

Saber-Translator

Saber-Translator is your exclusive AI comic translation tool, designed to effortlessly eliminate language barriers and enjoy the original comic fun. It offers features like translating comic images/PDFs, intelligent bubble detection and text recognition, powerful AI translation engine with multiple service providers, highly customizable translation effects, real-time preview and convenient operations, efficient image management and download, model recording and recommendation, and support for language learning with dual prompt word outputs.

github

: 2.7k

For similar tasks

minimind-notes

github

: 54

For similar jobs

second-brain-ai-assistant-course

This open-source course teaches how to build an advanced RAG and LLM system using LLMOps and ML systems best practices. It helps you create an AI assistant that leverages your personal knowledge base to answer questions, summarize documents, and provide insights. The course covers topics such as LLM system architecture, pipeline orchestration, large-scale web crawling, model fine-tuning, and advanced RAG features. It is suitable for ML/AI engineers and data/software engineers & data scientists looking to level up to production AI systems. The course is free, with minimal costs for tools like OpenAI's API and Hugging Face's Dedicated Endpoints. Participants will build two separate Python applications for offline ML pipelines and online inference pipeline.

github

: 539

knavigator

Knavigator is a project designed to analyze, optimize, and compare scheduling systems, with a focus on AI/ML workloads. It addresses various needs, including testing, troubleshooting, benchmarking, chaos engineering, performance analysis, and optimization. Knavigator interfaces with Kubernetes clusters to manage tasks such as manipulating with Kubernetes objects, evaluating PromQL queries, as well as executing specific operations. It can operate both outside and inside a Kubernetes cluster, leveraging the Kubernetes API for task management. To facilitate large-scale experiments without the overhead of running actual user workloads, Knavigator utilizes KWOK for creating virtual nodes in extensive clusters.

github

: 64

redb-open

reDB Node is a distributed, policy-driven data mesh platform that enables True Data Portability across various databases, warehouses, clouds, and environments. It unifies data access, data mobility, and schema transformation into one open platform. Built for developers, architects, and AI systems, reDB addresses the challenges of fragmented data ecosystems in modern enterprises by providing multi-database interoperability, automated schema versioning, zero-downtime migration, real-time developer data environments with obfuscation, quantum-resistant encryption, and policy-based access control. The project aims to build a foundation for future-proof data infrastructure.

github

: 55

minimind-notes

github

: 54

Interview-for-Algorithm-Engineer

This repository provides a collection of interview questions and answers for algorithm engineers. The questions are organized by topic, and each question includes a detailed explanation of the answer. This repository is a valuable resource for anyone preparing for an algorithm engineering interview.

github

: 1.4k

LLM-as-HH

LLM-as-HH is a codebase that accompanies the paper ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution. It introduces Language Hyper-Heuristics (LHHs) that leverage LLMs for heuristic generation with minimal manual intervention and open-ended heuristic spaces. Reflective Evolution (ReEvo) is presented as a searching framework that emulates the reflective design approach of human experts while surpassing human capabilities with scalable LLM inference, Internet-scale domain knowledge, and powerful evolutionary search. The tool can improve various algorithms on problems like Traveling Salesman Problem, Capacitated Vehicle Routing Problem, Orienteering Problem, Multiple Knapsack Problems, Bin Packing Problem, and Decap Placement Problem in both black-box and white-box settings.

github

: 78

universal

The Universal Numbers Library is a header-only C++ template library designed for universal number arithmetic, offering alternatives to native integer and floating-point for mixed-precision algorithm development and optimization. It tailors arithmetic types to the application's precision and dynamic range, enabling improved application performance and energy efficiency. The library provides fast implementations of special IEEE-754 formats like quarter precision, half-precision, and quad precision, as well as vendor-specific extensions. It supports static and elastic integers, decimals, fixed-points, rationals, linear floats, tapered floats, logarithmic, interval, and adaptive-precision integers, rationals, and floats. The library is suitable for AI, DSP, HPC, and HFT algorithms.

github

: 467

UmaAi

UmaAi is a tool designed for algorithm learning purposes, specifically focused on analyzing scenario mechanics in a game. It provides functionalities such as simulating scenarios, searching, handwritten-logic, and OCR integration. The tool allows users to modify settings in config.h for evaluating cardset strength, simulating games, and understanding game mechanisms through the source code. It emphasizes that it should not be used for illegal purposes and is intended for educational use only.

github

: 154

minimind-notes

README:

🧠 MiniMind | LLM 训练原理教案

🎉 完整交互式文档已上线 / Full Documentation Live 👉 https://minimind.wiki 👈

📖 简介 (Introduction)

🎯 这是什么？

👥 适合人群

🎯 正在寻找大模型岗位实习/工作的同学必看！

🎓 学生和研究者

💻 开发者

🚀 学习者

❌ 不适合

⚡ 快速开始

30 分钟体验核心设计

📚 学习路线

🧱 模块导航

Tier 1: Foundation（基础组件）

Tier 2: Architecture（架构组装）

🔬 实验特色

1. 对照实验设计

2. 渐进式学习

3. 可在笔记本运行

📖 文档结构

🛠️ 技术栈

🤝 贡献指南

📂 仓库结构

📜 致谢

🔗 相关资源

在线网站

论文

博客

视频

📞 联系方式

📄 License

For Tasks:

For Jobs:

Alternative AI tools for minimind-notes

Similar Open Source Tools

minimind-notes

PromptHub

Snap-Solver

vocotype-cli

ai_quant_trade

cockpit-tools

Unity-Skills

LunaBox

llm-action

All-Model-Chat

vscode-antigravity-cockpit

chatless

torch-rechub

py-xiaozhi

Lim-Code

Saber-Translator

For similar tasks

minimind-notes

For similar jobs

second-brain-ai-assistant-course

knavigator

redb-open

minimind-notes

Interview-for-Algorithm-Engineer

LLM-as-HH

universal

UmaAi

🎉 完整交互式文档已上线 / Full Documentation Live
👉 https://minimind.wiki 👈