llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
Stars: 9546
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
README:
🔥🔥🔥 最近正在做一个在 NVIDIA GPU、Ascend NPU 上训练大模型的简易工具unify-easy-llm,欢迎大家关注。
- 🐌 LLM训练
- 🐫 LLM训练实战
- 🐼 LLM参数高效微调技术原理
- 🐰 LLM参数高效微调技术实战
- 🐘 LLM分布式训练并行技术
- 🌋 分布式AI框架
- 📡 分布式训练网络通信
- 🌿 LLM训练优化技术
- ⌛ LLM对齐技术
- 🐎 LLM推理
- ♻️ LLM压缩
- 🌿 LLM测评
- 🌴 LLM数据工程
- 🌀 提示工程
- ♍️ LLM算法架构
- 🧩 LLM应用开发
- 🀄️ LLM国产化适配
- 🔯 AI编译器
- 🔘 AI基础设施
- 💟 LLMOps
- 🍄 LLM生态相关技术
- 💫 LLM面试题
- 🔨 服务器基础环境安装及常用工具
- 💬 LLM学习交流群
- 👥 微信公众号
- ⭐️ Star History
- 🔗 AI工程化课程推荐
下面汇总了我在大模型实践中训练相关的所有教程。从6B到65B,从全量微调到高效微调(LoRA,QLoRA,P-Tuning v2),再到RLHF(基于人工反馈的强化学习)。
LLM | 预训练/SFT/RLHF... | 参数 | 教程 | 代码 |
---|---|---|---|---|
Alpaca | full fine-turning | 7B | 从0到1复现斯坦福羊驼(Stanford Alpaca 7B) | 配套代码 |
Alpaca(LLaMA) | LoRA | 7B~65B | 1.足够惊艳,使用Alpaca-Lora基于LLaMA(7B)二十分钟完成微调,效果比肩斯坦福羊驼 2. 使用 LoRA 技术对 LLaMA 65B 大模型进行微调及推理 |
配套代码 |
BELLE(LLaMA/Bloom) | full fine-turning | 7B | 1.基于LLaMA-7B/Bloomz-7B1-mt复现开源中文对话大模型BELLE及GPTQ量化 2. BELLE(LLaMA-7B/Bloomz-7B1-mt)大模型使用GPTQ量化后推理性能测试 |
N/A |
ChatGLM | LoRA | 6B | 从0到1基于ChatGLM-6B使用LoRA进行参数高效微调 | 配套代码 |
ChatGLM | full fine-turning/P-Tuning v2 | 6B | 使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调 | 配套代码 |
Vicuna(LLaMA) | full fine-turning | 7B | 大模型也内卷,Vicuna训练及推理指南,效果碾压斯坦福羊驼 | N/A |
OPT | RLHF | 0.1B~66B | 1.一键式 RLHF 训练 DeepSpeed Chat(一):理论篇 2. 一键式 RLHF 训练 DeepSpeed Chat(二):实践篇 |
配套代码 |
MiniGPT-4(LLaMA) | full fine-turning | 7B | 大杀器,多模态大模型MiniGPT-4入坑指南 | N/A |
Chinese-LLaMA-Alpaca(LLaMA) | LoRA(预训练+微调) | 7B | 中文LLaMA&Alpaca大语言模型词表扩充+预训练+指令精调 | 配套代码 |
LLaMA | QLoRA | 7B/65B | 高效微调技术QLoRA实战,基于LLaMA-65B微调仅需48G显存,真香 | 配套代码 |
LLaMA | GaLore | 60M/7B | 突破内存瓶颈,使用 GaLore 一张4090消费级显卡也能预训练LLaMA-7B | 配套代码 |
对于普通大众来说,进行大模型的预训练或者全量微调遥不可及。由此,催生了各种参数高效微调技术,让科研人员或者普通开发者有机会尝试微调大模型。
因此,该技术值得我们进行深入分析其背后的机理,本系列大体分七篇文章进行讲解。
- 大模型参数高效微调技术原理综述(一)-背景、参数高效微调简介
- 大模型参数高效微调技术原理综述(二)-BitFit、Prefix Tuning、Prompt Tuning
- 大模型参数高效微调技术原理综述(三)-P-Tuning、P-Tuning v2
- 大模型参数高效微调技术原理综述(四)-Adapter Tuning及其变体
- 大模型参数高效微调技术原理综述(五)-LoRA、AdaLoRA、QLoRA
- 大模型参数高效微调技术原理综述(六)-MAM Adapter、UniPELT
- 大模型参数高效微调技术原理综述(七)-最佳实践、总结
下面给大家分享大模型参数高效微调技术实战,该系列主要针对 HuggingFace PEFT 框架支持的一些高效微调技术进行讲解。
教程 | 代码 | 框架 |
---|---|---|
大模型参数高效微调技术实战(一)-PEFT概述及环境搭建 | N/A | HuggingFace PEFT |
大模型参数高效微调技术实战(二)-Prompt Tuning | 配套代码 | HuggingFace PEFT |
大模型参数高效微调技术实战(三)-P-Tuning | 配套代码 | HuggingFace PEFT |
大模型参数高效微调技术实战(四)-Prefix Tuning / P-Tuning v2 | 配套代码 | HuggingFace PEFT |
大模型参数高效微调技术实战(五)-LoRA | 配套代码 | HuggingFace PEFT |
大模型参数高效微调技术实战(六)-IA3 | 配套代码 | HuggingFace PEFT |
大模型微调实战(七)-基于LoRA微调多模态大模型 | 配套代码 | HuggingFace PEFT |
大模型微调实战(八)-使用INT8/FP4/NF4微调大模型 | 配套代码 | PEFT、bitsandbytes |
近年来,随着Transformer、MOE架构的提出,使得深度学习模型轻松突破上万亿规模参数,传统的单机单卡模式已经无法满足超大模型进行训练的要求。因此,我们需要基于单机多卡、甚至是多机多卡进行分布式大模型的训练。
而利用AI集群,使深度学习算法更好地从大量数据中高效地训练出性能优良的大模型是分布式机器学习的首要目标。为了实现该目标,一般需要根据硬件资源与数据/模型规模的匹配情况,考虑对计算任务、训练数据和模型进行划分,从而进行分布式训练。因此,分布式训练相关技术值得我们进行深入分析其背后的机理。
下面主要对大模型进行分布式训练的并行技术进行讲解,本系列大体分九篇文章进行讲解。
- 大模型分布式训练并行技术(一)-概述
- 大模型分布式训练并行技术(二)-数据并行
- 大模型分布式训练并行技术(三)-流水线并行
- 大模型分布式训练并行技术(四)-张量并行
- 大模型分布式训练并行技术(五)-序列并行
- 大模型分布式训练并行技术(六)-多维混合并行
- 大模型分布式训练并行技术(七)-自动并行
- 大模型分布式训练并行技术(八)-MOE并行
- 大模型分布式训练并行技术(九)-总结
-
PyTorch
- PyTorch 单机多卡训练
- PyTorch 多机多卡训练
-
Megatron-LM
- Megatron-LM 单机多卡训练
- Megatron-LM 多机多卡训练
- 基于Megatron-LM从0到1完成GPT2模型预训练、模型评估及推理
-
DeepSpeed
- DeepSpeed 单机多卡训练
- DeepSpeed 多机多卡训练
-
Megatron-DeepSpeed
- 基于 Megatron-DeepSpeed 从 0 到1 完成 LLaMA 预训练
- 基于 Megatron-DeepSpeed 从 0 到1 完成 Bloom 预训练
待更新...
- FlashAttention V1、V2
- 混合精度训练
- 重计算
- MQA / GQA
- 梯度累积
- PPO(近端策略优化)
- DPO
- ORPO
- 大模型推理框架概述
- 大模型的好伙伴,浅析推理加速引擎FasterTransformer
- 模型推理服务化框架Triton保姆式教程(一):快速入门
- 模型推理服务化框架Triton保姆式教程(二):架构解析
- 模型推理服务化框架Triton保姆式教程(三):开发实践
- TensorRT-LLM保姆级教程(一)-快速入门
- TensorRT-LLM保姆级教程(二)-离线环境搭建、模型量化及推理
- TensorRT-LLM保姆级教程(三)-使用Triton推理服务框架部署模型
- TensorRT-LLM保姆级教程(四)-新模型适配
- LLM推理优化技术概述
- 大模型推理优化技术-KV Cache
- 大模型推理服务调度优化技术-Continuous batching
- 大模型底显存推理优化-Offload技术
- 大模型推理优化技术-KV Cache量化
- 大模型推理优化技术-KV Cache优化方法综述
- 大模型吞吐优化技术-多LoRA推理服务
- 大模型推理服务调度优化技术-公平性调度
- 大模型访存优化技术-FlashAttention
- 大模型显存优化技术-PagedAttention
- 大模型解码优化-Speculative Decoding及其变体
- Flash Decoding
- FlashDecoding++
近年来,随着Transformer、MOE架构的提出,使得深度学习模型轻松突破上万亿规模参数,从而导致模型变得越来越大,因此,我们需要一些大模型压缩技术来降低模型部署的成本,并提升模型的推理性能。 模型压缩主要分为如下几类:
- 模型剪枝(Pruning)
- 知识蒸馏(Knowledge Distillation)
- 模型量化(Quantization)
- 低秩分解(Low-Rank Factorization)
本系列将针对一些常见大模型量化方案(GPTQ、LLM.int8()、SmoothQuant、AWQ等)进行讲述。
- 大模型量化概述
- 量化感知训练:
- 训练后量化:
- 大模型量化技术原理:总结
目前,大多数针对大模型模型的压缩技术都专注于模型量化领域,即降低单个权重的数值表示的精度。另一种模型压缩方法模型剪枝的研究相对较少,即删除网络元素,包括从单个权重(非结构化剪枝)到更高粒度的组件,如权重矩阵的整行/列(结构化剪枝)。
本系列将针对一些常见大模型剪枝方案(LLM-Pruner、SliceGPT、SparseGPT、Wanda等)进行讲述。
结构化剪枝:
- LLM-Pruner(LLM-Pruner: On the Structural Pruning of Large Language Models)
- LLM-Shearing(Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning)
- SliceGPT: Compress Large Language Models by Deleting Rows and Columns
- LoSparse
非结构化剪枝:
- SparseGPT(SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot)
- LoRAPrune(LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning)
- Wanda(A Simple and Effective Pruning Approach for Large Language Models)
- Flash-LLM(Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity)
Standard KD:
使学生模型学习教师模型(LLM)所拥有的常见知识,如输出分布和特征信息,这种方法类似于传统的KD。
- MINILLM
- GKD
EA-based KD:
不仅仅是将LLM的常见知识转移到学生模型中,还涵盖了蒸馏它们独特的涌现能力。具体来说,EA-based KD又分为了上下文学习(ICL)、思维链(CoT)和指令跟随(IF)。
In-Context Learning:
- In-Context Learning distillation
Chain-of-Thought:
- MT-COT
- Fine-tune-CoT
- DISCO
- SCOTT
- SOCRATIC CoT
Instruction Following:
- Lion
低秩分解旨在通过将给定的权重矩阵分解成两个或多个较小维度的矩阵,从而对其进行近似。低秩分解背后的核心思想是找到一个大的权重矩阵W的分解,得到两个矩阵U和V,使得W≈U V,其中U是一个m×k矩阵,V是一个k×n矩阵,其中k远小于m和n。U和V的乘积近似于原始的权重矩阵,从而大幅减少了参数数量和计算开销。
在LLM研究的模型压缩领域,研究人员通常将多种技术与低秩分解相结合,包括修剪、量化等。
- ZeroQuant-FP(低秩分解+量化)
- LoRAPrune(低秩分解+剪枝)
- C-Eval:全面的中文基础模型评估套件,涵盖了52个不同学科的13948个多项选择题,分为四个难度级别。
- CMMLU:一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。CMMLU涵盖了从基础学科到高级专业水平的67个主题。它包括:需要计算和推理的自然科学,需要知识的人文科学和社会科学,以及需要生活常识的中国驾驶规则等。此外,CMMLU中的许多任务具有中国特定的答案,可能在其他地区或语言中并不普遍适用。因此是一个完全中国化的中文测试基准。
- IFEval: Instruction Following Eval/Paper:专注评估大模型遵循指令的能力,包含关键词检测、标点控制、输出格式要求等25种任务。
- SuperCLUE:一个综合性大模型评测基准,本次评测主要聚焦于大模型的四个能力象限,包括语言理解与生成、专业技能与知识、Agent智能体和安全性,进而细化为12项基础能力。
- AGIEval:用于评估基础模型在与人类认知和解决问题相关的任务中的能力。该基准源自 20 项面向普通考生的官方、公开、高标准的入学和资格考试,例如:普通大学入学考试(例如:中国高考(Gaokao)和美国 SAT)、法学院入学考试、数学竞赛、律师资格考试、国家公务员考试。
- OpenCompass:司南 2.0 大模型评测体系。支持的数据集如下:
语言 | 知识 | 推理 | 考试 |
字词释义
成语习语
语义相似度
指代消解
翻译
多语种问答
多语种总结
|
知识问答
|
文本蕴含
常识推理
数学推理
定理应用
综合推理
|
初中/高中/大学/职业考试
医学考试
|
理解 | 长文本 | 安全 | 代码 |
阅读理解
内容总结
内容分析
|
长文本理解
|
安全
健壮性
|
代码
|
LLM Data Engineering
- 数据收集
- 数据处理
- 去重
- 过滤
- 选择
- 组合
- LLM微调高效数据筛选技术原理-DEITA
- LLM微调高效数据筛选技术原理-MoDS
- LLM微调高效数据筛选技术原理-IFD
- LLM微调高效数据筛选技术原理-CaR
- LESS:仅选择5%有影响力的数据优于全量数据集进行目标指令微调
- LESS 实践:用少量的数据进行目标指令微调
- Zero-Shot Prompting
- Few-Shot Prompting
- Chain-of-Thought (CoT) Prompting
- Automatic Chain-of-Thought (Auto-CoT) Prompting
- Tree-of-Thoughts (ToT) Prompting
- ChatGLM / ChatGLM2 / ChatGLM3 大模型解析
- Bloom 大模型解析
- LLaMA / LLaMA2 大模型解析
- 百川智能开源大模型baichuan-7B技术剖析
- 百川智能开源大模型baichuan-13B技术剖析
- LLaMA3 技术剖析
- QWen 大模型剖析
大模型是基座,要想让其变成一款产品,我们还需要一些其他相关的技术,比如:向量数据库(Pinecone、Milvus、Vespa、Weaviate),LangChain等。
随着 ChatGPT 的现象级走红,引领了AI大模型时代的变革,从而导致 AI 算力日益紧缺。与此同时,中美贸易战以及美国对华进行AI芯片相关的制裁导致 AI 算力的国产化适配势在必行。本系列将对一些国产化 AI 加速卡进行讲解。
- 大模型国产化适配1-华为昇腾AI全栈软硬件平台总结
- 大模型国产化适配2-基于昇腾910使用ChatGLM-6B进行模型推理
-
大模型国产化适配3-基于昇腾910使用ChatGLM-6B进行模型训练
- MindRecord数据格式说明、全量微调、LoRA微调
- 大模型国产化适配4-基于昇腾910使用LLaMA-13B进行多机多卡训练
- 大模型国产化适配5-百度飞浆PaddleNLP大语言模型工具链总结
- 大模型国产化适配6-基于昇腾910B快速验证ChatGLM3-6B/BaiChuan2-7B模型推理
- 大模型国产化适配7-华为昇腾LLM落地可选解决方案(MindFormers、ModelLink、MindIE)
- MindIE 1.0.RC1 发布,华为昇腾终于推出了针对LLM的完整部署方案,结束小米加步枪时代
-
大模型国产化适配8-基于昇腾MindIE推理工具部署Qwen-72B实战(推理引擎、推理服务化)
- Qwen-72B、Baichuan2-7B、ChatGLM3-6B
- 大模型国产化适配9-LLM推理框架MindIE-Service性能基准测试
- 大模型国产化适配10-快速迁移大模型到昇腾910B保姆级教程(Pytorch版)
- 大模型国产化适配11-LLM训练性能基准测试(昇腾910B3)
- 国产知名AI芯片厂商产品大揭秘-昇腾、海光、天数智芯...
- 国内AI芯片厂商的计算平台大揭秘-昇腾、海光、天数智芯...
- 【LLM国产化】量化技术在MindIE推理框架中的应用
AI编译器是指将机器学习算法从开发阶段,通过变换和优化算法,使其变成部署状态。
- AI编译器技术剖析(一)-概述
- AI编译器技术剖析(二)-传统编译器
- AI编译器技术剖析(三)-树模型编译工具 Treelite 详解
- AI编译器技术剖析(四)-编译器前端
- AI编译器技术剖析(五)-编译器后端
- AI编译器技术剖析(六)-主流编译框架
- AI编译器技术剖析(七)-深度学习模型编译优化
- lleaves:使用 LLVM 编译梯度提升决策树将预测速度提升10+倍
框架:
- MLIR
- XLA
- TVM
- AI芯片技术原理剖析(一):国内外AI芯片概述
- AI芯片技术原理剖析(二):英伟达GPU
- AI芯片技术原理剖析(三):谷歌TPU
待更新...
待更新...
- 分布式训练网络通讯原语
- AI 集群通信软硬件
- 大模型词表扩充必备工具SentencePiece
- 大模型实践总结
- ChatGLM 和 ChatGPT 的技术区别在哪里?
- 现在为什么那么多人以清华大学的ChatGLM-6B为基座进行试验?
- 为什么很多新发布的大模型默认使用BF16而不是FP16?
- 大模型训练时ZeRO-2、ZeRO-3能否和Pipeline并行相结合?
- 一文详解模型权重存储新格式 Safetensors
- 一文搞懂大模型文件存储格式新宠GGUF
正在收集中...
基础环境安装:
常用工具:
- Linux 常见命令大全
- Conda 常用命令大全
- Poetry 常用命令大全
- Docker 常用命令大全
- Docker Dockerfile 指令大全
- Kubernetes 常用命令大全
- 集群环境 GPU 管理和监控工具 DCGM 常用命令大全
我创建了大模型相关的学习交流群,供大家一起学习交流大模型相关的最新技术,目前已有5个群,每个群都有上百人的规模,可加我微信进群(加微信请备注来意,如:进大模型学习交流群+GitHub,进大模型推理加速交流群+GitHub、进大模型应用开发交流群+GitHub、进大模型校招交流群+GitHub等)。一定要备注哟,否则不予通过。
PS:成都有个本地大模型交流群,想进可以另外单独备注下。
微信公众号:吃果冻不吐果冻皮,该公众号主要分享AI工程化(大模型、MLOps等)相关实践经验,免费电子书籍、论文等。
如今人工智能的发展可谓是如火如荼,ChatGPT、Sora、文心一言等AI大模型如雨后春笋般纷纷涌现。AI大模型优势在于它能处理复杂性问题;因此,越来越多的企业需要具备AI算法设计、AI应用开发、模型推理加速及模型压缩等AI工程化落地的能力。这就导致行业内的工程师,需要快速提升自身的技术栈,以便于在行业内站稳脚跟。我在llm-resource 和 ai-system梳理了一些大模型和AI工程化相关资料,同时,推荐一些AI工程化相关的课程,点击查看详情。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm-action
Similar Open Source Tools
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
ChatGPT-Next-Web-Pro
ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
ai_quant_trade
The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.
RAG-Retrieval
RAG-Retrieval provides full-chain RAG retrieval fine-tuning and inference code. It supports fine-tuning any open-source RAG retrieval models, including vector (embedding, graph a), delayed interactive models (ColBERT, graph d), interactive models (cross encoder, graph c). For inference, RAG-Retrieval focuses on ranking (reranker) and has developed a lightweight Python library rag-retrieval, providing a unified way to call any different RAG ranking models.
how-to-optim-algorithm-in-cuda
This repository documents how to optimize common algorithms based on CUDA. It includes subdirectories with code implementations for specific optimizations. The optimizations cover topics such as compiling PyTorch from source, NVIDIA's reduce optimization, OneFlow's elementwise template, fast atomic add for half data types, upsample nearest2d optimization in OneFlow, optimized indexing in PyTorch, OneFlow's softmax kernel, linear attention optimization, and more. The repository also includes learning resources related to deep learning frameworks, compilers, and optimization techniques.
llm-resource
llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.
HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
99AI
99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.
VLMEvalKit
VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt generation-based evaluation for all LVLMs, and provide the evaluation results obtained with both exact matching and LLM-based answer extraction.
DeepLearing-Interview-Awesome-2024
DeepLearning-Interview-Awesome-2024 is a repository that covers various topics related to deep learning, computer vision, big models (LLMs), autonomous driving, smart healthcare, and more. It provides a collection of interview questions with detailed explanations sourced from recent academic papers and industry developments. The repository is aimed at assisting individuals in academic research, work innovation, and job interviews. It includes six major modules covering topics such as large language models (LLMs), computer vision models, common problems in computer vision and perception algorithms, deep learning basics and frameworks, as well as specific tasks like 3D object detection, medical image segmentation, and more.
omnia
Omnia is a deployment tool designed to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters. It provides an Ansible playbook-based deployment for Slurm and Kubernetes on servers running an RPM-based Linux OS. The tool simplifies the process of setting up and managing clusters, making it easier for users to deploy and maintain their infrastructure.
activepieces
Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide
Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.
NeuroAI_Course
Neuromatch Academy NeuroAI Course Syllabus is a repository that contains the schedule and licensing information for the NeuroAI course. The course is designed to provide participants with a comprehensive understanding of artificial intelligence in neuroscience. It covers various topics related to AI applications in neuroscience, including machine learning, data analysis, and computational modeling. The content is primarily accessed from the ebook provided in the repository, and the course is scheduled for July 15-26, 2024. The repository is shared under a Creative Commons Attribution 4.0 International License and software elements are additionally licensed under the BSD (3-Clause) License. Contributors to the project are acknowledged and welcomed to contribute further.
Awesome-AI
Awesome AI is a repository that collects and shares resources in the fields of large language models (LLM), AI-assisted programming, AI drawing, and more. It explores the application and development of generative artificial intelligence. The repository provides information on various AI tools, models, and platforms, along with tutorials and web products related to AI technologies.
For similar tasks
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
llm-engine
Scale's LLM Engine is an open-source Python library, CLI, and Helm chart that provides everything you need to serve and fine-tune foundation models, whether you use Scale's hosted infrastructure or do it in your own cloud infrastructure using Kubernetes.
LLaMA-Factory
LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
llm-foundry
LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs
EdgeChains
EdgeChains is an open-source chain-of-thought engineering framework tailored for Large Language Models (LLMs)- like OpenAI GPT, LLama2, Falcon, etc. - With a focus on enterprise-grade deployability and scalability. EdgeChains is specifically designed to **orchestrate** such applications. At EdgeChains, we take a unique approach to Generative AI - we think Generative AI is a deployment and configuration management challenge rather than a UI and library design pattern challenge. We build on top of a tech that has solved this problem in a different domain - Kubernetes Config Management - and bring that to Generative AI. Edgechains is built on top of jsonnet, originally built by Google based on their experience managing a vast amount of configuration code in the Borg infrastructure.
llm-on-openshift
This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.