
litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Stars: 12731

LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
README:
20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.
✅ From scratch implementations ✅ No abstractions ✅ Beginner friendly ✅ Flash attention ✅ FSDP ✅ LoRA, QLoRA, Adapter ✅ Reduce GPU memory (fp4/8/16/32) ✅ 1-1000+ GPUs/TPUs ✅ 20+ LLMs
Quick start • Models • Finetune • Deploy • All workflows • Features • Recipes (YAML) • Lightning AI • Tutorials
Every LLM is implemented from scratch with no abstractions and full control, making them blazing fast, minimal, and performant at enterprise scale.
✅ Enterprise ready - Apache 2.0 for unlimited enterprise use. ✅ Developer friendly - Easy debugging with no abstraction layers and single file implementations. ✅ Optimized performance - Models designed to maximize performance, reduce costs, and speed up training. ✅ Proven recipes - Highly-optimized training/finetuning recipes tested at enterprise scale.
Install LitGPT
pip install 'litgpt[extra]'
Load and use any of the 20+ LLMs:
from litgpt import LLM
llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the family goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.
✅ Optimized for fast inference ✅ Quantization ✅ Runs on low-memory GPUs ✅ No layers of internal abstractions ✅ Optimized for production scale
Advanced install options
Install from source:
git clone https://github.com/Lightning-AI/litgpt
cd litgpt
pip install -e '.[all]'
Explore the full Python API docs.
Every model is written from scratch to maximize performance and remove layers of abstraction:
Model | Model size | Author | Reference |
---|---|---|---|
Llama 3, 3.1, 3.2, 3.3 | 1B, 3B, 8B, 70B, 405B | Meta AI | Meta AI 2024 |
Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
CodeGemma | 7B | Google Team, Google Deepmind | |
Gemma 2 | 2B, 9B, 27B | Google Team, Google Deepmind | |
Phi 4 | 14B | Microsoft Research | Abdin et al. 2024 |
Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | Qwen Team 2024 |
Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | Hui, Binyuan et al. 2024 |
R1 Distill Llama | 8B, 70B | DeepSeek AI | DeepSeek AI 2025 |
... | ... | ... | ... |
See full list of 20+ LLMs
Model | Model size | Author | Reference |
---|---|---|---|
CodeGemma | 7B | Google Team, Google Deepmind | |
Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
Falcon | 7B, 40B, 180B | TII UAE | TII 2023 |
Falcon 3 | 1B, 3B, 7B, 10B | TII UAE | TII 2024 |
FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | Stability AI 2023 |
Function Calling Llama 2 | 7B | Trelis | Trelis et al. 2023 |
Gemma | 2B, 7B | Google Team, Google Deepmind | |
Gemma 2 | 9B, 27B | Google Team, Google Deepmind | |
Gemma 3 | 1B, 4B, 12B, 27B | Google Team, Google Deepmind | |
Llama 2 | 7B, 13B, 70B | Meta AI | Touvron et al. 2023 |
Llama 3.1 | 8B, 70B | Meta AI | Meta AI 2024 |
Llama 3.2 | 1B, 3B | Meta AI | Meta AI 2024 |
Llama 3.3 | 70B | Meta AI | Meta AI 2024 |
Mathstral | 7B | Mistral AI | Mistral AI 2024 |
MicroLlama | 300M | Ken Wang | MicroLlama repo |
Mixtral MoE | 8x7B | Mistral AI | Mistral AI 2023 |
Mistral | 7B, 123B | Mistral AI | Mistral AI 2023 |
Mixtral MoE | 8x22B | Mistral AI | Mistral AI 2024 |
OLMo | 1B, 7B | Allen Institute for AI (AI2) | Groeneveld et al. 2024 |
OpenLLaMA | 3B, 7B, 13B | OpenLM Research | Geng & Liu 2023 |
Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | Li et al. 2023 |
Phi 3 | 3.8B | Microsoft Research | Abdin et al. 2024 |
Phi 4 | 14B | Microsoft Research | Abdin et al. 2024 |
Phi 4 Mini Instruct | 3.8B | Microsoft Research | Microsoft 2025 |
Phi 4 Mini Reasoning | 3.8B | Microsoft Research | Xu, Peng et al. 2025 |
Phi 4 Reasoning | 3.8B | Microsoft Research | Abdin et al. 2025 |
Phi 4 Reasoning Plus | 3.8B | Microsoft Research | Abdin et al. 2025 |
Platypus | 7B, 13B, 70B | Lee et al. | Lee, Hunter, and Ruiz 2023 |
Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | Biderman et al. 2023 |
Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | Qwen Team 2024 |
Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | Hui, Binyuan et al. 2024 |
Qwen2.5 1M (Long Context) | 7B, 14B | Alibaba Group | Qwen Team 2025 |
Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | An, Yang et al. 2024 |
QwQ | 32B | Alibaba Group | Qwen Team 2025 |
QwQ-Preview | 32B | Alibaba Group | Qwen Team 2024 |
Qwen3 | 0.6B, 1.7B, 4B{Hybrid, Thinking-2507}, 8B, 14B, 32B | Alibaba Group | Qwen Team 2025 |
Qwen3 MoE | 30B{Hybrid, Thinking-2507}, 235B{Hybrid, Thinking-2507} | Alibaba Group | Qwen Team 2025 |
R1 Distill Llama | 8B, 70B | DeepSeek AI | DeepSeek AI 2025 |
SmolLM2 | 135M, 360M, 1.7B | Hugging Face | Hugging Face 2024 |
Salamandra | 2B, 7B | Barcelona Supercomputing Centre | BSC-LTC 2024 |
StableCode | 3B | Stability AI | Stability AI 2023 |
StableLM | 3B, 7B | Stability AI | Stability AI 2023 |
StableLM Zephyr | 3B | Stability AI | Stability AI 2023 |
TinyLlama | 1.1B | Zhang et al. | Zhang et al. 2023 |
Tip: You can list all available models by running the litgpt download list
command.
Finetune • Pretrain • Continued pretraining • Evaluate • Deploy • Test
Use the command line interface to run advanced workflows such as pretraining or finetuning on your own data.
After installing LitGPT, select the model and workflow to run (finetune, pretrain, evaluate, deploy, etc...):
# litgpt [action] [model]
litgpt serve meta-llama/Llama-3.2-3B-Instruct
litgpt finetune meta-llama/Llama-3.2-3B-Instruct
litgpt pretrain meta-llama/Llama-3.2-3B-Instruct
litgpt chat meta-llama/Llama-3.2-3B-Instruct
litgpt evaluate meta-llama/Llama-3.2-3B-Instruct
Finetuning is the process of taking a pretrained AI model and further training it on a smaller, specialized dataset tailored to a specific task or application.
# 0) setup your dataset
curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json
# 1) Finetune a model (auto downloads weights)
litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path my_custom_dataset.json \
--data.val_split_fraction 0.1 \
--out_dir out/custom-model
# 2) Test the model
litgpt chat out/custom-model/final
# 3) Deploy the model
litgpt serve out/custom-model/final
Deploy a pretrained or finetune LLM to use it in real-world applications. Deploy, automatically sets up a web server that can be accessed by a website or app.
# deploy an out-of-the-box LLM
litgpt serve microsoft/phi-2
# deploy your own trained model
litgpt serve path/to/microsoft/phi-2/checkpoint
Show code to query server:
Test the server in a separate terminal and integrate the model API into your AI product:
# 3) Use the server (in a separate Python session)
import requests, json
response = requests.post(
"http://127.0.0.1:8000/predict",
json={"prompt": "Fix typos in the following sentence: Example input"}
)
print(response.json()["output"])
Evaluate an LLM to test its performance on various tasks to see how well it understands and generates text. Simply put, we can evaluate things like how well would it do in college-level chemistry, coding, etc... (MMLU, Truthful QA, etc...)
litgpt evaluate microsoft/phi-2 --tasks 'truthfulqa_mc2,mmlu'
Read the full evaluation docs.
Test how well the model works via an interactive chat. Use the chat
command to chat, extract embeddings, etc...
Here's an example showing how to use the Phi-2 LLM:
litgpt chat microsoft/phi-2
>> Prompt: What do Llamas eat?
Full code:
# 1) List all supported LLMs
litgpt download list
# 2) Use a model (auto downloads weights)
litgpt chat microsoft/phi-2
>> Prompt: What do Llamas eat?
The download of certain models requires an additional access token. You can read more about this in the download documentation.
Pretraining is the process of teaching an AI model by exposing it to a large amount of data before it is fine-tuned for specific tasks.
Show code:
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
# 1) Download a tokenizer
litgpt download EleutherAI/pythia-160m \
--tokenizer_only True
# 2) Pretrain the model
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model
# 3) Test the model
litgpt chat out/custom-model/final
Read the full pretraining docs
Continued pretraining is another way of finetuning that specializes an already pretrained model by training on custom data:
Show code:
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
# 1) Continue pretraining a model (auto downloads weights)
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--initial_checkpoint_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model
# 2) Test the model
litgpt chat out/custom-model/final
Read the full continued pretraining docs
✅ State-of-the-art optimizations: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, optional CPU offloading, and TPU and XLA support. ✅ Pretrain, finetune, and deploy ✅ Reduce compute requirements with low-precision settings: FP16, BF16, and FP16/FP32 mixed. ✅ Lower memory requirements with quantization: 4-bit floats, 8-bit integers, and double quantization. ✅ Configuration files for great out-of-the-box performance. ✅ Parameter-efficient finetuning: LoRA, QLoRA, Adapter, and Adapter v2. ✅ Exporting to other popular model weight formats. ✅ Many popular datasets for pretraining and finetuning, and support for custom datasets. ✅ Readable and easy-to-modify code to experiment with the latest research ideas.
LitGPT comes with validated recipes (YAML configs) to train models under different conditions. We've generated these recipes based on the parameters we found to perform the best for different training conditions.
Browse all training recipes here.
litgpt finetune \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
✅ Use configs to customize training
Configs let you customize training for all granular parameters like:
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true
...
✅ Example: LoRA finetuning config
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true
# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1
# How many nodes to use. (type: int, default: 1)
num_nodes: 1
# The LoRA rank. (type: int, default: 8)
lora_r: 32
# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16
# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05
# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true
# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false
# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true
# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false
# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false
# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
class_path: litgpt.data.Alpaca2k
init_args:
mask_prompt: false
val_split_fraction: 0.05
prompt_style: alpaca
ignore_index: -100
seed: 42
num_workers: 4
download_dir: data/alpaca2k
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
save_interval: 200
# Number of iterations between logging calls (type: int, default: 1)
log_interval: 1
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
global_batch_size: 8
# Number of samples per data-parallel rank (type: int, default: 4)
micro_batch_size: 2
# Number of iterations with learning rate warmup active (type: int, default: 100)
lr_warmup_steps: 10
# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 4
# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:
# Limits the number of optimizer steps to run (type: Optional[int], default: null)
max_steps:
# Limits the length of samples (type: Optional[int], default: null)
max_seq_length: 512
# Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
tie_embeddings:
# (type: float, default: 0.0003)
learning_rate: 0.0002
# (type: float, default: 0.02)
weight_decay: 0.0
# (type: float, default: 0.9)
beta1: 0.9
# (type: float, default: 0.95)
beta2: 0.95
# (type: Optional[float], default: null)
max_norm:
# (type: float, default: 6e-05)
min_lr: 6.0e-05
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
# Number of optimizer steps between evaluation calls (type: int, default: 100)
interval: 100
# Number of tokens to generate (type: Optional[int], default: 100)
max_new_tokens: 100
# Number of iterations (type: int, default: 100)
max_iters: 100
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv
# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337
✅ Override any parameter in the CLI:
litgpt finetune \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \
--lora_r 4
LitGPT powers many great AI projects, initiatives, challenges and of course enterprises. Please submit a pull request to be considered for a feature.
📊 SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
The Samba project by researchers at Microsoft is built on top of the LitGPT code base and combines state space models with sliding window attention, which outperforms pure state space models.
🏆 NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day
The LitGPT repository was the official starter kit for the NeurIPS 2023 LLM Efficiency Challenge, which is a competition focused on finetuning an existing non-instruction tuned LLM for 24 hours on a single GPU.
🦙 TinyLlama: An Open-Source Small Language Model
LitGPT powered the TinyLlama project and TinyLlama: An Open-Source Small Language Model research paper.
🍪 MicroLlama: MicroLlama-300M
MicroLlama is a 300M Llama model pretrained on 50B tokens powered by TinyLlama and LitGPT.
🔬 Pre-training Small Base LMs with Fewer Tokens
The research paper "Pre-training Small Base LMs with Fewer Tokens", which utilizes LitGPT, develops smaller base language models by inheriting a few transformer blocks from larger models and training on a tiny fraction of the data used by the larger models. It demonstrates that these smaller models can perform comparably to larger models despite using significantly less training data and resources.
We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.
🚀 Get started ⚡️ Finetuning, incl. LoRA, QLoRA, and Adapters 🤖 Pretraining 💬 Model evaluation 📘 Supported and custom datasets 🧹 Quantization 🤯 Tips for dealing with out-of-memory (OOM) errors 🧑🏽💻 Using cloud TPUs
This implementation extends on Lit-LLaMA and nanoGPT, and it's powered by Lightning Fabric ⚡.
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX and the Evaluation Harness
- @TimDettmers for bitsandbytes
- @Microsoft for LoRA
- @tridao for Flash Attention 2
LitGPT is released under the Apache 2.0 license.
If you use LitGPT in your research, please cite the following work:
@misc{litgpt-2023,
author = {Lightning AI},
title = {LitGPT},
howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
year = {2023},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for litgpt
Similar Open Source Tools

litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).

aiocoap
aiocoap is a Python library that implements the Constrained Application Protocol (CoAP) using native asyncio methods in Python 3. It supports various CoAP standards such as RFC7252, RFC7641, RFC7959, RFC8323, RFC7967, RFC8132, RFC9176, RFC8613, and draft-ietf-core-oscore-groupcomm-17. The library provides features for clients and servers, including multicast support, blockwise transfer, CoAP over TCP, TLS, and WebSockets, No-Response, PATCH/FETCH, OSCORE, and Group OSCORE. It offers an easy-to-use interface for concurrent operations and is suitable for IoT applications.

Fast-LLM
Fast-LLM is an open-source library designed for training large language models with exceptional speed, scalability, and flexibility. Built on PyTorch and Triton, it offers optimized kernel efficiency, reduced overheads, and memory usage, making it suitable for training models of all sizes. The library supports distributed training across multiple GPUs and nodes, offers flexibility in model architectures, and is easy to use with pre-built Docker images and simple configuration. Fast-LLM is licensed under Apache 2.0, developed transparently on GitHub, and encourages contributions and collaboration from the community.

nekro-agent
Nekro Agent is an AI chat plugin and proxy execution bot that is highly scalable, offers high freedom, and has minimal deployment requirements. It features context-aware chat for group/private chats, custom character settings, sandboxed execution environment, interactive image resource handling, customizable extension development interface, easy deployment with docker-compose, integration with Stable Diffusion for AI drawing capabilities, support for various file types interaction, hot configuration updates and command control, native multimodal understanding, visual application management control panel, CoT (Chain of Thought) support, self-triggered timers and holiday greetings, event notification understanding, and more. It allows for third-party extensions and AI-generated extensions, and includes features like automatic context trigger based on LLM, and a variety of basic commands for bot administrators.

Applio
Applio is a VITS-based Voice Conversion tool focused on simplicity, quality, and performance. It features a user-friendly interface, cross-platform compatibility, and a range of customization options. Applio is suitable for various tasks such as voice cloning, voice conversion, and audio editing. Its key features include a modular codebase, hop length implementation, translations in over 30 languages, optimized requirements, streamlined installation, hybrid F0 estimation, easy-to-use UI, optimized code and dependencies, plugin system, overtraining detector, model search, enhancements in pretrained models, voice blender, accessibility improvements, new F0 extraction methods, output format selection, hashing system, model download system, TTS enhancements, split audio, Discord presence, Flask integration, and support tab.

ml-engineering
This repository provides a comprehensive collection of methodologies, tools, and step-by-step instructions for successful training of large language models (LLMs) and multi-modal models. It is a technical resource suitable for LLM/VLM training engineers and operators, containing numerous scripts and copy-n-paste commands to facilitate quick problem-solving. The repository is an ongoing compilation of the author's experiences training BLOOM-176B and IDEFICS-80B models, and currently focuses on the development and training of Retrieval Augmented Generation (RAG) models at Contextual.AI. The content is organized into six parts: Insights, Hardware, Orchestration, Training, Development, and Miscellaneous. It includes key comparison tables for high-end accelerators and networks, as well as shortcuts to frequently needed tools and guides. The repository is open to contributions and discussions, and is licensed under Attribution-ShareAlike 4.0 International.

infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.

llm-on-openshift
This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.

llm_recipes
This repository showcases the author's experiments with Large Language Models (LLMs) for text generation tasks. It includes dataset preparation, preprocessing, model fine-tuning using libraries such as Axolotl and HuggingFace, and model evaluation.

enterprise-h2ogpte
Enterprise h2oGPTe - GenAI RAG is a repository containing code examples, notebooks, and benchmarks for the enterprise version of h2oGPTe, a powerful AI tool for generating text based on the RAG (Retrieval-Augmented Generation) architecture. The repository provides resources for leveraging h2oGPTe in enterprise settings, including implementation guides, performance evaluations, and best practices. Users can explore various applications of h2oGPTe in natural language processing tasks, such as text generation, content creation, and conversational AI.

alignment-handbook
The Alignment Handbook provides robust training recipes for continuing pretraining and aligning language models with human and AI preferences. It includes techniques such as continued pretraining, supervised fine-tuning, reward modeling, rejection sampling, and direct preference optimization (DPO). The handbook aims to fill the gap in public resources on training these models, collecting data, and measuring metrics for optimal downstream performance.

carla
CARLA is an open-source simulator for autonomous driving research. It provides open-source code, protocols, and digital assets (urban layouts, buildings, vehicles) for developing, training, and validating autonomous driving systems. CARLA supports flexible specification of sensor suites and environmental conditions.

PaddleNLP
PaddleNLP is an easy-to-use and high-performance NLP library. It aggregates high-quality pre-trained models in the industry and provides out-of-the-box development experience, covering a model library for multiple NLP scenarios with industry practice examples to meet developers' flexible customization needs.

chinese-llm-benchmark
The Chinese LLM Benchmark is a continuous evaluation list of large models in CLiB, covering a wide range of commercial and open-source models from various companies and research institutions. It supports multidimensional evaluation of capabilities including classification, information extraction, reading comprehension, data analysis, Chinese encoding efficiency, and Chinese instruction compliance. The benchmark not only provides capability score rankings but also offers the original output results of all models for interested individuals to score and rank themselves.

nncf
Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in OpenVINO™ with a minimal accuracy drop. It is designed to work with models from PyTorch, TorchFX, TensorFlow, ONNX, and OpenVINO™. NNCF offers samples demonstrating compression algorithms for various use cases and models, with the ability to add different compression algorithms easily. It supports GPU-accelerated layers, distributed training, and seamless combination of pruning, sparsity, and quantization algorithms. NNCF allows exporting compressed models to ONNX or TensorFlow formats for use with OpenVINO™ toolkit, and supports Accuracy-Aware model training pipelines via Adaptive Compression Level Training and Early Exit Training.

MarkFlowy
MarkFlowy is a lightweight and feature-rich Markdown editor with built-in AI capabilities. It supports one-click export of conversations, translation of articles, and obtaining article abstracts. Users can leverage large AI models like DeepSeek and Chatgpt as intelligent assistants. The editor provides high availability with multiple editing modes and custom themes. Available for Linux, macOS, and Windows, MarkFlowy aims to offer an efficient, beautiful, and data-safe Markdown editing experience for users.
For similar tasks

litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).

llm-engine
Scale's LLM Engine is an open-source Python library, CLI, and Helm chart that provides everything you need to serve and fine-tune foundation models, whether you use Scale's hosted infrastructure or do it in your own cloud infrastructure using Kubernetes.

LLaMA-Factory
LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.

llm-foundry
LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

EdgeChains
EdgeChains is an open-source chain-of-thought engineering framework tailored for Large Language Models (LLMs)- like OpenAI GPT, LLama2, Falcon, etc. - With a focus on enterprise-grade deployability and scalability. EdgeChains is specifically designed to **orchestrate** such applications. At EdgeChains, we take a unique approach to Generative AI - we think Generative AI is a deployment and configuration management challenge rather than a UI and library design pattern challenge. We build on top of a tech that has solved this problem in a different domain - Kubernetes Config Management - and bring that to Generative AI. Edgechains is built on top of jsonnet, originally built by Google based on their experience managing a vast amount of configuration code in the Borg infrastructure.

llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.

llm-on-openshift
This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.