litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Stars: 11102
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
README:
20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.
✅ From scratch implementations ✅ No abstractions ✅ Beginner friendly ✅ Flash attention ✅ FSDP ✅ LoRA, QLoRA, Adapter ✅ Reduce GPU memory (fp4/8/16/32) ✅ 1-1000+ GPUs/TPUs ✅ 20+ LLMs
Quick start • Models • Finetune • Deploy • All workflows • Features • Recipes (YAML) • Lightning AI • Tutorials
Every LLM is implemented from scratch with no abstractions and full control, making them blazing fast, minimal, and performant at enterprise scale.
✅ Enterprise ready - Apache 2.0 for unlimited enterprise use.
✅ Developer friendly - Easy debugging with no abstraction layers and single file implementations.
✅ Optimized performance - Models designed to maximize performance, reduce costs, and speed up training.
✅ Proven recipes - Highly-optimized training/finetuning recipes tested at enterprise scale.
Install LitGPT
pip install 'litgpt[all]'
Load and use any of the 20+ LLMs:
from litgpt import LLM
llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the familly goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.
✅ Optimized for fast inference
✅ Quantization
✅ Runs on low-memory GPUs
✅ No layers of internal abstractions
✅ Optimized for production scale
Advanced install options
Install from source:
git clone https://github.com/Lightning-AI/litgpt
cd litgpt
pip install -e '.[all]'
Explore the full Python API docs.
Every model is written from scratch to maximize performance and remove layers of abstraction:
Model | Model size | Author | Reference |
---|---|---|---|
Llama 3, 3.1, 3.2 | 1B, 3B, 8B, 70B, 405B | Meta AI | Meta AI 2024 |
Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
Mixtral MoE | 8x7B, 8x22B | Mistral AI | Mistral AI 2023 |
Mistral | 7B, 123B | Mistral AI | Mistral AI 2023 |
CodeGemma | 7B | Google Team, Google Deepmind | |
Gemma 2 | 2B, 9B, 27B | Google Team, Google Deepmind | |
Phi 3 & 3.5 | 3.8B | Microsoft | Abdin et al. 2024 |
... | ... | ... | ... |
See full list of 20+ LLMs
Model | Model size | Author | Reference |
---|---|---|---|
CodeGemma | 7B | Google Team, Google Deepmind | |
Code Llama | 7B, 13B, 34B, 70B | Meta AI | Rozière et al. 2023 |
Falcon | 7B, 40B, 180B | TII UAE | TII 2023 |
Falcon 3 | 1B, 3B, 7B, 10B | TII UAE | TII 2024 |
FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | Stability AI 2023 |
Function Calling Llama 2 | 7B | Trelis | Trelis et al. 2023 |
Gemma | 2B, 7B | Google Team, Google Deepmind | |
Gemma 2 | 9B, 27B | Google Team, Google Deepmind | |
Llama 2 | 7B, 13B, 70B | Meta AI | Touvron et al. 2023 |
Llama 3.1 | 8B, 70B | Meta AI | Meta AI 2024 |
Llama 3.2 | 1B, 3B | Meta AI | Meta AI 2024 |
Llama 3.3 | 70B | Meta AI | Meta AI 2024 |
Mathstral | 7B | Mistral AI | Mistral AI 2024 |
MicroLlama | 300M | Ken Wang | MicroLlama repo |
Mixtral MoE | 8x7B | Mistral AI | Mistral AI 2023 |
Mistral | 7B, 123B | Mistral AI | Mistral AI 2023 |
Mixtral MoE | 8x22B | Mistral AI | Mistral AI 2024 |
OLMo | 1B, 7B | Allen Institute for AI (AI2) | Groeneveld et al. 2024 |
OpenLLaMA | 3B, 7B, 13B | OpenLM Research | Geng & Liu 2023 |
Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research | Li et al. 2023 |
Phi 3 | 3.8B | Microsoft Research | Abdin et al. 2024 |
Phi 4 | 14B | Microsoft Research | Abdin et al. 2024 |
Platypus | 7B, 13B, 70B | Lee et al. | Lee, Hunter, and Ruiz 2023 |
Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | Biderman et al. 2023 |
Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | Qwen Team 2024 |
Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | Hui, Binyuan et al. 2024 |
Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | An, Yang et al. 2024 |
QwQ | 32B | Alibaba Group | Qwen Team 2024 |
SmolLM2 | 135M, 360M, 1.7B | Hugging Face | Hugging Face 2024 |
Salamandra | 2B, 7B | Barcelona Supercomputing Centre | BSC-LTC 2024 |
StableCode | 3B | Stability AI | Stability AI 2023 |
StableLM | 3B, 7B | Stability AI | Stability AI 2023 |
StableLM Zephyr | 3B | Stability AI | Stability AI 2023 |
TinyLlama | 1.1B | Zhang et al. | Zhang et al. 2023 |
Tip: You can list all available models by running the litgpt download list
command.
Finetune • Pretrain • Continued pretraining • Evaluate • Deploy • Test
Use the command line interface to run advanced workflows such as pretraining or finetuning on your own data.
After installing LitGPT, select the model and workflow to run (finetune, pretrain, evaluate, deploy, etc...):
# ligpt [action] [model]
litgpt serve meta-llama/Llama-3.2-3B-Instruct
litgpt finetune meta-llama/Llama-3.2-3B-Instruct
litgpt pretrain meta-llama/Llama-3.2-3B-Instruct
litgpt chat meta-llama/Llama-3.2-3B-Instruct
litgpt evaluate meta-llama/Llama-3.2-3B-Instruct
Finetuning is the process of taking a pretrained AI model and further training it on a smaller, specialized dataset tailored to a specific task or application.
# 0) setup your dataset
curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json
# 1) Finetune a model (auto downloads weights)
litgpt finetune microsoft/phi-2 \
--data JSON \
--data.json_path my_custom_dataset.json \
--data.val_split_fraction 0.1 \
--out_dir out/custom-model
# 2) Test the model
litgpt chat out/custom-model/final
# 3) Deploy the model
litgpt serve out/custom-model/final
Deploy a pretrained or finetune LLM to use it in real-world applications. Deploy, automatically sets up a web server that can be accessed by a website or app.
# deploy an out-of-the-box LLM
litgpt serve microsoft/phi-2
# deploy your own trained model
litgpt serve path/to/microsoft/phi-2/checkpoint
Show code to query server:
Test the server in a separate terminal and integrate the model API into your AI product:
# 3) Use the server (in a separate Python session)
import requests, json
response = requests.post(
"http://127.0.0.1:8000/predict",
json={"prompt": "Fix typos in the following sentence: Exampel input"}
)
print(response.json()["output"])
Evaluate an LLM to test its performance on various tasks to see how well it understands and generates text. Simply put, we can evaluate things like how well would it do in college-level chemistry, coding, etc... (MMLU, Truthful QA, etc...)
litgpt evaluate microsoft/phi-2 --tasks 'truthfulqa_mc2,mmlu'
Read the full evaluation docs.
Test how well the model works via an interactive chat. Use the chat
command to chat, extract embeddings, etc...
Here's an example showing how to use the Phi-2 LLM:
litgpt chat microsoft/phi-2
>> Prompt: What do Llamas eat?
Full code:
# 1) List all supported LLMs
litgpt download list
# 2) Use a model (auto downloads weights)
litgpt chat microsoft/phi-2
>> Prompt: What do Llamas eat?
The download of certain models requires an additional access token. You can read more about this in the download documentation.
Pretraining is the process of teaching an AI model by exposing it to a large amount of data before it is fine-tuned for specific tasks.
Show code:
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
# 1) Download a tokenizer
litgpt download EleutherAI/pythia-160m \
--tokenizer_only True
# 2) Pretrain the model
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model
# 3) Test the model
litgpt chat out/custom-model/final
Read the full pretraining docs
Continued pretraining is another way of finetuning that specializes an already pretrained model by training on custom data:
Show code:
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
# 1) Continue pretraining a model (auto downloads weights)
litgpt pretrain EleutherAI/pythia-160m \
--tokenizer_dir EleutherAI/pythia-160m \
--initial_checkpoint_dir EleutherAI/pythia-160m \
--data TextFiles \
--data.train_data_path "custom_texts/" \
--train.max_tokens 10_000_000 \
--out_dir out/custom-model
# 2) Test the model
litgpt chat out/custom-model/final
Read the full continued pretraining docs
✅ State-of-the-art optimizations: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, optional CPU offloading, and TPU and XLA support.
✅ Pretrain, finetune, and deploy
✅ Reduce compute requirements with low-precision settings: FP16, BF16, and FP16/FP32 mixed.
✅ Lower memory requirements with quantization: 4-bit floats, 8-bit integers, and double quantization.
✅ Configuration files for great out-of-the-box performance.
✅ Parameter-efficient finetuning: LoRA, QLoRA, Adapter, and Adapter v2.
✅ Exporting to other popular model weight formats.
✅ Many popular datasets for pretraining and finetuning, and support for custom datasets.
✅ Readable and easy-to-modify code to experiment with the latest research ideas.
LitGPT comes with validated recipes (YAML configs) to train models under different conditions. We've generated these recipes based on the parameters we found to perform the best for different training conditions.
Browse all training recipes here.
litgpt finetune \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
✅ Use configs to customize training
Configs let you customize training for all granular parameters like:
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true
...
✅ Example: LoRA finetuning config
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true
# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1
# How many nodes to use. (type: int, default: 1)
num_nodes: 1
# The LoRA rank. (type: int, default: 8)
lora_r: 32
# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16
# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05
# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true
# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false
# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true
# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false
# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false
# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
class_path: litgpt.data.Alpaca2k
init_args:
mask_prompt: false
val_split_fraction: 0.05
prompt_style: alpaca
ignore_index: -100
seed: 42
num_workers: 4
download_dir: data/alpaca2k
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
save_interval: 200
# Number of iterations between logging calls (type: int, default: 1)
log_interval: 1
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
global_batch_size: 8
# Number of samples per data-parallel rank (type: int, default: 4)
micro_batch_size: 2
# Number of iterations with learning rate warmup active (type: int, default: 100)
lr_warmup_steps: 10
# Number of epochs to train on (type: Optional[int], default: 5)
epochs: 4
# Total number of tokens to train on (type: Optional[int], default: null)
max_tokens:
# Limits the number of optimizer steps to run (type: Optional[int], default: null)
max_steps:
# Limits the length of samples (type: Optional[int], default: null)
max_seq_length: 512
# Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
tie_embeddings:
# (type: float, default: 0.0003)
learning_rate: 0.0002
# (type: float, default: 0.02)
weight_decay: 0.0
# (type: float, default: 0.9)
beta1: 0.9
# (type: float, default: 0.95)
beta2: 0.95
# (type: Optional[float], default: null)
max_norm:
# (type: float, default: 6e-05)
min_lr: 6.0e-05
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
# Number of optimizer steps between evaluation calls (type: int, default: 100)
interval: 100
# Number of tokens to generate (type: Optional[int], default: 100)
max_new_tokens: 100
# Number of iterations (type: int, default: 100)
max_iters: 100
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv
# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337
✅ Override any parameter in the CLI:
litgpt finetune \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \
--lora_r 4
LitGPT powers many great AI projects, initiatives, challenges and of course enterprises. Please submit a pull request to be considered for a feature.
📊 SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
The Samba project by researchers at Microsoft is built on top of the LitGPT code base and combines state space models with sliding window attention, which outperforms pure state space models.
🏆 NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day
The LitGPT repository was the official starter kit for the NeurIPS 2023 LLM Efficiency Challenge, which is a competition focused on finetuning an existing non-instruction tuned LLM for 24 hours on a single GPU.
🦙 TinyLlama: An Open-Source Small Language Model
LitGPT powered the TinyLlama project and TinyLlama: An Open-Source Small Language Model research paper.
🍪 MicroLlama: MicroLlama-300M
MicroLlama is a 300M Llama model pretrained on 50B tokens powered by TinyLlama and LitGPT.
🔬 Pre-training Small Base LMs with Fewer Tokens
The research paper "Pre-training Small Base LMs with Fewer Tokens", which utilizes LitGPT, develops smaller base language models by inheriting a few transformer blocks from larger models and training on a tiny fraction of the data used by the larger models. It demonstrates that these smaller models can perform comparably to larger models despite using significantly less training data and resources.
We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.
🚀 Get started
⚡️ Finetuning, incl. LoRA, QLoRA, and Adapters
🤖 Pretraining
💬 Model evaluation
📘 Supported and custom datasets
🧹 Quantization
🤯 Tips for dealing with out-of-memory (OOM) errors
🧑🏽💻 Using cloud TPUs
This implementation extends on Lit-LLaMA and nanoGPT, and it's powered by Lightning Fabric ⚡.
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX and the Evaluation Harness
- @TimDettmers for bitsandbytes
- @Microsoft for LoRA
- @tridao for Flash Attention 2
LitGPT is released under the Apache 2.0 license.
If you use LitGPT in your research, please cite the following work:
@misc{litgpt-2023,
author = {Lightning AI},
title = {LitGPT},
howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
year = {2023},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for litgpt
Similar Open Source Tools
litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
KwaiAgents
KwaiAgents is a series of Agent-related works open-sourced by the [KwaiKEG](https://github.com/KwaiKEG) from [Kuaishou Technology](https://www.kuaishou.com/en). The open-sourced content includes: 1. **KAgentSys-Lite**: a lite version of the KAgentSys in the paper. While retaining some of the original system's functionality, KAgentSys-Lite has certain differences and limitations when compared to its full-featured counterpart, such as: (1) a more limited set of tools; (2) a lack of memory mechanisms; (3) slightly reduced performance capabilities; and (4) a different codebase, as it evolves from open-source projects like BabyAGI and Auto-GPT. Despite these modifications, KAgentSys-Lite still delivers comparable performance among numerous open-source Agent systems available. 2. **KAgentLMs**: a series of large language models with agent capabilities such as planning, reflection, and tool-use, acquired through the Meta-agent tuning proposed in the paper. 3. **KAgentInstruct**: over 200k Agent-related instructions finetuning data (partially human-edited) proposed in the paper. 4. **KAgentBench**: over 3,000 human-edited, automated evaluation data for testing Agent capabilities, with evaluation dimensions including planning, tool-use, reflection, concluding, and profiling.
nncf
Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in OpenVINO™ with a minimal accuracy drop. It is designed to work with models from PyTorch, TorchFX, TensorFlow, ONNX, and OpenVINO™. NNCF offers samples demonstrating compression algorithms for various use cases and models, with the ability to add different compression algorithms easily. It supports GPU-accelerated layers, distributed training, and seamless combination of pruning, sparsity, and quantization algorithms. NNCF allows exporting compressed models to ONNX or TensorFlow formats for use with OpenVINO™ toolkit, and supports Accuracy-Aware model training pipelines via Adaptive Compression Level Training and Early Exit Training.
MaskLLM
MaskLLM is a learnable pruning method that establishes Semi-structured Sparsity in Large Language Models (LLMs) to reduce computational overhead during inference. It is scalable and benefits from larger training datasets. The tool provides examples for running MaskLLM with Megatron-LM, preparing LLaMA checkpoints, pre-tokenizing C4 data for Megatron, generating prior masks, training MaskLLM, and evaluating the model. It also includes instructions for exporting sparse models to Huggingface.
pixeltable
Pixeltable is a Python library designed for ML Engineers and Data Scientists to focus on exploration, modeling, and app development without the need to handle data plumbing. It provides a declarative interface for working with text, images, embeddings, and video, enabling users to store, transform, index, and iterate on data within a single table interface. Pixeltable is persistent, acting as a database unlike in-memory Python libraries such as Pandas. It offers features like data storage and versioning, combined data and model lineage, indexing, orchestration of multimodal workloads, incremental updates, and automatic production-ready code generation. The tool emphasizes transparency, reproducibility, cost-saving through incremental data changes, and seamless integration with existing Python code and libraries.
langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).
pr-agent
PR-Agent is a tool that helps to efficiently review and handle pull requests by providing AI feedbacks and suggestions. It supports various commands such as generating PR descriptions, providing code suggestions, answering questions about the PR, and updating the CHANGELOG.md file. PR-Agent can be used via CLI, GitHub Action, GitHub App, Docker, and supports multiple git providers and models. It emphasizes real-life practical usage, with each tool having a single GPT-4 call for quick and affordable responses. The PR Compression strategy enables effective handling of both short and long PRs, while the JSON prompting strategy allows for modular and customizable tools. PR-Agent Pro, the hosted version by CodiumAI, provides additional benefits such as full management, improved privacy, priority support, and extra features.
FalkorDB
FalkorDB is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph. Primary features: * Adopting the Property Graph Model * Nodes (vertices) and Relationships (edges) that may have attributes * Nodes can have multiple labels * Relationships have a relationship type * Graphs represented as sparse adjacency matrices * OpenCypher with proprietary extensions as a query language * Queries are translated into linear algebra expressions
airswap-protocols
AirSwap Protocols is a repository containing smart contracts for developers and traders on the AirSwap peer-to-peer trading network. It includes various packages for functionalities like server registry, atomic token swap, staking, rewards pool, batch token and order calls, libraries, and utils. The repository follows a branching and release process for contracts and tools, with steps for regular development process and individual package features or patches. Users can deploy and verify contracts using specific commands with network flags.
GIMP-ML
A.I. for GNU Image Manipulation Program (GIMP-ML) is a repository that provides Python plugins for using computer vision models in GIMP. The code base and models are continuously updated to support newer and more stable functionality. Users can edit images with text, outpaint images, and generate images from text using models like Dalle 2 and Dalle 3. The repository encourages citations using a specific bibtex entry and follows the MIT license for GIMP-ML and the original models.
imodelsX
imodelsX is a Scikit-learn friendly library that provides tools for explaining, predicting, and steering text models/data. It also includes a collection of utilities for getting started with text data. **Explainable modeling/steering** | Model | Reference | Output | Description | |---|---|---|---| | Tree-Prompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/tree_prompt) | Explanation + Steering | Generates a tree of prompts to steer an LLM (_Official_) | | iPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/iprompt) | Explanation + Steering | Generates a prompt that explains patterns in data (_Official_) | | AutoPrompt | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/autoprompt) | Explanation + Steering | Find a natural-language prompt using input-gradients (⌛ In progress)| | D3 | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/d3) | Explanation | Explain the difference between two distributions | | SASC | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/sasc) | Explanation | Explain a black-box text module using an LLM (_Official_) | | Aug-Linear | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_linear) | Linear model | Fit better linear model using an LLM to extract embeddings (_Official_) | | Aug-Tree | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/aug_tree) | Decision tree | Fit better decision tree using an LLM to expand features (_Official_) | **General utilities** | Model | Reference | |---|---| | LLM wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/llm) | Easily call different LLMs | | | Dataset wrapper| [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/data) | Download minimially processed huggingface datasets | | | Bag of Ngrams | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/bag_of_ngrams) | Learn a linear model of ngrams | | | Linear Finetune | [Reference](https://github.com/microsoft/AugML/tree/main/imodelsX/linear_finetune) | Finetune a single linear layer on top of LLM embeddings | | **Related work** * [imodels package](https://github.com/microsoft/interpretml/tree/main/imodels) (JOSS 2021) - interpretable ML package for concise, transparent, and accurate predictive modeling (sklearn-compatible). * [Adaptive wavelet distillation](https://arxiv.org/abs/2111.06185) (NeurIPS 2021) - distilling a neural network into a concise wavelet model * [Transformation importance](https://arxiv.org/abs/1912.04938) (ICLR 2020 workshop) - using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies) * [Hierarchical interpretations](https://arxiv.org/abs/1807.03343) (ICLR 2019) - extends CD to CNNs / arbitrary DNNs, and aggregates explanations into a hierarchy * [Interpretation regularization](https://arxiv.org/abs/2006.14340) (ICML 2020) - penalizes CD / ACD scores during training to make models generalize better * [PDR interpretability framework](https://www.pnas.org/doi/10.1073/pnas.1814225116) (PNAS 2019) - an overarching framewwork for guiding and framing interpretable machine learning
ColossalAI
Colossal-AI is a deep learning system for large-scale parallel training. It provides a unified interface to scale sequential code of model training to distributed environments. Colossal-AI supports parallel training methods such as data, pipeline, tensor, and sequence parallelism and is integrated with heterogeneous training and zero redundancy optimizer.
pipeline
Pipeline is a Python library designed for constructing computational flows for AI/ML models. It supports both development and production environments, offering capabilities for inference, training, and finetuning. The library serves as an interface to Mystic, enabling the execution of pipelines at scale and on enterprise GPUs. Users can also utilize this SDK with Pipeline Core on a private hosted cluster. The syntax for defining AI/ML pipelines is reminiscent of sessions in Tensorflow v1 and Flows in Prefect.
arcade-ai
Arcade AI is a developer-focused tooling and API platform designed to enhance the capabilities of LLM applications and agents. It simplifies the process of connecting agentic applications with user data and services, allowing developers to concentrate on building their applications. The platform offers prebuilt toolkits for interacting with various services, supports multiple authentication providers, and provides access to different language models. Users can also create custom toolkits and evaluate their tools using Arcade AI. Contributions are welcome, and self-hosting is possible with the provided documentation.
chat-your-doc
Chat Your Doc is an experimental project exploring various applications based on LLM technology. It goes beyond being just a chatbot project, focusing on researching LLM applications using tools like LangChain and LlamaIndex. The project delves into UX, computer vision, and offers a range of examples in the 'Lab Apps' section. It includes links to different apps, descriptions, launch commands, and demos, aiming to showcase the versatility and potential of LLM applications.
For similar tasks
litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
llm-engine
Scale's LLM Engine is an open-source Python library, CLI, and Helm chart that provides everything you need to serve and fine-tune foundation models, whether you use Scale's hosted infrastructure or do it in your own cloud infrastructure using Kubernetes.
LLaMA-Factory
LLaMA Factory is a unified framework for fine-tuning 100+ large language models (LLMs) with various methods, including pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO. It features integrated algorithms like GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning, as well as practical tricks like FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA. LLaMA Factory provides experiment monitors like LlamaBoard, TensorBoard, Wandb, MLflow, etc., and supports faster inference with OpenAI-style API, Gradio UI and CLI with vLLM worker. Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
llm-foundry
LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs
EdgeChains
EdgeChains is an open-source chain-of-thought engineering framework tailored for Large Language Models (LLMs)- like OpenAI GPT, LLama2, Falcon, etc. - With a focus on enterprise-grade deployability and scalability. EdgeChains is specifically designed to **orchestrate** such applications. At EdgeChains, we take a unique approach to Generative AI - we think Generative AI is a deployment and configuration management challenge rather than a UI and library design pattern challenge. We build on top of a tech that has solved this problem in a different domain - Kubernetes Config Management - and bring that to Generative AI. Edgechains is built on top of jsonnet, originally built by Google based on their experience managing a vast amount of configuration code in the Borg infrastructure.
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
llm-on-openshift
This repository provides resources, demos, and recipes for working with Large Language Models (LLMs) on OpenShift using OpenShift AI or Open Data Hub. It includes instructions for deploying inference servers for LLMs, such as vLLM, Hugging Face TGI, Caikit-TGIS-Serving, and Ollama. Additionally, it offers guidance on deploying serving runtimes, such as vLLM Serving Runtime and Hugging Face Text Generation Inference, in the Single-Model Serving stack of Open Data Hub or OpenShift AI. The repository also covers vector databases that can be used as a Vector Store for Retrieval Augmented Generation (RAG) applications, including Milvus, PostgreSQL+pgvector, and Redis. Furthermore, it provides examples of inference and application usage, such as Caikit, Langchain, Langflow, and UI examples.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.