llms-from-scratch-rs

A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.

Stars: 144

Visit

This project provides Rust code that follows the text 'Build An LLM From Scratch' by Sebastian Raschka. It translates PyTorch code into Rust using the Candle crate, aiming to build a GPT-style LLM. Users can clone the repo, run examples/exercises, and access the same datasets as in the book. The project includes chapters on understanding large language models, working with text data, coding attention mechanisms, implementing a GPT model, pretraining unlabeled data, fine-tuning for classification, and fine-tuning to follow instructions.

README:

LLMs from scratch - Rust

This project aims to provide Rust code that follows the incredible text, Build An LLM From Scratch by Sebastian Raschka. The book provides arguably the most clearest step by step walkthrough for building a GPT-style LLM. Listed below are the titles for each of the 7 Chapters of the book.

Understanding large language models
Working with text data
Coding attention mechanisms
Implementing a GPT model from scratch to generate text
Pretraining an unlabeled data
Fine-tuning for classification
Fine-tuning to follow instructions

The code (see associated github repo) provided in the book is all written in PyTorch (understandably so). In this project, we translate all of the PyTorch code into Rust code by using the Candle crate, which is a minimalist ML Framework.

Usage

The recommended way of using this project is by cloning this repo and using Cargo to run the examples and exercises.

# SSH
git clone [email protected]:nerdai/llms-from-scratch-rs.git

# HTTPS
git clone https://github.com/nerdai/llms-from-scratch-rs.git

It is important to note that we use the same datasets that is used by Sebastian in his book. Use the command below to download the data in a subfolder called data/ which will eventually be used by the examples and exercises of the book.

mkdir -p 'data/'
wget 'https://raw.githubusercontent.com/rabst/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt' -O 'data/the-verdict.txt'

Navigating the code

Users have the option of reading the code via their chosen IDE and the cloned repo, or by using the project's docs.

NOTE: The import style used in all of the examples and exercises modules are not by convention. Specifically, relevant imports are made under the main() method of every Example and Exercise implementation. This is done for educational purposes to assist the reader of the book in knowing precisely what imports are needed for the example/exercise at hand.

Running `Examples` and `Exercises`

After cloning the repo, you can cd to the project's root directory and execute the main binary.

# Run code for Example 05.07
cargo run example 05.07

# Run code for Exercise 5.5
cargo run exercise 5.5

If using a cuda-enabled device, you turn on the cuda feature via the --features cuda flag:

# Run code for Example 05.07
cargo run --features cuda example 05.07

# Run code for Exercise 5.5
cargo run --features cuda exercise 5.5

Listing `Examples`

To list the Examples, use the following command:

cargo run list --examples

A snippet of the output is pasted below.

EXAMPLES:
+-------+----------------------------------------------------------------------+
| Id    | Description                                                          |
+==============================================================================+
| 02.01 | Example usage of `listings::ch02::sample_read_text`                  |
|-------+----------------------------------------------------------------------|
| 02.02 | Use candle to generate an Embedding Layer.                           |
|-------+----------------------------------------------------------------------|
| 02.03 | Create absolute postiional embeddings.                               |
|-------+----------------------------------------------------------------------|
| 03.01 | Computing attention scores as a dot product.                         |
...
|-------+----------------------------------------------------------------------|
| 06.13 | Example usage of `train_classifier_simple` and `plot_values`         |
|       | function.                                                            |
|-------+----------------------------------------------------------------------|
| 06.14 | Loading fine-tuned model and calculate performance on whole train,   |
|       | val and test sets.                                                   |
|-------+----------------------------------------------------------------------|
| 06.15 | Example usage of `classify_review`.                                  |
+-------+----------------------------------------------------------------------+

Listing `Exercises`

One can similarly list the Exercises using:

cargo run list --exercises

# first few lines of output
EXERCISES:
+-----+------------------------------------------------------------------------+
| Id  | Statement                                                              |
+==============================================================================+
| 2.1 | Byte pair encoding of unknown words                                    |
|     |                                                                        |
|     | Try the BPE tokenizer from the tiktoken library on the unknown words   |
|     | 'Akwirw ier' and print the individual token IDs. Then, call the decode |
|     | function on each of the resulting integers in this list to reproduce   |
|     | the mapping shown in figure 2.11. Lastly, call the decode method on    |
|     | the token IDs to check whether it can reconstruct the original input,  |
|     | 'Akwirw ier.'                                                          |
|-----+------------------------------------------------------------------------|
| 2.2 | Data loaders with different strides and context sizes                  |
|     |                                                                        |
|     | To develop more intuition for how the data loader works, try to run it |
|     | with different settings such as `max_length=2` and `stride=2`, and     |
|     | `max_length=8` and `stride=2`.                                         |
|-----+------------------------------------------------------------------------|
...
|-----+------------------------------------------------------------------------|
| 6.2 | Fine-tuning the whole model                                            |
|     |                                                                        |
|     | Instead of fine-tuning just the final transformer block, fine-tune the |
|     | entire model and assess the effect on predictive performance.          |
|-----+------------------------------------------------------------------------|
| 6.3 | Fine-tuning the first vs. last token                                   |
|     |                                                                        |
|     | Try fine-tuning the first output token. Notice the changes in          |
|     | predictive performance compared to fine-tuning the last output token.  |
+-----+------------------------------------------------------------------------+

[Alternative Usage] Installing from `crates.io`

Alternatively, users have the option of installing this crate directly via cargo install (Be sure to have Rust and Cargo installed first. See here for installation instructions.):

cargo install llms-from-scratch-rs

Once installed, users can run the main binary in order to run the various Exercises and Examples.

# Run code for Example 05.07
cargo run example 05.07

# Run code for Exercise 5.5
cargo run exercsise 5.5

For Tasks:

Click tags to check more tools for each tasks

build model train model fine-tune model implement attention mechanisms work with text data

For Jobs:

machine learning engineer data scientist ai researcher software developer research scientist

Alternative AI tools for llms-from-scratch-rs

Similar Open Source Tools

llms-from-scratch-rs

github

: 144

ryoma

Ryoma is an AI Powered Data Agent framework that offers a comprehensive solution for data analysis, engineering, and visualization. It leverages cutting-edge technologies like Langchain, Reflex, Apache Arrow, Jupyter Ai Magics, Amundsen, Ibis, and Feast to provide seamless integration of language models, build interactive web applications, handle in-memory data efficiently, work with AI models, and manage machine learning features in production. Ryoma also supports various data sources like Snowflake, Sqlite, BigQuery, Postgres, MySQL, and different engines like Apache Spark and Apache Flink. The tool enables users to connect to databases, run SQL queries, and interact with data and AI models through a user-friendly UI called Ryoma Lab.

github

: 130

rank_llm

RankLLM is a suite of prompt-decoders compatible with open source LLMs like Vicuna and Zephyr. It allows users to create custom ranking models for various NLP tasks, such as document reranking, question answering, and summarization. The tool offers a variety of features, including the ability to fine-tune models on custom datasets, use different retrieval methods, and control the context size and variable passages. RankLLM is easy to use and can be integrated into existing NLP pipelines.

github

: 411

llm_processes

github

: 55

BetaML.jl

The Beta Machine Learning Toolkit is a package containing various algorithms and utilities for implementing machine learning workflows in multiple languages, including Julia, Python, and R. It offers a range of supervised and unsupervised models, data transformers, and assessment tools. The models are implemented entirely in Julia and are not wrappers for third-party models. Users can easily contribute new models or request implementations. The focus is on user-friendliness rather than computational efficiency, making it suitable for educational and research purposes.

github

: 90

AutoGPTQ

AutoGPTQ is an easy-to-use LLM quantization package with user-friendly APIs, based on GPTQ algorithm (weight-only quantization). It provides a simple and efficient way to quantize large language models (LLMs) to reduce their size and computational cost while maintaining their performance. AutoGPTQ supports a wide range of LLM models, including GPT-2, GPT-J, OPT, and BLOOM. It also supports various evaluation tasks, such as language modeling, sequence classification, and text summarization. With AutoGPTQ, users can easily quantize their LLM models and deploy them on resource-constrained devices, such as mobile phones and embedded systems.

github

: 4.4k

star-vector

StarVector is a multimodal vision-language model for Scalable Vector Graphics (SVG) generation. It can be used to perform image2SVG and text2SVG generation. StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. It achieves state-of-the-art performance in producing compact and semantically rich SVGs. The tool provides Hugging Face model checkpoints for image2SVG vectorization, with models like StarVector-8B and StarVector-1B. It also offers datasets like SVG-Stack, SVG-Fonts, SVG-Icons, SVG-Emoji, and SVG-Diagrams for evaluation. StarVector can be trained using Deepspeed or FSDP for tasks like Image2SVG and Text2SVG generation. The tool provides a demo with options for HuggingFace generation or VLLM backend for faster generation speed.

github

: 118

AQLM

AQLM is the official PyTorch implementation for Extreme Compression of Large Language Models via Additive Quantization. It includes prequantized AQLM models without PV-Tuning and PV-Tuned models for LLaMA, Mistral, and Mixtral families. The repository provides inference examples, model details, and quantization setups. Users can run prequantized models using Google Colab examples, work with different model families, and install the necessary inference library. The repository also offers detailed instructions for quantization, fine-tuning, and model evaluation. AQLM quantization involves calibrating models for compression, and users can improve model accuracy through finetuning. Additionally, the repository includes information on preparing models for inference and contributing guidelines.

github

: 1.2k

Qwen

Qwen is a series of large language models developed by Alibaba DAMO Academy. It outperforms the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen models outperform the baseline models of similar model sizes on a series of benchmark datasets, e.g., MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, etc., which evaluate the models’ capabilities on natural language understanding, mathematic problem solving, coding, etc. Qwen-72B achieves better performance than LLaMA2-70B on all tasks and outperforms GPT-3.5 on 7 out of 10 tasks.

github

: 17.0k

factorio-learning-environment

Factorio Learning Environment is an open source framework designed for developing and evaluating LLM agents in the game of Factorio. It provides two settings: Lab-play with structured tasks and Open-play for building large factories. Results show limitations in spatial reasoning and automation strategies. Agents interact with the environment through code synthesis, observation, action, and feedback. Tools are provided for game actions and state representation. Agents operate in episodes with observation, planning, and action execution. Tasks specify agent goals and are implemented in JSON files. The project structure includes directories for agents, environment, cluster, data, docs, eval, and more. A database is used for checkpointing agent steps. Benchmarks show performance metrics for different configurations.

github

: 525

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

github

: 59

llm-structured-output-benchmarks

Benchmark various LLM Structured Output frameworks like Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, LMFormatEnforcer, etc on tasks like multi-label classification, named entity recognition, synthetic data generation. The tool provides benchmark results, methodology, instructions to run the benchmark, add new data, and add a new framework. It also includes a roadmap for framework-related tasks, contribution guidelines, citation information, and feedback request.

github

: 111

optillm

optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

github

: 2.1k

aiobotocore

aiobotocore is an async client for Amazon services using botocore and aiohttp/asyncio. It provides a mostly full-featured asynchronous version of botocore, allowing users to interact with various AWS services asynchronously. The library supports operations such as uploading objects to S3, getting object properties, listing objects, and deleting objects. It also offers context manager examples for managing resources efficiently. aiobotocore supports multiple AWS services like S3, DynamoDB, SNS, SQS, CloudFormation, and Kinesis, with basic methods tested for each service. Users can run tests using moto for mocked tests or against personal Amazon keys. Additionally, the tool enables type checking and code completion for better development experience.

github

: 1.2k

Consistency_LLM

Consistency Large Language Models (CLLMs) is a family of efficient parallel decoders that reduce inference latency by efficiently decoding multiple tokens in parallel. The models are trained to perform efficient Jacobi decoding, mapping any randomly initialized token sequence to the same result as auto-regressive decoding in as few steps as possible. CLLMs have shown significant improvements in generation speed on various tasks, achieving up to 3.4 times faster generation. The tool provides a seamless integration with other techniques for efficient Large Language Model (LLM) inference, without the need for draft models or architectural modifications.

github

: 293

FalkorDB

FalkorDB is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph. Primary features: * Adopting the Property Graph Model * Nodes (vertices) and Relationships (edges) that may have attributes * Nodes can have multiple labels * Relationships have a relationship type * Graphs represented as sparse adjacency matrices * OpenCypher with proprietary extensions as a query language * Queries are translated into linear algebra expressions

github

: 929

For similar tasks

local-assistant-examples

The Local Assistant Examples repository is a collection of educational examples showcasing the use of large language models (LLMs). It was initially created for a blog post on building a RAG model locally, and has since expanded to include more examples and educational material. Each example is housed in its own folder with a dedicated README providing instructions on how to run it. The repository is designed to be simple and educational, not for production use.

github

: 281

llms-from-scratch-rs

github

: 144

dstack

Dstack is an open-source orchestration engine for running AI workloads in any cloud. It supports a wide range of cloud providers (such as AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, CUDO, RunPod, etc.) as well as on-premises infrastructure. With Dstack, you can easily set up and manage dev environments, tasks, services, and pools for your AI workloads.

github

: 1.7k

one-click-llms

The one-click-llms repository provides templates for quickly setting up an API for language models. It includes advanced inferencing scripts for function calling and offers various models for text generation and fine-tuning tasks. Users can choose between Runpod and Vast.AI for different GPU configurations, with recommendations for optimal performance. The repository also supports Trelis Research and offers templates for different model sizes and types, including multi-modal APIs and chat models.

github

: 139

starcoder2-self-align

StarCoder2-Instruct is an open-source pipeline that introduces StarCoder2-15B-Instruct-v0.1, a self-aligned code Large Language Model (LLM) trained with a fully permissive and transparent pipeline. It generates instruction-response pairs to fine-tune StarCoder-15B without human annotations or data from proprietary LLMs. The tool is primarily finetuned for Python code generation tasks that can be verified through execution, with potential biases and limitations. Users can provide response prefixes or one-shot examples to guide the model's output. The model may have limitations with other programming languages and out-of-domain coding tasks.

github

: 170

enhance_llm

The enhance_llm repository contains three main parts: 1. Vector model domain fine-tuning based on llama_index and qwen fine-tuning BGE vector model. 2. Large model domain fine-tuning based on PEFT fine-tuning qwen1.5-7b-chat, with sft and dpo. 3. High-order retrieval enhanced generation (RAG) system based on the above domain work, implementing a two-stage RAG system. It includes query rewriting, recall reordering, retrieval reordering, multi-turn dialogue, and more. The repository also provides hardware and environment configurations along with star history and licensing information.

github

: 142

fms-fsdp

The 'fms-fsdp' repository is a companion to the Foundation Model Stack, providing a (pre)training example to efficiently train FMS models, specifically Llama2, using native PyTorch features like FSDP for training and SDPA implementation of Flash attention v2. It focuses on leveraging FSDP for training efficiently, not as an end-to-end framework. The repo benchmarks training throughput on different GPUs, shares strategies, and provides installation and training instructions. It trained a model on IBM curated data achieving high efficiency and performance metrics.

github

: 148

CogVLM2

CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.

github

: 83

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

llms-from-scratch-rs

README:

LLMs from scratch - Rust

Usage

Navigating the code

Running Examples and Exercises

Listing Examples

Listing Exercises

[Alternative Usage] Installing from crates.io

For Tasks:

For Jobs:

Alternative AI tools for llms-from-scratch-rs

Similar Open Source Tools

llms-from-scratch-rs

ryoma

rank_llm

llm_processes

BetaML.jl

AutoGPTQ

star-vector

AQLM

Qwen

factorio-learning-environment

StableToolBench

llm-structured-output-benchmarks

optillm

aiobotocore

Consistency_LLM

FalkorDB

For similar tasks

local-assistant-examples

llms-from-scratch-rs

dstack

one-click-llms

starcoder2-self-align

enhance_llm

fms-fsdp

CogVLM2

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick

Running `Examples` and `Exercises`

Listing `Examples`

Listing `Exercises`

[Alternative Usage] Installing from `crates.io`