llm-twin-course
๐ค ๐๐ฒ๐ฎ๐ฟ๐ป for ๐ณ๐ฟ๐ฒ๐ฒ how to ๐ฏ๐๐ถ๐น๐ฑ an end-to-end ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป-๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐๐๐ & ๐ฅ๐๐ ๐๐๐๐๐ฒ๐บ using ๐๐๐ ๐ข๐ฝ๐ best practices: ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + 12 ๐ฉ๐ข๐ฏ๐ฅ๐ด-๐ฐ๐ฏ ๐ญ๐ฆ๐ด๐ด๐ฐ๐ฏ๐ด
Stars: 3118
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.
README:
By finishing the "LLM Twin: Building Your Production-Ready AI Replica" free course, you will learn how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices.
No more isolated scripts or Notebooks! Learn production ML by building and deploying an end-to-end production-grade LLM system.
You will learn how to architect and build a real-world LLM system from start to finishโ-โfrom data collection to deployment.
You will also learn to leverage MLOps best practices, such as experiment trackers, model registries, prompt monitoring, and versioning.
The end goal? Build and deploy your own LLM twin.
What is an LLM Twin? It is an AI character that learns to write like somebody by incorporating its style and personality into an LLM.
- Crawl your digital data from various social media platforms, such as Medium, Substack and GitHub.
- Clean, normalize and load the data to a Mongo NoSQL DB through a series of ETL pipelines.
- Send database changes to a RabbitMQ queue using the CDC pattern.
- Learn to package the crawlers as AWS Lambda functions.
- Consume messages in real-time from a queue through a Bytewax streaming pipeline.
- Every message will be cleaned, chunked, embedded and loaded into a Qdrant vector DB.
- In the bonus series, we refactor the cleaning, chunking, and embedding logic using Superlinked, a specialized vector compute engine. We will also load and index the vectors to a Redis vector DB.
- Create a custom instruction dataset based on your custom digital data to do SFT.
- Fine-tune an LLM using LoRA or QLoRA.
- Use Comet ML's experiment tracker to monitor the experiments.
- Evaluate the LLM using Opik
- Save and version the best model to the Hugging Face model registry.
- Run and automate the training pipeline using AWS SageMaker.
- Load the fine-tuned LLM from the Hugging Face model registry.
- Deploy the LLM as a scalable REST API using AWS SageMaker inference endpoints.
- Enhance the prompts using advanced RAG techniques.
- Monitor the prompts and LLM generated results using Opik
- In the bonus series, we refactor the advanced RAG layer to write more optimal queries using Superlinked.
- Wrap up everything with a Gradio UI (as seen below) where you can start playing around with the LLM Twin to generate content that follows your writing style.
Along the 4 microservices, you will learn to integrate 4 serverless tools:
- Comet ML as your experiment tracker and data registry;
- Qdrant as your vector DB;
- AWS SageMaker as your ML infrastructure;
- Opik as your prompt evaluation and monitoring tool.
This course is ideal for:
- ML/AI engineers who want to learn to engineer production-ready LLM & RAG systems using LLMOps good principles
- Data Engineers, Data Scientists, and Software Engineers wanting to understand the engineering behind LLM & RAG systems
Note: This course focuses on engineering practices and end-to-end system implementation rather than theoretical model optimization or research.
Category | Requirements |
---|---|
Skills | Basic understanding of Python and Machine Learning |
Hardware | Any modern laptop/workstation will do the job, as the LLM fine-tuning and inference will be done on AWS SageMaker. |
Level | Intermediate |
All tools used throughout the course will stick to their free tier, except:
- OpenAI's API, which will cost ~$1
- AWS for fine-tuning and inference, which will cost < $10 depending on how much you play around with our scripts and your region.
As an open-source course, you don't have to enroll. Everything is self-paced, free of charge and with its resources freely accessible as follows:
- code: this GitHub repository
- articles: Decoding ML
The course contains 10 hands-on written lessons and the open-source code you can access on GitHub, showing how to build an end-to-end LLM system.
Also, it includes 2 bonus lessons on how to improve the RAG system.
You can read everything at your own pace.
This self-paced course consists of 12 comprehensive lessons covering theory, system design, and hands-on implementation.
Our recommendation for each module:
- Read the article
- Run the code to replicate our results
- Go deeper into the code by reading the
src
Python modules
[!NOTE] Check the INSTALL_AND_USAGE doc for a step-by-step installation and usage guide.
Lesson | Article | Category | Description | Source Code |
---|---|---|---|---|
1 | An End-to-End Framework for Production-Ready LLM Systems | System Design | Learn the overall architecture and design principles of production LLM systems. | No code |
2 | Data Crawling | Data Engineering | Learn to crawl and process social media content for LLM training. | src/data_crawling |
3 | CDC Magic | Data Engineering | Learn to implement Change Data Capture (CDC) for syncing two data sources. | src/data_cdc |
4 | Feature Streaming Pipelines | Feature Pipeline | Build real-time streaming pipelines for LLM and RAG data processing. | src/feature_pipeline |
5 | Advanced RAG Algorithms | Feature Pipeline | Implement advanced RAG techniques for better retrieval. | src/feature_pipeline |
6 | Generate Fine-Tuning Instruct Datasets | Training Pipeline | Create custom instruct datasets for LLM fine-tuning. | src/feature_pipeline/generate_dataset |
7 | LLM Fine-tuning Pipeline | Training Pipeline | Build an end-to-end LLM fine-tuning pipeline and deploy it to AWS SageMaker. | src/training_pipeline |
8 | LLM & RAG Evaluation | Training Pipeline | Learn to evaluate LLM and RAG system performance. | src/inference_pipeline/evaluation |
9 | Implement and Deploy the RAG Inference Pipeline | Inference Pipeline | Design, implement and deploy the RAG inference to AWS SageMaker. | src/inference_pipeline |
10 | Prompt Monitoring | Inference Pipeline | Build the prompt monitoring and production evaluation pipeline. | src/inference_pipeline |
11 | Refactor the RAG module using 74.3% Less Code | Bonus on RAG | Optimize the RAG system. | src/bonus_superlinked_rag |
12 | Multi-Index RAG Apps | Bonus on RAG | Build advanced multi-index RAG apps. | src/bonus_superlinked_rag |
At Decoding ML we teach how to build production ML systems, thus the course follows the structure of a real-world Python project:
llm-twin-course/
โโโ src/ # Source code for all the ML pipelines and services
โ โโโ data_crawling/ # Data collection pipeline code
โ โโโ data_cdc/ # Change Data Capture (CDC) pipeline code
โ โโโ feature_pipeline/ # Feature engineering pipeline code
โ โโโ training_pipeline/ # Training pipeline code
โ โโโ inference_pipeline/ # Inference service code
โ โโโ bonus_superlinked_rag/ # Bonus RAG optimization code
โโโ .env.example # Example environment variables template
โโโ Makefile # Commands to build and run the project
โโโ pyproject.toml # Project dependencies
To understand how to install and run the LLM Twin code end-to-end, go to the INSTALL_AND_USAGE dedicated document.
[!NOTE] Even though you can run everything solely using the INSTALL_AND_USAGE dedicated document, we recommend that you read the articles to understand the LLM Twin system and design choices fully.
Have questions or running into issues? We're here to help!
Open a GitHub issue for:
- Questions about the course material
- Technical troubleshooting
- Clarification on concepts
As an open-source course, we may not be able to fix all the bugs that arise.
If you find any bugs and know how to fix them, support future readers by contributing to this course with your bug fix.
We will deeply appreciate your support for the AI community and future readers ๐ค
A big "Thank you ๐" to all our contributors! This course is possible only because of their efforts.
Also, another big "Thank you ๐" to all our sponsors who supported our work and made this course possible.
Comet | Opik | Bytewax | Qdrant | Superlinked |
Our LLM Engineerโs Handbook inspired the open-source LLM Twin course.
Consider supporting our work by getting our book to learn a complete framework for building and deploying production LLM & RAG systems โ from data to deployment.
Perfect for practitioners who want both theory and hands-on expertise by connecting the dots between DE, research, MLE and MLOps:
Buy the LLM Engineerโs Handbook
This course is an open-source project released under the MIT license. Thus, as long you distribute our LICENSE and acknowledge our work, you can safely clone or fork this project and use it as a source of inspiration for whatever you want (e.g., university projects, college degree projects, personal projects, etc.).
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm-twin-course
Similar Open Source Tools
llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.
AmigaGPT
AmigaGPT is a versatile ChatGPT client for AmigaOS 3.x, 4.1, and MorphOS. It brings the capabilities of OpenAIโs GPT to Amiga systems, enabling text generation, question answering, and creative exploration. AmigaGPT can generate images using DALL-E, supports speech output, and seamlessly integrates with AmigaOS. Users can customize the UI, choose fonts and colors, and enjoy a native user experience. The tool requires specific system requirements and offers features like state-of-the-art language models, AI image generation, speech capability, and UI customization.
AI-Gateway
The AI-Gateway repository explores the AI Gateway pattern through a series of experimental labs, focusing on Azure API Management for handling AI services APIs. The labs provide step-by-step instructions using Jupyter notebooks with Python scripts, Bicep files, and APIM policies. The goal is to accelerate experimentation of advanced use cases and pave the way for further innovation in the rapidly evolving field of AI. The repository also includes a Mock Server to mimic the behavior of the OpenAI API for testing and development purposes.
workbench-example-hybrid-rag
This NVIDIA AI Workbench project is designed for developing a Retrieval Augmented Generation application with a customizable Gradio Chat app. It allows users to embed documents into a locally running vector database and run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or using microservices via NVIDIA Inference Microservices (NIMs). The project supports various models with different quantization options and provides tutorials for using different inference modes. Users can troubleshoot issues, customize the Gradio app, and access advanced tutorials for specific tasks.
synmetrix
Synmetrix is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube.js to consolidate metrics from various sources and distribute them downstream via a SQL API. Use cases include data democratization, business intelligence and reporting, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
mlcraft
Synmetrix (prev. MLCraft) is an open source data engineering platform and semantic layer for centralized metrics management. It provides a complete framework for modeling, integrating, transforming, aggregating, and distributing metrics data at scale. Key features include data modeling and transformations, semantic layer for unified data model, scheduled reports and alerts, versioning, role-based access control, data exploration, caching, and collaboration on metrics modeling. Synmetrix leverages Cube (Cube.js) for flexible data models that consolidate metrics from various sources, enabling downstream distribution via a SQL API for integration into BI tools, reporting, dashboards, and data science. Use cases include data democratization, business intelligence, embedded analytics, and enhancing accuracy in data handling and queries. The tool speeds up data-driven workflows from metrics definition to consumption by combining data engineering best practices with self-service analytics capabilities.
VideoLingo
VideoLingo is an all-in-one video translation and localization dubbing tool designed to generate Netflix-level high-quality subtitles. It aims to eliminate stiff machine translation, multiple lines of subtitles, and can even add high-quality dubbing, allowing knowledge from around the world to be shared across language barriers. Through an intuitive Streamlit web interface, the entire process from video link to embedded high-quality bilingual subtitles and even dubbing can be completed with just two clicks, easily creating Netflix-quality localized videos. Key features and functions include using yt-dlp to download videos from Youtube links, using WhisperX for word-level timeline subtitle recognition, using NLP and GPT for subtitle segmentation based on sentence meaning, summarizing intelligent term knowledge base with GPT for context-aware translation, three-step direct translation, reflection, and free translation to eliminate strange machine translation, checking single-line subtitle length and translation quality according to Netflix standards, using GPT-SoVITS for high-quality aligned dubbing, and integrating package for one-click startup and one-click output in streamlit.
openrecall
OpenRecall is a fully open-source, privacy-first tool that captures your digital history through snapshots, making it searchable for quick access to specific information. It offers transparency, cross-platform support, privacy focus, and hardware compatibility. Features include time travel, local-first AI, semantic search, and full control over storage. The roadmap includes visual search capabilities and audio transcription. Users can easily install and run OpenRecall to enhance memory and productivity without compromising privacy.
mindnlp
MindNLP is an open-source NLP library based on MindSpore. It provides a platform for solving natural language processing tasks, containing many common approaches in NLP. It can help researchers and developers to construct and train models more conveniently and rapidly. Key features of MindNLP include: * Comprehensive data processing: Several classical NLP datasets are packaged into a friendly module for easy use, such as Multi30k, SQuAD, CoNLL, etc. * Friendly NLP model toolset: MindNLP provides various configurable components. It is friendly to customize models using MindNLP. * Easy-to-use engine: MindNLP simplified complicated training process in MindSpore. It supports Trainer and Evaluator interfaces to train and evaluate models easily. MindNLP supports a wide range of NLP tasks, including: * Language modeling * Machine translation * Question answering * Sentiment analysis * Sequence labeling * Summarization MindNLP also supports industry-leading Large Language Models (LLMs), including Llama, GLM, RWKV, etc. For support related to large language models, including pre-training, fine-tuning, and inference demo examples, you can find them in the "llm" directory. To install MindNLP, you can either install it from Pypi, download the daily build wheel, or install it from source. The installation instructions are provided in the documentation. MindNLP is released under the Apache 2.0 license. If you find this project useful in your research, please consider citing the following paper: @misc{mindnlp2022, title={{MindNLP}: a MindSpore NLP library}, author={MindNLP Contributors}, howpublished = {\url{https://github.com/mindlab-ai/mindnlp}}, year={2022} }
lm.rs
lm.rs is a tool that allows users to run inference on Language Models locally on the CPU using Rust. It supports LLama3.2 1B and 3B models, with a WebUI also available. The tool provides benchmarks and download links for models and tokenizers, with recommendations for quantization options. Users can convert models from Google/Meta on huggingface using provided scripts. The tool can be compiled with cargo and run with various arguments for model weights, tokenizer, temperature, and more. Additionally, a backend for the WebUI can be compiled and run to connect via the web interface.
reComputer-Jetson-for-Beginners
The reComputer Jetson Orin Beginner Guide is a comprehensive resource designed to help developers explore and harness the powerful AI computing capabilities of the NVIDIA Jetson Orin platform. The guide covers a wide range of topics, from basic tools and getting started to advanced applications in computer vision, generative AI, robotics, and more. With step-by-step tutorials and hands-on projects, users can learn to master NVIDIA's core technologies and popular AI frameworks, enabling them to innovate in AI and robotics. The guide is suitable for beginners looking to dive into AI development and build cutting-edge projects with Jetson Orin.
SurfSense
SurfSense is a tool designed to help users save and organize content from the internet into a personal Knowledge Graph. It allows users to capture web browsing sessions and webpage content using a Chrome extension, enabling easy retrieval and recall of saved information. SurfSense offers features like powerful search capabilities, natural language interaction with saved content, self-hosting options, and integration with GraphRAG for meaningful content relations. The tool eliminates the need for web scraping by directly reading data from the DOM, making it a convenient solution for managing online information.
ocular
Ocular is a set of modules and tools that allow you to build rich, reliable, and performant Generative AI-Powered Search Platforms without the need to reinvent Search Architecture. We help you build you spin up customized internal search in days not months.
edgeai
Embedded inference of Deep Learning models is quite challenging due to high compute requirements. TIโs Edge AI software product helps optimize and accelerate inference on TIโs embedded devices. It supports heterogeneous execution of DNNs across cortex-A based MPUs, TIโs latest generation C7x DSP, and DNN accelerator (MMA). The solution simplifies the product life cycle of DNN development and deployment by providing a rich set of tools and optimized libraries.
spaCy
spaCy is an industrial-strength Natural Language Processing (NLP) library in Python and Cython. It incorporates the latest research and is designed for real-world applications. The library offers pretrained pipelines supporting 70+ languages, with advanced neural network models for tasks such as tagging, parsing, named entity recognition, and text classification. It also facilitates multi-task learning with pretrained transformers like BERT, along with a production-ready training system and streamlined model packaging, deployment, and workflow management. spaCy is commercial open-source software released under the MIT license.
AI-Writer
AI-Writer is an AI content generation toolkit called Alwrity that automates and enhances the process of blog creation, optimization, and management. It integrates advanced AI models for text generation, image creation, and data analysis, offering features such as online research integration, long-form content generation, AI content planning, multilingual support, prevention of AI hallucinations, multimodal content generation, SEO optimization, and integration with platforms like Wordpress and Jekyll. The toolkit is designed for automated blog management and requires appropriate API keys and access credentials for full functionality.
For similar tasks
llm-twin-course
The LLM Twin Course is a free, end-to-end framework for building production-ready LLM systems. It teaches you how to design, train, and deploy a production-ready LLM twin of yourself powered by LLMs, vector DBs, and LLMOps good practices. The course is split into 11 hands-on written lessons and the open-source code you can access on GitHub. You can read everything and try out the code at your own pace.
maxtext
MaxText is a high-performance, highly scalable, open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference. MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler. MaxText aims to be a launching off point for ambitious LLM projects both in research and production. We encourage users to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet their needs.
swift
SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) supports training, inference, evaluation and deployment of nearly **200 LLMs and MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts. To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
ipex-llm
IPEX-LLM is a PyTorch library for running Large Language Models (LLMs) on Intel CPUs and GPUs with very low latency. It provides seamless integration with various LLM frameworks and tools, including llama.cpp, ollama, Text-Generation-WebUI, HuggingFace transformers, and more. IPEX-LLM has been optimized and verified on over 50 LLM models, including LLaMA, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, and RWKV. It supports a range of low-bit inference formats, including INT4, FP8, FP4, INT8, INT2, FP16, and BF16, as well as finetuning capabilities for LoRA, QLoRA, DPO, QA-LoRA, and ReLoRA. IPEX-LLM is actively maintained and updated with new features and optimizations, making it a valuable tool for researchers, developers, and anyone interested in exploring and utilizing LLMs.
Awesome-LLM-Inference
Awesome-LLM-Inference: A curated list of ๐Awesome LLM Inference Papers with Codes, check ๐Contents for more details. This repo is still updated frequently ~ ๐จโ๐ปโ Welcome to star โญ๏ธ or submit a PR to this repo!
lingo
Lingo is a lightweight ML model proxy that runs on Kubernetes, allowing you to run text-completion and embedding servers without changing OpenAI client code. It supports serving OSS LLMs, is compatible with OpenAI API, plug-and-play with messaging systems, scales from zero based on load, and has zero dependencies. Namespaced with no cluster privileges needed.
Awesome-LLM-Compression
Awesome LLM compression research papers and tools to accelerate LLM training and inference.
For similar jobs
db2rest
DB2Rest is a modern low-code REST DATA API platform that simplifies the development of intelligent applications. It seamlessly integrates existing and new databases with language models (LMs/LLMs) and vector stores, enabling the rapid delivery of context-aware, reasoning applications without vendor lock-in.
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)
airflow
Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
airbyte-platform
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's low-code Connector Development Kit (CDK). Airbyte is used by data engineers and analysts at companies of all sizes to move data for a variety of purposes, including data warehousing, data analysis, and machine learning.
chronon
Chronon is a platform that simplifies and improves ML workflows by providing a central place to define features, ensuring point-in-time correctness for backfills, simplifying orchestration for batch and streaming pipelines, offering easy endpoints for feature fetching, and guaranteeing and measuring consistency. It offers benefits over other approaches by enabling the use of a broad set of data for training, handling large aggregations and other computationally intensive transformations, and abstracting away the infrastructure complexity of data plumbing.