rlhf-book
Textbook on reinforcement learning from human feedback
Stars: 531
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
README:
Built on Pandoc book template.
This is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF).
The code is licensed with the MIT license, but the content for the book found in chapters/ is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0.
This is meant for people with a basic ML and/or software background.
To cite this book, please use the following format.
@book{rlhf2024,
author = {Nathan Lambert},
title = {Reinforcement Learning from Human Feedback},
year = {2024},
publisher = {Online},
url = {https://rlhfbook.com},
% Chapters can be optionally included as shown below:
% chapters = {Introduction, Background, Methods, Results, Discussion, Conclusion}
}
This repository contains a simple template for building Pandoc documents; Pandoc is a suite of tools to compile markdown files into readable files (PDF, EPUB, HTML...).
TLDR.
Run make to create files.
Run make files to move files into place for figures, pdf linked, etc.
With the nested structure used for the website the section links between chapters in the PDF are broken.
We are opting for this in favor of a better web experience, but best practice is to not put any links to rlhfbook.com within the markdown files. Non-html versions will not be well suited to them.
Please, check this page for more information. On ubuntu, it can be installed as the pandoc package:
sudo apt-get install pandocThis template uses make to build the output files, so don't forget to install it too:
sudo apt-get install makeTo export to PDF files, make sure to install the following packages:
sudo apt-get install texlive-fonts-recommended texlive-xetexbrew install pandoc
brew install make
(See below for pandoc-crossref)
Here's a folder structure for a Pandoc book:
my-book/ # Root directory.
|- build/ # Folder used to store builded (output) files.
|- chapters/ # Markdowns files; one for each chapter.
|- images/ # Images folder.
| |- cover.png # Cover page for epub.
|- metadata.yml # Metadata content (title, author...).
|- Makefile # Makefile used for building our books.
Edit the metadata.yml file to set configuration data (note that it must start and end with ---):
---
title: My book title
author: Daniel Herzog
rights: MIT License
lang: en-US
tags: [pandoc, book, my-book, etc]
abstract: |
Your summary.
mainfont: DejaVu Sans
# Filter preferences:
# - pandoc-crossref
linkReferences: true
---You can find the list of all available keys on this page.
Creating a new chapter is as simple as creating a new markdown file in the chapters/ folder; you'll end up with something like this:
chapters/01-introduction.md
chapters/02-installation.md
chapters/03-usage.md
chapters/04-references.md
Pandoc and Make will join them automatically ordered by name; that's why the numeric prefixes are being used.
All you need to specify for each chapter at least one title:
# Introduction
This is the first paragraph of the introduction chapter.
## First
This is the first subsection.
## Second
This is the second subsection.Each title (#) will represent a chapter, while each subtitle (##) will represent a chapter's section. You can use as many levels of sections as markdown supports.
You may prefer to have manual control over page ordering instead of using numeric prefixes.
To do so, replace CHAPTERS = chapters/*.md in the Makefile with your own order. For example:
CHAPTERS += $(addprefix ./chapters/,\
01-introduction.md\
02-installation.md\
03-usage.md\
04-references.md\
)
Anchor links can be used to link chapters within the book:
// chapters/01-introduction.md
# Introduction
For more information, check the [Usage] chapter.
// chapters/02-installation.md
# Usage
...If you want to rename the reference, use this syntax:
For more information, check [this](#usage) chapter.Anchor names should be downcased, and spaces, colons, semicolons... should be replaced with hyphens.
Instead of Chapter title: A new era, you have: #chapter-title-a-new-era.
It's the same as anchor links:
# Introduction
## First
For more information, check the [Second] section.
## Second
...Or, with an alternative name:
For more information, check [this](#second) section.Text. That's cool. What about images and tables?
Use Markdown syntax to insert an image with a caption:
Pandoc will automatically convert the image into a figure, using the title (the text between the brackets) as a caption.
If you want to resize the image, you may use this syntax, available since Pandoc 1.16:
{ width=50% height=50% }Use markdown table, and use the Table: <Your table description> syntax to add a caption:
| Index | Name |
| ----- | ---- |
| 0 | AAA |
| 1 | BBB |
| ... | ... |
Table: This is an example table.Wrap a LaTeX math equation between $ delimiters for inline (tiny) formulas:
This, $\mu = \sum_{i=0}^{N} \frac{x_i}{N}$, the mean equation, ...Pandoc will transform them automatically into images using online services.
If you want to center the equation instead of inlining it, use double $$ delimiters:
$$\mu = \sum_{i=0}^{N} \frac{x_i}{N}$$Here's an online equation editor.
Originally, this template used LaTeX labels for auto numbering on images, tables, equations or sections, like this:
Please, admire the gloriousnes of Figure \ref{seagull_image}.
However, these references only works when exporting to a LaTeX-based format (i.e. PDF, LaTeX).
In case you need cross references support on other formats, this template now support cross references using Pandoc filters. If you want to use them, use a valid plugin and with its own syntax.
Using pandoc-crossref is highly recommended, but there are other alternatives which use a similar syntax, like pandoc-xnos.
To install on Mac, run:
brew install pandoc-crossref
First, enable the filter on the Makefile by updating the FILTER_ARGS variable with your new
filter(s):
FILTER_ARGS = --filter pandoc-crossrefThen, you may use the filter cross references. For example, pandoc-crossref uses
{#<type>:<id>} for definitions and @<type>:id for referencing. Some examples:
List of references:
- Check @fig:seagull.
- Check @tbl:table.
- Check @eq:equation.
List of elements to reference:
{#fig:seagull}
$$ y = mx + b $$ {#eq:equation}
| Index | Name |
| ----- | ---- |
| 0 | AAA |
| 1 | BBB |
| ... | ... |
Table: This is an example table. {#tbl:table}Check the desired filter settings and usage for more information (pandoc-crossref usage).
If you need to modify the MD content before passing it to pandoc, you may use CONTENT_FILTERS. By
setting this makefile variable, it will be passed to the markdown content before passing it to
pandoc. For example, to replace all occurrences of @pagebreak with
<div style="page-break-before: always;"></div> you may use a sed filter:
CONTENT_FILTERS = sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g'
To use multiple filters, you may include multiple pipes on the CONTENT_FILTERS variable:
CONTENT_FILTERS = \
sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g' | \
sed 's/@image/[Cool image](\/images\/image.png)/g'
This template uses Makefile to automatize the building process. Instead of using the pandoc cli util, we're going to use some make commands.
Please note that PDF file generation requires some extra dependencies (~ 800 MB):
sudo apt-get install texlive-xetex ttf-dejavuAfter installing the dependencies, use this command:
make pdfThe generated file will be placed in build/pdf.
Use this command:
make epubThe generated file will be placed in build/epub.
Use this command:
make htmlThe generated file(s) will be placed in build/html.
Use this command:
make docxThe generated file(s) will be placed in build/docx.
If you want to configure the output, you'll probably have to look the Pandoc Manual for further information about pdf (LaTeX) generation, custom styles, etc, and modify the Makefile file accordingly.
Output files are generated using pandoc templates. All
templates are located under the templates/ folder, and may be modified as you will. Some basic
format templates are already included on this repository, in case you need something to start
with.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for rlhf-book
Similar Open Source Tools
rlhf-book
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
gitingest
GitIngest is a tool that allows users to turn any Git repository into a prompt-friendly text ingest for LLMs. It provides easy code context by generating a text digest from a git repository URL or directory. The tool offers smart formatting for optimized output format for LLM prompts and provides statistics about file and directory structure, size of the extract, and token count. GitIngest can be used as a CLI tool on Linux and as a Python package for code integration. The tool is built using Tailwind CSS for frontend, FastAPI for backend framework, tiktoken for token estimation, and apianalytics.dev for simple analytics. Users can self-host GitIngest by building the Docker image and running the container. Contributions to the project are welcome, and the tool aims to be beginner-friendly for first-time contributors with a simple Python and HTML codebase.
fabric
Fabric is an open-source framework for augmenting humans using AI. It provides a structured approach to breaking down problems into individual components and applying AI to them one at a time. Fabric includes a collection of pre-defined Patterns (prompts) that can be used for a variety of tasks, such as extracting the most interesting parts of YouTube videos and podcasts, writing essays, summarizing academic papers, creating AI art prompts, and more. Users can also create their own custom Patterns. Fabric is designed to be easy to use, with a command-line interface and a variety of helper apps. It is also extensible, allowing users to integrate it with their own AI applications and infrastructure.
ComfyUI-mnemic-nodes
ComfyUI-mnemic-nodes is a repository hosting a collection of nodes developed for ComfyUI, providing useful components to enhance project functionality. The nodes include features like returning file paths, saving text files, downloading images from URLs, tokenizing text, cleaning strings, querying Groq language models, generating negative prompts, and more. Some nodes are experimental and marked with a 'Caution' label. Installation instructions and setup details are provided for each node, along with examples and presets for different tasks.
nextjs-openai-doc-search
This starter project is designed to process `.mdx` files in the `pages` directory to use as custom context within OpenAI Text Completion prompts. It involves building a custom ChatGPT style doc search powered by Next.js, OpenAI, and Supabase. The project includes steps for pre-processing knowledge base, storing embeddings in Postgres, performing vector similarity search, and injecting content into OpenAI GPT-3 text completion prompt.
openai_trtllm
OpenAI-compatible API for TensorRT-LLM and NVIDIA Triton Inference Server, which allows you to integrate with langchain
fish-ai
fish-ai is a tool that adds AI functionality to Fish shell. It can be integrated with various AI providers like OpenAI, Azure OpenAI, Google, Hugging Face, Mistral, or a self-hosted LLM. Users can transform comments into commands, autocomplete commands, and suggest fixes. The tool allows customization through configuration files and supports switching between contexts. Data privacy is maintained by redacting sensitive information before submission to the AI models. Development features include debug logging, testing, and creating releases.
AI-Video-Boilerplate-Simple
AI-video-boilerplate-simple is a free Live AI Video boilerplate for testing out live video AI experiments. It includes a simple Flask server that serves files, supports live video from various sources, and integrates with Roboflow for AI vision. Users can use this template for projects, research, business ideas, and homework. It is lightweight and can be deployed on popular cloud platforms like Replit, Vercel, Digital Ocean, or Heroku.
yoyak
Yoyak is a small CLI tool powered by LLM for summarizing and translating web pages. It provides shell completion scripts for bash, fish, and zsh. Users can set the model they want to use and summarize web pages with the 'yoyak summary' command. Additionally, translation to other languages is supported using the '-l' option with ISO 639-1 language codes. Yoyak supports various models for summarization and translation tasks.
Fabric
Fabric is an open-source framework designed to augment humans using AI by organizing prompts by real-world tasks. It addresses the integration problem of AI by creating and organizing prompts for various tasks. Users can create, collect, and organize AI solutions in a single place for use in their favorite tools. Fabric also serves as a command-line interface for those focused on the terminal. It offers a wide range of features and capabilities, including support for multiple AI providers, internationalization, speech-to-text, AI reasoning, model management, web search, text-to-speech, desktop notifications, and more. The project aims to help humans flourish by leveraging AI technology to solve human problems and enhance creativity.
llama.vim
llama.vim is a plugin that provides local LLM-assisted text completion for Vim users. It offers features such as auto-suggest on cursor movement, manual suggestion toggling, suggestion acceptance with Tab and Shift+Tab, control over text generation time, context configuration, ring context with chunks from open and edited files, and performance stats display. The plugin requires a llama.cpp server instance to be running and supports FIM-compatible models. It aims to be simple, lightweight, and provide high-quality and performant local FIM completions even on consumer-grade hardware.
mLoRA
mLoRA (Multi-LoRA Fine-Tune) is an open-source framework for efficient fine-tuning of multiple Large Language Models (LLMs) using LoRA and its variants. It allows concurrent fine-tuning of multiple LoRA adapters with a shared base model, efficient pipeline parallelism algorithm, support for various LoRA variant algorithms, and reinforcement learning preference alignment algorithms. mLoRA helps save computational and memory resources when training multiple adapters simultaneously, achieving high performance on consumer hardware.
shellChatGPT
ShellChatGPT is a shell wrapper for OpenAI's ChatGPT, DALL-E, Whisper, and TTS, featuring integration with LocalAI, Ollama, Gemini, Mistral, Groq, and GitHub Models. It provides text and chat completions, vision, reasoning, and audio models, voice-in and voice-out chatting mode, text editor interface, markdown rendering support, session management, instruction prompt manager, integration with various service providers, command line completion, file picker dialogs, color scheme personalization, stdin and text file input support, and compatibility with Linux, FreeBSD, MacOS, and Termux for a responsive experience.
chatgpt-cli
ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.
lexido
Lexido is an innovative assistant for the Linux command line, designed to boost your productivity and efficiency. Powered by Gemini Pro 1.0 and utilizing the free API, Lexido offers smart suggestions for commands based on your prompts and importantly your current environment. Whether you're installing software, managing files, or configuring system settings, Lexido streamlines the process, making it faster and more intuitive.
For similar tasks
rlhf-book
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.