rlhf-book

Textbook on reinforcement learning from human feedback

Stars: 531

Visit

RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.

README:

RLHF Book

Built on Pandoc book template.

This is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). The code is licensed with the MIT license, but the content for the book found in chapters/ is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. This is meant for people with a basic ML and/or software background.

Citation

To cite this book, please use the following format.

@book{rlhf2024,
  author       = {Nathan Lambert},
  title        = {Reinforcement Learning from Human Feedback},
  year         = {2024},
  publisher    = {Online},
  url          = {https://rlhfbook.com},
  % Chapters can be optionally included as shown below:
  % chapters   = {Introduction, Background, Methods, Results, Discussion, Conclusion}
}

Tooling

This repository contains a simple template for building Pandoc documents; Pandoc is a suite of tools to compile markdown files into readable files (PDF, EPUB, HTML...).

Usage

TLDR. Run make to create files. Run make files to move files into place for figures, pdf linked, etc.

Known Conversion Issues

With the nested structure used for the website the section links between chapters in the PDF are broken. We are opting for this in favor of a better web experience, but best practice is to not put any links to rlhfbook.com within the markdown files. Non-html versions will not be well suited to them.

Installing

Please, check this page for more information. On ubuntu, it can be installed as the pandoc package:

Linux

sudo apt-get install pandoc

This template uses make to build the output files, so don't forget to install it too:

sudo apt-get install make

To export to PDF files, make sure to install the following packages:

sudo apt-get install texlive-fonts-recommended texlive-xetex

Mac

brew install pandoc
brew install make

(See below for pandoc-crossref)

Folder structure

Here's a folder structure for a Pandoc book:

my-book/         # Root directory.
|- build/        # Folder used to store builded (output) files.
|- chapters/     # Markdowns files; one for each chapter.
|- images/       # Images folder.
|  |- cover.png  # Cover page for epub.
|- metadata.yml  # Metadata content (title, author...).
|- Makefile      # Makefile used for building our books.

Setup generic data

Edit the metadata.yml file to set configuration data (note that it must start and end with ---):

---
title: My book title
author: Daniel Herzog
rights: MIT License
lang: en-US
tags: [pandoc, book, my-book, etc]
abstract: |
  Your summary.
mainfont: DejaVu Sans

# Filter preferences:
# - pandoc-crossref
linkReferences: true
---

You can find the list of all available keys on this page.

Creating chapters

Creating a new chapter is as simple as creating a new markdown file in the chapters/ folder; you'll end up with something like this:

chapters/01-introduction.md
chapters/02-installation.md
chapters/03-usage.md
chapters/04-references.md

Pandoc and Make will join them automatically ordered by name; that's why the numeric prefixes are being used.

All you need to specify for each chapter at least one title:

# Introduction

This is the first paragraph of the introduction chapter.

## First

This is the first subsection.

## Second

This is the second subsection.

Each title (#) will represent a chapter, while each subtitle (##) will represent a chapter's section. You can use as many levels of sections as markdown supports.

Manual control over page ordering

You may prefer to have manual control over page ordering instead of using numeric prefixes.

To do so, replace CHAPTERS = chapters/*.md in the Makefile with your own order. For example:

CHAPTERS += $(addprefix ./chapters/,\
 01-introduction.md\
 02-installation.md\
 03-usage.md\
 04-references.md\
)

Links between chapters

Anchor links can be used to link chapters within the book:

// chapters/01-introduction.md
# Introduction

For more information, check the [Usage] chapter.

// chapters/02-installation.md
# Usage

...

If you want to rename the reference, use this syntax:

For more information, check [this](#usage) chapter.

Anchor names should be downcased, and spaces, colons, semicolons... should be replaced with hyphens. Instead of Chapter title: A new era, you have: #chapter-title-a-new-era.

Links between sections

It's the same as anchor links:

# Introduction

## First

For more information, check the [Second] section.

## Second

...

Or, with an alternative name:

For more information, check [this](#second) section.

Inserting objects

Text. That's cool. What about images and tables?

Insert an image

Use Markdown syntax to insert an image with a caption:

![A cool seagull.](images/seagull.png)

Pandoc will automatically convert the image into a figure, using the title (the text between the brackets) as a caption.

If you want to resize the image, you may use this syntax, available since Pandoc 1.16:

![A cool seagull.](images/seagull.png){ width=50% height=50% }

Insert a table

Use markdown table, and use the Table: <Your table description> syntax to add a caption:

| Index | Name |
| ----- | ---- |
| 0     | AAA  |
| 1     | BBB  |
| ...   | ...  |

Table: This is an example table.

Insert an equation

Wrap a LaTeX math equation between $ delimiters for inline (tiny) formulas:

This, $\mu = \sum_{i=0}^{N} \frac{x_i}{N}$, the mean equation, ...

Pandoc will transform them automatically into images using online services.

If you want to center the equation instead of inlining it, use double $$ delimiters:

$$\mu = \sum_{i=0}^{N} \frac{x_i}{N}$$

Here's an online equation editor.

Cross references

Originally, this template used LaTeX labels for auto numbering on images, tables, equations or sections, like this:

Please, admire the gloriousnes of Figure \ref{seagull_image}.

![A cool seagull.\label{seagull_image}](images/seagull.png)

However, these references only works when exporting to a LaTeX-based format (i.e. PDF, LaTeX).

In case you need cross references support on other formats, this template now support cross references using Pandoc filters. If you want to use them, use a valid plugin and with its own syntax.

Using pandoc-crossref is highly recommended, but there are other alternatives which use a similar syntax, like pandoc-xnos.

To install on Mac, run:

brew install pandoc-crossref

First, enable the filter on the Makefile by updating the FILTER_ARGS variable with your new filter(s):

FILTER_ARGS = --filter pandoc-crossref

Then, you may use the filter cross references. For example, pandoc-crossref uses {#<type>:<id>} for definitions and @<type>:id for referencing. Some examples:

List of references:

- Check @fig:seagull.
- Check @tbl:table.
- Check @eq:equation.

List of elements to reference:

![A cool seagull](images/seagull.png){#fig:seagull}

$$ y = mx + b $$ {#eq:equation}

| Index | Name |
| ----- | ---- |
| 0     | AAA  |
| 1     | BBB  |
| ...   | ...  |

Table: This is an example table. {#tbl:table}

Check the desired filter settings and usage for more information (pandoc-crossref usage).

Content filters

If you need to modify the MD content before passing it to pandoc, you may use CONTENT_FILTERS. By setting this makefile variable, it will be passed to the markdown content before passing it to pandoc. For example, to replace all occurrences of @pagebreak with <div style="page-break-before: always;"></div> you may use a sed filter:

CONTENT_FILTERS = sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g'

To use multiple filters, you may include multiple pipes on the CONTENT_FILTERS variable:

CONTENT_FILTERS = \
  sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g' | \
  sed 's/@image/[Cool image](\/images\/image.png)/g'

Output

This template uses Makefile to automatize the building process. Instead of using the pandoc cli util, we're going to use some make commands.

Export to PDF

Please note that PDF file generation requires some extra dependencies (~ 800 MB):

sudo apt-get install texlive-xetex ttf-dejavu

After installing the dependencies, use this command:

make pdf

The generated file will be placed in build/pdf.

Export to EPUB

Use this command:

make epub

The generated file will be placed in build/epub.

Export to HTML

Use this command:

make html

The generated file(s) will be placed in build/html.

Export to DOCX

Use this command:

make docx

The generated file(s) will be placed in build/docx.

Extra configuration

If you want to configure the output, you'll probably have to look the Pandoc Manual for further information about pdf (LaTeX) generation, custom styles, etc, and modify the Makefile file accordingly.

Templates

Output files are generated using pandoc templates. All templates are located under the templates/ folder, and may be modified as you will. Some basic format templates are already included on this repository, in case you need something to start with.

References

For Tasks:

Click tags to check more tools for each tasks

create textbook compile markdown build pdf files generate epub export to html

For Jobs:

data scientist machine learning engineer software developer research scientist technical writer

Alternative AI tools for rlhf-book

Similar Open Source Tools

rlhf-book

github

: 531

ethereum-etl-airflow

This repository contains Airflow DAGs for extracting, transforming, and loading (ETL) data from the Ethereum blockchain into BigQuery. The DAGs use the Google Cloud Platform (GCP) services, including BigQuery, Cloud Storage, and Cloud Composer, to automate the ETL process. The repository also includes scripts for setting up the GCP environment and running the DAGs locally.

github

: 394

blinkid-ios

BlinkID iOS is a mobile SDK that enables developers to easily integrate ID scanning and data extraction capabilities into their iOS applications. The SDK supports scanning and processing various types of identity documents, such as passports, driver's licenses, and ID cards. It provides accurate and fast data extraction, including personal information and document details. With BlinkID iOS, developers can enhance their apps with secure and reliable ID verification functionality, improving user experience and streamlining identity verification processes.

github

: 392

aio-theme

github

: 71

log10

Log10 is a one-line Python integration to manage your LLM data. It helps you log both closed and open-source LLM calls, compare and identify the best models and prompts, store feedback for fine-tuning, collect performance metrics such as latency and usage, and perform analytics and monitor compliance for LLM powered applications. Log10 offers various integration methods, including a python LLM library wrapper, the Log10 LLM abstraction, and callbacks, to facilitate its use in both existing production environments and new projects. Pick the one that works best for you. Log10 also provides a copilot that can help you with suggestions on how to optimize your prompt, and a feedback feature that allows you to add feedback to your completions. Additionally, Log10 provides prompt provenance, session tracking and call stack functionality to help debug prompt chains. With Log10, you can use your data and feedback from users to fine-tune custom models with RLHF, and build and deploy more reliable, accurate and efficient self-hosted models. Log10 also supports collaboration, allowing you to create flexible groups to share and collaborate over all of the above features.

github

: 96

code2prompt

code2prompt is a command-line tool that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting. It automates generating LLM prompts from codebases of any size, customizing prompt generation with Handlebars templates, respecting .gitignore, filtering and excluding files using glob patterns, displaying token count, including Git diff output, copying prompt to clipboard, saving prompt to an output file, excluding files and folders, adding line numbers to source code blocks, and more. It helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.

github

: 5.1k

chatgpt-cli

ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.

github

: 661

sandbox

Sandbox is an open-source cloud-based code editing environment with custom AI code autocompletion and real-time collaboration. It consists of a frontend built with Next.js, TailwindCSS, Shadcn UI, Clerk, Monaco, and Liveblocks, and a backend with Express, Socket.io, Cloudflare Workers, D1 database, R2 storage, Workers AI, and Drizzle ORM. The backend includes microservices for database, storage, and AI functionalities. Users can run the project locally by setting up environment variables and deploying the containers. Contributions are welcome following the commit convention and structure provided in the repository.

github

: 1.1k

raycast_api_proxy

The Raycast AI Proxy is a tool that acts as a proxy for the Raycast AI application, allowing users to utilize the application without subscribing. It intercepts and forwards Raycast requests to various AI APIs, then reformats the responses for Raycast. The tool supports multiple AI providers and allows for custom model configurations. Users can generate self-signed certificates, add them to the system keychain, and modify DNS settings to redirect requests to the proxy. The tool is designed to work with providers like OpenAI, Azure OpenAI, Google, and more, enabling tasks such as AI chat completions, translations, and image generation.

github

: 317

yek

Yek is a fast Rust-based tool designed to read text-based files in a repository or directory, chunk them, and serialize them for Large Language Models (LLM) consumption. It utilizes .gitignore rules to skip unwanted files, Git history to infer important files, and additional ignore patterns. Yek splits content into chunks based on token count or byte size, supports processing multiple directories, and can stream content when output is piped. It is configurable via a 'yek.toml' file and prioritizes important files at the end of the output.

github

: 1.6k

llm-vscode

llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.

github

: 1.1k

Discord-AI-Chatbot

Discord AI Chatbot is a versatile tool that seamlessly integrates into your Discord server, offering a wide range of capabilities to enhance your communication and engagement. With its advanced language model, the bot excels at imaginative generation, providing endless possibilities for creative expression. Additionally, it offers secure credential management, ensuring the privacy of your data. The bot's hybrid command system combines the best of slash and normal commands, providing flexibility and ease of use. It also features mention recognition, ensuring prompt responses whenever you mention it or use its name. The bot's message handling capabilities prevent confusion by recognizing when you're replying to others. You can customize the bot's behavior by selecting from a range of pre-existing personalities or creating your own. The bot's web access feature unlocks a new level of convenience, allowing you to interact with it from anywhere. With its open-source nature, you have the freedom to modify and adapt the bot to your specific needs.

github

: 1.3k

llm-functions

LLM Functions is a project that enables the enhancement of large language models (LLMs) with custom tools and agents developed in bash, javascript, and python. Users can create tools for their LLM to execute system commands, access web APIs, or perform other complex tasks triggered by natural language prompts. The project provides a framework for building tools and agents, with tools being functions written in the user's preferred language and automatically generating JSON declarations based on comments. Agents combine prompts, function callings, and knowledge (RAG) to create conversational AI agents. The project is designed to be user-friendly and allows users to easily extend the capabilities of their language models.

github

: 263

AI-Video-Boilerplate-Simple

AI-video-boilerplate-simple is a free Live AI Video boilerplate for testing out live video AI experiments. It includes a simple Flask server that serves files, supports live video from various sources, and integrates with Roboflow for AI vision. Users can use this template for projects, research, business ideas, and homework. It is lightweight and can be deployed on popular cloud platforms like Replit, Vercel, Digital Ocean, or Heroku.

github

: 57

pacha

Pacha is an AI tool designed for retrieving context for natural language queries using a SQL interface and Python programming environment. It is optimized for working with Hasura DDN for multi-source querying. Pacha is used in conjunction with language models to produce informed responses in AI applications, agents, and chatbots.

github

: 75

detoxify

Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.

github

: 980

For similar tasks

rlhf-book

github

: 531

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

agentcloud

AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

github

: 583

oss-fuzz-gen

This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

github

: 1.2k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136