rlhf-book
Textbook on reinforcement learning from human feedback
Stars: 94
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
README:
Built on Pandoc book template.
This is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF).
The code is licensed with the MIT license, but the content for the book found in chapters/
is licensed under the Creative Commons Non-Commerical Attribution License, CC BY-NC 4.0.
This is meant for people with a basic ML and/or software background.
To cite this book, please use the following format.
@book{rlhf2024,
author = {Nathan Lambert},
title = {Reinforcement Learning from Human Feedback},
year = {2024},
publisher = {Online},
url = {https://rlhfbook.com},
% Chapters can be optionally included as shown below:
% chapters = {Introduction, Background, Methods, Results, Discussion, Conclusion}
}
This repository contains a simple template for building Pandoc documents; Pandoc is a suite of tools to compile markdown files into readable files (PDF, EPUB, HTML...).
TLDR.
Run make
to create files.
Run make files
to move files into place for figures, pdf linked, etc.
With the nested structure used for the website the section links between chapters in the PDF are broken.
We are opting for this in favor of a better web experience, but best practice is to not put any links to rlhfbook.com
within the markdown files. Non-html versions will not be well suited to them.
Please, check this page for more information. On ubuntu, it can be installed as the pandoc package:
sudo apt-get install pandoc
This template uses make to build the output files, so don't forget to install it too:
sudo apt-get install make
To export to PDF files, make sure to install the following packages:
sudo apt-get install texlive-fonts-recommended texlive-xetex
brew install pandoc
brew install make
(See below for pandoc-crossref
)
Here's a folder structure for a Pandoc book:
my-book/ # Root directory.
|- build/ # Folder used to store builded (output) files.
|- chapters/ # Markdowns files; one for each chapter.
|- images/ # Images folder.
| |- cover.png # Cover page for epub.
|- metadata.yml # Metadata content (title, author...).
|- Makefile # Makefile used for building our books.
Edit the metadata.yml file to set configuration data (note that it must start and end with ---
):
---
title: My book title
author: Daniel Herzog
rights: MIT License
lang: en-US
tags: [pandoc, book, my-book, etc]
abstract: |
Your summary.
mainfont: DejaVu Sans
# Filter preferences:
# - pandoc-crossref
linkReferences: true
---
You can find the list of all available keys on this page.
Creating a new chapter is as simple as creating a new markdown file in the chapters/ folder; you'll end up with something like this:
chapters/01-introduction.md
chapters/02-installation.md
chapters/03-usage.md
chapters/04-references.md
Pandoc and Make will join them automatically ordered by name; that's why the numeric prefixes are being used.
All you need to specify for each chapter at least one title:
# Introduction
This is the first paragraph of the introduction chapter.
## First
This is the first subsection.
## Second
This is the second subsection.
Each title (#) will represent a chapter, while each subtitle (##) will represent a chapter's section. You can use as many levels of sections as markdown supports.
You may prefer to have manual control over page ordering instead of using numeric prefixes.
To do so, replace CHAPTERS = chapters/*.md
in the Makefile with your own order. For example:
CHAPTERS += $(addprefix ./chapters/,\
01-introduction.md\
02-installation.md\
03-usage.md\
04-references.md\
)
Anchor links can be used to link chapters within the book:
// chapters/01-introduction.md
# Introduction
For more information, check the [Usage] chapter.
// chapters/02-installation.md
# Usage
...
If you want to rename the reference, use this syntax:
For more information, check [this](#usage) chapter.
Anchor names should be downcased, and spaces, colons, semicolons... should be replaced with hyphens.
Instead of Chapter title: A new era
, you have: #chapter-title-a-new-era
.
It's the same as anchor links:
# Introduction
## First
For more information, check the [Second] section.
## Second
...
Or, with al alternative name:
For more information, check [this](#second) section.
Text. That's cool. What about images and tables?
Use Markdown syntax to insert an image with a caption:
![A cool seagull.](images/seagull.png)
Pandoc will automatically convert the image into a figure, using the title (the text between the brackets) as a caption.
If you want to resize the image, you may use this syntax, available since Pandoc 1.16:
![A cool seagull.](images/seagull.png){ width=50% height=50% }
Use markdown table, and use the Table: <Your table description>
syntax to add a caption:
| Index | Name |
| ----- | ---- |
| 0 | AAA |
| 1 | BBB |
| ... | ... |
Table: This is an example table.
Wrap a LaTeX math equation between $
delimiters for inline (tiny) formulas:
This, $\mu = \sum_{i=0}^{N} \frac{x_i}{N}$, the mean equation, ...
Pandoc will transform them automatically into images using online services.
If you want to center the equation instead of inlining it, use double $$
delimiters:
$$\mu = \sum_{i=0}^{N} \frac{x_i}{N}$$
Here's an online equation editor.
Originally, this template used LaTeX labels for auto numbering on images, tables, equations or sections, like this:
Please, admire the gloriousnes of Figure \ref{seagull_image}.
![A cool seagull.\label{seagull_image}](images/seagull.png)
However, these references only works when exporting to a LaTeX-based format (i.e. PDF, LaTeX).
In case you need cross references support on other formats, this template now support cross references using Pandoc filters. If you want to use them, use a valid plugin and with its own syntax.
Using pandoc-crossref is highly recommended, but there are other alternatives which use a similar syntax, like pandoc-xnos.
To install on Mac, run:
brew install pandoc-crossref
First, enable the filter on the Makefile by updating the FILTER_ARGS
variable with your new
filter(s):
FILTER_ARGS = --filter pandoc-crossref
Then, you may use the filter cross references. For example, pandoc-crossref uses
{#<type>:<id>}
for definitions and @<type>:id
for referencing. Some examples:
List of references:
- Check @fig:seagull.
- Check @tbl:table.
- Check @eq:equation.
List of elements to reference:
![A cool seagull](images/seagull.png){#fig:seagull}
$$ y = mx + b $$ {#eq:equation}
| Index | Name |
| ----- | ---- |
| 0 | AAA |
| 1 | BBB |
| ... | ... |
Table: This is an example table. {#tbl:table}
Check the desired filter settings and usage for more information (pandoc-crossref usage).
If you need to modify the MD content before passing it to pandoc, you may use CONTENT_FILTERS
. By
setting this makefile variable, it will be passed to the markdown content before passing it to
pandoc. For example, to replace all occurrences of @pagebreak
with
<div style="page-break-before: always;"></div>
you may use a sed
filter:
CONTENT_FILTERS = sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g'
To use multiple filters, you may include multiple pipes on the CONTENT_FILTERS
variable:
CONTENT_FILTERS = \
sed 's/@pagebreak/"<div style=\"page-break-before: always;\"><\/div>"/g' | \
sed 's/@image/[Cool image](\/images\/image.png)/g'
This template uses Makefile to automatize the building process. Instead of using the pandoc cli util, we're going to use some make commands.
Please note that PDF file generation requires some extra dependencies (~ 800 MB):
sudo apt-get install texlive-xetex ttf-dejavu
After installing the dependencies, use this command:
make pdf
The generated file will be placed in build/pdf.
Use this command:
make epub
The generated file will be placed in build/epub.
Use this command:
make html
The generated file(s) will be placed in build/html.
Use this command:
make docx
The generated file(s) will be placed in build/docx.
If you want to configure the output, you'll probably have to look the Pandoc Manual for further information about pdf (LaTeX) generation, custom styles, etc, and modify the Makefile file accordingly.
Output files are generated using pandoc templates. All
templates are located under the templates/
folder, and may be modified as you will. Some basic
format templates are already included on this repository, ion case you need something to start
with.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for rlhf-book
Similar Open Source Tools
rlhf-book
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
ethereum-etl-airflow
This repository contains Airflow DAGs for extracting, transforming, and loading (ETL) data from the Ethereum blockchain into BigQuery. The DAGs use the Google Cloud Platform (GCP) services, including BigQuery, Cloud Storage, and Cloud Composer, to automate the ETL process. The repository also includes scripts for setting up the GCP environment and running the DAGs locally.
log10
Log10 is a one-line Python integration to manage your LLM data. It helps you log both closed and open-source LLM calls, compare and identify the best models and prompts, store feedback for fine-tuning, collect performance metrics such as latency and usage, and perform analytics and monitor compliance for LLM powered applications. Log10 offers various integration methods, including a python LLM library wrapper, the Log10 LLM abstraction, and callbacks, to facilitate its use in both existing production environments and new projects. Pick the one that works best for you. Log10 also provides a copilot that can help you with suggestions on how to optimize your prompt, and a feedback feature that allows you to add feedback to your completions. Additionally, Log10 provides prompt provenance, session tracking and call stack functionality to help debug prompt chains. With Log10, you can use your data and feedback from users to fine-tune custom models with RLHF, and build and deploy more reliable, accurate and efficient self-hosted models. Log10 also supports collaboration, allowing you to create flexible groups to share and collaborate over all of the above features.
openai_trtllm
OpenAI-compatible API for TensorRT-LLM and NVIDIA Triton Inference Server, which allows you to integrate with langchain
gemini-pro-bot
This Python Telegram bot utilizes Google's `gemini-pro` LLM API to generate creative text formats based on user input. It's designed to be an engaging and interactive way to explore the capabilities of large language models. Key features include generating various text formats like poems, code, scripts, and musical pieces. The bot supports real-time streaming of the generation process, allowing users to witness the text unfold. Additionally, it can respond to messages with Bard's creative output and handle image-based inputs for multimodal responses. User authentication is optional, and the bot can be easily integrated with Docker or installed via pipenv.
chatgpt-cli
ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.
sandbox
Sandbox is an open-source cloud-based code editing environment with custom AI code autocompletion and real-time collaboration. It consists of a frontend built with Next.js, TailwindCSS, Shadcn UI, Clerk, Monaco, and Liveblocks, and a backend with Express, Socket.io, Cloudflare Workers, D1 database, R2 storage, Workers AI, and Drizzle ORM. The backend includes microservices for database, storage, and AI functionalities. Users can run the project locally by setting up environment variables and deploying the containers. Contributions are welcome following the commit convention and structure provided in the repository.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
Discord-AI-Chatbot
Discord AI Chatbot is a versatile tool that seamlessly integrates into your Discord server, offering a wide range of capabilities to enhance your communication and engagement. With its advanced language model, the bot excels at imaginative generation, providing endless possibilities for creative expression. Additionally, it offers secure credential management, ensuring the privacy of your data. The bot's hybrid command system combines the best of slash and normal commands, providing flexibility and ease of use. It also features mention recognition, ensuring prompt responses whenever you mention it or use its name. The bot's message handling capabilities prevent confusion by recognizing when you're replying to others. You can customize the bot's behavior by selecting from a range of pre-existing personalities or creating your own. The bot's web access feature unlocks a new level of convenience, allowing you to interact with it from anywhere. With its open-source nature, you have the freedom to modify and adapt the bot to your specific needs.
llm-functions
LLM Functions is a project that enables the enhancement of large language models (LLMs) with custom tools and agents developed in bash, javascript, and python. Users can create tools for their LLM to execute system commands, access web APIs, or perform other complex tasks triggered by natural language prompts. The project provides a framework for building tools and agents, with tools being functions written in the user's preferred language and automatically generating JSON declarations based on comments. Agents combine prompts, function callings, and knowledge (RAG) to create conversational AI agents. The project is designed to be user-friendly and allows users to easily extend the capabilities of their language models.
Upscaler
Holloway's Upscaler is a consolidation of various compiled open-source AI image/video upscaling products for a CLI-friendly image and video upscaling program. It provides low-cost AI upscaling software that can run locally on a laptop, programmable for albums and videos, reliable for large video files, and works without GUI overheads. The repository supports hardware testing on various systems and provides important notes on GPU compatibility, video types, and image decoding bugs. Dependencies include ffmpeg and ffprobe for video processing. The user manual covers installation, setup pathing, calling for help, upscaling images and videos, and contributing back to the project. Benchmarks are provided for performance evaluation on different hardware setups.
AI-Video-Boilerplate-Simple
AI-video-boilerplate-simple is a free Live AI Video boilerplate for testing out live video AI experiments. It includes a simple Flask server that serves files, supports live video from various sources, and integrates with Roboflow for AI vision. Users can use this template for projects, research, business ideas, and homework. It is lightweight and can be deployed on popular cloud platforms like Replit, Vercel, Digital Ocean, or Heroku.
detoxify
Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.
pacha
Pacha is an AI tool designed for retrieving context for natural language queries using a SQL interface and Python programming environment. It is optimized for working with Hasura DDN for multi-source querying. Pacha is used in conjunction with language models to produce informed responses in AI applications, agents, and chatbots.
hqq
HQQ is a fast and accurate model quantizer that skips the need for calibration data. It's super simple to implement (just a few lines of code for the optimizer). It can crunch through quantizing the Llama2-70B model in only 4 minutes! 🚀
For similar tasks
rlhf-book
RLHF Book is a work-in-progress textbook covering the fundamentals of Reinforcement Learning from Human Feedback (RLHF). It is built on the Pandoc book template and is meant for people with a basic ML and/or software background. The content for the book is licensed under the Creative Commons Non-Commercial Attribution License, CC BY-NC 4.0. The repository contains a simple template for building Pandoc documents, allowing users to compile markdown files into readable files such as PDF, EPUB, and HTML.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.