Awesome-LLM-Safety

A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.

Stars: 1564

Visit

Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.

README:

🛡️Awesome LLM-Safety🛡️

English | 中文

🤗Introduction

Welcome to our Awesome-llm-safety repository! 🥰🥰🥰

🔥 News

2024.05 update NAACL 2024 Papers Collection, thanks @zhrli324, @feqHe!

🧑‍💻 Our Work

We've curated a collection of the latest 😋, most comprehensive 😎, and most valuable 🤩 resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.

If a resource is relevant to multiple subcategories, we place it under each applicable section. For instance, the "Awesome-LLM-Safety" repository will be listed under each subcategory to which it pertains🤩!.

✔️ Perfect for Majority

For beginners curious about llm-safety, our repository serves as a compass for grasping the big picture and diving into the details. Classic or influential papers retained in the README provide a beginner-friendly navigation through interesting directions in the field;
For seasoned researchers, this repository is a tool to keep you informed and fill any gaps in your knowledge. Within each subtopic, we are diligently updating all the latest content and continuously backfilling with previous work. Our thorough compilation and careful selection are time-savers for you.

🧭 How to Use this Guide

Quick Start: In the README, users can find a curated list of select information sorted by date, along with links to various consultations.
In-Depth Exploration: If you have a special interest in a particular subtopic, delve into the "subtopic" folder for more. Each item, be it an article or piece of news, comes with a brief introduction, allowing researchers to swiftly zero in on relevant content.

💼 How to Contribution

If you have completed an insightful work or carefully compiled conference papers, we would love to add your work to the repository.

For individual papers, you can raise an issue, and we will quickly add your paper under the corresponding subtopic.
If you have compiled a collection of papers for a conference, you are welcome to submit a pull request directly. We would greatly appreciate your contribution. Please note that these pull requests need to be consistent with our existing format.

📜Advertisement

🌱 If you would like more people to read your recent insightful work, please contact me via email. I can offer you a promotional spot here for up to one month.

Let’s start LLM Safety tutorial!

🚀Table of Contents

🛡️Awesome LLM-Safety🛡️
- 🤗Introduction
- 🚀Table of Contents
- [🔐Security & Discussion](#security & discussion)
- 🔏Privacy
- 📰Truthfulness & Misinformation
- 😈JailBreak & Attacks
- [🛡️Defenses & Mitigation](#️defenses & mitigation)
  - 📖Tutorials, Articles, Presentations and Talks
  - Other
- 💯Datasets & Benchmark
- 🧑‍🏫 Scholars 👩‍🏫
- 🧑‍🎓Author

🤔AI Safety & Security Discussions

Date	Link	Publication	Authors
2024/5/20	Managing extreme AI risks amid rapid progress	Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann	Science

🔐Security & Discussion

📑Papers

Date	Institute	Publication	Paper
20.10	Facebook AI Research	arxiv	Recipes for Safety in Open-domain Chatbots
22.03	OpenAI	NIPS2022	Training language models to follow instructions with human feedback
23.07	UC Berkeley	NIPS2023	Jailbroken: How Does LLM Safety Training Fail?
23.12	OpenAI	Open AI	Practices for Governing Agentic AI Systems

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
22.02	Toxicity Detection API	Perspective API	link paper
23.07	Repository	Awesome LLM Security	link
23.10	Tutorials	Awesome-LLM-Safety	link
24.01	Tutorials	Awesome-LM-SSP	link

Other

👉Latest&Comprehensive Security Paper

🔏Privacy

📑Papers

Date	Institute	Publication	Paper
19.12	Microsoft	CCS2020	Analyzing Information Leakage of Updates to Natural Language Models
21.07	Google Research	ACL2022	Deduplicating Training Data Makes Language Models Better
21.10	Stanford	ICLR2022	Large language models can be strong differentially private learners
22.02	Google Research	ICLR2023	Quantifying Memorization Across Neural Language Models
22.02	UNC Chapel Hill	ICML2022	Deduplicating Training Data Mitigates Privacy Risks in Language Models

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
23.10	Tutorials	Awesome-LLM-Safety	link
24.01	Tutorials	Awesome-LM-SSP	link

Other

👉Latest&Comprehensive Privacy Paper

📰Truthfulness & Misinformation

📑Papers

Date	Institute	Publication	Paper
21.09	University of Oxford	ACL2022	TruthfulQA: Measuring How Models Mimic Human Falsehoods
23.11	Harbin Institute of Technology	arxiv	A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
23.11	Arizona State University	arxiv	Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
23.07	Repository	llm-hallucination-survey	link
23.10	Repository	LLM-Factuality-Survey	link
23.10	Tutorials	Awesome-LLM-Safety	link

Other

👉Latest&Comprehensive Truthfulness&Misinformation Paper

😈JailBreak & Attacks

📑Papers

Date	Institute	Publication	Paper
20.12	Google	USENIX Security 2021	Extracting Training Data from Large Language Models
22.11	AE Studio	NIPS2022(ML Safety Workshop)	Ignore Previous Prompt: Attack Techniques For Language Models
23.06	Google	arxiv	Are aligned neural networks adversarially aligned?
23.07	CMU	arxiv	Universal and Transferable Adversarial Attacks on Aligned Language Models
23.10	University of Pennsylvania	arxiv	Jailbreaking Black Box Large Language Models in Twenty Queries

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
23.01	Community	Reddit/ChatGPTJailbrek	link
23.02	Resource&Tutorials	Latest Jailbreak Prompts	link
23.10	Tutorials	Awesome-LLM-Safety	link
23.10	Article	Adversarial Attacks on LLMs(Author: Lilian Weng)	link
23.11	Video	[1hr Talk] Intro to Large Language Models From 45:45(Author: Andrej Karpathy)	link
24.09	Repo	awesome_LLM-harmful-fine-tuning-papers	link
12.10	Resource	Jailbreak Commuinities	link
12.10	Article	Jailbreak Techniques and Safeguards	link

Other

👉Latest&Comprehensive JailBreak & Attacks Paper

🛡️Defenses & Mitigation

📑Papers

Date	Institute	Publication	Paper
21.07	Google Research	ACL2022	Deduplicating Training Data Makes Language Models Better
22.04	Anthropic	arxiv	Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
23.10	Tutorials	Awesome-LLM-Safety	link

Other

👉Latest&Comprehensive Defenses Paper

💯Datasets & Benchmark

📑Papers

Date	Institute	Publication	Paper
20.09	University of Washington	EMNLP2020(findings)	RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
21.09	University of Oxford	ACL2022	TruthfulQA: Measuring How Models Mimic Human Falsehoods
22.03	MIT	ACL2022	ToxiGen: A Large-Scale Machine-Generated datasets for Adversarial and Implicit Hate Speech Detection

📖Tutorials, Articles, Presentations and Talks

Date	Type	Title	URL
23.10	Tutorials	Awesome-LLM-Safety	link

📚Resource📚

Toxicity - RealToxicityPrompts datasets
Truthfulness - TruthfulQA datasets

Other

👉Latest&Comprehensive datasets & Benchmark Paper

🧑‍🎓Author

🤗If you have any questions, please contact our authors!🤗

✉️: ydyjya ➡️ [email protected]

💬: LLM Safety Discussion

Wechat Group | My Wechat

⬆ Back to ToC

For Tasks:

Click tags to check more tools for each tasks

research llm safety develop llm safety tools implement llm safety measures

For Jobs:

researcher developer engineer data scientist machine learning engineer

Alternative AI tools for Awesome-LLM-Safety

Similar Open Source Tools

Awesome-LLM-Safety

github

: 1.6k

alignment-handbook

The Alignment Handbook provides robust training recipes for continuing pretraining and aligning language models with human and AI preferences. It includes techniques such as continued pretraining, supervised fine-tuning, reward modeling, rejection sampling, and direct preference optimization (DPO). The handbook aims to fill the gap in public resources on training these models, collecting data, and measuring metrics for optimal downstream performance.

github

: 5.3k

LLMs-playground

LLMs-playground is a repository containing code examples and tutorials for learning and experimenting with Large Language Models (LLMs). It provides a hands-on approach to understanding how LLMs work and how to fine-tune them for specific tasks. The repository covers various LLM architectures, pre-training techniques, and fine-tuning strategies, making it a valuable resource for researchers, students, and practitioners interested in natural language processing and machine learning. By exploring the code and following the tutorials, users can gain practical insights into working with LLMs and apply their knowledge to real-world projects.

github

: 74

Awesome-LLM-Prune

This repository is dedicated to the pruning of large language models (LLMs). It aims to serve as a comprehensive resource for researchers and practitioners interested in the efficient reduction of model size while maintaining or enhancing performance. The repository contains various papers, summaries, and links related to different pruning approaches for LLMs, along with author information and publication details. It covers a wide range of topics such as structured pruning, unstructured pruning, semi-structured pruning, and benchmarking methods. Researchers and practitioners can explore different pruning techniques, understand their implications, and access relevant resources for further study and implementation.

github

: 262

open-webui-tools

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

github

: 348

ml-retreat

ML-Retreat is a comprehensive machine learning library designed to simplify and streamline the process of building and deploying machine learning models. It provides a wide range of tools and utilities for data preprocessing, model training, evaluation, and deployment. With ML-Retreat, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to optimize their models. The library is built with a focus on scalability, performance, and ease of use, making it suitable for both beginners and experienced machine learning practitioners.

github

: 2.2k

Fast-LLM

Fast-LLM is an open-source library designed for training large language models with exceptional speed, scalability, and flexibility. Built on PyTorch and Triton, it offers optimized kernel efficiency, reduced overheads, and memory usage, making it suitable for training models of all sizes. The library supports distributed training across multiple GPUs and nodes, offers flexibility in model architectures, and is easy to use with pre-built Docker images and simple configuration. Fast-LLM is licensed under Apache 2.0, developed transparently on GitHub, and encourages contributions and collaboration from the community.

github

: 224

Awesome-AI-Security

Awesome-AI-Security is a curated list of resources for AI security, including tools, research papers, articles, and tutorials. It aims to provide a comprehensive overview of the latest developments in securing AI systems and preventing vulnerabilities. The repository covers topics such as adversarial attacks, privacy protection, model robustness, and secure deployment of AI applications. Whether you are a researcher, developer, or security professional, this collection of resources will help you stay informed and up-to-date in the rapidly evolving field of AI security.

github

: 98

AI-and-competition

This repository provides baselines for various competitions, a few top solutions for some competitions, and independent deep learning projects. Baselines serve as entry guides for competitions, suitable for beginners to make their first submission. Top solutions are more complex and refined versions of baselines, with limited quantity but enhanced quality. The repository is maintained by a single author, yunsuxiaozi, offering code improvements and annotations for better understanding. Users can support the repository by learning from it and providing feedback.

github

: 51

God-Level-AI

A drill of scientific methods, processes, algorithms, and systems to build stories & models. An in-depth learning resource for humans. This repository is designed for individuals aiming to excel in the field of Data and AI, providing video sessions and text content for learning. It caters to those in leadership positions, professionals, and students, emphasizing the need for dedicated effort to achieve excellence in the tech field. The content covers various topics with a focus on practical application.

github

: 3.5k

awesome-LLM-resources

This repository is a curated list of resources for learning and working with Large Language Models (LLMs). It includes a collection of articles, tutorials, tools, datasets, and research papers related to LLMs such as GPT-3, BERT, and Transformer models. Whether you are a researcher, developer, or enthusiast interested in natural language processing and artificial intelligence, this repository provides valuable resources to help you understand, implement, and experiment with LLMs.

github

: 6.2k

RAG-To-Know

RAG-To-Know is a versatile tool for knowledge extraction and summarization. It leverages the RAG (Retrieval-Augmented Generation) framework to provide a seamless way to retrieve and summarize information from various sources. With RAG-To-Know, users can easily extract key insights and generate concise summaries from large volumes of text data. The tool is designed to streamline the process of information retrieval and summarization, making it ideal for researchers, students, journalists, and anyone looking to quickly grasp the essence of complex information.

github

: 58

mcp-fundamentals

The mcp-fundamentals repository is a collection of fundamental concepts and examples related to microservices, cloud computing, and DevOps. It covers topics such as containerization, orchestration, CI/CD pipelines, and infrastructure as code. The repository provides hands-on exercises and code samples to help users understand and apply these concepts in real-world scenarios. Whether you are a beginner looking to learn the basics or an experienced professional seeking to refresh your knowledge, mcp-fundamentals has something for everyone.

github

: 100

cs-self-learning

This repository serves as an archive for computer science learning notes, codes, and materials. It covers a wide range of topics including basic knowledge, AI, backend & big data, tools, and other related areas. The content is organized into sections and subsections for easy navigation and reference. Users can find learning resources, programming practices, and tutorials on various subjects such as languages, data structures & algorithms, AI, frameworks, databases, development tools, and more. The repository aims to support self-learning and skill development in the field of computer science.

github

: 53

meeting-minutes

An open-source AI assistant for taking meeting notes that captures live meeting audio, transcribes it in real-time, and generates summaries while ensuring user privacy. Perfect for teams to focus on discussions while automatically capturing and organizing meeting content without external servers or complex infrastructure. Features include modern UI, real-time audio capture, speaker diarization, local processing for privacy, and more. The tool also offers a Rust-based implementation for better performance and native integration, with features like live transcription, speaker diarization, and a rich text editor for notes. Future plans include database connection for saving meeting minutes, improving summarization quality, and adding download options for meeting transcriptions and summaries. The backend supports multiple LLM providers through a unified interface, with configurations for Anthropic, Groq, and Ollama models. System architecture includes core components like audio capture service, transcription engine, LLM orchestrator, data services, and API layer. Prerequisites for setup include Node.js, Python, FFmpeg, and Rust. Development guidelines emphasize project structure, testing, documentation, type hints, and ESLint configuration. Contributions are welcome under the MIT License.

github

: 7.6k

Awesome-Efficient-MoE

Awesome Efficient MoE is a GitHub repository that provides an implementation of Mixture of Experts (MoE) models for efficient deep learning. The repository includes code for training and using MoE models, which are neural network architectures that combine multiple expert networks to improve performance on complex tasks. MoE models are particularly useful for handling diverse data distributions and capturing complex patterns in data. The implementation in this repository is designed to be efficient and scalable, making it suitable for training large-scale MoE models on modern hardware. The code is well-documented and easy to use, making it accessible for researchers and practitioners interested in leveraging MoE models for their deep learning projects.

github

: 131

For similar tasks

Awesome-LLM-Safety

github

: 1.6k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

agentcloud

AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

github

: 583

oss-fuzz-gen

This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

github

: 1.2k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136