Awesome-LLM-Safety
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.
Stars: 906
Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.
README:
English | 中文
Welcome to our Awesome-llm-safety repository! 🥰🥰🥰
🔥 News
🧑💻 Our Work
We've curated a collection of the latest 😋, most comprehensive 😎, and most valuable 🤩 resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.
If a resource is relevant to multiple subcategories, we place it under each applicable section. For instance, the "Awesome-LLM-Safety" repository will be listed under each subcategory to which it pertains🤩!.
✔️ Perfect for Majority
- For beginners curious about llm-safety, our repository serves as a compass for grasping the big picture and diving into the details. Classic or influential papers retained in the README provide a beginner-friendly navigation through interesting directions in the field;
- For seasoned researchers, this repository is a tool to keep you informed and fill any gaps in your knowledge. Within each subtopic, we are diligently updating all the latest content and continuously backfilling with previous work. Our thorough compilation and careful selection are time-savers for you.
🧭 How to Use this Guide
- Quick Start: In the README, users can find a curated list of select information sorted by date, along with links to various consultations.
- In-Depth Exploration: If you have a special interest in a particular subtopic, delve into the "subtopic" folder for more. Each item, be it an article or piece of news, comes with a brief introduction, allowing researchers to swiftly zero in on relevant content.
💼 How to Contribution
If you have completed an insightful work or carefully compiled conference papers, we would love to add your work to the repository.
- For individual papers, you can raise an issue, and we will quickly add your paper under the corresponding subtopic.
- If you have compiled a collection of papers for a conference, you are welcome to submit a pull request directly. We would greatly appreciate your contribution. Please note that these pull requests need to be consistent with our existing format.
📜Advertisement
🌱 If you would like more people to read your recent insightful work, please contact me via email. I can offer you a promotional spot here for up to one month.
Let’s start LLM Safety tutorial!
-
🛡️Awesome LLM-Safety🛡️
- 🤗Introduction
- 🚀Table of Contents
- [🔐Security & Discussion](#security & discussion)
- 🔏Privacy
- 📰Truthfulness & Misinformation
- 😈JailBreak & Attacks
- [🛡️Defenses & Mitigation](#️defenses & mitigation)
- 💯Datasets & Benchmark
- 🧑🏫 Scholars 👩🏫
- 🧑🎓Author
Date | Link | Publication | Authors |
---|---|---|---|
2024/5/20 | Managing extreme AI risks amid rapid progress | Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann | Science |
Date | Institute | Publication | Paper |
---|---|---|---|
20.10 | Facebook AI Research | arxiv | Recipes for Safety in Open-domain Chatbots |
22.03 | OpenAI | NIPS2022 | Training language models to follow instructions with human feedback |
23.07 | UC Berkeley | NIPS2023 | Jailbroken: How Does LLM Safety Training Fail? |
23.12 | OpenAI | Open AI | Practices for Governing Agentic AI Systems |
Date | Type | Title | URL |
---|---|---|---|
22.02 | Toxicity Detection API | Perspective API |
link paper |
23.07 | Repository | Awesome LLM Security | link |
23.10 | Tutorials | Awesome-LLM-Safety | link |
24.01 | Tutorials | Awesome-LM-SSP | link |
👉Latest&Comprehensive Security Paper
Date | Institute | Publication | Paper |
---|---|---|---|
19.12 | Microsoft | CCS2020 | Analyzing Information Leakage of Updates to Natural Language Models |
21.07 | Google Research | ACL2022 | Deduplicating Training Data Makes Language Models Better |
21.10 | Stanford | ICLR2022 | Large language models can be strong differentially private learners |
22.02 | Google Research | ICLR2023 | Quantifying Memorization Across Neural Language Models |
22.02 | UNC Chapel Hill | ICML2022 | Deduplicating Training Data Mitigates Privacy Risks in Language Models |
Date | Type | Title | URL |
---|---|---|---|
23.10 | Tutorials | Awesome-LLM-Safety | link |
24.01 | Tutorials | Awesome-LM-SSP | link |
👉Latest&Comprehensive Privacy Paper
Date | Institute | Publication | Paper |
---|---|---|---|
21.09 | University of Oxford | ACL2022 | TruthfulQA: Measuring How Models Mimic Human Falsehoods |
23.11 | Harbin Institute of Technology | arxiv | A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions |
23.11 | Arizona State University | arxiv | Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey |
Date | Type | Title | URL |
---|---|---|---|
23.07 | Repository | llm-hallucination-survey | link |
23.10 | Repository | LLM-Factuality-Survey | link |
23.10 | Tutorials | Awesome-LLM-Safety | link |
👉Latest&Comprehensive Truthfulness&Misinformation Paper
Date | Institute | Publication | Paper |
---|---|---|---|
20.12 | USENIX Security 2021 | Extracting Training Data from Large Language Models | |
22.11 | AE Studio | NIPS2022(ML Safety Workshop) | Ignore Previous Prompt: Attack Techniques For Language Models |
23.06 | arxiv | Are aligned neural networks adversarially aligned? | |
23.07 | CMU | arxiv | Universal and Transferable Adversarial Attacks on Aligned Language Models |
23.10 | University of Pennsylvania | arxiv | Jailbreaking Black Box Large Language Models in Twenty Queries |
Date | Type | Title | URL |
---|---|---|---|
23.01 | Community | Reddit/ChatGPTJailbrek | link |
23.02 | Resource&Tutorials | Latest Jailbreak Prompts | link |
23.10 | Tutorials | Awesome-LLM-Safety | link |
23.10 | Article | Adversarial Attacks on LLMs(Author: Lilian Weng) | link |
23.11 | Video | [1hr Talk] Intro to Large Language Models From 45:45(Author: Andrej Karpathy) |
link |
24.09 | Repo | awesome_LLM-harmful-fine-tuning-papers | link |
12.10 | Resource | Jailbreak Commuinities | link |
12.10 | Article | Jailbreak Techniques and Safeguards | link |
👉Latest&Comprehensive JailBreak & Attacks Paper
Date | Institute | Publication | Paper |
---|---|---|---|
21.07 | Google Research | ACL2022 | Deduplicating Training Data Makes Language Models Better |
22.04 | Anthropic | arxiv | Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
Date | Type | Title | URL |
---|---|---|---|
23.10 | Tutorials | Awesome-LLM-Safety | link |
👉Latest&Comprehensive Defenses Paper
Date | Institute | Publication | Paper |
---|---|---|---|
20.09 | University of Washington | EMNLP2020(findings) | RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models |
21.09 | University of Oxford | ACL2022 | TruthfulQA: Measuring How Models Mimic Human Falsehoods |
22.03 | MIT | ACL2022 | ToxiGen: A Large-Scale Machine-Generated datasets for Adversarial and Implicit Hate Speech Detection |
Date | Type | Title | URL |
---|---|---|---|
23.10 | Tutorials | Awesome-LLM-Safety | link |
- Toxicity - RealToxicityPrompts datasets
- Truthfulness - TruthfulQA datasets
👉Latest&Comprehensive datasets & Benchmark Paper
🤗If you have any questions, please contact our authors!🤗
✉️: ydyjya ➡️ [email protected]
💬: LLM Safety Discussion
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-Safety
Similar Open Source Tools
Awesome-LLM-Safety
Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.
CogVLM2
CogVLM2 is a new generation of open source models that offer significant improvements in benchmarks such as TextVQA and DocVQA. It supports 8K content length, image resolution up to 1344 * 1344, and both Chinese and English languages. The project provides basic calling methods, fine-tuning examples, and OpenAI API format calling examples to help developers quickly get started with the model.
Model-References
The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.
EAGLE
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs that enhance multimodal LLM perception using a mix of vision encoders and various input resolutions. The model features a channel-concatenation-based fusion for vision experts with different architectures and knowledge, supporting up to over 1K input resolution. It excels in resolution-sensitive tasks like optical character recognition and document understanding.
Awesome-Model-Merging-Methods-Theories-Applications
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.
kangaroo
Kangaroo is an AI-powered SQL client and admin tool for popular databases like SQLite, MySQL, PostgreSQL, etc. It supports various functionalities such as table design, query, model, sync, export/import, and more. The tool is designed to be comfortable, fun, and developer-friendly, with features like code intellisense and autocomplete. Kangaroo aims to provide a seamless experience for database management across different operating systems.
ai-samples
AI Samples for .NET is a repository containing various samples demonstrating how to use AI in .NET applications. It provides quickstarts using Semantic Kernel and Azure OpenAI SDK, covers LLM Core Concepts, End to End Examples, Local Models, Local Embedding Models, Tokenizers, Vector Databases, and Reference Examples. The repository showcases different AI-related projects and tools for developers to explore and learn from.
sane-airscan
sane-airscan is a SANE backend that supports driverless scanning using Apple AirScan (eSCL) and Microsoft WSD protocols. It automatically chooses between the two protocols and has been tested with various devices from Brother, Canon, Dell, Kyocera, Lexmark, Epson, HP, OKI, Panasonic, Pantum, Ricoh, Samsung, and Xerox. The backend allows for automatic and manual device discovery and configuration, supports scanning from platen and ADF in color and grayscale modes, and works with both IPv4 and IPv6. It does not require installation and does not conflict with vendor-provided proprietary software.
llm-awq
AWQ (Activation-aware Weight Quantization) is a tool designed for efficient and accurate low-bit weight quantization (INT3/4) for Large Language Models (LLMs). It supports instruction-tuned models and multi-modal LMs, providing features such as AWQ search for accurate quantization, pre-computed AWQ model zoo for various LLMs, memory-efficient 4-bit linear in PyTorch, and efficient CUDA kernel implementation for fast inference. The tool enables users to run large models on resource-constrained edge platforms, delivering more efficient responses with LLM/VLM chatbots through 4-bit inference.
nntrainer
NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.
ml-road-map
The Machine Learning Road Map is a comprehensive guide designed to take individuals from various levels of machine learning knowledge to a basic understanding of machine learning principles using high-quality, free resources. It aims to simplify the complex and rapidly growing field of machine learning by providing a structured roadmap for learning. The guide emphasizes the importance of understanding AI for everyone, the need for patience in learning machine learning due to its complexity, and the value of learning from experts in the field. It covers five different paths to learning about machine learning, catering to consumers, aspiring AI researchers, ML engineers, developers interested in building ML applications, and companies looking to implement AI solutions.
awesome-mobile-llm
Awesome Mobile LLMs is a curated list of Large Language Models (LLMs) and related studies focused on mobile and embedded hardware. The repository includes information on various LLM models, deployment frameworks, benchmarking efforts, applications, multimodal LLMs, surveys on efficient LLMs, training LLMs on device, mobile-related use-cases, industry announcements, and related repositories. It aims to be a valuable resource for researchers, engineers, and practitioners interested in mobile LLMs.
RAGHub
RAGHub is a community-driven project focused on cataloging new and emerging frameworks, projects, and resources in the Retrieval-Augmented Generation (RAG) ecosystem. It aims to help users stay ahead of changes in the field by providing a platform for the latest innovations in RAG. The repository includes information on RAG frameworks, evaluation frameworks, optimization frameworks, citation frameworks, engines, search reranker frameworks, projects, resources, and real-world use cases across industries and professions.
TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.
auto-dev-vscode
AutoDev for VSCode is an AI-powered coding wizard with multilingual support, auto code generation, and a bug-slaying assistant. It offers customizable prompts and features like Auto Dev/Testing/Document/Agent. The tool aims to enhance coding productivity and efficiency by providing intelligent assistance and automation capabilities within the Visual Studio Code environment.
2024-AICS-EXP
This repository contains the complete archive of the 2024 version of the 'Intelligent Computing System' experiment at the University of Chinese Academy of Sciences. The experiment content for 2024 has undergone extensive adjustments to the knowledge system and experimental topics, including the transition from TensorFlow to PyTorch, significant modifications to previous code, and the addition of experiments with large models. The project is continuously updated in line with the course progress, currently up to the seventh experiment. Updates include the addition of experiments like YOLOv5 in Experiment 5-3, updates to theoretical teaching materials, and fixes for bugs in Experiment 6 code. The repository also includes experiment manuals, questions, and answers for various experiments, with some data sets hosted on Baidu Cloud due to size limitations on GitHub.
For similar tasks
Awesome-LLM-Safety
Welcome to our Awesome-llm-safety repository! We've curated a collection of the latest, most comprehensive, and most valuable resources on large language model safety (llm-safety). But we don't stop there; included are also relevant talks, tutorials, conferences, news, and articles. Our repository is constantly updated to ensure you have the most current information at your fingertips.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.