awesome-llm-security

A curation of awesome tools, documents and projects about LLM Security.

Stars: 777

Visit

Awesome LLM Security is a curated collection of tools, documents, and projects related to Large Language Model (LLM) security. It covers various aspects of LLM security including white-box, black-box, and backdoor attacks, defense mechanisms, platform security, and surveys. The repository provides resources for researchers and practitioners interested in understanding and safeguarding LLMs against adversarial attacks. It also includes a list of tools specifically designed for testing and enhancing LLM security.

README:

Awesome LLM Security

A curation of awesome tools, documents and projects about LLM Security.

Contributions are always welcome. Please read the Contribution Guidelines before contributing.

Awesome LLM Security

Papers

White-box attack

"Visual Adversarial Examples Jailbreak Large Language Models", 2023-06, AAAI(Oral) 24, multi-modal, [paper] [repo]
"Are aligned neural networks adversarially aligned?", 2023-06, NeurIPS(Poster) 23, multi-modal, [paper]
"(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs", 2023-07, multi-modal [paper]
"Universal and Transferable Adversarial Attacks on Aligned Language Models", 2023-07, transfer, [paper] [repo] [page]
"Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models", 2023-07, multi-modal, [paper]
"Image Hijacking: Adversarial Images can Control Generative Models at Runtime", 2023-09, multi-modal, [paper] [repo] [site]
"Weak-to-Strong Jailbreaking on Large Language Models", 2024-04, token-prob, [paper] [repo]

Black-box attack

"Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", 2023-02, AISec@CCS 23 [paper]
"Jailbroken: How Does LLM Safety Training Fail?", 2023-07, NeurIPS(Oral) 23, [paper]
"Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models", 2023-07, [paper] [repo]
"Effective Prompt Extraction from Language Models", 2023-07, prompt-extraction, [paper]
"Multi-step Jailbreaking Privacy Attacks on ChatGPT", 2023-04, EMNLP 23, privacy, [paper]
"LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?", 2023-07, [paper]
"Jailbreaking chatgpt via prompt engineering: An empirical study", 2023-05, [paper]
"Prompt Injection attack against LLM-integrated Applications", 2023-06, [paper] [repo]
"MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots", 2023-07, time-side-channel, [paper]
"GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher", 2023-08, ICLR 24, cipher, [paper] [repo]
"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", 2023-08, [paper]
"Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs", 2023-08, [paper] [repo] [dataset]
"Detecting Language Model Attacks with Perplexity", 2023-08, [paper]
"Open Sesame! Universal Black Box Jailbreaking of Large Language Models", 2023-09, gene-algorithm, [paper]
"Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", 2023-10, ICLR(oral) 24, [paper] [repo] [site] [dataset]
"AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models", 2023-10, ICLR(poster) 24, gene-algorithm, new-criterion, [paper]
"Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations", 2023-10, CoRR 23, ICL, [paper]
"Multilingual Jailbreak Challenges in Large Language Models", 2023-10, ICLR(poster) 24, [paper] [repo]
"Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation", 2023-11, SoLaR(poster) 24, [paper]
"DeepInception: Hypnotize Large Language Model to Be Jailbreaker", 2023-11, [paper] [repo] [site]
"A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily", 2023-11, NAACL 24, [paper] [repo]
"AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models", 2023-10, [paper]
"Language Model Inversion", 2023-11, ICLR(poster) 24, [paper] [repo]
"An LLM can Fool Itself: A Prompt-Based Adversarial Attack", 2023-10, ICLR(poster) 24, [paper] [repo]
"GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts", 2023-09, [paper] [repo] [site]
"Many-shot Jailbreaking", 2024-04, [paper]
"Rethinking How to Evaluate Language Model Jailbreak", 2024-04, [paper] [repo]

Backdoor attack

"BITE: Textual Backdoor Attacks with Iterative Trigger Injection", 2022-05, ACL 23, defense [paper]
"Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models", 2023-05, EMNLP 23, [paper]
"Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection", 2023-07, NAACL 24, [paper] [repo] [site]

Defense

"Baseline Defenses for Adversarial Attacks Against Aligned Language Models", 2023-09, [paper] [repo]
"LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked", 2023-08, ICLR 24 Tiny Paper, self-filtered, [paper] [repo] [site]
"Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM", 2023-09, random-mask-filter, [paper]
"Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models", 2023-12, [paper] [repo]
"AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks", 2024-03, [paper] [repo]
"Protecting Your LLMs with Information Bottleneck", 2024-04, [paper] [repo]
"PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition", 2024-05, ICML 24, [paper] [repo]
“Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs”, 2024-06, [paper]

Platform Security

"LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins", 2023-09, [paper] [repo]

Survey

"Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks", 2023-10, ACL 24, [paper]
"Security and Privacy Challenges of Large Language Models: A Survey", 2024-02, [paper]
"Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models", 2024-03, [paper]

Tools

Plexiglass: a security toolbox for testing and safeguarding LLMs
PurpleLlama: set of tools to assess and improve LLM security.
Rebuff: a self-hardening prompt injection detector
Garak: a LLM vulnerability scanner
LLMFuzzer: a fuzzing framework for LLMs
LLM Guard: a security toolkit for LLM Interactions
Vigil: a LLM prompt injection detection toolkit
jailbreak-evaluation: an easy-to-use Python package for language model jailbreak evaluation
Prompt Fuzzer: the open-source tool to help you harden your GenAI applications

Articles

Other Awesome Projects

(0din GenAI Bug Bounty from Mozilla)(https://0din.ai): The 0Day Investigative Network is a bug bounty program focusing on flaws within GenAI models. Vulnerability classes include Prompt Injection, Training Data Poisoning, DoS, and more.
Gandalf: a prompt injection wargame
LangChain vulnerable to code injection - CVE-2023-29374
Jailbreak Chat
Adversarial Prompting
Epivolis: a prompt injection aware chatbot designed to mitigate adversarial efforts
LLM Security Problems at DEFCON31 Quals: the world's top security competition
PromptBounty.io
PALLMs (Payloads for Attacking Large Language Models)

Other Useful Resources

Twitter: @llm_sec
Blog: LLM Security authored by @llm_sec
Blog: Embrace The Red
Blog: Kai's Blog
Newsletter: AI safety takes
Newsletter & Blog: Hackstery

For Tasks:

Click tags to check more tools for each tasks

detect adversarial attacks evaluate llm vulnerabilities develop defense strategies conduct security assessments mitigate prompt injection

For Jobs:

security analyst ai researcher cybersecurity consultant machine learning engineer data scientist

Alternative AI tools for awesome-llm-security

Similar Open Source Tools

awesome-llm-security

github

: 777

Efficient-LLMs-Survey

This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

github

: 1.1k

Awesome-Segment-Anything

Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.

github

: 926

Awesome-TimeSeries-SpatioTemporal-LM-LLM

Awesome-TimeSeries-SpatioTemporal-LM-LLM is a curated list of Large (Language) Models and Foundation Models for Temporal Data, including Time Series, Spatio-temporal, and Event Data. The repository aims to summarize recent advances in Large Models and Foundation Models for Time Series and Spatio-Temporal Data with resources such as papers, code, and data. It covers various applications like General Time Series Analysis, Transportation, Finance, Healthcare, Event Analysis, Climate, Video Data, and more. The repository also includes related resources, surveys, and papers on Large Language Models, Foundation Models, and their applications in AIOps.

github

: 944

Awesome_Mamba

Awesome Mamba is a curated collection of groundbreaking research papers and articles on Mamba Architecture, a pioneering framework in deep learning known for its selective state spaces and efficiency in processing complex data structures. The repository offers a comprehensive exploration of Mamba architecture through categorized research papers covering various domains like visual recognition, speech processing, remote sensing, video processing, activity recognition, image enhancement, medical imaging, reinforcement learning, natural language processing, 3D recognition, multi-modal understanding, time series analysis, graph neural networks, point cloud analysis, and tabular data handling.

github

: 125

awesome-AIOps

awesome-AIOps is a curated list of academic researches and industrial materials related to Artificial Intelligence for IT Operations (AIOps). It includes resources such as competitions, white papers, blogs, tutorials, benchmarks, tools, companies, academic materials, talks, workshops, papers, and courses covering various aspects of AIOps like anomaly detection, root cause analysis, incident management, microservices, dependency tracing, and more.

github

: 163

rllm

rLLM (relationLLM) is a Pytorch library for Relational Table Learning (RTL) with LLMs. It breaks down state-of-the-art GNNs, LLMs, and TNNs as standardized modules and facilitates novel model building in a 'combine, align, and co-train' way using these modules. The library is LLM-friendly, processes various graphs as multiple tables linked by foreign keys, introduces new relational table datasets, and is supported by students and teachers from Shanghai Jiao Tong University and Tsinghua University.

github

: 421

optscale

OptScale is an open-source FinOps and MLOps platform that provides cloud cost optimization for all types of organizations and MLOps capabilities like experiment tracking, model versioning, ML leaderboards.

github

: 979

Awesome-LLM-Survey

This repository, Awesome-LLM-Survey, serves as a comprehensive collection of surveys related to Large Language Models (LLM). It covers various aspects of LLM, including instruction tuning, human alignment, LLM agents, hallucination, multi-modal capabilities, and more. Researchers are encouraged to contribute by updating information on their papers to benefit the LLM survey community.

github

: 223

kan-gpt

The KAN-GPT repository is a PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling. It provides a model for generating text based on prompts, with a focus on improving performance compared to traditional MLP-GPT models. The repository includes scripts for training the model, downloading datasets, and evaluating model performance. Development tasks include integrating with other libraries, testing, and documentation.

github

: 663

pro-chat

ProChat is a components library focused on quickly building large language model chat interfaces. It empowers developers to create rich, dynamic, and intuitive chat interfaces with features like automatic chat caching, streamlined conversations, message editing tools, auto-rendered Markdown, and programmatic controls. The tool also includes design evolution plans such as customized dialogue rendering, enhanced request parameters, personalized error handling, expanded documentation, and atomic component design.

github

: 514

airi

Airi is a VTuber project heavily inspired by Neuro-sama. It is capable of various functions such as playing Minecraft, chatting in Telegram and Discord, audio input from browser and Discord, client side speech recognition, VRM and Live2D model support with animations, and more. The project also includes sub-projects like unspeech, hfup, Drizzle ORM driver for DuckDB WASM, and various other tools. Airi uses models like whisper-large-v3-turbo from Hugging Face and is similar to projects like z-waif, amica, eliza, AI-Waifu-Vtuber, and AIVTuber. The project acknowledges contributions from various sources and implements packages to interact with LLMs and models.

github

: 329

arxiv-mcp-server

The ArXiv MCP Server acts as a bridge between AI assistants and arXiv's research repository, enabling AI models to search for and access papers programmatically through the Message Control Protocol (MCP). It offers features like paper search, access, listing, local storage, and research prompts. Users can install it via Smithery or manually for Claude Desktop. The server provides tools for paper search, download, listing, and reading, along with specialized prompts for paper analysis. Configuration can be done through environment variables, and testing is supported with a test suite. The tool is released under the MIT License and is developed by the Pearl Labs Team.

github

: 125

openlrc

Open-Lyrics is a Python library that transcribes voice files using faster-whisper and translates/polishes the resulting text into `.lrc` files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude. It offers well preprocessed audio to reduce hallucination and context-aware translation to improve translation quality. Users can install the library from PyPI or GitHub and follow the installation steps to set up the environment. The tool supports GUI usage and provides Python code examples for transcription and translation tasks. It also includes features like utilizing context and glossary for translation enhancement, pricing information for different models, and a list of todo tasks for future improvements.

github

: 476

functionary

Functionary is a language model that interprets and executes functions/plugins. It determines when to execute functions, whether in parallel or serially, and understands their outputs. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls. It offers documentation and examples on functionary.meetkai.com. The newest model, meetkai/functionary-medium-v3.1, is ranked 2nd in the Berkeley Function-Calling Leaderboard. Functionary supports models with different context lengths and capabilities for function calling and code interpretation. It also provides grammar sampling for accurate function and parameter names. Users can deploy Functionary models serverlessly using Modal.com.

github

: 1.5k

Remote-MCP

github

: 83

For similar tasks

awesome-llm-security

github

: 777

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

awesome-llm-security

README:

Awesome LLM Security

Table of Contents

Papers

White-box attack

Black-box attack

Backdoor attack

Defense

Platform Security

Survey

Tools

Articles

Other Awesome Projects

Other Useful Resources

For Tasks:

For Jobs:

Alternative AI tools for awesome-llm-security

Similar Open Source Tools

awesome-llm-security

Efficient-LLMs-Survey

Awesome-Segment-Anything

Awesome-TimeSeries-SpatioTemporal-LM-LLM

Awesome_Mamba

awesome-AIOps

rllm

optscale

Awesome-LLM-Survey

kan-gpt

pro-chat

airi

arxiv-mcp-server

openlrc

functionary

Remote-MCP

For similar tasks

awesome-llm-security

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick