Open-Prompt-Injection

Open-Prompt-Injection

Prompt injection attacks and defenses in LLM-integrated applications

Stars: 90

Visit
 screenshot

OpenPromptInjection is an open-source toolkit for attacks and defenses in LLM-integrated applications, enabling easy implementation, evaluation, and extension of attacks, defenses, and LLMs. It supports various attack and defense strategies, including prompt injection, paraphrasing, retokenization, data prompt isolation, instructional prevention, sandwich prevention, perplexity-based detection, LLM-based detection, response-based detection, and know-answer detection. Users can create models, tasks, and apps to evaluate different scenarios. The toolkit currently supports PaLM2 and provides a demo for querying models with prompts. Users can also evaluate ASV for different scenarios by injecting tasks and querying models with attacked data prompts.

README:

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

This repo contains OpenPromptInjection, which is an open-source toolkit for attacks and defenses in LLM-integrated applications, which enables easy implementation, evaluation, and extension of attacks, defenses, and LLMs.

If you use this code, please cite the following paper:

@inproceedings{liu2024promptinjection,
  title={Formalizing and Benchmarking Prompt Injection Attacks and Defenses},
  author={Yupei Liu and Yuqi Jia and Runpeng Geng and Jinyuan Jia and Neil Zhenqiang Gong},
  booktitle={USENIX Security Symposium},
  year={2024}
}

Illustration of prompt injection attacks to LLM-integrated Apps

Required Python packages

python 3.9.0, scipy, numpy, torch, tqdm, datasets, rouge 1.0.1, google-generativeai

Implemented attacks, defenses, and LLMs

For attacks, clients can use one of the following key words: naive, escape, ignore, fake_comp, and combine. Each of they corresponds one attack strategy mentioned in the paper.

For defenses, specifying the following key words when creating the app:

  1. By default, "no" is used, meaning that there is no defense used.
  2. Paraphrasing: "paraphrasing"
  3. Retokenization: "retokenization"
  4. Data prompt isolation: "delimiters", "xml", or "random_seq"
  5. Instructional prevention: "instructional"
  6. Sandwich prevention: "sandwich"
  7. Perplexity-based detection: use "ppl-[window_size]-[threshold]". When this is for non-windowed PPL detection, use "ppl-all-[threshold]". For example, "ppl-all-3.0" means the PPL detector without using windows when the threshold is 3.0. Another example is that "ppl-5-3.5" means to use a windowed PPL detector with threshold being 3.5.
  8. LLM-based detection: "llm-based"
  9. Response-based detection: "response-based"
  10. Know-answer detection: "known-answer"

Clients are recommended to navigate to ./configs/model_configs/ to check the supported LLMs. Clients should also enter their own PaLM2 API keys in the corresponding areas in the model config. Supports for other models will be added later.

Usage

Caveat: this repo only supports PaLM2 for now. More models will be added into this repo later.

A simple demo

Before you start, go to './configs/model_configs/palm2_config.json' and replace the API keys with your real keys. Please refer to Google's official site for how to obtain an API key for PaLM2. For Meta's Llama models and OpenAI's GPT models, please also refer to their websites for registration details.

The following code snippet creates a model and queries the model with the prompt "Write a poem about monkeys":

import OpenPromptInjection as PI
from OpenPromptInjection.utils import open_config

model_config_path = './configs/model_configs/palm2_config.json'
model_config = open_config(config_path=model_config_path)
model = PI.create_model(config=model_config)
model.print_model_info()

msg = "Write a poem about monkeys"
model.query(msg)

Combined attack

The following code snippet evaluates the ASV of the scenario where the target task is sentiment analysis (i.e., the target data is sst2), the injected task is spam detection (i.e., the injected data is spam detection), the model is PaLM2, and no defense is applied:

import OpenPromptInjection as PI
from OpenPromptInjection.utils import open_config

# Create the target task
target_task = PI.create_task(open_config(config_path='./configs/task_configs/sst2_config.json'), 100)

# Create the model
model_config = open_config(config_path='./configs/model_configs/palm2_config.json')
model = PI.create_model(config=model_config)

# Create the injected task
inject_task = PI.create_task(open_config(config_path='./configs/task_configs/sms_spam_config.json'), 100, for_injection=True)
attacker = PI.create_attacker('combine', inject_task)

# Create the LLM-integrated App
target_app = PI.create_app(target_task, model, defense='no')

# Queries the model with the attacked data prompt and save the responses
attack_responses = list()
for i, (data_prompt, ground_truth_label) in enumerate(target_app):
    data_prompt_after_attack = attacker.inject(data_prompt, i, target_task=target_task.task)
    response = target_app.query(data_prompt_after_attack, verbose=1, idx=i, total=len(target_app))
    attack_responses.append(response)
    if i % 2 == 0:
        time.sleep(1)

# Create an evaluator to calculate the ASV
evaluator = PI.create_evaluator(
    None, 
    target_task=target_task,
    None, 
    injected_task=attacker.task,
    attack_responses=attack_responses
)

print(f"ASV = {evaluator.asv}")

To evaluate another defense (or attack strategy, target task, etc.), clients can change the config files and the path passed into the factory methods.

To run the experiments reported in the paper, please execute:

python3 run.py

after changing the configurations in the "run.py" file. This script will call "main.py" which is the main file for the experiments. Clients can also feel free to check how "main.py" uses the major classes, factory methods, and utils for their own use cases.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Open-Prompt-Injection

Similar Open Source Tools

For similar tasks

For similar jobs