garak
the LLM vulnerability scanner
Stars: 3257
Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.
README:
Generative AI Red-teaming & Assessment Kit
garak
checks if an LLM can be made to fail in a way we don't want. garak
probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap
, it's nmap
for LLMs.
garak
focuses on ways of making an LLM or dialog system fail. It combines static, dynamic, and adaptive probes to explore this.
garak
's a free tool. We love developing it and are always interested in adding functionality to support applications.
> See our user guide! docs.garak.ai
> Join our Discord!
> Project links & home: garak.ai
> Twitter: @garak_llm
> DEF CON slides!
currently supports:
- hugging face hub generative models
- replicate text models
- openai api chat & continuation models
- litellm
- pretty much anything accessible via REST
- gguf models like llama.cpp version >= 1046
- .. and many more LLMs!
garak
is a command-line tool. It's developed in Linux and OSX.
Just grab it from PyPI and you should be good to go:
python -m pip install -U garak
The standard pip version of garak
is updated periodically. To get a fresher version from GitHub, try:
python -m pip install -U git+https://github.com/NVIDIA/garak.git@main
garak
has its own dependencies. You can to install garak
in its own Conda environment:
conda create --name garak "python>=3.10,<=3.12"
conda activate garak
gh repo clone NVIDIA/garak
cd garak
python -m pip install -e .
OK, if that went fine, you're probably good to go!
Note: if you cloned before the move to the NVIDIA
GitHub organisation, but you're reading this at the github.com/NVIDIA
URI, please update your remotes as follows:
git remote set-url origin https://github.com/NVIDIA/garak.git
The general syntax is:
garak <options>
garak
needs to know what model to scan, and by default, it'll try all the probes it knows on that model, using the vulnerability detectors recommended by each probe. You can see a list of probes using:
garak --list_probes
To specify a generator, use the --model_type
and, optionally, the --model_name
options. Model type specifies a model family/interface; model name specifies the exact model to be used. The "Intro to generators" section below describes some of the generators supported. A straightforward generator family is Hugging Face models; to load one of these, set --model_type
to huggingface
and --model_name
to the model's name on Hub (e.g. "RWKV/rwkv-4-169m-pile"
). Some generators might need an API key to be set as an environment variable, and they'll let you know if they need that.
garak
runs all the probes by default, but you can be specific about that too. --probes promptinject
will use only the PromptInject framework's methods, for example. You can also specify one specific plugin instead of a plugin family by adding the plugin name after a .
; for example, --probes lmrc.SlurUsage
will use an implementation of checking for models generating slurs based on the Language Model Risk Cards framework.
For help and inspiration, find us on Twitter or discord!
Probe ChatGPT for encoding-based prompt injection (OSX/*nix) (replace example value with a real OpenAI API key)
export OPENAI_API_KEY="sk-123XXXXXXXXXXXX"
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0
python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0
For each probe loaded, garak will print a progress bar as it generates. Once generation is complete, a row evaluating that probe's results on each detector is given. If any of the prompt attempts yielded an undesirable behavior, the response will be marked as FAIL, and the failure rate given.
Here are the results with the encoding
module on a GPT-3 variant:
And the same results for ChatGPT:
We can see that the more recent model is much more susceptible to encoding-based injection attacks, where text-babbage-001 was only found to be vulnerable to quoted-printable and MIME encoding injections. The figures at the end of each row, e.g. 840/840, indicate the number of text generations total and then how many of these seemed to behave OK. The figure can be quite high because more than one generation is made per prompt - by default, 10.
Errors go in garak.log
; the run is logged in detail in a .jsonl
file specified at analysis start & end. There's a basic analysis script in analyse/analyse_log.py
which will output the probes and prompts that led to the most hits.
Send PRs & open issues. Happy hunting!
Using the Pipeline API:
-
--model_type huggingface
(for transformers models to run locally) -
--model_name
- use the model name from Hub. Only generative models will work. If it fails and shouldn't, please open an issue and paste in the command you tried + the exception!
Using the Inference API:
-
--model_type huggingface.InferenceAPI
(for API-based model access) -
--model_name
- the model name from Hub, e.g."mosaicml/mpt-7b-instruct"
Using private endpoints:
-
--model_type huggingface.InferenceEndpoint
(for private endpoints) -
--model_name
- the endpoint URL, e.g.https://xxx.us-east-1.aws.endpoints.huggingface.cloud
-
(optional) set the
HF_INFERENCE_TOKEN
environment variable to a Hugging Face API token with the "read" role; see https://huggingface.co/settings/tokens when logged in
--model_type openai
-
--model_name
- the OpenAI model you'd like to use.gpt-3.5-turbo-0125
is fast and fine for testing. - set the
OPENAI_API_KEY
environment variable to your OpenAI API key (e.g. "sk-19763ASDF87q6657"); see https://platform.openai.com/account/api-keys when logged in
Recognised model types are whitelisted, because the plugin needs to know which sub-API to use. Completion or ChatCompletion models are OK. If you'd like to use a model not supported, you should get an informative error message, and please send a PR / open an issue.
- set the
REPLICATE_API_TOKEN
environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in
Public Replicate models:
--model_type replicate
-
--model_name
- the Replicate model name and hash, e.g."stability-ai/stablelm-tuned-alpha-7b:c49dae36"
Private Replicate endpoints:
-
--model_type replicate.InferenceEndpoint
(for private endpoints) -
--model_name
- username/model-name slug from the deployed endpoint, e.g.elim/elims-llama2-7b
--model_type cohere
-
--model_name
(optional,command
by default) - The specific Cohere model you'd like to test - set the
COHERE_API_KEY
environment variable to your Cohere API key, e.g. "aBcDeFgHiJ123456789"; see https://dashboard.cohere.ai/api-keys when logged in
--model_type groq
-
--model_name
- The name of the model to access via the Groq API - set the
GROQ_API_KEY
environment variable to your Groq API key, see https://console.groq.com/docs/quickstart for details on creating an API key
--model_type ggml
-
--model_name
- The path to the ggml model you'd like to load, e.g./home/leon/llama.cpp/models/7B/ggml-model-q4_0.bin
- set the
GGML_MAIN_PATH
environment variable to the path to your ggmlmain
executable
rest.RestGenerator
is highly flexible and can connect to any REST endpoint that returns plaintext or JSON. It does need some brief config, which will typically result a short YAML file describing your endpoint. See https://reference.garak.ai/en/latest/garak.generators.rest.html for examples.
Use models from https://build.nvidia.com/ or other NIM endpoints.
- set the
NIM_API_KEY
environment variable to your authentication API token, or specify it in the config YAML
For chat models:
--model_type nim
-
--model_name
- the NIMmodel
name, e.g.meta/llama-3.1-8b-instruct
For completion models:
--model_type nim.NVOpenAICompletion
-
--model_name
- the NIMmodel
name, e.g.bigcode/starcoder2-15b
- set the
OCTO_API_TOKEN
environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in
Octo public endpoint:
--model_type octo
-
--model_name
- the OctoAI public endpoint for the model, e.g.mistral-7b-instruct-fp16
Octo private endpoint:
-
--model_type octo.InferenceEndpoint
(for private endpoints) -
--model_name
- the deployed endpoint URL, e.g.https://llama-2-70b-chat-xxx.octoai.run/v1/chat/completions
-
--model_type test
-
(alternatively)
--model_name test.Blank
For testing. This always generates the empty string, using thetest.Blank
generator. Will be marked as failing for any tests that require an output, e.g. those that make contentious claims and expect the model to refute them in order to pass. -
--model_type test.Repeat
For testing. This generator repeats back the prompt it received.
Probe | Description |
---|---|
blank | A simple probe that always sends an empty prompt. |
atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 fine-tuned on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
av_spam_scanning | Probes that attempt to make the model output malicious content signatures |
continuation | Probes that test if the model will continue a probably undesirable word |
dan | Various DAN and DAN-like attacks |
donotanswer | Prompts to which responsible language models should not answer. |
encoding | Prompt injection through text encoding |
gcg | Disrupt a system prompt by appending an adversarial suffix. |
glitch | Probe model for glitch tokens that provoke unusual behavior. |
grandma | Appeal to be reminded of one's grandmother. |
goodside | Implementations of Riley Goodside attacks. |
leakerplay | Evaluate if a model will replay training data. |
lmrc | Subsample of the Language Model Risk Cards probes |
malwaregen | Attempts to have the model generate code for building malware |
misleading | Attempts to make a model support misleading and false claims |
packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
promptinject | Implementation of the Agency Enterprise PromptInject work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
snowball | Snowballed Hallucination probes designed to make a model give a wrong answer to questions too complex for it to process |
xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration. |
garak
generates multiple kinds of log:
- A log file,
garak.log
. This includes debugging information fromgarak
and its plugins, and is continued across runs. - A report of the current run, structured as JSONL. A new report file is created every time
garak
runs. The name of this file is output at the beginning and, if successful, also at the end of the run. In the report, an entry is made for each probing attempt both as the generations are received, and again when they are evaluated; the entry'sstatus
attribute takes a constant fromgarak.attempts
to describe what stage it was made at. - A hit log, detailing attempts that yielded a vulnerability (a 'hit')
Check out the reference docs for an authoritative guide to garak
code structure.
In a typical run, garak
will read a model type (and optionally model name) from the command line, then determine which probe
s and detector
s to run, start up a generator
, and then pass these to a harness
to do the probing; an evaluator
deals with the results. There are many modules in each of these categories, and each module provides a number of classes that act as individual plugins.
-
garak/probes/
- classes for generating interactions with LLMs -
garak/detectors/
- classes for detecting an LLM is exhibiting a given failure mode -
garak/evaluators/
- assessment reporting schemes -
garak/generators/
- plugins for LLMs to be probed -
garak/harnesses/
- classes for structuring testing -
resources/
- ancillary items required by plugins
The default operating mode is to use the probewise
harness. Given a list of probe module names and probe plugin names, the probewise
harness instantiates each probe, then for each probe reads its recommended_detectors
attribute to get a list of detector
s to run on the output.
Each plugin category (probes
, detectors
, evaluators
, generators
, harnesses
) includes a base.py
which defines the base classes usable by plugins in that category. Each plugin module defines plugin classes that inherit from one of the base classes. For example, garak.generators.openai.OpenAIGenerator
descends from garak.generators.base.Generator
.
Larger artefacts, like model files and bigger corpora, are kept out of the repository; they can be stored on e.g. Hugging Face Hub and loaded locally by clients using garak
.
- Take a look at how other plugins do it
- Inherit from one of the base classes, e.g.
garak.probes.base.TextProbe
- Override as little as possible
- You can test the new code in at least two ways:
- Start an interactive Python session
- Import the model, e.g.
import garak.probes.mymodule
- Instantiate the plugin, e.g.
p = garak.probes.mymodule.MyProbe()
- Import the model, e.g.
- Run a scan with test plugins
- For probes, try a blank generator and always.Pass detector:
python3 -m garak -m test.Blank -p mymodule -d always.Pass
- For detectors, try a blank generator and a blank probe:
python3 -m garak -m test.Blank -p test.Blank -d mymodule
- For generators, try a blank probe and always.Pass detector:
python3 -m garak -m mymodule -p test.Blank -d always.Pass
- For probes, try a blank generator and always.Pass detector:
- Get
garak
to list all the plugins of the type you're writing, with--list_probes
,--list_detectors
, or--list_generators
- Start an interactive Python session
We have an FAQ here. Reach out if you have any more questions! [email protected]
Code reference documentation is at garak.readthedocs.io.
You can read the garak preprint paper. If you use garak, please cite us.
@article{garak,
title={{garak: A Framework for Security Probing Large Language Models}},
author={Leon Derczynski and Erick Galinkin and Jeffrey Martin and Subho Majumdar and Nanna Inie},
year={2024},
howpublished={\url{https://garak.ai}}
}
"Lying is a skill like any other, and if you wish to maintain a level of excellence you have to practice constantly" - Elim
For updates and news see @garak_llm
© 2023- Leon Derczynski; Apache license v2, see LICENSE
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for garak
Similar Open Source Tools
garak
Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.
garak
Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.
code2prompt
code2prompt is a command-line tool that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting. It automates generating LLM prompts from codebases of any size, customizing prompt generation with Handlebars templates, respecting .gitignore, filtering and excluding files using glob patterns, displaying token count, including Git diff output, copying prompt to clipboard, saving prompt to an output file, excluding files and folders, adding line numbers to source code blocks, and more. It helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.
mergekit
Mergekit is a toolkit for merging pre-trained language models. It uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.
turnkeyml
TurnkeyML is a tools framework that integrates models, toolchains, and hardware backends to simplify the evaluation and actuation of deep learning models. It supports use cases like exporting ONNX files, performance validation, functional coverage measurement, stress testing, and model insights analysis. The framework consists of analysis, build, runtime, reporting tools, and a models corpus, seamlessly integrated to provide comprehensive functionality with simple commands. Extensible through plugins, it offers support for various export and optimization tools and AI runtimes. The project is actively seeking collaborators and is licensed under Apache 2.0.
HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.
py-vectara-agentic
The `vectara-agentic` Python library is designed for developing powerful AI assistants using Vectara and Agentic-RAG. It supports various agent types, includes pre-built tools for domains like finance and legal, and enables easy creation of custom AI assistants and agents. The library provides tools for summarizing text, rephrasing text, legal tasks like summarizing legal text and critiquing as a judge, financial tasks like analyzing balance sheets and income statements, and database tools for inspecting and querying databases. It also supports observability via LlamaIndex and Arize Phoenix integration.
sage
Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.
termax
Termax is an LLM agent in your terminal that converts natural language to commands. It is featured by: - Personalized Experience: Optimize the command generation with RAG. - Various LLMs Support: OpenAI GPT, Anthropic Claude, Google Gemini, Mistral AI, and more. - Shell Extensions: Plugin with popular shells like `zsh`, `bash` and `fish`. - Cross Platform: Able to run on Windows, macOS, and Linux.
codespin
CodeSpin.AI is a set of open-source code generation tools that leverage large language models (LLMs) to automate coding tasks. With CodeSpin, you can generate code in various programming languages, including Python, JavaScript, Java, and C++, by providing natural language prompts. CodeSpin offers a range of features to enhance code generation, such as custom templates, inline prompting, and the ability to use ChatGPT as an alternative to API keys. Additionally, CodeSpin provides options for regenerating code, executing code in prompt files, and piping data into the LLM for processing. By utilizing CodeSpin, developers can save time and effort in coding tasks, improve code quality, and explore new possibilities in code generation.
log10
Log10 is a one-line Python integration to manage your LLM data. It helps you log both closed and open-source LLM calls, compare and identify the best models and prompts, store feedback for fine-tuning, collect performance metrics such as latency and usage, and perform analytics and monitor compliance for LLM powered applications. Log10 offers various integration methods, including a python LLM library wrapper, the Log10 LLM abstraction, and callbacks, to facilitate its use in both existing production environments and new projects. Pick the one that works best for you. Log10 also provides a copilot that can help you with suggestions on how to optimize your prompt, and a feedback feature that allows you to add feedback to your completions. Additionally, Log10 provides prompt provenance, session tracking and call stack functionality to help debug prompt chains. With Log10, you can use your data and feedback from users to fine-tune custom models with RLHF, and build and deploy more reliable, accurate and efficient self-hosted models. Log10 also supports collaboration, allowing you to create flexible groups to share and collaborate over all of the above features.
WindowsAgentArena
Windows Agent Arena (WAA) is a scalable Windows AI agent platform designed for testing and benchmarking multi-modal, desktop AI agents. It provides researchers and developers with a reproducible and realistic Windows OS environment for AI research, enabling testing of agentic AI workflows across various tasks. WAA supports deploying agents at scale using Azure ML cloud infrastructure, allowing parallel running of multiple agents and delivering quick benchmark results for hundreds of tasks in minutes.
mods
AI for the command line, built for pipelines. LLM based AI is really good at interpreting the output of commands and returning the results in CLI friendly text formats like Markdown. Mods is a simple tool that makes it super easy to use AI on the command line and in your pipelines. Mods works with OpenAI, Groq, Azure OpenAI, and LocalAI To get started, install Mods and check out some of the examples below. Since Mods has built-in Markdown formatting, you may also want to grab Glow to give the output some _pizzazz_.
LeanCopilot
Lean Copilot is a tool that enables the use of large language models (LLMs) in Lean for proof automation. It provides features such as suggesting tactics/premises, searching for proofs, and running inference of LLMs. Users can utilize built-in models from LeanDojo or bring their own models to run locally or on the cloud. The tool supports platforms like Linux, macOS, and Windows WSL, with optional CUDA and cuDNN for GPU acceleration. Advanced users can customize behavior using Tactic APIs and Model APIs. Lean Copilot also allows users to bring their own models through ExternalGenerator or ExternalEncoder. The tool comes with caveats such as occasional crashes and issues with premise selection and proof search. Users can get in touch through GitHub Discussions for questions, bug reports, feature requests, and suggestions. The tool is designed to enhance theorem proving in Lean using LLMs.
gpt-cli
gpt-cli is a command-line interface tool for interacting with various chat language models like ChatGPT, Claude, and others. It supports model customization, usage tracking, keyboard shortcuts, multi-line input, markdown support, predefined messages, and multiple assistants. Users can easily switch between different assistants, define custom assistants, and configure model parameters and API keys in a YAML file for easy customization and management.
stagehand
Stagehand is an AI web browsing framework that simplifies and extends web automation using three simple APIs: act, extract, and observe. It aims to provide a lightweight, configurable framework without complex abstractions, allowing users to automate web tasks reliably. The tool generates Playwright code based on atomic instructions provided by the user, enabling natural language-driven web automation. Stagehand is open source, maintained by the Browserbase team, and supports different models and model providers for flexibility in automation tasks.
For similar tasks
garak
Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.