PromptFuzz
PromtFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts.
Stars: 170
**Description:** PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing. **Features:** * **Multiply LLM support** : Supports the general LLMs: Codex, Inocder, ChatGPT, and GPT4 (Currently tested on ChatGPT). * **Context-based Prompt** : Construct LLM prompts with the automatically extracted library context. * **Powerful Sanitization** : The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs. * **Prioritized Mutation** : Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage. * **Fuzz Driver Exploitation** : Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers. * **Fuzz engine integration** : Integrates with grey-box fuzz engine: LibFuzzer. **Benefits:** * **High branch coverage:** The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than _OSS-Fuzz_ and 1.67x greater than _Hopper_. * **Bug detection:** PromptFuzz detected 33 valid security bugs from 49 unique crashes. * **Wide range of bugs:** The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. * **Unique bugs:** PromptFuzz detects uniquely interesting bugs that other fuzzers may miss. **Usage:** 1. Build the library using the provided build scripts. 2. Export the LLM API KEY if using ChatGPT or GPT4. 3. Generate fuzz drivers using the `fuzzer` command. 4. Run the fuzz drivers using the `harness` command. 5. Deduplicate and analyze the reported crashes. **Future Works:** * **Custom LLMs suport:** Support custom LLMs. * **Close-source libraries:** Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus. * **Performance** : Reduce the huge time cost required in erroneous program elimination.
README:
PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing.
PromptFuzz is currently regarded as the leading approach for generating fuzz drivers both in academia and industry. The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than OSS-Fuzz and 1.67x greater than Hopper. Besides, PromptFuzz detected 33 valid security bugs from 49 unique crashes.
- Multiply LLM support: Supports the general LLMs: Codex, Incoder, ChatGPT, and GPT4 (Currently tested on ChatGPT).
- Context-based Prompt: Construct LLM prompts with the automatically extracted library context.
- Powerful Sanitization: The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs.
- Prioritized Mutation: Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage.
- Fuzz Driver Exploitation: Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers.
- Fuzz engine integration: Integrates with grey-box fuzz engine: LibFuzzer.
The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. For instances, CVE-2023-6277, CVE-2023-52355 and CVE-2023-52356.
PromptFuzz detects uniquely interesting bugs:
ID | Library | Buggy Function | Bug Type | Status | Track Link |
---|---|---|---|---|---|
1. | libaom | highbd_8_variance_sse2 | SEGV | Confirmed | 3489 |
2. | libaom | av1_rc_update_framerate | Uninitialized Stack | Confirmed | 3509 |
3. | libaom | timebase_units_to_ticks | Integer Overflow | Confirmed | 3510 |
4. | libaom | encode_without_recode | SEGV | Confirmed | 3534 |
5. | libvpx | vp8_peek_si_internal | SEGV | Confirmed | 1817 |
6. | libvpx | update_fragments | Buffer Overflow | Confirmed | 1827 |
7. | libvpx | vp8e_encode | Integer Overflow | Confirmed | 1828 |
8. | libvpx | encode_mb_row | Integer Overflow | Confirmed | 1831 |
9. | libvpx | vpx_free_tpl_gop_stats | SEGV | Confirmed | 1837 |
10. | libmagic | apprentice_map | Buffer Overflow | Waiting | 481 |
11. | libmagic | magic_setparam | Buffer Overflow | Waiting | 482 |
12. | libmagic | check_buffer | Buffer Overflow | Confirmed | 483 |
13. | libmagic | mget | Integer Overflow | Waiting | 486 |
14. | libTIFF | TIFFOpen | OOM | Confirmed | 614 |
15. | libTIFF | PixarLogSetupDecode | OOM | Confirmed | 619 |
16. | libTIFF | TIFFReadEncodedStrip | OOM | Confirmed | 620 |
17. | libTIFF | TIFFReadRGBAImageOriented | OOM | Confirmed | 620 |
18. | libTIFF | TIFFRasterScanlineSize64 | OOM | Confirmed | 621 |
19. | libTIFF | TIFFReadRGBATileExt | SEGV | Confirmed | 622 |
20. | sqlite3 | sqlite3_unlock_notify | Null Pointer crash | Confirmed | e77a5 |
21. | sqlite3 | sqlite3_enable_load_extension | Null Pointer crash | Confirmed | 9ce83 |
22. | sqlite3 | sqlite3_db_config | Null Pointer crash | Confirmed | 5e3fc |
23. | c-ares | config_sortlist | Memory Leak | Confirmed | d62627 |
24. | c-ares | config_sortlist | Memory Leak | Confirmed | d62627 |
25. | libjpeg-turbo | tj3DecodeYUV8 | Integer Overflow | Confirmed | 78eaf0 |
26. | libjpeg-turbo | tj3LoadImage16 | OOM | Confirmed | 735 |
27. | libpcap | pcap_create | File Leak | Confirmed | 1233 |
28. | libpcap | pcapint_create_interface | Null Pointer crash | Confirmed | 1239 |
29. | libpcap | pcapint_fixup_pcap_pkthdr | Misaligned Address | Confirmed | - |
30. | cJSON | cJSON_SetNumberHelper | Error Cast | Confirmed | 805 |
31. | cJSON | cJSON_CreateNumber | Error Cast | Confirmed | 806 |
32. | cJSON | cJSON_DeleteItemFromObjectCaseSensitive | TimeOut | Confirmed | 807 |
33. | curl | parseurl | Assertion Failure | Confirmed | 12775 |
You can use the Dockerfile to build the environment:
docker build -t promptfuzz .
docker run -it promptfuzz bash
Before you apply this fuzzer for a new project, you must have a automatic build script to build your project to prepare the required data (e.g., headers, link libraries, fuzzing corpus and etc.), like OSS-Fuzz. See Preparation.
We have prepared the build scripts for some popular open source libraries, you can refer to the data directory.
If you prefer to set up the environment locally instead of using Docker, you can follow the instructions below:
Requirements:
- Rust stable
- LLVM and Clang (built with compiler-rt)
- wllvm (installed by
pip3 install wllvm
)
You can download llvm and clang from this link or install by llvm.sh.
Explicit dependency see Dockerfile.
If you prefer build llvm manually, you can build clang with compiler-rt and libcxx from source code following the config:
cmake -S llvm -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;compiler-rt;" \
-DCMAKE_BUILD_TYPE=Release -DLIBCXX_ENABLE_STATIC_ABI_LIBRARY=ON \
-DLIBCXXABI_ENABLE_SHARED=OFF -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
Those custom LLMs have not been fully supported and tested in PromptFuzz. If you just want to use PromptFuzz, please choose ChatGPT or GPT4.
Currently, the primary programming language used for implementation is Rust, while a few Python scripts are utilized to invoke specific LLM models.
If you want to invoke the self-build LLMs (i.e., Incoder), the following is the requirements for building Python dependency:
- pytorch (pip3 install torch)
- transformers (pip3 install transformers)
- yaml (pip3 install PyYAML)
- fastapi (pip3 install fastapi[all])
Run the script in the prompt_fuzz/data directory, to prepare the required data of this library.
After the build process is finished, the data of this library is stored under prompt_fuzz/output/build/.
You must have an OPENAI_API_KEY
in advance if you choice ChatGPT and GPT4. If you don't have that key, apply it from OpenAI in advance.
user@ubuntu$ export OPENAI_API_KEY=$(your_key)
If you encounter difficulties in accessing the OPENAI service from your IP location, you can utilize a proxy to redirect your requests as follows:
user@ubuntu$ export OPENAI_PROXY_BASE=https://openai.proxy.com/v1
Here, openai.proxy.com
should be the location of your personal openai service proxy.
PromptFuzz generates fuzz drivers in a fuzz loop. There are several options that can be tuned in the configuration of promptfuzz.
Typically, the only options that need to be actively set are -c
and -r
. The -c
option determines the number of cores to be used for sanitization. Enabling the -r
option will periodically re-check the correctness of the seed programs, reducing false positives but also introducing some extra overhead.
For instance, the following command is sufficient to perform fuzzing on libaom:
cargo run --bin fuzzer -- libaom -c $(nproc) -r
The detailed configurations of promptfuzz:
user@ubuntu$ cargo run --bin fuzzer -- --help
Once the fuzz drivers generated finish, you should follow the follow steps to run the fuzz drivers and detect bugs.
Take libaom is an example, you can run this command to fuse the programs into a fuzz driver that can be fuzzed:
cargo run --bin harness -- libaom fuse-fuzzer
And, you can execute the fuzzers you fused:
cargo run --bin harness -- libaom fuzzer-run
Note that, promptfuzz implements the mechanism to detect the crashed program inside the fused fuzz driver. If a crash of a program has detected, promptfuzz will disable the code of the crashed program, which enables an continuously fuzzing. So, ensure that executing the fuzz drivers in PromptFuzz.
After 24 hours execution(s), you should deduplicate the reported crashes by PromptFuzz:
cargo run --bin harness -- libaom sanitize-crash
Then, you can collect and verbose the code coverage of your fuzzers by:
cargo run --bin harness -- libaom coverage collect
and
cargo run --bin harness -- libaom coverage report
We also provide a harness named harness
to facilitate you access some core components of PromptFuzz.
Here is the command input of harness
:
#[derive(Subcommand, Debug)]
enum Commands {
/// check a program whether is correct.
Check { program: PathBuf },
/// Recheck the seeds whether are correct.
ReCheck,
/// transform a program to a fuzzer.
Transform {
program: PathBuf,
#[arg(short, default_value = "true")]
use_cons: bool,
/// corpora used to perform transform check
#[arg(short = 'p', default_value = "None")]
corpora: Option<PathBuf>,
},
/// Fuse the programs in seeds to fuzzers.
FuseFuzzer {
/// transform fuzzer with constraints
#[arg(short, default_value = "true")]
use_cons: bool,
/// the number of condensed fuzzer you want to fuse
#[arg(short, default_value = "1")]
n_fuzzer: usize,
/// the count of cpu cores you could use
#[arg(short, default_value = "10")]
cpu_cores: usize,
seed_dir: Option<PathBuf>,
},
/// Run a synthesized fuzzer in the fuzz dir.
FuzzerRun {
/// which fuzzer you want to run. default is "output/$Library/fuzzers"
#[arg(short = 'u', default_value = "true")]
use_cons: bool,
/// the amount of time you wish your fuzzer to run. The default is 86400s (24 hours), the unit is second. 0 is for unlimit.
time_limit: Option<u64>,
/// whether minimize the fuzzing corpus before running
minimize: Option<bool>,
},
/// collect code coverage
Coverage {
/// Coverage kind to collect
kind: CoverageKind,
/// -u means the exploit fuzzers
#[arg(short = 'u', default_value = "true")]
exploit: bool,
},
Compile {
kind: Compile,
#[arg(short = 'u', default_value = "true")]
exploit: bool,
},
/// infer constraints
Infer,
/// Minimize the seeds by unique branches.
Minimize,
/// Sanitize duplicate and spurious crashes
SanitizeCrash {
#[arg(short = 'u', default_value = "true")]
exploit: bool,
},
/// archive the results
Archive { suffix: Option<String> },
/// Build ADG from seeds
Adg {
/// ADG kind to build: sparse or dense
kind: ADGKind,
/// The path of target programs to build the ADG.
target: Option<PathBuf>,
},
}
- Custom LLMs support: Support custom LLMs.
- Close-source libraries: Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus.
- Generalization: Generalize PromptFuzz to binary programs.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for PromptFuzz
Similar Open Source Tools
PromptFuzz
**Description:** PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing. **Features:** * **Multiply LLM support** : Supports the general LLMs: Codex, Inocder, ChatGPT, and GPT4 (Currently tested on ChatGPT). * **Context-based Prompt** : Construct LLM prompts with the automatically extracted library context. * **Powerful Sanitization** : The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs. * **Prioritized Mutation** : Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage. * **Fuzz Driver Exploitation** : Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers. * **Fuzz engine integration** : Integrates with grey-box fuzz engine: LibFuzzer. **Benefits:** * **High branch coverage:** The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than _OSS-Fuzz_ and 1.67x greater than _Hopper_. * **Bug detection:** PromptFuzz detected 33 valid security bugs from 49 unique crashes. * **Wide range of bugs:** The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. * **Unique bugs:** PromptFuzz detects uniquely interesting bugs that other fuzzers may miss. **Usage:** 1. Build the library using the provided build scripts. 2. Export the LLM API KEY if using ChatGPT or GPT4. 3. Generate fuzz drivers using the `fuzzer` command. 4. Run the fuzz drivers using the `harness` command. 5. Deduplicate and analyze the reported crashes. **Future Works:** * **Custom LLMs suport:** Support custom LLMs. * **Close-source libraries:** Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus. * **Performance** : Reduce the huge time cost required in erroneous program elimination.
qserve
QServe is a serving system designed for efficient and accurate Large Language Models (LLM) on GPUs with W4A8KV4 quantization. It achieves higher throughput compared to leading industry solutions, allowing users to achieve A100-level throughput on cheaper L40S GPUs. The system introduces the QoQ quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, addressing runtime overhead challenges. QServe improves serving throughput for various LLM models by implementing compute-aware weight reordering, register-level parallelism, and fused attention memory-bound techniques.
Cherry_LLM
Cherry Data Selection project introduces a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, minimizing manual curation and cost for instruction tuning. The project focuses on selecting impactful training samples ('cherry data') to enhance LLM instruction tuning by estimating instruction-following difficulty. The method involves phases like 'Learning from Brief Experience', 'Evaluating Based on Experience', and 'Retraining from Self-Guided Experience' to improve LLM performance.
EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
SemanticFinder
SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.
TableLLM
TableLLM is a large language model designed for efficient tabular data manipulation tasks in real office scenarios. It can generate code solutions or direct text answers for tasks like insert, delete, update, query, merge, and chart operations on tables embedded in spreadsheets or documents. The model has been fine-tuned based on CodeLlama-7B and 13B, offering two scales: TableLLM-7B and TableLLM-13B. Evaluation results show its performance on benchmarks like WikiSQL, Spider, and self-created table operation benchmark. Users can use TableLLM for code and text generation tasks on tabular data.
last_layer
last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.
dora
Dataflow-oriented robotic application (dora-rs) is a framework that makes creation of robotic applications fast and simple. Building a robotic application can be summed up as bringing together hardwares, algorithms, and AI models, and make them communicate with each others. At dora-rs, we try to: make integration of hardware and software easy by supporting Python, C, C++, and also ROS2. make communication low latency by using zero-copy Arrow messages. dora-rs is still experimental and you might experience bugs, but we're working very hard to make it stable as possible.
StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.
langtrace
Langtrace is an open source observability software that lets you capture, debug, and analyze traces and metrics from all your applications that leverage LLM APIs, Vector Databases, and LLM-based Frameworks. It supports Open Telemetry Standards (OTEL), and the traces generated adhere to these standards. Langtrace offers both a managed SaaS version (Langtrace Cloud) and a self-hosted option. The SDKs for both Typescript/Javascript and Python are available, making it easy to integrate Langtrace into your applications. Langtrace automatically captures traces from various vendors, including OpenAI, Anthropic, Azure OpenAI, Langchain, LlamaIndex, Pinecone, and ChromaDB.
StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.
BitBLAS
BitBLAS is a library for mixed-precision BLAS operations on GPUs, for example, the $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication where $C_{cdtype}[M, N] = A_{adtype}[M, K] \times W_{wdtype}[N, K]$. BitBLAS aims to support efficient mixed-precision DNN model deployment, especially the $W_{wdtype}A_{adtype}$ quantization in large language models (LLMs), for example, the $W_{UINT4}A_{FP16}$ in GPTQ, the $W_{INT2}A_{FP16}$ in BitDistiller, the $W_{INT2}A_{INT8}$ in BitNet-b1.58. BitBLAS is based on techniques from our accepted submission at OSDI'24.
camel
CAMEL is an open-source library designed for the study of autonomous and communicative agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
FFAIVideo
FFAIVideo is a lightweight node.js project that utilizes popular AI LLM to intelligently generate short videos. It supports multiple AI LLM models such as OpenAI, Moonshot, Azure, g4f, Google Gemini, etc. Users can input text to automatically synthesize exciting video content with subtitles, background music, and customizable settings. The project integrates Microsoft Edge's online text-to-speech service for voice options and uses Pexels website for video resources. Installation of FFmpeg is essential for smooth operation. Inspired by MoneyPrinterTurbo, MoneyPrinter, and MsEdgeTTS, FFAIVideo is designed for front-end developers with minimal dependencies and simple usage.
IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.
eko
Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.
For similar tasks
PromptFuzz
**Description:** PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing. **Features:** * **Multiply LLM support** : Supports the general LLMs: Codex, Inocder, ChatGPT, and GPT4 (Currently tested on ChatGPT). * **Context-based Prompt** : Construct LLM prompts with the automatically extracted library context. * **Powerful Sanitization** : The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs. * **Prioritized Mutation** : Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage. * **Fuzz Driver Exploitation** : Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers. * **Fuzz engine integration** : Integrates with grey-box fuzz engine: LibFuzzer. **Benefits:** * **High branch coverage:** The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than _OSS-Fuzz_ and 1.67x greater than _Hopper_. * **Bug detection:** PromptFuzz detected 33 valid security bugs from 49 unique crashes. * **Wide range of bugs:** The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. * **Unique bugs:** PromptFuzz detects uniquely interesting bugs that other fuzzers may miss. **Usage:** 1. Build the library using the provided build scripts. 2. Export the LLM API KEY if using ChatGPT or GPT4. 3. Generate fuzz drivers using the `fuzzer` command. 4. Run the fuzz drivers using the `harness` command. 5. Deduplicate and analyze the reported crashes. **Future Works:** * **Custom LLMs suport:** Support custom LLMs. * **Close-source libraries:** Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus. * **Performance** : Reduce the huge time cost required in erroneous program elimination.
awesome-gpt-security
Awesome GPT + Security is a curated list of awesome security tools, experimental case or other interesting things with LLM or GPT. It includes tools for integrated security, auditing, reconnaissance, offensive security, detecting security issues, preventing security breaches, social engineering, reverse engineering, investigating security incidents, fixing security vulnerabilities, assessing security posture, and more. The list also includes experimental cases, academic research, blogs, and fun projects related to GPT security. Additionally, it provides resources on GPT security standards, bypassing security policies, bug bounty programs, cracking GPT APIs, and plugin security.
SWE-agent
SWE-agent is a tool that allows language models to autonomously fix issues in GitHub repositories, perform tasks on the web, find cybersecurity vulnerabilities, and handle custom tasks. It uses configurable agent-computer interfaces (ACIs) to interact with isolated computer environments. The tool is built and maintained by researchers from Princeton University and Stanford University.
For similar jobs
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
PromptFuzz
**Description:** PromptFuzz is an automated tool that generates high-quality fuzz drivers for libraries via a fuzz loop constructed on mutating LLMs' prompts. The fuzz loop of PromptFuzz aims to guide the mutation of LLMs' prompts to generate programs that cover more reachable code and explore complex API interrelationships, which are effective for fuzzing. **Features:** * **Multiply LLM support** : Supports the general LLMs: Codex, Inocder, ChatGPT, and GPT4 (Currently tested on ChatGPT). * **Context-based Prompt** : Construct LLM prompts with the automatically extracted library context. * **Powerful Sanitization** : The program's syntax, semantics, behavior, and coverage are thoroughly analyzed to sanitize the problematic programs. * **Prioritized Mutation** : Prioritizes mutating the library API combinations within LLM's prompts to explore complex interrelationships, guided by code coverage. * **Fuzz Driver Exploitation** : Infers API constraints using statistics and extends fixed API arguments to receive random bytes from fuzzers. * **Fuzz engine integration** : Integrates with grey-box fuzz engine: LibFuzzer. **Benefits:** * **High branch coverage:** The fuzz drivers generated by PromptFuzz achieved a branch coverage of 40.12% on the tested libraries, which is 1.61x greater than _OSS-Fuzz_ and 1.67x greater than _Hopper_. * **Bug detection:** PromptFuzz detected 33 valid security bugs from 49 unique crashes. * **Wide range of bugs:** The fuzz drivers generated by PromptFuzz can detect a wide range of bugs, most of which are security bugs. * **Unique bugs:** PromptFuzz detects uniquely interesting bugs that other fuzzers may miss. **Usage:** 1. Build the library using the provided build scripts. 2. Export the LLM API KEY if using ChatGPT or GPT4. 3. Generate fuzz drivers using the `fuzzer` command. 4. Run the fuzz drivers using the `harness` command. 5. Deduplicate and analyze the reported crashes. **Future Works:** * **Custom LLMs suport:** Support custom LLMs. * **Close-source libraries:** Apply PromptFuzz to close-source libraries by fine tuning LLMs on private code corpus. * **Performance** : Reduce the huge time cost required in erroneous program elimination.
code-review-gpt
Code Review GPT uses Large Language Models to review code in your CI/CD pipeline. It helps streamline the code review process by providing feedback on code that may have issues or areas for improvement. It should pick up on common issues such as exposed secrets, slow or inefficient code, and unreadable code. It can also be run locally in your command line to review staged files. Code Review GPT is in alpha and should be used for fun only. It may provide useful feedback but please check any suggestions thoroughly.
aiverify
AI Verify is an AI governance testing framework and software toolkit that validates the performance of AI systems against a set of internationally recognised principles through standardised tests. AI Verify is consistent with international AI governance frameworks such as those from European Union, OECD and Singapore. It is a single integrated toolkit that operates within an enterprise environment. It can perform technical tests on common supervised learning classification and regression models for most tabular and image datasets. It however does not define AI ethical standards and does not guarantee that any AI system tested will be free from risks or biases or is completely safe.
cover-agent
CodiumAI Cover Agent is a tool designed to help increase code coverage by automatically generating qualified tests to enhance existing test suites. It utilizes Generative AI to streamline development workflows and is part of a suite of utilities aimed at automating the creation of unit tests for software projects. The system includes components like Test Runner, Coverage Parser, Prompt Builder, and AI Caller to simplify and expedite the testing process, ensuring high-quality software development. Cover Agent can be run via a terminal and is planned to be integrated into popular CI platforms. The tool outputs debug files locally, such as generated_prompt.md, run.log, and test_results.html, providing detailed information on generated tests and their status. It supports multiple LLMs and allows users to specify the model to use for test generation.
auto-playwright
Auto Playwright is a tool that allows users to run Playwright tests using AI. It eliminates the need for selectors by determining actions at runtime based on plain-text instructions. Users can automate complex scenarios, write tests concurrently with or before functionality development, and benefit from rapid test creation. The tool supports various Playwright actions and offers additional options for debugging and customization. It uses HTML sanitization to reduce costs and improve text quality when interacting with the OpenAI API.
momentum-core
Momentum is an open-source behavioral auditor for backend code that helps developers generate powerful insights into their codebase. It analyzes code behavior, tests it at every git push, and ensures readiness for production. Momentum understands backend code, visualizes dependencies, identifies behaviors, generates test code, runs code in the local environment, and provides debugging solutions. It aims to improve code quality, streamline testing processes, and enhance developer productivity.
mutahunter
Mutahunter is an open-source language-agnostic mutation testing tool maintained by CodeIntegrity. It leverages LLM models to inject context-aware faults into codebase, ensuring comprehensive testing. The tool aims to empower companies and developers to enhance test suites and improve software quality by verifying the effectiveness of test cases through creating mutants in the code and checking if the test cases can catch these changes. Mutahunter provides detailed reports on mutation coverage, killed mutants, and survived mutants, enabling users to identify potential weaknesses in their test suites.