pentest-agent

PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention. For more information, read our paper at https://dl.acm.org/doi/10.1145/3708821.3733882

Stars: 71

Visit

Pentest Agent is a lightweight and versatile tool designed for conducting penetration testing on network systems. It provides a user-friendly interface for scanning, identifying vulnerabilities, and generating detailed reports. The tool is highly customizable, allowing users to define specific targets and parameters for testing. Pentest Agent is suitable for security professionals and ethical hackers looking to assess the security posture of their systems and networks.

README:

PentestAgent

Overview

PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention.

The framework is modular and consists of the following components:

Reconnaissance Agent: Gathers intelligence about the target system.
Planning Agent: Identifies and prioritizes vulnerabilities and potential exploits.
Execution Agent: Attempts to execute selected exploits in a controlled environment.

For further is information, please refer to our paper.

🔧 Installation & Setup

Note: We recommend deploying this project on a Kali Linux environment for better compatibility with penetration testing tools and workflows.

1. Clone the Repository

git clone https://github.com/nbshenxm/pentest-agent.git
cd pentest-agent

2. Set Environment Variables

Several environment variables need to be filled in. If you are not familiar with environment variables, set them in the .env file.

Required:

PDCP_API_KEY: ProjectDiscovery API key for accessing CVE data and vulnerability information.
GITLAB_TOKEN: GitLab token for ExploitDB access.
GITHUB_KEY: GitHub token for searching repositories and issues.
INDEX_STORAGE_DIR: Directory to store vector indexes for RAG.
PLANNING_OUTPUT_DIR: Directory to save planning results.
LOG_DIR: Directory to store logs.

Optional:

http_proxy, https_proxy: If using a proxy or VPN.

3. Install Python Dependencies

Python version: 3.12

Use a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

or with Conda:

conda create -n pentest python=3.12
conda activate pentest
python -m pip install -r requirements.txt

4. Install CVEMAP

CVEMAP is needed to fetch CVE-related information. Follow their installation instructions.

⚙️ Configuration

File: `pentest_agent/configs/config.yaml`

(1) models

Specify the LLM provider, model name, temperature, and API key.

(2) cve

Set the model used for parsing CVE entries and its generation temperature.

(3) cve_scoring

Scoring criteria for evaluating CVEs:

Vulnerability type
Exploit maturity
Remote exploitability
Attack complexity
Source weighting (ExploitDB, GitHub, Google)

(4) runtime

Reconnaissance Agent:

current_topic: Topic identifier for current CVE task.
target_ip: IP address of the target host.

Planning Agent:

model: LLM Model used for searching exploits and analyzing vulnerability data.
keyword, app, version: Target application details.
vuln_type: Type of vulnerability to focus on.
cvemap_fuzzy_search: Enable fuzzy search for CVE matching.
output_dir: Directory to save analysis results.

Execution Agent:

current_topic: Task/topic identifier.
doc_dir: Directory containing exploit scripts or documents.
target_ip, target_port: IP and port of target host.
attacker_ip: IP of attacker's machine.
command_to_execute: Payload to validate exploitation.
model: LLM Model used for exploit execution guidance.

🚀 Running the Agents (Manual)

Reconnaissance Agent

File: pentest_agent/agents/recon_agent.py
Function: Given a target IP, gathers system and service info.
Usage: Set the topic, LLM model, and IP, then run the script.

python pentest_agent/agents/recon_agent.py

Planning Agent

File: pentest_agent/agents/planning_agent.py
Function: Identifies relevant CVEs and associated exploits from multiple sources.
Sources:
- GitHub repositories and issues
- ExploitDB entries
- Google search results
Features: Multi-source intelligence aggregation with configurable LLM backends
Usage: Set the model and application information.

python pentest_agent/agents/planning_agent.py

Execution Agent

File: pentest_agent/agents/execution_agent.py
Function: Executes selected exploits based on previous analysis and collected context.
Usage: Set the topic, exploit document path, and target info.

python pentest_agent/agents/execution_agent.py

🐳 Docker Deployment

PentestAgent provides Docker support for isolated execution of each agent.

0. Pre-Configuration

Step 1: Edit `pentest_agent/configs/config.yaml`

Configure all agent parameters under the models, cve, cve_scoring, and runtime sections.

Step 2: Config `.env` in `pentest_agent/docker`

Example .env content:

GITHUB_KEY=your_github_token
OPENAI_API_KEY=your_openai_key
HUGGING_FACE_TOKEN=your_hf_token
INDEX_STORAGE_DIR=/path/to/indexes
PLANNING_OUTPUT_DIR=/path/to/output
LOG_DIR=/path/to/logs

1. Start and Run Reconnaissance Agent

cd pentest_agent/docker
docker-compose up --build -d recon

2. Start and Run Planning Agent

cd pentest_agent/docker
docker-compose up --build -d planning

3. Start and Run Execution Agent

cd pentest_agent/docker
docker-compose up --build -d execution

📊 Benchmark & Evaluation

Infrastructure

We adopt Vulhub for evaluating the system. Vulhub provides Docker-based vulnerable environments with real-world CVEs.

Target Selection

We select vulnerabilities based on the following criteria:

Must have a valid CVE ID
Must include a CVSS v3.x score
Additional labels include:
- CWE ID
- Exploitability sub-score
- Difficulty levels derived from the CVSS vector

Our results

It's been a while since we performed our evaluation. We are working on including some new scenarios in addition to the VulHub in the benchmark, as well as evaluating PentestAgent on a variety of advanced LLM backbones. We will publish our results on the benchmark these works are finished.

🤝 Contribution

Feel free to open an issue if you:

Encounter any bugs
Have suggestions for improvement
Would like to contribute features or benchmarks

We welcome community contributions!

For Tasks:

Click tags to check more tools for each tasks

scan networks identify vulnerabilities generate reports assess security posture customize testing parameters

For Jobs:

security analyst penetration tester network administrator cybersecurity consultant ethical hacker

Alternative AI tools for pentest-agent

Similar Open Source Tools

pentest-agent

github

: 71

parlant

Parlant is a structured approach to building and guiding customer-facing AI agents. It allows developers to create and manage robust AI agents, providing specific feedback on agent behavior and helping understand user intentions better. With features like guidelines, glossary, coherence checks, dynamic context, and guided tool use, Parlant offers control over agent responses and behavior. Developer-friendly aspects include instant changes, Git integration, clean architecture, and type safety. It enables confident deployment with scalability, effective debugging, and validation before deployment. Parlant works with major LLM providers and offers client SDKs for Python and TypeScript. The tool facilitates natural customer interactions through asynchronous communication and provides a chat UI for testing new behaviors before deployment.

github

: 12.6k

trae-agent

Trae-agent is a Python library for building and training reinforcement learning agents. It provides a simple and flexible framework for implementing various reinforcement learning algorithms and experimenting with different environments. With Trae-agent, users can easily create custom agents, define reward functions, and train them on a variety of tasks. The library also includes utilities for visualizing agent performance and analyzing training results, making it a valuable tool for both beginners and experienced researchers in the field of reinforcement learning.

github

: 9.3k

verl-tool

The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

github

: 383

LightLLM

LightLLM is a lightweight library for linear and logistic regression models. It provides a simple and efficient way to train and deploy machine learning models for regression tasks. The library is designed to be easy to use and integrate into existing projects, making it suitable for both beginners and experienced data scientists. With LightLLM, users can quickly build and evaluate regression models using a variety of algorithms and hyperparameters. The library also supports feature engineering and model interpretation, allowing users to gain insights from their data and make informed decisions based on the model predictions.

github

: 3.6k

deepflow

DeepFlow is an open-source project that provides deep observability for complex cloud-native and AI applications. It offers Zero Code data collection with eBPF for metrics, distributed tracing, request logs, and function profiling. DeepFlow is integrated with SmartEncoding to achieve Full Stack correlation and efficient access to all observability data. With DeepFlow, cloud-native and AI applications automatically gain deep observability, removing the burden of developers continually instrumenting code and providing monitoring and diagnostic capabilities covering everything from code to infrastructure for DevOps/SRE teams.

github

: 3.5k

airbrussh

Airbrussh is a Capistrano plugin that enhances the output of Capistrano's deploy command. It provides a more detailed and structured view of the deployment process, including color-coded output, timestamps, and improved formatting. Airbrussh aims to make the deployment logs easier to read and understand, helping developers troubleshoot issues and monitor deployments more effectively. It is a useful tool for teams working with Capistrano to streamline their deployment workflows and improve visibility into the deployment process.

github

: 512

pdr_ai_v2

pdr_ai_v2 is a Python library for implementing machine learning algorithms and models. It provides a wide range of tools and functionalities for data preprocessing, model training, evaluation, and deployment. The library is designed to be user-friendly and efficient, making it suitable for both beginners and experienced data scientists. With pdr_ai_v2, users can easily build and deploy machine learning models for various applications, such as classification, regression, clustering, and more.

github

: 599

ml-retreat

ML-Retreat is a comprehensive machine learning library designed to simplify and streamline the process of building and deploying machine learning models. It provides a wide range of tools and utilities for data preprocessing, model training, evaluation, and deployment. With ML-Retreat, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to optimize their models. The library is built with a focus on scalability, performance, and ease of use, making it suitable for both beginners and experienced machine learning practitioners.

github

: 2.2k

deepteam

Deepteam is a powerful open-source tool designed for deep learning projects. It provides a user-friendly interface for training, testing, and deploying deep neural networks. With Deepteam, users can easily create and manage complex models, visualize training progress, and optimize hyperparameters. The tool supports various deep learning frameworks and allows seamless integration with popular libraries like TensorFlow and PyTorch. Whether you are a beginner or an experienced deep learning practitioner, Deepteam simplifies the development process and accelerates model deployment.

github

: 739

vivaria

Vivaria is a web application tool designed for running evaluations and conducting agent elicitation research. Users can interact with Vivaria using a web UI and a command-line interface. It allows users to start task environments based on METR Task Standard definitions, run AI agents, perform agent elicitation research, view API requests and responses, add tags and comments to runs, store results in a PostgreSQL database, sync data to Airtable, test prompts against LLMs, and authenticate using Auth0.

github

: 110

deeppowers

Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

github

: 183

open-webui-tools

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

github

: 348

WorkflowAI

WorkflowAI is a powerful tool designed to streamline and automate various tasks within the workflow process. It provides a user-friendly interface for creating custom workflows, automating repetitive tasks, and optimizing efficiency. With WorkflowAI, users can easily design, execute, and monitor workflows, allowing for seamless integration of different tools and systems. The tool offers advanced features such as conditional logic, task dependencies, and error handling to ensure smooth workflow execution. Whether you are managing project tasks, processing data, or coordinating team activities, WorkflowAI simplifies the workflow management process and enhances productivity.

github

: 436

promptl

Promptl is a versatile command-line tool designed to streamline the process of creating and managing prompts for user input in various programming projects. It offers a simple and efficient way to prompt users for information, validate their input, and handle different scenarios based on their responses. With Promptl, developers can easily integrate interactive prompts into their scripts, applications, and automation workflows, enhancing user experience and improving overall usability. The tool provides a range of customization options and features, making it suitable for a wide range of use cases across different programming languages and environments.

github

: 71

sciml.ai

SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

github

: 61

For similar tasks

pentest-agent

github

: 71

ciso-assistant-community

CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

github

: 3.2k

supersonic

SuperSonic is a next-generation BI platform that integrates Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms. This integration ensures that Chat BI has access to the same curated and governed semantic data models as traditional BI. Furthermore, the implementation of both paradigms benefits from the integration: * Chat BI's Text2SQL gets augmented with context-retrieval from semantic models. * Headless BI's query interface gets extended with natural language API. SuperSonic provides a Chat BI interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build logical semantic models (definition of metric/dimension/tag, along with their meaning and relationships) through a Headless BI interface. Meanwhile, SuperSonic is designed to be extensible and composable, allowing custom implementations to be added and configured with Java SPI. The integration of Chat BI and Headless BI has the potential to enhance the Text2SQL generation in two dimensions: 1. Incorporate data semantics (such as business terms, column values, etc.) into the prompt, enabling LLM to better understand the semantics and reduce hallucination. 2. Offload the generation of advanced SQL syntax (such as join, formula, etc.) from LLM to the semantic layer to reduce complexity. With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development we decide to open source SuperSonic as an extensible framework.

github

: 4.2k

DB-GPT

DB-GPT is an open source AI native data app development framework with AWEL(Agentic Workflow Expression Language) and agents. It aims to build infrastructure in the field of large models, through the development of multiple technical capabilities such as multi-model management (SMMF), Text2SQL effect optimization, RAG framework and optimization, Multi-Agents framework collaboration, AWEL (agent workflow orchestration), etc. Which makes large model applications with data simpler and more convenient.

github

: 17.4k

Chat2DB

Chat2DB is an AI-driven data development and analysis platform that enables users to communicate with databases using natural language. It supports a wide range of databases, including MySQL, PostgreSQL, Oracle, SQLServer, SQLite, MariaDB, ClickHouse, DM, Presto, DB2, OceanBase, Hive, KingBase, MongoDB, Redis, and Snowflake. Chat2DB provides a user-friendly interface that allows users to query databases, generate reports, and explore data using natural language commands. It also offers a variety of features to help users improve their productivity, such as auto-completion, syntax highlighting, and error checking.

github

: 14.3k

aide

AIDE (Advanced Intrusion Detection Environment) is a tool for monitoring file system changes. It can be used to detect unauthorized changes to monitored files and directories. AIDE was written to be a simple and free alternative to Tripwire. Features currently included in AIDE are as follows: o File attributes monitored: permissions, inode, user, group file size, mtime, atime, ctime, links and growing size. o Checksums and hashes supported: SHA1, MD5, RMD160, and TIGER. CRC32, HAVAL and GOST if Mhash support is compiled in. o Plain text configuration files and database for simplicity. o Rules, variables and macros that can be customized to local site or system policies. o Powerful regular expression support to selectively include or exclude files and directories to be monitored. o gzip database compression if zlib support is compiled in. o Free software licensed under the GNU General Public License v2.

github

: 592

OpsPilot

OpsPilot is an AI-powered operations navigator developed by the WeOps team. It leverages deep learning and LLM technologies to make operations plans interactive and generalize and reason about local operations knowledge. OpsPilot can be integrated with web applications in the form of a chatbot and primarily provides the following capabilities: 1. Operations capability precipitation: By depositing operations knowledge, operations skills, and troubleshooting actions, when solving problems, it acts as a navigator and guides users to solve operations problems through dialogue. 2. Local knowledge Q&A: By indexing local knowledge and Internet knowledge and combining the capabilities of LLM, it answers users' various operations questions. 3. LLM chat: When the problem is beyond the scope of OpsPilot's ability to handle, it uses LLM's capabilities to solve various long-tail problems.

github

: 171

aimeos-typo3

Aimeos is a professional, full-featured, and high-performance e-commerce extension for TYPO3. It can be installed in an existing TYPO3 website within 5 minutes and can be adapted, extended, overwritten, and customized to meet specific needs.

github

: 638

For similar jobs

last_layer

last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.

github

: 79

aircrack-ng

Aircrack-ng is a comprehensive suite of tools designed to evaluate the security of WiFi networks. It covers various aspects of WiFi security, including monitoring, attacking (replay attacks, deauthentication, fake access points), testing WiFi cards and driver capabilities, and cracking WEP and WPA PSK. The tools are command line-based, allowing for extensive scripting and have been utilized by many GUIs. Aircrack-ng primarily works on Linux but also supports Windows, macOS, FreeBSD, OpenBSD, NetBSD, Solaris, and eComStation 2.

github

: 5.2k

reverse-engineering-assistant

ReVA (Reverse Engineering Assistant) is a project aimed at building a disassembler agnostic AI assistant for reverse engineering tasks. It utilizes a tool-driven approach, providing small tools to the user to empower them in completing complex tasks. The assistant is designed to accept various inputs, guide the user in correcting mistakes, and provide additional context to encourage exploration. Users can ask questions, perform tasks like decompilation, class diagram generation, variable renaming, and more. ReVA supports different language models for online and local inference, with easy configuration options. The workflow involves opening the RE tool and program, then starting a chat session to interact with the assistant. Installation includes setting up the Python component, running the chat tool, and configuring the Ghidra extension for seamless integration. ReVA aims to enhance the reverse engineering process by breaking down actions into small parts, including the user's thoughts in the output, and providing support for monitoring and adjusting prompts.

github

: 219

AutoAudit

AutoAudit is an open-source large language model specifically designed for the field of network security. It aims to provide powerful natural language processing capabilities for security auditing and network defense, including analyzing malicious code, detecting network attacks, and predicting security vulnerabilities. By coupling AutoAudit with ClamAV, a security scanning platform has been created for practical security audit applications. The tool is intended to assist security professionals with accurate and fast analysis and predictions to combat evolving network threats.

github

: 201

aif

Arno's Iptables Firewall (AIF) is a single- & multi-homed firewall script with DSL/ADSL support. It is a free software distributed under the GNU GPL License. The script provides a comprehensive set of configuration files and plugins for setting up and managing firewall rules, including support for NAT, load balancing, and multirouting. It offers detailed instructions for installation and configuration, emphasizing security best practices and caution when modifying settings. The script is designed to protect against hostile attacks by blocking all incoming traffic by default and allowing users to configure specific rules for open ports and network interfaces.

github

: 147

watchtower

AIShield Watchtower is a tool designed to fortify the security of AI/ML models and Jupyter notebooks by automating model and notebook discoveries, conducting vulnerability scans, and categorizing risks into 'low,' 'medium,' 'high,' and 'critical' levels. It supports scanning of public GitHub repositories, Hugging Face repositories, AWS S3 buckets, and local systems. The tool generates comprehensive reports, offers a user-friendly interface, and aligns with industry standards like OWASP, MITRE, and CWE. It aims to address the security blind spots surrounding Jupyter notebooks and AI models, providing organizations with a tailored approach to enhancing their security efforts.

github

: 187

Academic_LLM_Sec_Papers

Academic_LLM_Sec_Papers is a curated collection of academic papers related to LLM Security Application. The repository includes papers sorted by conference name and published year, covering topics such as large language models for blockchain security, software engineering, machine learning, and more. Developers and researchers are welcome to contribute additional published papers to the list. The repository also provides information on listed conferences and journals related to security, networking, software engineering, and cryptography. The papers cover a wide range of topics including privacy risks, ethical concerns, vulnerabilities, threat modeling, code analysis, fuzzing, and more.

github

: 54

DeGPT

DeGPT is a tool designed to optimize decompiler output using Large Language Models (LLM). It requires manual installation of specific packages and setting up API key for OpenAI. The tool provides functionality to perform optimization on decompiler output by running specific scripts.

github

: 64

pentest-agent

README:

PentestAgent

Overview

🔧 Installation & Setup

1. Clone the Repository

2. Set Environment Variables

3. Install Python Dependencies

4. Install CVEMAP

⚙️ Configuration

File: pentest_agent/configs/config.yaml

(1) models

(2) cve

(3) cve_scoring

(4) runtime

🚀 Running the Agents (Manual)

Reconnaissance Agent

Planning Agent

Execution Agent

🐳 Docker Deployment

0. Pre-Configuration

Step 1: Edit pentest_agent/configs/config.yaml

Step 2: Config .env in pentest_agent/docker

1. Start and Run Reconnaissance Agent

2. Start and Run Planning Agent

3. Start and Run Execution Agent

📊 Benchmark & Evaluation

Infrastructure

Target Selection

Our results

🤝 Contribution

For Tasks:

For Jobs:

Alternative AI tools for pentest-agent

Similar Open Source Tools

pentest-agent

parlant

trae-agent

verl-tool

LightLLM

deepflow

airbrussh

pdr_ai_v2

ml-retreat

deepteam

vivaria

deeppowers

open-webui-tools

WorkflowAI

promptl

sciml.ai

For similar tasks

pentest-agent

ciso-assistant-community

supersonic

DB-GPT

Chat2DB

aide

OpsPilot

aimeos-typo3

For similar jobs

last_layer

aircrack-ng

reverse-engineering-assistant

AutoAudit

aif

watchtower

Academic_LLM_Sec_Papers

DeGPT

File: `pentest_agent/configs/config.yaml`

Step 1: Edit `pentest_agent/configs/config.yaml`

Step 2: Config `.env` in `pentest_agent/docker`