
Awesome-LLM4Cybersecurity
An overview of LLMs for cybersecurity.
Stars: 681

The repository 'Awesome-LLM4Cybersecurity' provides a comprehensive overview of the applications of Large Language Models (LLMs) in cybersecurity. It includes a systematic literature review covering topics such as constructing cybersecurity-oriented domain LLMs, potential applications of LLMs in cybersecurity, and research directions in the field. The repository analyzes various benchmarks, datasets, and applications of LLMs in cybersecurity tasks like threat intelligence, fuzzing, vulnerabilities detection, insecure code generation, program repair, anomaly detection, and LLM-assisted attacks.
README:
When LLMs Meet Cybersecurity: A Systematic Literature Review
π₯ Updates
π[2025-03-03] We have updated the related papers up to Feb 28th, with 33 new papers added (2025.01.01-2025.02.28).
π[2025-01-21] We have updated the related papers up to Dec 31st, with 74 new papers added (2024.09.01-2024.12.31).
π[2025-01-08] We have included the publication venues for each paper.
π[2024-09-21] We have updated the related papers up to Aug 31st, with 75 new papers added (2024.06.01-2024.08.31).
- When LLMs Meet Cybersecurity: A Systematic Literature Review
- π₯ Updates
- π Introduction
- π© Features
-
π Literatures
- πBibTeX
π Introduction
We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.
We seek to address three key questions:
- RQ1: How to construct cyber security-oriented domain LLMs?
- RQ2: What are the potential applications of LLMs in cybersecurity?
- RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?
π© Features
(2024.08.20) Our study encompasses an analysis of over 300 works, spanning across 25+ LLMs and more than 10 downstream scenarios.
π Literatures
RQ1: How to construct cybersecurity-oriented domain LLMs?
Cybersecurity Evaluation Benchmarks
-
CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity | arXiv | 2024.02.12 | Paper Link
-
SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models | Github | 2023 | Paper Link
-
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | arXiv | 2023.12.26 | Paper Link
-
Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. | Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security | 2022.11.09 | Paper Link
-
Can llms patch security issues? | arXiv | 2024.02.19 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
An empirical study of netops capability of pre-trained large language models. | arXiv | 2023.09.19 | Paper Link
-
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models | arXiv | 2024.02.16 | Paper Link
-
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | arXiv | 2023.12.07 | Paper Link
-
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations | IEEE/ACM International Conference on Mining Software Repositories | 2023.03.16 | Paper Link
-
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator | arXiv | 2024.04.22 | Paper Link
-
Assessing Cybersecurity Vulnerabilities in Code Large Language Models | arXiv | 2024.04.29 | Paper Link
-
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory | arXiv | 2024.05.30 | Paper Link
-
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | arXiv | 2024.06.09 | Paper Link
-
eyeballvul: a future-proof benchmark for vulnerability detection in the wild | arXiv | 2024.07.11 | Paper Link
-
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models | arXiv | 2024.08.03 | Paper Link
-
AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | arXiv | 2024.08.09 | Paper Link
-
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | arXiv | 2024.11.25 | Paper Link
-
AI Cyber Risk Benchmark: Automated Exploitation Capabilities | arXiv | 2024.12.09 | Paper Link
-
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | arXiv | 2024.12.31 | Paper Link
-
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
| arXiv | 2025.02.16 | Paper Link
Fine-tuned Domain LLMs for Cybersecurity
-
SecureFalcon: The Next Cyber Reasoning System for Cyber Security | arXiv | 2023.07.13 | Paper Link
-
Owl: A Large Language Model for IT Operations | ICLR | 2023.09.17 | Paper Link
-
HackMentor: Fine-tuning Large Language Models for Cybersecurity | TrustCom | 2023.09 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.02.29 | Paper Link
-
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair | arXiv | 2024.03.11 | Paper Link
-
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding | ISSRE | 2023.10.06 | Paper Link
-
Instruction Tuning for Secure Code Generation | ICML | 2024.02.14 | Paper Link
-
Nova+: Generative Language Models for Binaries | arXiv | 2023.11.27 | Paper Link
-
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | arXiv | 2024.04.30 | Paper Link
-
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models | arXiv | 2024.06.02 | Paper Link
-
Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models | arXiv | 2024.06.09 | Paper Link
-
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | arXiv | 2024.06.09 | Paper Link
-
IoT-LM: Large Multisensory Language Models for the Internet of Things | arXiv | 2024.07.13 | Paper Link
-
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions | arXiv | 2024.08.18 | Paper Link
-
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | arXiv | 2024.09.17 | Paper Link
-
AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs | arXiv | 2024.11.02 | Paper Link
-
Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection | arXiv | 2024.11.07 | Paper Link
RQ2: What are the potential applications of LLMs in cybersecurity?
Threat Intelligence
-
LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge | arXiv | 2024.01.18 | Paper Link
-
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | BigData | 2023.10.04 | Paper Link
-
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions | arXiv | 2023.08.22 | Paper Link
-
Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation | arXiv | 2024.01.12 | Paper Link
-
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures | Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses | 2023.08.09 | Paper Link
-
ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models | Forensic Sci. Int. Digit. Investig. | 2023.12.22 | Paper Link
-
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild | arXiv | 2023.07.14 | Paper Link
-
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection | arXiv | 2023.08.27 | Paper Link
-
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion | arXiv | 2023.12.21 | Paper Link
-
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 | arXiv | 2023.09.28 | Paper Link
-
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness | Expert Syst. Appl. | 2024.03.13 | Paper Link
-
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | arXiv | 2024.03.01 | Paper Link
-
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence | arXiv | 2024.05.06 | Paper Link
-
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | EuroS&P Workshop | 2024.05.08 | Paper Link
-
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models | arXiv | 2024.06.30 | Paper Link
-
LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI | arXiv | 2024.07.06 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features | arXiv | 2024.08.09 | Paper Link
-
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums | arXiv | 2024.08.08 | Paper Link
-
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | arXiv | 2024.08.12 | Paper Link
-
Usefulness of data flow diagrams and large language models for security threat validation: a registered report | arXiv | 2024.08.14 | Paper Link
-
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment | arXiv | 2024.08.15 | Paper Link
-
Evaluating the Usability of LLMs in Threat Intelligence Enrichment | arXiv | 2024.09.23 | Paper Link
-
Cyber Knowledge Completion Using Large Language Models | arXiv | 2024.09.24 | Paper Link
-
AI-Driven Cyber Threat Intelligence Automation | arXiv | 2024.10.27 | Paper Link
-
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity | arXiv | 2024.10.28 | Paper Link
-
IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery | arXiv | 2024.11.08| Paper Link
-
Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models | arXiv | 2024.12.16 | Paper Link
FUZZ
-
Augmenting Greybox Fuzzing with Generative AI | arXiv | 2023.06.11 | Paper Link
-
How well does LLM generate security tests? | arXiv | 2023.10.03 | Paper Link
-
Fuzz4All: Universal Fuzzing with Large Language Models | ICSE | 2024.01.15 | Paper Link
-
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models | ICSE | 2023.07.26 | Paper Link
-
Understanding Large Language Model Based Fuzz Driver Generation | arXiv | 2023.07.24 | Paper Link
-
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models | ISSTA | 2023.06.07 | Paper Link
-
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT | arXiv | 2023.04.04 | Paper Link
-
Large language model guided protocol fuzzing | NDSS | 2024.02.26 | Paper Link
-
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | USENIX | 2024.03.06 | Paper Link
-
When Fuzzing Meets LLMs: Challenges and Opportunities | ACM International Conference on the Foundations of Software Engineering | 2024.04.25 | Paper Link
-
An Exploratory Study on Using Large Language Models for Mutation Testing | arXiv | 2024.06.14 | Paper Link
-
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | arXiv | 2024.09.03 | Paper Link
-
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing | arXiv | 2024.11.05 | Paper Link
-
ChatHTTPFuzz: Large Language Model-Assisted IoT HTTP Fuzzing | arXiv | 2024.11.18 | Paper Link
-
Harnessing Large Language Models for Seed Generation in Greybox Fuzzing | arXiv | 2024.11.27 | Paper Link
-
Large Language Model assisted Hybrid Fuzzing | arXiv | 2024.12.19 | Paper Link
-
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models | arXiv | 2025.01.08 | Paper Link
Vulnerabilities Detection
-
Evaluation of ChatGPT Model for Vulnerability Detection | arXiv | 2023.04.12 | Paper Link
-
Detecting software vulnerabilities using Language Models | CSR | 2023.02.23 | Paper Link
-
Software Vulnerability Detection using Large Language Models | ISSRE Workshop | 2023.09.02 | Paper Link
-
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities | arXiv | 2023.11.16 | Paper Link
-
Software Vulnerability and Functionality Assessment using LLMs | arXiv | 2024.03.13 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.03.01 | Paper Link
-
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models | arXiv | 2023.11.15 | Paper Link
-
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism | arXiv | 2023.09.27 | Paper Link
-
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT | ICSE | 2023.08.24 | Paper Link
-
Using ChatGPT as a Static Application Security Testing Tool | arXiv | 2023.08.28 | Paper Link
-
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection | arXiv | 2024.01.13 | Paper Link
-
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives | TPS-ISA | 2023.10.16 | Paper Link
-
Software Vulnerability Detection with GPT and In-Context Learning | DSC | 2024.01.08 | Paper Link
-
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis | ICSE | 2023.12.25 | Paper Link
-
VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model | arXiv | 2023.08.09 | Paper Link
-
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning | arXiv | 2024.01.29 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection | arXiv | 2024.03.21 | Paper Link
-
How ChatGPT is Solving Vulnerability Management Problem | arXiv | 2023.11.11 | Paper Link
-
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection | International Symposium on Research in Attacks, Intrusions and Defenses | 2023.08.09 | Paper Link
-
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification | International Conference on Predictive Models and Data Analytics in Software Engineering | 2023.09.02 | Paper Link
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models | arXiv | 2023.12.22 | Paper Link
-
Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap | arXiv | 2024.04.04 | Paper Link
-
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection | Journal of Systems and Software | 2024.05.02 | Paper Link
-
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study | arXiv | 2024.05.24 | Paper Link
-
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | arXiv | 2024.05.27 | Paper Link
-
Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | ACL Findings | 2024.06.06 | Paper Link
-
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG | arXiv | 2024.06.19 | Paper Link
-
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | arXiv | 2024.06.26 | Paper Link
-
Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis | arXiv | 2024.06.27 | Paper Link
-
Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models | Information Security and Privacy | 2024.07.12 | Paper Link
-
Static Detection of Filesystem Vulnerabilities in Android Systems | arXiv | 2024.07.16 | Paper Link
-
SCoPE: Evaluating LLMs for Software Vulnerability Detection | arXiv | 2024.07.19 | Paper Link
-
Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection | arXiv | 2024.07.23 | Paper Link
-
Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Harnessing the Power of LLMs in Source Code Vulnerability Detection | arXiv | 2024.08.07 | Paper Link
-
Exploring RAG-based Vulnerability Augmentation with LLMs | arXiv | 2024.08.08 | Paper Link
-
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions | arXiv | 2024.08.14 | Paper Link
-
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data | arXiv | 2024.08.28 | Paper Link
-
Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection | European symposium on research in computer security | 2024.08.29 | Paper Link
-
SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection | arXiv | 2024.09.02 | Paper Link
-
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches | arXiv | 2024.09.11 | Paper Link
-
Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | arXiv | 2024.09.16 | Paper Link
-
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching | arXiv | 2024.09.17 | Paper Link
-
Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing | arXiv | 2024.09.24 | Paper Link
-
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | arXiv | 2024.09.27 | Paper Link
-
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? | arXiv | 2024.10.10 | Paper Link
-
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs | arXiv | 2024.10.22 | Paper Link
-
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | arXiv | 2024.11.07 | Paper Link
-
LProtector: An LLM-driven Vulnerability Detection System | arXiv | 2024.11.04 | Paper Link
-
Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection | arXiv | 2024.11.14 | Paper Link
-
CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection | arXiv | 2024.11.20 | Paper Link
-
EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | arXiv | 2024.11.25 | Paper Link
-
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics | arXiv | 2024.11.26 | Paper Link
-
ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models | arXiv | 2024.12.06 | Paper Link
-
Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | arXiv | 2024.12.16 | Paper Link
-
Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study | arXiv | 2024.12.24 | Paper Link
-
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection | arXiv | 2025.01.04 | Paper Link
-
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | arXiv | 2025.01.08 | Paper Link
-
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis | arXiv | 2025.01.07 | Paper Link
Insecure code Generation
-
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants | USENIX | 2023.02.27 | Paper Link
-
Bugs in Large Language Models Generated Code | arXiv | 2024.03.18 | Paper Link
-
Asleep at the Keyboard? Assessing the Security of GitHub Copilotβs Code Contributions | S&P | 2021.12.16 | Paper Link
-
The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis | arXiv | 2023.08.29 | Paper Link
-
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT | IEEE Trans. Software Eng. | 2023.08.09 | Paper Link
-
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code | arXiv | 2023.11.01 | Paper Link
-
Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation | NeurIPS | 2023.10.30 | Paper Link
-
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet | arXiv | 2023.12.19 | Paper Link
-
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages | arXiv | 2023.08.08 | Paper Link
-
How Secure is Code Generated by ChatGPT? | SMC | 2023.04.19 | Paper Link
-
Large Language Models for Code: Security Hardening and Adversarial Testing | ACM SIGSAC Conference on Computer and Communications Security | 2023.09.29 | Paper Link
-
Pop Quiz! Can a Large Language Model Help With Reverse Engineering? | arXiv | 2022.02.02 | Paper Link
-
LLM4Decompile: Decompiling Binary Code with Large Language Models | EMNLP | 2024.03.08 | Paper Link
-
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? | USENIX | 2024.03.05 | Paper Link
-
Understanding Programs by Exploiting (Fuzzing) Test Cases | ACL Findings | 2023.01.12 | Paper Link
-
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures | arXiv | 2023.08.07 | Paper Link
-
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 | arXiv | 2023.12.13 | Paper Link
-
Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats | Research Square | 2023.11.21 | Paper Link
-
Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models | arXiv | 2024.03.18 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions | ICSE | 2023.10.24 | Paper Link
-
FLAG: Finding Line Anomalies (in code) with Generative AI | arXiv | 2023.07.22 | Paper Link
-
Evolutionary Large Language Models for Hardware Security: A Comparative Survey | arXiv | 2024.04.25 | Paper Link
-
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models | arXiv | 2024.04.29 | Paper Link
-
LLM Security Guard for Code | International Conference on Evaluation and Assessment in Software Engineering | 2024.05.03 | Paper Link
-
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | arXiv | 2024.05.30 | Paper Link
-
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | arXiv | 2024.06.20 | Paper Link
-
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | arXiv | 2024.07.04 | Paper Link
-
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation | arXiv | 2024.08.17 | Paper Link
-
ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts | arXiv | 2024.09.15 | Paper Link
Program Repair
-
Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs | arXiv | 2023.11.06 | Paper Link
-
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | APR@ICSE | 2023.01.20 | Paper Link
-
AI-powered patching: the future of automated vulnerability fixes | google | 2024.01.31 | Paper Link
-
Practical Program Repair in the Era of Large Pre-trained Language Models | arXiv | 2022.10.25 | Paper Link
-
Security Code Review by LLMs: A Deep Dive into Responses | arXiv | 2024.01.29 | Paper Link
-
Examining Zero-Shot Vulnerability Repair with Large Language Models | SP | 2022.08.15 | Paper Link
-
How Effective Are Neural Networks for Fixing Security Vulnerabilities | ISSTA | 2023.05.29 | Paper Link
-
Can LLMs Patch Security Issues? | arXiv | 2024.02.19 | Paper Link
-
InferFix: End-to-End Program Repair with LLMs | ESEC/FSE | 2023.03.13 | Paper Link
-
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching | arXiv | 2023.08.24 | Paper Link
-
DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection | arXiv | 2023.08.14 | Paper Link
-
Fixing Hardware Security Bugs with Large Language Models | arXiv | 2023.02.02 | Paper Link
-
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models | WWW | 2023.03.19 | Paper Link
-
Enhanced Automated Code Vulnerability Repair using Large Language Models | Eng. Appl. Artif. Intell. | 2024.01.08 | Paper Link
-
Teaching Large Language Models to Self-Debug | ICLR | 2023.10.05 | Paper Link
-
Better Patching Using LLM Prompting, via Self-Consistency | ASE | 2023.08.16 | Paper Link
-
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.11.08 | Paper Link
-
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward | arXiv | 2024.02.22 | Paper Link
-
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs | arXiv | 2024.03.07 | Paper Link
-
When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? | ICSE | 2023.03.01 | Paper Link
-
Aligning LLMs for FL-free Program Repair | arXiv | 2024.04.13 | Paper Link
-
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs | arXiv | 2024.04.22 | Paper Link
-
How Far Can We Go with Practical Function-Level Program Repair? | arXiv | 2024.04.19 | Paper Link
-
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | arXiv | 2024.03.23 | Paper Link
-
A Systematic Literature Review on Large Language Models for Automated Program Repair | arXiv | 2024.05.12 | Paper Link
-
Automated Repair of AI Code with Large Language Models and Formal Verification | arXiv | 2024.05.14 | Paper Link
-
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback | Proceedings of the 1st ACM International Conference on AI-Powered Software | 2024.05.24 | Paper Link
-
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis | arXiv | 2024.06.04 | Paper Link
-
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models | ACM/IEEE International Symposium on Machine Learning for CAD | 2024.07.04 | Paper Link
-
ThinkRepair: Self-Directed Automated Program Repair | ACM SIGSOFT International Symposium on Software Testing and Analysis | 2024.07.30 | Paper Link
-
Revisiting Evolutionary Program Repair via Code Language Model | arXiv | 2024.08.20 | Paper Link
-
RePair: Automated Program Repair with Process-based Feedback | ACL Findings | 2024.08.21 | Paper Link
-
Enhancing LLM-Based Automated Program Repair with Design Rationales | ASE | 2024.08.22 | Paper Link
-
Automated Software Vulnerability Patching using Large Language Models | arXiv | 2024.08.24 | Paper Link
-
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | arXiv | 2024.08.26 | Paper Link
-
Fixing Code Generation Errors for Large Language Models | arXiv | 2024.09.01 | Paper Link
-
APOLLO: A GPT-based tool to detect phishing emails and generate explanations that warn users | arXiv | 2024.10.10 | Paper Link
-
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks | arXiv | 2024.10.20 | Paper Link
-
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation | arXiv | 2024.11.12 | Paper Link
-
Fixing Security Vulnerabilities with AI in OSS-Fuzz | arXiv | 2024.11.21 | Paper Link
-
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair | arXiv | 2024.12.05 | Paper Link
-
From Defects to Demands: A Unified, Iterative, and Heuristically Guided LLM-Based Framework for Automated Software Repair and Requirement Realization | arXiv | 2024.12.06 | Paper Link
-
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models | arXiv | 2025.01.07 | Paper Link
Anomaly Detection
-
Benchmarking Large Language Models for Log Analysis, Security, and Interpretation | J. Netw. Syst. Manag. | 2023.11.24 | Paper Link
-
Log-based Anomaly Detection based on EVT Theory with feedback | arXiv | 2023.09.30 | Paper Link
-
LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection | HPCC/DSS/SmartCity/DependSys | 2023.09.14 | Paper Link
-
LogGPT: Log Anomaly Detection via GPT | BigData | 2023.12.11 | Paper Link
-
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies | ICPC | 2024.01.26 | Paper Link
-
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | arXiv | 2024.03.02 | Paper Link
-
Web Content Filtering through knowledge distillation of Large Language Models | WI-IAT | 2023.05.10 | Paper Link
-
Application of Large Language Models to DDoS Attack Detection | International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles | 2024.02.05 | Paper Link
-
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach | arXiv | 2023.11.12 | Paper Link
-
Evaluating the Performance of ChatGPT for Spam Email Detection | arXiv | 2024.02.23 | Paper Link
-
Prompted Contextual Vectors for Spear-Phishing Detection | arXiv | 2024.02.14 | Paper Link
-
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models | arXiv | 2023.11.30 | Paper Link
-
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection | arXiv | 2023.10.30 | Paper Link
-
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices | IEEE Access | 2024.02.08 | Paper Link
-
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) | arXiv | 2023.09.27 | Paper Link
-
ChatGPT for digital forensic investigation: The good, the bad, and the unknown | Forensic Science International: Digital Investigation | 2023.07.10 | Paper Link
-
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | arXiv | 2024.04.23 | Paper Link
-
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing | ICSE | 2024.04.27 | Paper Link
-
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS | arXiv | 2024.05.12 | Paper Link
-
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection | arXiv | 2024.05.17 | Paper Link
-
Log Parsing with Self-Generated In-Context Learning and Self-Correction | arXiv | 2024.06.05 | Paper Link
-
Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | arXiv | 2024.06.06 | Paper Link
-
ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units | arXiv | 2024.06.11 | Paper Link
-
Anomaly Detection on Unstable Logs with GPT Models | arXiv | 2024.06.11 | Paper Link
-
Defending Against Social Engineering Attacks in the Age of LLMs | EMNLP | 2024.06.18 | Paper Link
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis | arXiv | 2024.07.02 | Paper Link
-
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection | arXiv | 2024.07.12 | Paper Link
-
Towards Explainable Network Intrusion Detection using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Utilizing Large Language Models to Optimize the Detection and Explainability of Phishing Websites | arXiv | 2024.08.11 | Paper Link
-
Multimodal Large Language Models for Phishing Webpage Detection and Identification | arXiv | 2024.08.12 | Paper Link
-
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | arXiv | 2024.08.14 | Paper Link
-
Automated Phishing Detection Using URLs and Webpages | arXiv | 2024.08.16 | Paper Link
-
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models | arXiv | 2024.08.25 | Paper Link
-
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model | arXiv | 2024.08.27 | Paper Link
-
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | arXiv | 2024.09.03 | Paper Link
-
A Comparative Study on Large Language Models for Log Parsing | arXiv | 2024.09.04 | Paper Link
-
Using Large Language Models for Template Detection from Security Event Logs | arXiv | 2024.09.08 | Paper Link
-
LogLLM: Log-based Anomaly Detection Using Large Language Models | arXiv | 2024.11.13 | Paper Link
-
LogLM: From Task-based to Instruction-based Automated Log Analysis | arXiv | 2024.10.12 | Paper Link
-
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | arXiv | 2024.12.03 | Paper Link
-
Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware | arXiv | 2025.01.08 | Paper Link
-
Confront Insider Threat: Precise Anomaly Detection in Behavior Logs Based on LLM Fine-Tuning | COLING | 2024 | Paper Link
LLM Assisted Attack
-
Identifying and mitigating the security risks of generative ai | Foundations and Trends in Privacy and Security | 2023.12.29 | Paper Link
-
Impact of Big Data Analytics and ChatGPT on Cybersecurity | I3CS | 2023.05.22 | Paper Link
-
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy | IEEE Access | 2023.07.03 | Paper Link
-
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing | arXiv | 2023.10.10 | Paper Link
-
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services | USENIX | 2024.01.06 | Paper Link
-
Evaluating LLMs for Privilege-Escalation Scenarios | arXiv | 2023.10.23 | Paper Link
-
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | arXiv | 2023.08.21 | Paper Link
-
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT | CNS | 2023.09.19 | Paper Link
-
From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude | arXiv | 2024.03.10 | Paper Link
-
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads | arXiv | 2023.05.24 | Paper Link
-
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | USENIX | 2023.08.13 | Paper Link
-
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks | arXiv | 2024.03.02 | Paper Link
-
RatGPT: Turning online LLMs into Proxies for Malware Attacks | arXiv | 2023.09.07 | Paper Link
-
Getting pwnβd by AI: Penetration Testing with Large Language Models | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.08.17 | Paper Link
-
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method | arXiv | 2024.06.18 | Paper Link
-
Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | arXiv | 2024.07.11 | Paper Link
-
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure | arXiv | 2024.07.22 | Paper Link
-
From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM | arXiv | 2024.07.24 | Paper Link
-
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation | Proceedings of the Workshop on Autonomous Cybersecurity | 2024.07.25 | Paper Link
-
Practical Attacks against Black-box Code Completion Engines | arXiv | 2024.08.05 | Paper Link
-
Using Retriever Augmented Large Language Models for Attack Graph Generation | arXiv | 2024.08.11 | Paper Link
-
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher | Sensors | 2024.08.21 | Paper Link
-
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks | arXiv | 2024.08.23 | Paper Link
-
Hacking, The Lazy Way: LLM Augmented Pentesting | arXiv | 2024.09.14 | Paper Link
-
On the Feasibility of Fully AI-automated Vishing Attacks | arXiv | 2024.09.20 | Paper Link
-
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | arXiv | 2024.10.25 | Paper Link
-
AutoPenBench: Benchmarking Generative Agents for Penetration Testing | arXiv | 2024.10.28 | Paper Link
-
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? | arXiv | 2024.11.02 | Paper Link
-
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing | arXiv | 2024.11.07 | Paper Link
-
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | arXiv | 2024.11.18 | Paper Link
-
Adapting to Cyber Threats: A Phishing Evolution Network (PEN) Framework for Phishing Generation and Analyzing Evolution Patterns using Large Language Models | arXiv | 2024.11.18 | Paper Link
-
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers | arXiv | 2024.11.22 | Paper Link
-
AI-Augmented Ethical Hacking: A Practical Examination of Manual Exploitation and Privilege Escalation in Linux Environments | arXiv | 2024.11.26 | Paper Link
-
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs | arXiv | 2024.11.27 | Paper Link
-
Hacking CTFs with Plain Agents | arXiv | 2024.12.03 | Paper Link
-
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | arXiv | 2024.12.02 | Paper Link
-
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents | arXiv | 2025.02.23 | Paper Link
Others
-
An LLM-based Framework for Fingerprinting Internet-connected Devices | ACM on Internet Measurement Conference | 2023.10.24 | Paper Link
-
Anatomy of an AI-powered malicious social botnet | arXiv | 2023.07.30 | Paper Link
-
Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation | arXiv | 2023.12.12 | Paper Link
-
LLM for SoC Security: A Paradigm Shift | IEEE Access | 2023.10.09 | Paper Link
-
Harnessing the Power of LLM to Support Binary Taint Analysis | arXiv | 2023.10.12 | Paper Link
-
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | arXiv | 2023.12.07 | Paper Link
-
LLM in the Shell: Generative Honeypots | EuroS&P Workshop | 2024.02.09 | Paper Link
-
Employing LLMs for Incident Response Planning and Review | arXiv | 2024.03.02 | Paper Link
-
Enhancing Network Management Using Code Generated by Large Language Models | Proceedings of the 22nd ACM Workshop on Hot Topics in Networks | 2023.08.11 | [Paper Link] (https://arxiv.org/abs/2308.06261)
-
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models | ICSE | 2023.07.18 | Paper Link
-
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions | CHI | 2024.02.07 | Paper Link
-
How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models | arXiv | 2024.04.16 | Paper Link
-
Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models | arXiv | 2024.04.24 | Paper Link
-
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | arXiv | 2024.04.29 | Paper Link
-
Large Language Models for Cyber Security: A Systematic Literature Review | arXiv | 2024.05.08 | Paper Link
-
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities | arXiv | 2024.05.08 | Paper Link
-
LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots | arXiv | 2024.05.10 | Paper Link
-
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions | arXiv | 2024.05.23 | Paper Link
-
Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | arXiv | 2024.06.09 | Paper Link
-
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications | arXiv | 2024.06.16 | Paper Link
-
On Large Language Models in National Security Applications | arXiv | 2024.07.03 | Paper Link
-
Disassembling Obfuscated Executables with LLM | arXiv | 2024.07.12 | Paper Link
-
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | arXiv | 2024.07.22 | Paper Link
-
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection | arXiv | 2024.07.26 | Paper Link
-
Beyond Detection: Leveraging Large Language Models for Cyber Attack Prediction in IoT Networks | arXiv | 2024.08.26 | Paper Link
-
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement | arXiv | 2024.09.12 | Paper Link
-
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | arXiv | 2024.09.15 | Paper Link
-
Contextualized AI for Cyber Defense: An Automated Survey using LLMs | arXiv | 2024.09.20 | Paper Link
-
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models | arXiv | 2024.09.25 | Paper Link
-
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security Research | EMNLP | 2024.10.02 | Paper Link
-
Integrating Large Language Models with Internet of Things Applications | arXiv | 2024.10.25 | Paper Link
-
Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education | arXiv | 2024.12.10 | Paper Link
-
Emerging Security Challenges of Large Language Models | arXiv | 2024.12.23 | Paper Link
-
Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | arXiv | 2024.12.30 | Paper Link
-
BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction | arXiv | 2025.01.03 | Paper Link
-
Empowering AIOps: Leveraging Large Language Models for IT Operations Management | arXiv | 2025.01.21 | Paper Link
RQ3: What are further research directions about the application of LLMs in cybersecurity?
Further Research: Agent4Cybersecurity
-
Cybersecurity Issues and Challenges | Handbook of research on cybersecurity issues and challenges for business and FinTech applications | 2022.08 | Paper Link
-
A unified cybersecurity framework for complex environments | Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists | 2018.09.26 | Paper Link
-
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution | arXiv | 2024.02.20 | Paper Link
-
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments | ICAART | 2023.08.28 | Paper Link
-
Llm agents can autonomously hack websites. | arXiv | 2024.02.16 | Paper Link
-
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides | ECAI | 2024.02.27 | Paper Link
-
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | arXiv | 2023.11.07 | Paper Link
-
The Rise and Potential of Large Language Model Based Agents: A Survey | arXiv | 2023.09.19 | Paper Link
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | ICLR | 2023.10.03 | Paper Link
-
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs | arXiv | 2024.02.28 | Paper Link
-
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. | arXiv | 2024.01.08 | Paper Link
-
TaskWeaver: A Code-First Agent Framework | arXiv | 2023.12.01 | Paper Link
-
Large Language Models for Networking: Applications, Enabling Techniques, and Challenges | arXiv | 2023.11.29 | Paper Link
-
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | EMNLP Findings | 2024.02.18 | Paper Link
-
WIPI: A New Web Threat for LLM-Driven Web Agents | arXiv | 2024.02.26 | Paper Link
-
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | ACL Findings | 2024.03.25 | Paper Link
-
LLM Agents can Autonomously Exploit One-day Vulnerabilities | arXiv | 2024.04.17 | Paper Link
-
Large Language Models for Networking: Workflow, Advances and Challenges | arXiv | 2024.04.29 | Paper Link
-
Generative AI in Cybersecurity | arXiv | 2024.05.02 | Paper Link
-
Generative AI and Large Language Models for Cyber Security: All Insights You Need | arXiv | 2024.05.21 | Paper Link
-
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | arXiv | 2024.06.02 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection | arXiv | 2024.08.20 | Paper Link
-
BreachSeek: A Multi-Agent Automated Penetration Tester | arXiv | 2024.08.31 | Paper Link
-
MarsCode Agent: AI-native Automated Bug Fixing | arXiv | 2024.09.04 | Paper Link
-
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild | arXiv | 2024.10.17 | Paper Link
-
Multi-Agent Collaboration in Incident Response with Large Language Models | arXiv | 2024.12.03 | Paper Link
-
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework | arXiv | 2025.01.23 | Paper Link
πBibTeX
@misc{zhang2024llms,
title={When LLMs Meet Cybersecurity: A Systematic Literature Review},
author={Jie Zhang and Haoyu Bu and Hui Wen and Yu Chen and Lun Li and Hongsong Zhu},
year={2024},
eprint={2405.03644},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
π₯ Updates
π[2025-03-03] We have updated the related papers up to Feb 28th, with 33 new papers added (2025.01.01-2025.02.28).
π[2025-01-21] We have updated the related papers up to Dec 31st, with 74 new papers added (2024.09.01-2024.12.31).
π[2025-01-08] We have included the publication venues for each paper.
π[2024-09-21] We have updated the related papers up to Aug 31st, with 75 new papers added (2024.06.01-2024.08.31).
- When LLMs Meet Cybersecurity: A Systematic Literature Review
- π₯ Updates
- π Introduction
- π© Features
- π Literatures
- πBibTeX
π Introduction
We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.
We seek to address three key questions:
- RQ1: How to construct cyber security-oriented domain LLMs?
- RQ2: What are the potential applications of LLMs in cybersecurity?
- RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?
π© Features
(2024.08.20) Our study encompasses an analysis of over 300 works, spanning across 25+ LLMs and more than 10 downstream scenarios.
π Literatures
RQ1: How to construct cybersecurity-oriented domain LLMs?
Cybersecurity Evaluation Benchmarks
-
CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity | arXiv | 2024.02.12 | Paper Link
-
SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models | Github | 2023 | Paper Link
-
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | arXiv | 2023.12.26 | Paper Link
-
Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. | Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security | 2022.11.09 | Paper Link
-
Can llms patch security issues? | arXiv | 2024.02.19 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
An empirical study of netops capability of pre-trained large language models. | arXiv | 2023.09.19 | Paper Link
-
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models | arXiv | 2024.02.16 | Paper Link
-
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | arXiv | 2023.12.07 | Paper Link
-
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations | IEEE/ACM International Conference on Mining Software Repositories | 2023.03.16 | Paper Link
-
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator | arXiv | 2024.04.22 | Paper Link
-
Assessing Cybersecurity Vulnerabilities in Code Large Language Models | arXiv | 2024.04.29 | Paper Link
-
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory | arXiv | 2024.05.30 | Paper Link
-
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | arXiv | 2024.06.09 | Paper Link
-
eyeballvul: a future-proof benchmark for vulnerability detection in the wild | arXiv | 2024.07.11 | Paper Link
-
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models | arXiv | 2024.08.03 | Paper Link
-
AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | arXiv | 2024.08.09 | Paper Link
-
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | arXiv | 2024.11.25 | Paper Link
-
AI Cyber Risk Benchmark: Automated Exploitation Capabilities | arXiv | 2024.12.09 | Paper Link
-
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | arXiv | 2024.12.31 | Paper Link
-
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training | arXiv | 2025.02.16 | Paper Link
Fine-tuned Domain LLMs for Cybersecurity
-
SecureFalcon: The Next Cyber Reasoning System for Cyber Security | arXiv | 2023.07.13 | Paper Link
-
Owl: A Large Language Model for IT Operations | ICLR | 2023.09.17 | Paper Link
-
HackMentor: Fine-tuning Large Language Models for Cybersecurity | TrustCom | 2023.09 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.02.29 | Paper Link
-
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair | arXiv | 2024.03.11 | Paper Link
-
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding | ISSRE | 2023.10.06 | Paper Link
-
Instruction Tuning for Secure Code Generation | ICML | 2024.02.14 | Paper Link
-
Nova+: Generative Language Models for Binaries | arXiv | 2023.11.27 | Paper Link
-
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | arXiv | 2024.04.30 | Paper Link
-
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models | arXiv | 2024.06.02 | Paper Link
-
Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models | arXiv | 2024.06.09 | Paper Link
-
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | arXiv | 2024.06.09 | Paper Link
-
IoT-LM: Large Multisensory Language Models for the Internet of Things | arXiv | 2024.07.13 | Paper Link
-
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions | arXiv | 2024.08.18 | Paper Link
-
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | arXiv | 2024.09.17 | Paper Link
-
AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs | arXiv | 2024.11.02 | Paper Link
-
Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection | arXiv | 2024.11.07 | Paper Link
RQ2: What are the potential applications of LLMs in cybersecurity?
Threat Intelligence
-
LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge | arXiv | 2024.01.18 | Paper Link
-
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | BigData | 2023.10.04 | Paper Link
-
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions | arXiv | 2023.08.22 | Paper Link
-
Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation | arXiv | 2024.01.12 | Paper Link
-
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures | Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses | 2023.08.09 | Paper Link
-
ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models | Forensic Sci. Int. Digit. Investig. | 2023.12.22 | Paper Link
-
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild | arXiv | 2023.07.14 | Paper Link
-
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection | arXiv | 2023.08.27 | Paper Link
-
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion | arXiv | 2023.12.21 | Paper Link
-
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 | arXiv | 2023.09.28 | Paper Link
-
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness | Expert Syst. Appl. | 2024.03.13 | Paper Link
-
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | arXiv | 2024.03.01 | Paper Link
-
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence | arXiv | 2024.05.06 | Paper Link
-
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | EuroS&P Workshop | 2024.05.08 | Paper Link
-
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models | arXiv | 2024.06.30 | Paper Link
-
LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI | arXiv | 2024.07.06 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features | arXiv | 2024.08.09 | Paper Link
-
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums | arXiv | 2024.08.08 | Paper Link
-
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | arXiv | 2024.08.12 | Paper Link
-
Usefulness of data flow diagrams and large language models for security threat validation: a registered report | arXiv | 2024.08.14 | Paper Link
-
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment | arXiv | 2024.08.15 | Paper Link
-
Evaluating the Usability of LLMs in Threat Intelligence Enrichment | arXiv | 2024.09.23 | Paper Link
-
Cyber Knowledge Completion Using Large Language Models | arXiv | 2024.09.24 | Paper Link
-
AI-Driven Cyber Threat Intelligence Automation | arXiv | 2024.10.27 | Paper Link
-
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity | arXiv | 2024.10.28 | Paper Link
-
IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery | arXiv | 2024.11.08| Paper Link
-
Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models | arXiv | 2024.12.16 | Paper Link
FUZZ
-
Augmenting Greybox Fuzzing with Generative AI | arXiv | 2023.06.11 | Paper Link
-
How well does LLM generate security tests? | arXiv | 2023.10.03 | Paper Link
-
Fuzz4All: Universal Fuzzing with Large Language Models | ICSE | 2024.01.15 | Paper Link
-
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models | ICSE | 2023.07.26 | Paper Link
-
Understanding Large Language Model Based Fuzz Driver Generation | arXiv | 2023.07.24 | Paper Link
-
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models | ISSTA | 2023.06.07 | Paper Link
-
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT | arXiv | 2023.04.04 | Paper Link
-
Large language model guided protocol fuzzing | NDSS | 2024.02.26 | Paper Link
-
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | USENIX | 2024.03.06 | Paper Link
-
When Fuzzing Meets LLMs: Challenges and Opportunities | ACM International Conference on the Foundations of Software Engineering | 2024.04.25 | Paper Link
-
An Exploratory Study on Using Large Language Models for Mutation Testing | arXiv | 2024.06.14 | Paper Link
-
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | arXiv | 2024.09.03 | Paper Link
-
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing | arXiv | 2024.11.05 | Paper Link
-
ChatHTTPFuzz: Large Language Model-Assisted IoT HTTP Fuzzing | arXiv | 2024.11.18 | Paper Link
-
Harnessing Large Language Models for Seed Generation in Greybox Fuzzing | arXiv | 2024.11.27 | Paper Link
-
Large Language Model assisted Hybrid Fuzzing | arXiv | 2024.12.19 | Paper Link
-
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models | arXiv | 2025.01.08 | Paper Link
Vulnerabilities Detection
-
Evaluation of ChatGPT Model for Vulnerability Detection | arXiv | 2023.04.12 | Paper Link
-
Detecting software vulnerabilities using Language Models | CSR | 2023.02.23 | Paper Link
-
Software Vulnerability Detection using Large Language Models | ISSRE Workshop | 2023.09.02 | Paper Link
-
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities | arXiv | 2023.11.16 | Paper Link
-
Software Vulnerability and Functionality Assessment using LLMs | arXiv | 2024.03.13 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.03.01 | Paper Link
-
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models | arXiv | 2023.11.15 | Paper Link
-
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism | arXiv | 2023.09.27 | Paper Link
-
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT | ICSE | 2023.08.24 | Paper Link
-
Using ChatGPT as a Static Application Security Testing Tool | arXiv | 2023.08.28 | Paper Link
-
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection | arXiv | 2024.01.13 | Paper Link
-
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives | TPS-ISA | 2023.10.16 | Paper Link
-
Software Vulnerability Detection with GPT and In-Context Learning | DSC | 2024.01.08 | Paper Link
-
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis | ICSE | 2023.12.25 | Paper Link
-
VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model | arXiv | 2023.08.09 | Paper Link
-
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning | arXiv | 2024.01.29 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection | arXiv | 2024.03.21 | Paper Link
-
How ChatGPT is Solving Vulnerability Management Problem | arXiv | 2023.11.11 | Paper Link
-
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection | International Symposium on Research in Attacks, Intrusions and Defenses | 2023.08.09 | Paper Link
-
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification | International Conference on Predictive Models and Data Analytics in Software Engineering | 2023.09.02 | Paper Link
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models | arXiv | 2023.12.22 | Paper Link
-
Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap | arXiv | 2024.04.04 | Paper Link
-
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection | Journal of Systems and Software | 2024.05.02 | Paper Link
-
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study | arXiv | 2024.05.24 | Paper Link
-
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | arXiv | 2024.05.27 | Paper Link
-
Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | ACL Findings | 2024.06.06 | Paper Link
-
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG | arXiv | 2024.06.19 | Paper Link
-
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | arXiv | 2024.06.26 | Paper Link
-
Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis | arXiv | 2024.06.27 | Paper Link
-
Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models | Information Security and Privacy | 2024.07.12 | Paper Link
-
Static Detection of Filesystem Vulnerabilities in Android Systems | arXiv | 2024.07.16 | Paper Link
-
SCoPE: Evaluating LLMs for Software Vulnerability Detection | arXiv | 2024.07.19 | Paper Link
-
Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection | arXiv | 2024.07.23 | Paper Link
-
Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Harnessing the Power of LLMs in Source Code Vulnerability Detection | arXiv | 2024.08.07 | Paper Link
-
Exploring RAG-based Vulnerability Augmentation with LLMs | arXiv | 2024.08.08 | Paper Link
-
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions | arXiv | 2024.08.14 | Paper Link
-
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data | arXiv | 2024.08.28 | Paper Link
-
Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection | European symposium on research in computer security | 2024.08.29 | Paper Link
-
SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection | arXiv | 2024.09.02 | Paper Link
-
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches | arXiv | 2024.09.11 | Paper Link
-
Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | arXiv | 2024.09.16 | Paper Link
-
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching | arXiv | 2024.09.17 | Paper Link
-
Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing | arXiv | 2024.09.24 | Paper Link
-
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | arXiv | 2024.09.27 | Paper Link
-
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? | arXiv | 2024.10.10 | Paper Link
-
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs | arXiv | 2024.10.22 | Paper Link
-
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | arXiv | 2024.11.07 | Paper Link
-
LProtector: An LLM-driven Vulnerability Detection System | arXiv | 2024.11.04 | Paper Link
-
Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection | arXiv | 2024.11.14 | Paper Link
-
CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection | arXiv | 2024.11.20 | Paper Link
-
EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | arXiv | 2024.11.25 | Paper Link
-
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics | arXiv | 2024.11.26 | Paper Link
-
ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models | arXiv | 2024.12.06 | Paper Link
-
Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | arXiv | 2024.12.16 | Paper Link
-
Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study | arXiv | 2024.12.24 | Paper Link
-
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection | arXiv | 2025.01.04 | Paper Link
-
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | arXiv | 2025.01.08 | Paper Link
-
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis | arXiv | 2025.01.07 | Paper Link
Insecure code Generation
-
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants | USENIX | 2023.02.27 | Paper Link
-
Bugs in Large Language Models Generated Code | arXiv | 2024.03.18 | Paper Link
-
Asleep at the Keyboard? Assessing the Security of GitHub Copilotβs Code Contributions | S&P | 2021.12.16 | Paper Link
-
The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis | arXiv | 2023.08.29 | Paper Link
-
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT | IEEE Trans. Software Eng. | 2023.08.09 | Paper Link
-
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code | arXiv | 2023.11.01 | Paper Link
-
Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation | NeurIPS | 2023.10.30 | Paper Link
-
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet | arXiv | 2023.12.19 | Paper Link
-
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages | arXiv | 2023.08.08 | Paper Link
-
How Secure is Code Generated by ChatGPT? | SMC | 2023.04.19 | Paper Link
-
Large Language Models for Code: Security Hardening and Adversarial Testing | ACM SIGSAC Conference on Computer and Communications Security | 2023.09.29 | Paper Link
-
Pop Quiz! Can a Large Language Model Help With Reverse Engineering? | arXiv | 2022.02.02 | Paper Link
-
LLM4Decompile: Decompiling Binary Code with Large Language Models | EMNLP | 2024.03.08 | Paper Link
-
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? | USENIX | 2024.03.05 | Paper Link
-
Understanding Programs by Exploiting (Fuzzing) Test Cases | ACL Findings | 2023.01.12 | Paper Link
-
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures | arXiv | 2023.08.07 | Paper Link
-
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 | arXiv | 2023.12.13 | Paper Link
-
Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats | Research Square | 2023.11.21 | Paper Link
-
Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models | arXiv | 2024.03.18 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions | ICSE | 2023.10.24 | Paper Link
-
FLAG: Finding Line Anomalies (in code) with Generative AI | arXiv | 2023.07.22 | Paper Link
-
Evolutionary Large Language Models for Hardware Security: A Comparative Survey | arXiv | 2024.04.25 | Paper Link
-
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models | arXiv | 2024.04.29 | Paper Link
-
LLM Security Guard for Code | International Conference on Evaluation and Assessment in Software Engineering | 2024.05.03 | Paper Link
-
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | arXiv | 2024.05.30 | Paper Link
-
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | arXiv | 2024.06.20 | Paper Link
-
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | arXiv | 2024.07.04 | Paper Link
-
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation | arXiv | 2024.08.17 | Paper Link
-
ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts | arXiv | 2024.09.15 | Paper Link
Program Repair
-
Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs | arXiv | 2023.11.06 | Paper Link
-
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | APR@ICSE | 2023.01.20 | Paper Link
-
AI-powered patching: the future of automated vulnerability fixes | google | 2024.01.31 | Paper Link
-
Practical Program Repair in the Era of Large Pre-trained Language Models | arXiv | 2022.10.25 | Paper Link
-
Security Code Review by LLMs: A Deep Dive into Responses | arXiv | 2024.01.29 | Paper Link
-
Examining Zero-Shot Vulnerability Repair with Large Language Models | SP | 2022.08.15 | Paper Link
-
How Effective Are Neural Networks for Fixing Security Vulnerabilities | ISSTA | 2023.05.29 | Paper Link
-
Can LLMs Patch Security Issues? | arXiv | 2024.02.19 | Paper Link
-
InferFix: End-to-End Program Repair with LLMs | ESEC/FSE | 2023.03.13 | Paper Link
-
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching | arXiv | 2023.08.24 | Paper Link
-
DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection | arXiv | 2023.08.14 | Paper Link
-
Fixing Hardware Security Bugs with Large Language Models | arXiv | 2023.02.02 | Paper Link
-
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models | WWW | 2023.03.19 | Paper Link
-
Enhanced Automated Code Vulnerability Repair using Large Language Models | Eng. Appl. Artif. Intell. | 2024.01.08 | Paper Link
-
Teaching Large Language Models to Self-Debug | ICLR | 2023.10.05 | Paper Link
-
Better Patching Using LLM Prompting, via Self-Consistency | ASE | 2023.08.16 | Paper Link
-
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.11.08 | Paper Link
-
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward | arXiv | 2024.02.22 | Paper Link
-
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs | arXiv | 2024.03.07 | Paper Link
-
When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? | ICSE | 2023.03.01 | Paper Link
-
Aligning LLMs for FL-free Program Repair | arXiv | 2024.04.13 | Paper Link
-
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs | arXiv | 2024.04.22 | Paper Link
-
How Far Can We Go with Practical Function-Level Program Repair? | arXiv | 2024.04.19 | Paper Link
-
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | arXiv | 2024.03.23 | Paper Link
-
A Systematic Literature Review on Large Language Models for Automated Program Repair | arXiv | 2024.05.12 | Paper Link
-
Automated Repair of AI Code with Large Language Models and Formal Verification | arXiv | 2024.05.14 | Paper Link
-
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback | Proceedings of the 1st ACM International Conference on AI-Powered Software | 2024.05.24 | Paper Link
-
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis | arXiv | 2024.06.04 | Paper Link
-
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models | ACM/IEEE International Symposium on Machine Learning for CAD | 2024.07.04 | Paper Link
-
ThinkRepair: Self-Directed Automated Program Repair | ACM SIGSOFT International Symposium on Software Testing and Analysis | 2024.07.30 | Paper Link
-
Revisiting Evolutionary Program Repair via Code Language Model | arXiv | 2024.08.20 | Paper Link
-
RePair: Automated Program Repair with Process-based Feedback | ACL Findings | 2024.08.21 | Paper Link
-
Enhancing LLM-Based Automated Program Repair with Design Rationales | ASE | 2024.08.22 | Paper Link
-
Automated Software Vulnerability Patching using Large Language Models | arXiv | 2024.08.24 | Paper Link
-
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | arXiv | 2024.08.26 | Paper Link
-
Fixing Code Generation Errors for Large Language Models | arXiv | 2024.09.01 | Paper Link
-
APOLLO: A GPT-based tool to detect phishing emails and generate explanations that warn users | arXiv | 2024.10.10 | Paper Link
-
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks | arXiv | 2024.10.20 | Paper Link
-
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation | arXiv | 2024.11.12 | Paper Link
-
Fixing Security Vulnerabilities with AI in OSS-Fuzz | arXiv | 2024.11.21 | Paper Link
-
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair | arXiv | 2024.12.05 | Paper Link
-
From Defects to Demands: A Unified, Iterative, and Heuristically Guided LLM-Based Framework for Automated Software Repair and Requirement Realization | arXiv | 2024.12.06 | Paper Link
-
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models | arXiv | 2025.01.07 | Paper Link
Anomaly Detection
-
Benchmarking Large Language Models for Log Analysis, Security, and Interpretation | J. Netw. Syst. Manag. | 2023.11.24 | Paper Link
-
Log-based Anomaly Detection based on EVT Theory with feedback | arXiv | 2023.09.30 | Paper Link
-
LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection | HPCC/DSS/SmartCity/DependSys | 2023.09.14 | Paper Link
-
LogGPT: Log Anomaly Detection via GPT | BigData | 2023.12.11 | Paper Link
-
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies | ICPC | 2024.01.26 | Paper Link
-
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | arXiv | 2024.03.02 | Paper Link
-
Web Content Filtering through knowledge distillation of Large Language Models | WI-IAT | 2023.05.10 | Paper Link
-
Application of Large Language Models to DDoS Attack Detection | International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles | 2024.02.05 | Paper Link
-
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach | arXiv | 2023.11.12 | Paper Link
-
Evaluating the Performance of ChatGPT for Spam Email Detection | arXiv | 2024.02.23 | Paper Link
-
Prompted Contextual Vectors for Spear-Phishing Detection | arXiv | 2024.02.14 | Paper Link
-
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models | arXiv | 2023.11.30 | Paper Link
-
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection | arXiv | 2023.10.30 | Paper Link
-
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices | IEEE Access | 2024.02.08 | Paper Link
-
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) | arXiv | 2023.09.27 | Paper Link
-
ChatGPT for digital forensic investigation: The good, the bad, and the unknown | Forensic Science International: Digital Investigation | 2023.07.10 | Paper Link
-
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | arXiv | 2024.04.23 | Paper Link
-
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing | ICSE | 2024.04.27 | Paper Link
-
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS | arXiv | 2024.05.12 | Paper Link
-
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection | arXiv | 2024.05.17 | Paper Link
-
Log Parsing with Self-Generated In-Context Learning and Self-Correction | arXiv | 2024.06.05 | Paper Link
-
Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | arXiv | 2024.06.06 | Paper Link
-
ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units | arXiv | 2024.06.11 | Paper Link
-
Anomaly Detection on Unstable Logs with GPT Models | arXiv | 2024.06.11 | Paper Link
-
Defending Against Social Engineering Attacks in the Age of LLMs | EMNLP | 2024.06.18 | Paper Link
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis | arXiv | 2024.07.02 | Paper Link
-
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection | arXiv | 2024.07.12 | Paper Link
-
Towards Explainable Network Intrusion Detection using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Utilizing Large Language Models to Optimize the Detection and Explainability of Phishing Websites | arXiv | 2024.08.11 | Paper Link
-
Multimodal Large Language Models for Phishing Webpage Detection and Identification | arXiv | 2024.08.12 | Paper Link
-
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | arXiv | 2024.08.14 | Paper Link
-
Automated Phishing Detection Using URLs and Webpages | arXiv | 2024.08.16 | Paper Link
-
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models | arXiv | 2024.08.25 | Paper Link
-
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model | arXiv | 2024.08.27 | Paper Link
-
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | arXiv | 2024.09.03 | Paper Link
-
A Comparative Study on Large Language Models for Log Parsing | arXiv | 2024.09.04 | Paper Link
-
Using Large Language Models for Template Detection from Security Event Logs | arXiv | 2024.09.08 | Paper Link
-
LogLLM: Log-based Anomaly Detection Using Large Language Models | arXiv | 2024.11.13 | Paper Link
-
LogLM: From Task-based to Instruction-based Automated Log Analysis | arXiv | 2024.10.12 | Paper Link
-
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | arXiv | 2024.12.03 | Paper Link
-
Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware | arXiv | 2025.01.08 | Paper Link
-
Confront Insider Threat: Precise Anomaly Detection in Behavior Logs Based on LLM Fine-Tuning | COLING | 2024 | Paper Link
LLM Assisted Attack
-
Identifying and mitigating the security risks of generative ai | Foundations and Trends in Privacy and Security | 2023.12.29 | Paper Link
-
Impact of Big Data Analytics and ChatGPT on Cybersecurity | I3CS | 2023.05.22 | Paper Link
-
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy | IEEE Access | 2023.07.03 | Paper Link
-
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing | arXiv | 2023.10.10 | Paper Link
-
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services | USENIX | 2024.01.06 | Paper Link
-
Evaluating LLMs for Privilege-Escalation Scenarios | arXiv | 2023.10.23 | Paper Link
-
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | arXiv | 2023.08.21 | Paper Link
-
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT | CNS | 2023.09.19 | Paper Link
-
From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude | arXiv | 2024.03.10 | Paper Link
-
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads | arXiv | 2023.05.24 | Paper Link
-
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | USENIX | 2023.08.13 | Paper Link
-
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks | arXiv | 2024.03.02 | Paper Link
-
RatGPT: Turning online LLMs into Proxies for Malware Attacks | arXiv | 2023.09.07 | Paper Link
-
Getting pwnβd by AI: Penetration Testing with Large Language Models | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.08.17 | Paper Link
-
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method | arXiv | 2024.06.18 | Paper Link
-
Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | arXiv | 2024.07.11 | Paper Link
-
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure | arXiv | 2024.07.22 | Paper Link
-
From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM | arXiv | 2024.07.24 | Paper Link
-
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation | Proceedings of the Workshop on Autonomous Cybersecurity | 2024.07.25 | Paper Link
-
Practical Attacks against Black-box Code Completion Engines | arXiv | 2024.08.05 | Paper Link
-
Using Retriever Augmented Large Language Models for Attack Graph Generation | arXiv | 2024.08.11 | Paper Link
-
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher | Sensors | 2024.08.21 | Paper Link
-
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks | arXiv | 2024.08.23 | Paper Link
-
Hacking, The Lazy Way: LLM Augmented Pentesting | arXiv | 2024.09.14 | Paper Link
-
On the Feasibility of Fully AI-automated Vishing Attacks | arXiv | 2024.09.20 | Paper Link
-
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | arXiv | 2024.10.25 | Paper Link
-
AutoPenBench: Benchmarking Generative Agents for Penetration Testing | arXiv | 2024.10.28 | Paper Link
-
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? | arXiv | 2024.11.02 | Paper Link
-
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing | arXiv | 2024.11.07 | Paper Link
-
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | arXiv | 2024.11.18 | Paper Link
-
Adapting to Cyber Threats: A Phishing Evolution Network (PEN) Framework for Phishing Generation and Analyzing Evolution Patterns using Large Language Models | arXiv | 2024.11.18 | Paper Link
-
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers | arXiv | 2024.11.22 | Paper Link
-
AI-Augmented Ethical Hacking: A Practical Examination of Manual Exploitation and Privilege Escalation in Linux Environments | arXiv | 2024.11.26 | Paper Link
-
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs | arXiv | 2024.11.27 | Paper Link
-
Hacking CTFs with Plain Agents | arXiv | 2024.12.03 | Paper Link
-
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | arXiv | 2024.12.02 | Paper Link
-
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents | arXiv | 2025.02.23 | Paper Link
Others
-
An LLM-based Framework for Fingerprinting Internet-connected Devices | ACM on Internet Measurement Conference | 2023.10.24 | Paper Link
-
Anatomy of an AI-powered malicious social botnet | arXiv | 2023.07.30 | Paper Link
-
Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation | arXiv | 2023.12.12 | Paper Link
-
LLM for SoC Security: A Paradigm Shift | IEEE Access | 2023.10.09 | Paper Link
-
Harnessing the Power of LLM to Support Binary Taint Analysis | arXiv | 2023.10.12 | Paper Link
-
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | arXiv | 2023.12.07 | Paper Link
-
LLM in the Shell: Generative Honeypots | EuroS&P Workshop | 2024.02.09 | Paper Link
-
Employing LLMs for Incident Response Planning and Review | arXiv | 2024.03.02 | Paper Link
-
Enhancing Network Management Using Code Generated by Large Language Models | Proceedings of the 22nd ACM Workshop on Hot Topics in Networks | 2023.08.11 | [Paper Link] (https://arxiv.org/abs/2308.06261)
-
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models | ICSE | 2023.07.18 | Paper Link
-
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions | CHI | 2024.02.07 | Paper Link
-
How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models | arXiv | 2024.04.16 | Paper Link
-
Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models | arXiv | 2024.04.24 | Paper Link
-
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | arXiv | 2024.04.29 | Paper Link
-
Large Language Models for Cyber Security: A Systematic Literature Review | arXiv | 2024.05.08 | Paper Link
-
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities | arXiv | 2024.05.08 | Paper Link
-
LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots | arXiv | 2024.05.10 | Paper Link
-
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions | arXiv | 2024.05.23 | Paper Link
-
Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | arXiv | 2024.06.09 | Paper Link
-
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications | arXiv | 2024.06.16 | Paper Link
-
On Large Language Models in National Security Applications | arXiv | 2024.07.03 | Paper Link
-
Disassembling Obfuscated Executables with LLM | arXiv | 2024.07.12 | Paper Link
-
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | arXiv | 2024.07.22 | Paper Link
-
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection | arXiv | 2024.07.26 | Paper Link
-
Beyond Detection: Leveraging Large Language Models for Cyber Attack Prediction in IoT Networks | arXiv | 2024.08.26 | Paper Link
-
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement | arXiv | 2024.09.12 | Paper Link
-
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | arXiv | 2024.09.15 | Paper Link
-
Contextualized AI for Cyber Defense: An Automated Survey using LLMs | arXiv | 2024.09.20 | Paper Link
-
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models | arXiv | 2024.09.25 | Paper Link
-
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security Research | EMNLP | 2024.10.02 | Paper Link
-
Integrating Large Language Models with Internet of Things Applications | arXiv | 2024.10.25 | Paper Link
-
Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education | arXiv | 2024.12.10 | Paper Link
-
Emerging Security Challenges of Large Language Models | arXiv | 2024.12.23 | Paper Link
-
Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | arXiv | 2024.12.30 | Paper Link
-
BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction | arXiv | 2025.01.03 | Paper Link
-
Empowering AIOps: Leveraging Large Language Models for IT Operations Management | arXiv | 2025.01.21 | Paper Link
RQ3: What are further research directions about the application of LLMs in cybersecurity?
Further Research: Agent4Cybersecurity
-
Cybersecurity Issues and Challenges | Handbook of research on cybersecurity issues and challenges for business and FinTech applications | 2022.08 | Paper Link
-
A unified cybersecurity framework for complex environments | Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists | 2018.09.26 | Paper Link
-
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution | arXiv | 2024.02.20 | Paper Link
-
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments | ICAART | 2023.08.28 | Paper Link
-
Llm agents can autonomously hack websites. | arXiv | 2024.02.16 | Paper Link
-
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides | ECAI | 2024.02.27 | Paper Link
-
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | arXiv | 2023.11.07 | Paper Link
-
The Rise and Potential of Large Language Model Based Agents: A Survey | arXiv | 2023.09.19 | Paper Link
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | ICLR | 2023.10.03 | Paper Link
-
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs | arXiv | 2024.02.28 | Paper Link
-
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. | arXiv | 2024.01.08 | Paper Link
-
TaskWeaver: A Code-First Agent Framework | arXiv | 2023.12.01 | Paper Link
-
Large Language Models for Networking: Applications, Enabling Techniques, and Challenges | arXiv | 2023.11.29 | Paper Link
-
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | EMNLP Findings | 2024.02.18 | Paper Link
-
WIPI: A New Web Threat for LLM-Driven Web Agents | arXiv | 2024.02.26 | Paper Link
-
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | ACL Findings | 2024.03.25 | Paper Link
-
LLM Agents can Autonomously Exploit One-day Vulnerabilities | arXiv | 2024.04.17 | Paper Link
-
Large Language Models for Networking: Workflow, Advances and Challenges | arXiv | 2024.04.29 | Paper Link
-
Generative AI in Cybersecurity | arXiv | 2024.05.02 | Paper Link
-
Generative AI and Large Language Models for Cyber Security: All Insights You Need | arXiv | 2024.05.21 | Paper Link
-
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | arXiv | 2024.06.02 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection | arXiv | 2024.08.20 | Paper Link
-
BreachSeek: A Multi-Agent Automated Penetration Tester | arXiv | 2024.08.31 | Paper Link
-
MarsCode Agent: AI-native Automated Bug Fixing | arXiv | 2024.09.04 | Paper Link
-
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild | arXiv | 2024.10.17 | Paper Link
-
Multi-Agent Collaboration in Incident Response with Large Language Models | arXiv | 2024.12.03 | Paper Link
-
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework | arXiv | 2025.01.23 | Paper Link
πBibTeX
@misc{zhang2024llms,
title={When LLMs Meet Cybersecurity: A Systematic Literature Review},
author={Jie Zhang and Haoyu Bu and Hui Wen and Yu Chen and Lun Li and Hongsong Zhu},
year={2024},
eprint={2405.03644},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM4Cybersecurity
Similar Open Source Tools

Awesome-LLM4Cybersecurity
The repository 'Awesome-LLM4Cybersecurity' provides a comprehensive overview of the applications of Large Language Models (LLMs) in cybersecurity. It includes a systematic literature review covering topics such as constructing cybersecurity-oriented domain LLMs, potential applications of LLMs in cybersecurity, and research directions in the field. The repository analyzes various benchmarks, datasets, and applications of LLMs in cybersecurity tasks like threat intelligence, fuzzing, vulnerabilities detection, insecure code generation, program repair, anomaly detection, and LLM-assisted attacks.

tamingLLMs
The 'Taming LLMs' repository provides a practical guide to the pitfalls and challenges associated with Large Language Models (LLMs) when building applications. It focuses on key limitations and implementation pitfalls, offering practical Python examples and open source solutions to help engineers and technical leaders navigate these challenges. The repository aims to equip readers with the knowledge to harness the power of LLMs while avoiding their inherent limitations.

learnopencv
LearnOpenCV is a repository containing code for Computer Vision, Deep learning, and AI research articles shared on the blog LearnOpenCV.com. It serves as a resource for individuals looking to enhance their expertise in AI through various courses offered by OpenCV. The repository includes a wide range of topics such as image inpainting, instance segmentation, robotics, deep learning models, and more, providing practical implementations and code examples for readers to explore and learn from.

awesome-llm-planning-reasoning
The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

Awesome-LLM-Constrained-Decoding
Awesome-LLM-Constrained-Decoding is a curated list of papers, code, and resources related to constrained decoding of Large Language Models (LLMs). The repository aims to facilitate reliable, controllable, and efficient generation with LLMs by providing a comprehensive collection of materials in this domain.

PredictorLLM
PredictorLLM is an advanced trading agent framework that utilizes large language models to automate trading in financial markets. It includes a profiling module to establish agent characteristics, a layered memory module for retaining and prioritizing financial data, and a decision-making module to convert insights into trading strategies. The framework mimics professional traders' behavior, surpassing human limitations in data processing and continuously evolving to adapt to market conditions for superior investment outcomes.

goodai-ltm-benchmark
This repository contains code and data for replicating experiments on Long-Term Memory (LTM) abilities of conversational agents. It includes a benchmark for testing agents' memory performance over long conversations, evaluating tasks requiring dynamic memory upkeep and information integration. The repository supports various models, datasets, and configurations for benchmarking and reporting results.

are-copilots-local-yet
Current trends and state of the art for using open & local LLM models as copilots to complete code, generate projects, act as shell assistants, automatically fix bugs, and more. This document is a curated list of local Copilots, shell assistants, and related projects, intended to be a resource for those interested in a survey of the existing tools and to help developers discover the state of the art for projects like these.

Awesome-Resource-Efficient-LLM-Papers
A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

Awesome-Tabular-LLMs
This repository is a collection of papers on Tabular Large Language Models (LLMs) specialized for processing tabular data. It includes surveys, models, and applications related to table understanding tasks such as Table Question Answering, Table-to-Text, Text-to-SQL, and more. The repository categorizes the papers based on key ideas and provides insights into the advancements in using LLMs for processing diverse tables and fulfilling various tabular tasks based on natural language instructions.

LLM4Opt
LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

Awesome-Model-Merging-Methods-Theories-Applications
A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

data-prep-kit
Data Prep Kit accelerates unstructured data preparation for LLM app developers. It allows developers to cleanse, transform, and enrich unstructured data for pre-training, fine-tuning, instruct-tuning LLMs, or building RAG applications. The kit provides modules for Python, Ray, and Spark runtimes, supporting Natural Language and Code data modalities. It offers a framework for custom transforms and uses Kubeflow Pipelines for workflow automation. Users can install the kit via PyPi and access a variety of transforms for data processing pipelines.

llm-engineer-toolkit
The LLM Engineer Toolkit is a curated repository containing over 120 LLM libraries categorized for various tasks such as training, application development, inference, serving, data extraction, data generation, agents, evaluation, monitoring, prompts, structured outputs, safety, security, embedding models, and other miscellaneous tools. It includes libraries for fine-tuning LLMs, building applications powered by LLMs, serving LLM models, extracting data, generating synthetic data, creating AI agents, evaluating LLM applications, monitoring LLM performance, optimizing prompts, handling structured outputs, ensuring safety and security, embedding models, and more. The toolkit covers a wide range of tools and frameworks to streamline the development, deployment, and optimization of large language models.

Awesome-Neuro-Symbolic-Learning-with-LLM
The Awesome-Neuro-Symbolic-Learning-with-LLM repository is a curated collection of papers and resources focusing on improving reasoning and planning capabilities of Large Language Models (LLMs) and Multi-Modal Large Language Models (MLLMs) through neuro-symbolic learning. It covers a wide range of topics such as neuro-symbolic visual reasoning, program synthesis, logical reasoning, mathematical reasoning, code generation, visual reasoning, geometric reasoning, classical planning, game AI planning, robotic planning, AI agent planning, and more. The repository provides a comprehensive overview of tutorials, workshops, talks, surveys, papers, datasets, and benchmarks related to neuro-symbolic learning with LLMs and MLLMs.

writing
The LLM Creative Story-Writing Benchmark evaluates large language models based on their ability to incorporate a set of 10 mandatory story elements in a short narrative. It measures constraint satisfaction and literary quality by grading models on character development, plot structure, atmosphere, storytelling impact, authenticity, and execution. The benchmark aims to assess how well models can adapt to rigid requirements, remain original, and produce cohesive stories using all assigned elements.
For similar tasks

Awesome-LLM4Cybersecurity
The repository 'Awesome-LLM4Cybersecurity' provides a comprehensive overview of the applications of Large Language Models (LLMs) in cybersecurity. It includes a systematic literature review covering topics such as constructing cybersecurity-oriented domain LLMs, potential applications of LLMs in cybersecurity, and research directions in the field. The repository analyzes various benchmarks, datasets, and applications of LLMs in cybersecurity tasks like threat intelligence, fuzzing, vulnerabilities detection, insecure code generation, program repair, anomaly detection, and LLM-assisted attacks.

watchtower
AIShield Watchtower is a tool designed to fortify the security of AI/ML models and Jupyter notebooks by automating model and notebook discoveries, conducting vulnerability scans, and categorizing risks into 'low,' 'medium,' 'high,' and 'critical' levels. It supports scanning of public GitHub repositories, Hugging Face repositories, AWS S3 buckets, and local systems. The tool generates comprehensive reports, offers a user-friendly interface, and aligns with industry standards like OWASP, MITRE, and CWE. It aims to address the security blind spots surrounding Jupyter notebooks and AI models, providing organizations with a tailored approach to enhancing their security efforts.

LLM-PLSE-paper
LLM-PLSE-paper is a repository focused on the applications of Large Language Models (LLMs) in Programming Language and Software Engineering (PL/SE) domains. It covers a wide range of topics including bug detection, specification inference and verification, code generation, fuzzing and testing, code model and reasoning, code understanding, IDE technologies, prompting for reasoning tasks, and agent/tool usage and planning. The repository provides a comprehensive collection of research papers, benchmarks, empirical studies, and frameworks related to the capabilities of LLMs in various PL/SE tasks.

invariant
Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.

OpenRedTeaming
OpenRedTeaming is a repository focused on red teaming for generative models, specifically large language models (LLMs). The repository provides a comprehensive survey on potential attacks on GenAI and robust safeguards. It covers attack strategies, evaluation metrics, benchmarks, and defensive approaches. The repository also implements over 30 auto red teaming methods. It includes surveys, taxonomies, attack strategies, and risks related to LLMs. The goal is to understand vulnerabilities and develop defenses against adversarial attacks on large language models.

quark-engine
Quark Engine is an AI-powered tool designed for analyzing Android APK files. It focuses on enhancing the detection process for auto-suggestion, enabling users to create detection workflows without coding. The tool offers an intuitive drag-and-drop interface for workflow adjustments and updates. Quark Agent, the core component, generates Quark Script code based on natural language input and feedback. The project is committed to providing a user-friendly experience for designing detection workflows through textual and visual methods. Various features are still under development and will be rolled out gradually.

vulnerability-analysis
The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.

CodeAsk
CodeAsk is a code analysis tool designed to tackle complex issues such as code that seems to self-replicate, cryptic comments left by predecessors, messy and unclear code, and long-lasting temporary solutions. It offers intelligent code organization and analysis, security vulnerability detection, code quality assessment, and other interesting prompts to help users understand and work with legacy code more efficiently. The tool aims to translate 'legacy code mountains' into understandable language, creating an illusion of comprehension and facilitating knowledge transfer to new team members.
For similar jobs

ciso-assistant-community
CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

PurpleLlama
Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.

vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.

taranis-ai
Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.

NightshadeAntidote
Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.

h4cker
This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.

AIMr
AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.

admyral
Admyral is an open-source Cybersecurity Automation & Investigation Assistant that provides a unified console for investigations and incident handling, workflow automation creation, automatic alert investigation, and next step suggestions for analysts. It aims to tackle alert fatigue and automate security workflows effectively by offering features like workflow actions, AI actions, case management, alert handling, and more. Admyral combines security automation and case management to streamline incident response processes and improve overall security posture. The tool is open-source, transparent, and community-driven, allowing users to self-host, contribute, and collaborate on integrations and features.