llmxcpg

llmxcpg

Source code for LLMxCPG paper

Stars: 111

Visit
 screenshot

LLMxCPG is a framework for vulnerability detection using Code Property Graphs (CPG) and Large Language Models (LLM). It involves a two-phase process: Slice Construction where an LLM generates queries for a CPG to extract a code slice, and Vulnerability Detection where another LLM classifies the code slice as vulnerable or safe. The repository includes implementations of baseline models, information on datasets, scripts for running models, prompt templates, query generation examples, and configurations for fine-tuning models.

README:

LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models

📣 News

  • [2025.11] LLMxCPG wins the MENA CSAW Applied Research Competition (ARC) 2025.
  • [2025.10] LLMxCPG is accepted to the MENA CSAW Applied Research Competition (ARC) 2025.
  • [2025.08] LLMxCPG is presented at USENIX Security 2025.

Overview

This repository contains the source code for LLMxCPG, a framework for vulnerability detection using Code Property Graphs (CPG) and Large Language Models (LLM).

The core methodology involves a two-phase process:

  1. Slice Construction: An LLM generates specific queries for a Code Property Graph to extract a minimal, relevant "slice" of code that may contain a vulnerability.

  2. Vulnerability Detection: A second LLM analyzes the extracted code slice to classify it as either vulnerable or safe.

Citation

If you use this codebase in your research, please cite the associated paper:

@inproceedings{lekssays2025llmxcpg,
  title={$\{$LLMxCPG$\}$:$\{$Context-Aware$\}$ Vulnerability Detection Through Code Property $\{$Graph-Guided$\}$ Large Language Models},
  author={Lekssays, Ahmed and Mouhcine, Hamza and Tran, Khang and Yu, Ting and Khalil, Issa},
  booktitle={34th USENIX Security Symposium (USENIX Security 25)},
  pages={489--507},
  year={2025}
}

Issues

If you encounter any issues with our codebase, please open an issue in the repository. This is the most effective way for us to assist you.

Repository Structure

.
├── baselines/      # Implementations of baseline models for comparison.
├── data/           # Information on datasets used.
├── inference/      # Scripts for running the LLMxCPG-Q and LLMxCPG-D models.
├── prompts/        # Prompt templates for query generation and classification.
├── queries/        # LLMxCPG-Q generation process and examples of generated CPGQL queries.
├── training/       # Scripts and configurations for fine-tuning the models.
└── README.md

Getting Started

Models

Our finetuned models (i.e., LLMxCPG-Q and LLMxCPG-D) are available on Hugging Face at: 🤗 LLMxCPG Collection.

Prerequisites

  • Docker

  • Python 3.8+

  • Joern - for CPG generation and querying (tested with v4.0.408)

Installation

  1. Clone the repository:

    git clone https://github.com/qcri/llmxcpg
    cd llmxcpg
    
  2. Install Python dependencies:

    pip install -r requirements.txt
    

Training

The models can be fine-tuned using the scripts provided in the training/ directory.

  • Query Generation Model (LLMxCPG-Q): Fine-tuned from Qwen2.5-Coder-32B-Instruct.

  • Detection Model (LLMxCPG-D): Fine-tuned from QwQ-32B-Preview.

The training process uses the Unsloth framework and employs Low-Rank Adaptation (LoRA) for efficient fine-tuning. Refer to the scripts and configurations in the training/ directory for details.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for llmxcpg

Similar Open Source Tools

For similar tasks

For similar jobs