LLM-PLSE-paper

LLM-PLSE-paper

None

Stars: 125

Visit
 screenshot

LLM-PLSE-paper is a repository focused on the applications of Large Language Models (LLMs) in Programming Language and Software Engineering (PL/SE) domains. It covers a wide range of topics including bug detection, specification inference and verification, code generation, fuzzing and testing, code model and reasoning, code understanding, IDE technologies, prompting for reasoning tasks, and agent/tool usage and planning. The repository provides a comprehensive collection of research papers, benchmarks, empirical studies, and frameworks related to the capabilities of LLMs in various PL/SE tasks.

README:

Reference

PL/SE Applications

Bug Detection

Benchmark and Empirical Study

  • LLMs: Understanding Code Syntax and Semantics for Code Analysis, arxiv 2024, Link

  • Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection, arxiv 2024, Link

  • LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. S&P 2024, Link

  • Vulnerability Detection with Code Language Models: How Far Are We? arxiv 2024, Link

  • A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection, arxiv 2024, Link

  • How Far Have We Gone in Vulnerability Detection Using Large Language Models, arxiv, Link

  • Large Language Models for Code Analysis: Do LLMs Really Do Their Job?, Usenix Security 2023, Link

  • Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities, arxiv 2023, Link

  • Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection, arXiv, Link

  • DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection, RAID 2023, Link

  • SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models, Link

General Analysis

  • Program Slicing in the Era of Large Language Models, arxiv 2024, Link

  • LLMDFA: Analyzing Dataflow in Code with Large Language Models, NeurIPS 2024, Link

  • Sanitizing Large Language Models in Bug Detection with Data-Flow, Findings of EMNLP 2024, Link

  • LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning, arxiv, Link

  • Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning, FSE 2024, Link

  • Large Language Models for Test-Free Fault Localization, ICSE 2024, Link

  • Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection, axiv, Link

  • Finetuning Large Language Models for Vulnerability Detection, arxiv, Link

  • A Learning-Based Approach to Static Program Slicing. OOPSLA 2024, Link

  • Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. ICSE 2024, Link

  • E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. arXiv, Link

Domain-Specific Bug Detection(Domain-Specific Program & Bug Type)

  • Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications, ICSE 2025, Link

  • Interleaving Static Analysis and LLM Prompting, SOAP 2024, Link

  • Using an LLM to Help With Code Understanding, ICSE 2024, Link

  • Code Linting using Language Models, arxiv 2024, Link

  • LLM-Assisted Static Analysis for Detecting Security Vulnerabilities, arxiv, Link

  • SMARTINV: Multimodal Learning for Smart Contract Invariant Inference, S&P 2024, Link

  • LLM-based Resource-Oriented Intention Inference for Static Resource Detection, arxiv, Link

  • Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach, OOPSLA 2024, Link

  • Do you still need a manual smart contract audit? Link

  • Harnessing the Power of LLM to Support Binary Taint Analysis, arxiv, Link

  • Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. arXiv, Link

  • GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. ICSE 2024 Link

  • Continuous Learning for Android Malware Detection, USENIX Security 2023, Link

  • Beware of the Unexpected: Bimodal Taint Analysis, ISSTA 2023, Link

Specification Inference and Verification

  • SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications, arxiv 2024/09, Link

  • Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? FSE 2024, Link

  • Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification, CAV 2024, Link

  • SpecGen: Automated Generation of Formal Program Specifications via Large Language Models, Link

  • Lemur: Integrating Large Language Models in Automated Program Verification, ICLR 2024, Link

  • Zero and Few-shot Semantic Parsing with Ambiguous Inputs, ICLR 2024, Link

  • Finding Inductive Loop Invariants using Large Language Models, Link

  • Can ChatGPT support software verification? arXiv, Link

  • Impact of Large Language Models on Generating Software Specifications, Link

  • Can Large Language Models Reason about Program Invariants?, ICML 2023, Link

  • Ranking LLM-Generated Loop Invariants for Program Verification, Link

Code Generation (Program Repair, Code Completion, and Program Synthesis)

  • Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search, NeurIPS 2024, Link

  • EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories, arxiv 2024/03, Link

  • CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks,

  • AutoCodeRover: Autonomous Program Improvement, ISSTA 2024, Link

  • Exploring and Unleashing the Power of Large Language Models in Automated Code Translation, FSE 2024, Link

  • Rectifier: Code Translation with Corrector via LLMs, arxiv 2024, Link

  • RepairAgent: An Autonomous, LLM-Based Agent for Program Repair, Link

  • LongCoder: A Long-Range Pre-trained Language Model for Code Completion, ICML 2023, Link

  • Learning Performance-Improving Code Edits, ICLR 2024, Link

  • PyDex: Repairing Bugs in Introductory Python Assignments using LLMs, OOPSLA 2024, Link

  • Automatic Programming: Large Language Models and Beyond, arxiv 2024, (Mark) Link

  • Towards AI-Assisted Synthesis of Verified Dafny Methods, FSE 2024, Link

  • Enabling Memory Safety of C Programs using LLMs, arxiv, Link

  • CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, ICLR 2024, Link

  • Is Self-Repair a Silver Bullet for Code Generation? ICLR 2024, Link

  • Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search Link

  • Hypothesis Search: Inductive Reasoning with Language Models, ICLR 2024, Link

  • CodePlan: Repository-level Coding using LLMs and Planning, FMDM & NIPS 2023, Link

  • Repository-Level Prompt Generation for Large Language Models of Code. ICML 2023, Link

  • Refactoring Programs Using Large Language Models with Few-Shot Examples. arXiv, Link

  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link

  • Guess & Sketch: Language Model Guided Transpilation, ICLR 2024, Link

  • Optimal Neural Program Synthesis from Multimodal Specifications, EMNLP 2021, Link

  • CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation, ICLR 2022, Link

  • Sporq: An Interactive Environment for Exploring Code Using Query-by-Example, UIST 2021, Link

  • Data Extraction via Semantic Regular Expression Synthesis, OOPSLA 2023, Link

  • Web Question Answering with Neurosymbolic Program Synthesis, PLDI 2021, Link

  • Active Inductive Logic Programming for Code Search, ICSE 2019, Link

Fuzzing, Testing, and Debugging

  • Effective Large Language Model Debugging with Best-first Tree Search, Link

  • Teaching Large Language Models to Self-Debug, ICLR 2024, Link

  • When Fuzzing Meets LLMs: Challenges and Opportunities, FSE 2024, Link

  • Evaluating Offensive Security Capabilities of Large Language Models, Google, Link

  • An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation, TSE 2024, Link

  • LLMorpheus: Mutation Testing using Large Language Models, arxiv 2014, Frank Tip, Link

  • Towards Understanding the Effectiveness of Large Langauge Models on Directed Test Input Generation, ASE 2024, Link

  • Evaluating Offensive Security Capabilities of Large Language Models, Google 2024/06, Link

  • Prompt Fuzzing for Fuzz Driver Generation, CCS 2024, Link

  • Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. ICSE 2024. Link

  • LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models Link

  • Large Language Model guided Protocol Fuzzing, NDSS 2024, Link

  • Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models, ISSTA 2023, Link

  • Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models, arxiv 2024, Link

  • Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag, MASEC@NeurIPS 2023, Link

  • Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting, Link

  • LPR: Large Language Models-Aided Program Reduction. ISSTA 2024, Link

Code Model and Code Reasoning

  • SemCoder: Training Code Language Models with Comprehensive Semantics, NeurIPS 2024, Link

  • Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases, NeurIPS 2024, Link

  • CodeMind: A Framework to Challenge Large Language Models for Code Reasoning, arxiv, Link

  • CodeFort: Robust Training for Code Generation Models, EMNLP Findings 2024, Link

  • Meta Large Language Model Compiler: Foundation Models of Compiler Optimization, Meta, Link

  • Constrained Decoding for Secure Code Generation, arxiv, Link

  • Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks, OOPSLA 2024, Link

  • Detecting Misuse of Security APIs: A Systematic Review, Arxiv 2024, Link

  • An Investigation into Misuse of Java Security APIs by Large Language Models, ASIACCS 2024, Link

  • Large Language Models for Code: Security Hardening and Adversarial Testing, CCS 2023, Link, Code

  • Instruction Tuning for Secure Code Generation, ICML 2024, Link

  • jTrans: jump-aware transformer for binary code similarity detection, ISSTA 2022, Link

  • Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs, FSE 2024.

  • Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? ICSE 2024, Link

  • Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs, arxiv, Link

  • CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking, FSE 2024, Link

  • FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations, ICSE 2024, Link

  • Symmetry-Preserving Program Representations for Learning Code Semantics Link

  • ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries, CCS 2024, Link

  • LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis, Link

  • When Do Program-of-Thought Works for Reasoning? AAAI 2024 Link

  • Grounded Copilot: How Programmers Interact with Code-Generating Models, OOPSLA 2023, Link

  • Extracting Training Data from Large Language Models, USENIX Security 2023, Link

  • How could Neural Networks understand Programs? ICML 2021, Link

  • ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations, ICML 2021, Link

  • GraphCodeBert: Pre-training Code Representations with Data Flow, ICLR 2021, Link

  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages, EMNLP 2020, Link

  • Neural Code Comprehension: A Learnable Representation of Code Semantics, NeurIPS 2018, Link

Prompting (for Reasoning Tasks) and Hallucinations

  • GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, Apple, Link

  • Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models, OOPSLA 2024, Link

  • Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024, Link

  • LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, NeurIPS 2023, Link

  • Large Language Models for Automatic Equation Discovery, arxiv, Link

  • Self-Evaluation Guided Beam Search for Reasoning, NeurIPS 2023, Link

  • Self-consistency improves chain of thought reasoning in language models. NeurIPS 2022, Link

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023, Link

  • Cumulative Reasoning With Large Language Models, Link

  • Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting, EMNLP 2023, Link

  • Complementary Explanations for Effective In-Context Learning, ACL 2023, Link

  • Wechat Post: 大语言模型的数学之路 Link

  • Blog: Prompt Engineering Link

Agent, Tool Using, and Planning

  • Steering Large Language Models between Code Execution and Textual Reasoning, Microsoft, Link

  • Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs, Meta, Link

  • Natural Language Commanding via Program Synthesis, Microsoft Link

  • Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Feifei Li, Google Link

  • Real-world practices of AI Agents, Link

  • Cognitive Architectures for Language Agents, Link

  • The Rise and Potential of Large Language Model Based Agents: A Survey, Link

  • ReAct: Synergizing Reasoning and Acting in Language Models Link

  • Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, Link

  • Wechat Post: AutoGen, Link

  • SATLM: Satisfiability-Aided Language Models Using Declarative Prompting, NeurIPS 2023, Link

  • Awesome things about LLM-powered agents: Papers, Repos, and Blogs, Link

  • ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. Link

  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link

Model and Framework

  • LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. Link

  • codellama: Inference code for CodeLlama models, Link

  • CodeFuse: LLM for Code from Ant Group, Link

  • Owl-LM: Large Language Model for Blockchain, Link

Survey

  • Large Language Model-Based Agents for Software Engineering: A Survey Link

  • A Survey on Large Language Models for Code Generation, arxiv 2024, Link

  • Comprehensive Outline of Large Language Model-based Multi-Agent Research, Tsinghua NLP Group, Link

  • If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Link

  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing, Link

  • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Link

  • Large Language Models for Software Engineering: A Systematic Literature Review, Link

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LLM-PLSE-paper

Similar Open Source Tools

For similar tasks

For similar jobs