LLM-PLSE-paper

None

Stars: 125

Visit

LLM-PLSE-paper is a repository focused on the applications of Large Language Models (LLMs) in Programming Language and Software Engineering (PL/SE) domains. It covers a wide range of topics including bug detection, specification inference and verification, code generation, fuzzing and testing, code model and reasoning, code understanding, IDE technologies, prompting for reasoning tasks, and agent/tool usage and planning. The repository provides a comprehensive collection of research papers, benchmarks, empirical studies, and frameworks related to the capabilities of LLMs in various PL/SE tasks.

README:

Reference

PL/SE Applications

Bug Detection

Benchmark and Empirical Study

LLMs: Understanding Code Syntax and Semantics for Code Analysis, arxiv 2024, Link
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection, arxiv 2024, Link
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. S&P 2024, Link
Vulnerability Detection with Code Language Models: How Far Are We? arxiv 2024, Link
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection, arxiv 2024, Link
How Far Have We Gone in Vulnerability Detection Using Large Language Models, arxiv, Link
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?, Usenix Security 2023, Link
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities, arxiv 2023, Link
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection, arXiv, Link
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection, RAID 2023, Link
SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models, Link

General Analysis

Program Slicing in the Era of Large Language Models, arxiv 2024, Link
LLMDFA: Analyzing Dataflow in Code with Large Language Models, NeurIPS 2024, Link
Sanitizing Large Language Models in Bug Detection with Data-Flow, Findings of EMNLP 2024, Link
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning, arxiv, Link
Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning, FSE 2024, Link
Large Language Models for Test-Free Fault Localization, ICSE 2024, Link
Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection, axiv, Link
Finetuning Large Language Models for Vulnerability Detection, arxiv, Link
A Learning-Based Approach to Static Program Slicing. OOPSLA 2024, Link
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. ICSE 2024, Link
E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. arXiv, Link

Domain-Specific Bug Detection(Domain-Specific Program & Bug Type)

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications, ICSE 2025, Link
Interleaving Static Analysis and LLM Prompting, SOAP 2024, Link
Using an LLM to Help With Code Understanding, ICSE 2024, Link
Code Linting using Language Models, arxiv 2024, Link
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities, arxiv, Link
SMARTINV: Multimodal Learning for Smart Contract Invariant Inference, S&P 2024, Link
LLM-based Resource-Oriented Intention Inference for Static Resource Detection, arxiv, Link
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach, OOPSLA 2024, Link
Do you still need a manual smart contract audit? Link
Harnessing the Power of LLM to Support Binary Taint Analysis, arxiv, Link
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. arXiv, Link
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. ICSE 2024 Link
Continuous Learning for Android Malware Detection, USENIX Security 2023, Link
Beware of the Unexpected: Bimodal Taint Analysis, ISSTA 2023, Link

Specification Inference and Verification

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications, arxiv 2024/09, Link
Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? FSE 2024, Link
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification, CAV 2024, Link
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models, Link
Lemur: Integrating Large Language Models in Automated Program Verification, ICLR 2024, Link
Zero and Few-shot Semantic Parsing with Ambiguous Inputs, ICLR 2024, Link
Finding Inductive Loop Invariants using Large Language Models, Link
Can ChatGPT support software verification? arXiv, Link
Impact of Large Language Models on Generating Software Specifications, Link
Can Large Language Models Reason about Program Invariants?, ICML 2023, Link
Ranking LLM-Generated Loop Invariants for Program Verification, Link

Code Generation (Program Repair, Code Completion, and Program Synthesis)

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search, NeurIPS 2024, Link
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories, arxiv 2024/03, Link
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks,
AutoCodeRover: Autonomous Program Improvement, ISSTA 2024, Link
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation, FSE 2024, Link
Rectifier: Code Translation with Corrector via LLMs, arxiv 2024, Link
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair, Link
LongCoder: A Long-Range Pre-trained Language Model for Code Completion, ICML 2023, Link
Learning Performance-Improving Code Edits, ICLR 2024, Link
PyDex: Repairing Bugs in Introductory Python Assignments using LLMs, OOPSLA 2024, Link
Automatic Programming: Large Language Models and Beyond, arxiv 2024, (Mark) Link
Towards AI-Assisted Synthesis of Verified Dafny Methods, FSE 2024, Link
Enabling Memory Safety of C Programs using LLMs, arxiv, Link
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, ICLR 2024, Link
Is Self-Repair a Silver Bullet for Code Generation? ICLR 2024, Link
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search Link
Hypothesis Search: Inductive Reasoning with Language Models, ICLR 2024, Link
CodePlan: Repository-level Coding using LLMs and Planning, FMDM & NIPS 2023, Link
Repository-Level Prompt Generation for Large Language Models of Code. ICML 2023, Link
Refactoring Programs Using Large Language Models with Few-Shot Examples. arXiv, Link
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link
Guess & Sketch: Language Model Guided Transpilation, ICLR 2024, Link
Optimal Neural Program Synthesis from Multimodal Specifications, EMNLP 2021, Link
CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation, ICLR 2022, Link
Sporq: An Interactive Environment for Exploring Code Using Query-by-Example, UIST 2021, Link
Data Extraction via Semantic Regular Expression Synthesis, OOPSLA 2023, Link
Web Question Answering with Neurosymbolic Program Synthesis, PLDI 2021, Link
Active Inductive Logic Programming for Code Search, ICSE 2019, Link

Fuzzing, Testing, and Debugging

Effective Large Language Model Debugging with Best-first Tree Search, Link
Teaching Large Language Models to Self-Debug, ICLR 2024, Link
When Fuzzing Meets LLMs: Challenges and Opportunities, FSE 2024, Link
Evaluating Offensive Security Capabilities of Large Language Models, Google, Link
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation, TSE 2024, Link
LLMorpheus: Mutation Testing using Large Language Models, arxiv 2014, Frank Tip, Link
Towards Understanding the Effectiveness of Large Langauge Models on Directed Test Input Generation, ASE 2024, Link
Evaluating Offensive Security Capabilities of Large Language Models, Google 2024/06, Link
Prompt Fuzzing for Fuzz Driver Generation, CCS 2024, Link
Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. ICSE 2024. Link
LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models Link
Large Language Model guided Protocol Fuzzing, NDSS 2024, Link
Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models, ISSTA 2023, Link
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models, arxiv 2024, Link
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag, MASEC@NeurIPS 2023, Link
Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting, Link
LPR: Large Language Models-Aided Program Reduction. ISSTA 2024, Link

Code Model and Code Reasoning

SemCoder: Training Code Language Models with Comprehensive Semantics, NeurIPS 2024, Link
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases, NeurIPS 2024, Link
CodeMind: A Framework to Challenge Large Language Models for Code Reasoning, arxiv, Link
CodeFort: Robust Training for Code Generation Models, EMNLP Findings 2024, Link
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization, Meta, Link
Constrained Decoding for Secure Code Generation, arxiv, Link
Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks, OOPSLA 2024, Link
Detecting Misuse of Security APIs: A Systematic Review, Arxiv 2024, Link
An Investigation into Misuse of Java Security APIs by Large Language Models, ASIACCS 2024, Link
Large Language Models for Code: Security Hardening and Adversarial Testing, CCS 2023, Link, Code
Instruction Tuning for Secure Code Generation, ICML 2024, Link
jTrans: jump-aware transformer for binary code similarity detection, ISSTA 2022, Link
Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs, FSE 2024.
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? ICSE 2024, Link
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs, arxiv, Link
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking, FSE 2024, Link
FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations, ICSE 2024, Link
Symmetry-Preserving Program Representations for Learning Code Semantics Link
ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries, CCS 2024, Link
LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis, Link
When Do Program-of-Thought Works for Reasoning? AAAI 2024 Link
Grounded Copilot: How Programmers Interact with Code-Generating Models, OOPSLA 2023, Link
Extracting Training Data from Large Language Models, USENIX Security 2023, Link
How could Neural Networks understand Programs? ICML 2021, Link
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations, ICML 2021, Link
GraphCodeBert: Pre-training Code Representations with Data Flow, ICLR 2021, Link
CodeBERT: A Pre-Trained Model for Programming and Natural Languages, EMNLP 2020, Link
Neural Code Comprehension: A Learnable Representation of Code Semantics, NeurIPS 2018, Link

Prompting (for Reasoning Tasks) and Hallucinations

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, Apple, Link
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models, OOPSLA 2024, Link
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024, Link
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, NeurIPS 2023, Link
Large Language Models for Automatic Equation Discovery, arxiv, Link
Self-Evaluation Guided Beam Search for Reasoning, NeurIPS 2023, Link
Self-consistency improves chain of thought reasoning in language models. NeurIPS 2022, Link
Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023, Link
Cumulative Reasoning With Large Language Models, Link
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting, EMNLP 2023, Link
Complementary Explanations for Effective In-Context Learning, ACL 2023, Link
Wechat Post: 大语言模型的数学之路 Link
Blog: Prompt Engineering Link

Agent, Tool Using, and Planning

Steering Large Language Models between Code Execution and Textual Reasoning, Microsoft, Link
Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs, Meta, Link
Natural Language Commanding via Program Synthesis, Microsoft Link
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Feifei Li, Google Link
Real-world practices of AI Agents, Link
Cognitive Architectures for Language Agents, Link
The Rise and Potential of Large Language Model Based Agents: A Survey, Link
ReAct: Synergizing Reasoning and Acting in Language Models Link
Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, Link
Wechat Post: AutoGen, Link
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting, NeurIPS 2023, Link
Awesome things about LLM-powered agents: Papers, Repos, and Blogs, Link
ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. Link
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Link

Model and Framework

LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. Link
codellama: Inference code for CodeLlama models, Link
CodeFuse: LLM for Code from Ant Group, Link
Owl-LM: Large Language Model for Blockchain, Link

Survey

Large Language Model-Based Agents for Software Engineering: A Survey Link
A Survey on Large Language Models for Code Generation, arxiv 2024, Link
Comprehensive Outline of Large Language Model-based Multi-Agent Research, Tsinghua NLP Group, Link
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Link
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing, Link
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Link
Large Language Models for Software Engineering: A Systematic Literature Review, Link

For Tasks:

Click tags to check more tools for each tasks

detect vulnerabilities generate code repair programs infer specifications fuzz testing

For Jobs:

software engineer security analyst data scientist research scientist ai engineer

Alternative AI tools for LLM-PLSE-paper

Similar Open Source Tools

LLM-PLSE-paper

github

: 125

LLMEvaluation

The LLMEvaluation repository is a comprehensive compendium of evaluation methods for Large Language Models (LLMs) and LLM-based systems. It aims to assist academics and industry professionals in creating effective evaluation suites tailored to their specific needs by reviewing industry practices for assessing LLMs and their applications. The repository covers a wide range of evaluation techniques, benchmarks, and studies related to LLMs, including areas such as embeddings, question answering, multi-turn dialogues, reasoning, multi-lingual tasks, ethical AI, biases, safe AI, code generation, summarization, software performance, agent LLM architectures, long text generation, graph understanding, and various unclassified tasks. It also includes evaluations for LLM systems in conversational systems, copilots, search and recommendation engines, task utility, and verticals like healthcare, law, science, financial, and others. The repository provides a wealth of resources for evaluating and understanding the capabilities of LLMs in different domains.

github

: 94

interpret

InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions. Interpretability is essential for: - Model debugging - Why did my model make this mistake? - Feature Engineering - How can I improve my model? - Detecting fairness issues - Does my model discriminate? - Human-AI cooperation - How can I understand and trust the model's decisions? - Regulatory compliance - Does my model satisfy legal requirements? - High-risk applications - Healthcare, finance, judicial, ...

github

: 6.4k

awesome-sound_event_detection

The 'awesome-sound_event_detection' repository is a curated reading list focusing on sound event detection and Sound AI. It includes research papers covering various sub-areas such as learning formulation, network architecture, pooling functions, missing or noisy audio, data augmentation, representation learning, multi-task learning, few-shot learning, zero-shot learning, knowledge transfer, polyphonic sound event detection, loss functions, audio and visual tasks, audio captioning, audio retrieval, audio generation, and more. The repository provides a comprehensive collection of papers, datasets, and resources related to sound event detection and Sound AI, making it a valuable reference for researchers and practitioners in the field.

github

: 147

Agent

Agent is a RustSBI specialized domain knowledge quiz LLM tool that extracts domain knowledge from various sources such as Rust Documentation, RISC-V Documentation, Bouffalo Docs, Bouffalo SDK, and Xiangshan Docs. It also provides resources for LLM prompt engineering and RAG engineering, including guides and existing projects related to retrieval-augmented generation (RAG) systems.

github

: 101

LMOps

LMOps is a research initiative focusing on fundamental research and technology for building AI products with foundation models, particularly enabling AI capabilities with Large Language Models (LLMs) and Generative AI models. The project explores various aspects such as prompt optimization, longer context handling, LLM alignment, acceleration of LLMs, LLM customization, and understanding in-context learning. It also includes tools like Promptist for automatic prompt optimization, Structured Prompting for efficient long-sequence prompts consumption, and X-Prompt for extensible prompts beyond natural language. Additionally, LLMA accelerators are developed to speed up LLM inference by referencing and copying text spans from documents. The project aims to advance technologies that facilitate prompting language models and enhance the performance of LLMs in various scenarios.

github

: 3.6k

AI-PhD-S25

AI-PhD-S25 is a mono-repo for the DOTE 6635 course on AI for Business Research at CUHK Business School. The course aims to provide a fundamental understanding of ML/AI concepts and methods relevant to business research, explore applications of ML/AI in business research, and discover cutting-edge AI/ML technologies. The course resources include Google CoLab for code distribution, Jupyter Notebooks, Google Sheets for group tasks, Overleaf template for lecture notes, replication projects, and access to HPC Server compute resource. The course covers topics like AI/ML in business research, deep learning basics, attention mechanisms, transformer models, LLM pretraining, posttraining, causal inference fundamentals, and more.

github

: 64

AI-PhD-S24

AI-PhD-S24 is a mono-repo for the PhD course 'AI for Business Research' at CUHK Business School in Spring 2024. The course aims to provide a basic understanding of machine learning and artificial intelligence concepts/methods used in business research, showcase how ML/AI is utilized in business research, and introduce state-of-the-art AI/ML technologies. The course includes scribed lecture notes, class recordings, and covers topics like AI/ML fundamentals, DL, NLP, CV, unsupervised learning, and diffusion models.

github

: 90

generative-ai-on-aws

Generative AI on AWS by O'Reilly Media provides a comprehensive guide on leveraging generative AI models on the AWS platform. The book covers various topics such as generative AI use cases, prompt engineering, large-language models, fine-tuning techniques, optimization, deployment, and more. Authors Chris Fregly, Antje Barth, and Shelbee Eigenbrode offer insights into cutting-edge AI technologies and practical applications in the field. The book is a valuable resource for data scientists, AI enthusiasts, and professionals looking to explore generative AI capabilities on AWS.

github

: 417

LLMs-from-scratch

This repository contains the code for coding, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch). In _Build a Large Language Model (From Scratch)_, you'll discover how LLMs work from the inside out. In this book, I'll guide you step by step through creating your own LLM, explaining each stage with clear text, diagrams, and examples. The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT.

github

: 43.7k

FATE-LLM

FATE-LLM is a framework supporting federated learning for large and small language models. It promotes training efficiency of federated LLMs using Parameter-Efficient methods, protects the IP of LLMs using FedIPR, and ensures data privacy during training and inference through privacy-preserving mechanisms.

github

: 135

LLMs-at-DoD

This repository contains tutorials for using Large Language Models (LLMs) in the U.S. Department of Defense. The tutorials utilize open-source frameworks and LLMs, allowing users to run them in their own cloud environments. The repository is maintained by the Defense Digital Service and welcomes contributions from users.

github

: 67

awesome-ml-gen-ai-elixir

A curated list of Machine Learning (ML) and Generative AI (GenAI) packages and resources for the Elixir programming language. It includes core tools for data exploration, traditional machine learning algorithms, deep learning models, computer vision libraries, generative AI tools, livebooks for interactive notebooks, and various resources such as books, videos, and articles. The repository aims to provide a comprehensive overview for experienced Elixir developers and ML/AI practitioners exploring different ecosystems.

github

: 89

GOLEM

GOLEM is an open-source AI framework focused on optimization and learning of structured graph-based models using meta-heuristic methods. It emphasizes the potential of meta-heuristics in complex problem spaces where gradient-based methods are not suitable, and the importance of structured models in various problem domains. The framework offers features like structured model optimization, metaheuristic methods, multi-objective optimization, constrained optimization, extensibility, interpretability, and reproducibility. It can be applied to optimization problems represented as directed graphs with defined fitness functions. GOLEM has applications in areas like AutoML, Bayesian network structure search, differential equation discovery, geometric design, and neural architecture search. The project structure includes packages for core functionalities, adapters, graph representation, optimizers, genetic algorithms, utilities, serialization, visualization, examples, and testing. Contributions are welcome, and the project is supported by ITMO University's Research Center Strong Artificial Intelligence in Industry.

github

: 53

evalkit

EvalKit is an open-source TypeScript library for evaluating and improving the performance of large language models (LLMs). It helps developers ensure the reliability, accuracy, and trustworthiness of their AI models. The library provides various metrics such as Bias Detection, Coherence, Faithfulness, Hallucination, Intent Detection, and Semantic Similarity. EvalKit is designed to be user-friendly with detailed documentation, tutorials, and recipes for different use cases and LLM providers. It requires Node.js 18+ and an OpenAI API Key for installation and usage. Contributions from the community are welcome under the Apache 2.0 License.

github

: 70

aws-machine-learning-university-responsible-ai

This repository contains slides, notebooks, and data for the Machine Learning University (MLU) Responsible AI class. The mission is to make Machine Learning accessible to everyone, covering widely used ML techniques and applying them to real-world problems. The class includes lectures, final projects, and interactive visuals to help users learn about Responsible AI and core ML concepts.

github

: 60

For similar tasks

watchtower

AIShield Watchtower is a tool designed to fortify the security of AI/ML models and Jupyter notebooks by automating model and notebook discoveries, conducting vulnerability scans, and categorizing risks into 'low,' 'medium,' 'high,' and 'critical' levels. It supports scanning of public GitHub repositories, Hugging Face repositories, AWS S3 buckets, and local systems. The tool generates comprehensive reports, offers a user-friendly interface, and aligns with industry standards like OWASP, MITRE, and CWE. It aims to address the security blind spots surrounding Jupyter notebooks and AI models, providing organizations with a tailored approach to enhancing their security efforts.

github

: 187

LLM-PLSE-paper

github

: 125

invariant

Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.

github

: 143

OpenRedTeaming

OpenRedTeaming is a repository focused on red teaming for generative models, specifically large language models (LLMs). The repository provides a comprehensive survey on potential attacks on GenAI and robust safeguards. It covers attack strategies, evaluation metrics, benchmarks, and defensive approaches. The repository also implements over 30 auto red teaming methods. It includes surveys, taxonomies, attack strategies, and risks related to LLMs. The goal is to understand vulnerabilities and develop defenses against adversarial attacks on large language models.

github

: 68

Awesome-LLM4Cybersecurity

The repository 'Awesome-LLM4Cybersecurity' provides a comprehensive overview of the applications of Large Language Models (LLMs) in cybersecurity. It includes a systematic literature review covering topics such as constructing cybersecurity-oriented domain LLMs, potential applications of LLMs in cybersecurity, and research directions in the field. The repository analyzes various benchmarks, datasets, and applications of LLMs in cybersecurity tasks like threat intelligence, fuzzing, vulnerabilities detection, insecure code generation, program repair, anomaly detection, and LLM-assisted attacks.

github

: 681

quark-engine

Quark Engine is an AI-powered tool designed for analyzing Android APK files. It focuses on enhancing the detection process for auto-suggestion, enabling users to create detection workflows without coding. The tool offers an intuitive drag-and-drop interface for workflow adjustments and updates. Quark Agent, the core component, generates Quark Script code based on natural language input and feedback. The project is committed to providing a user-friendly experience for designing detection workflows through textual and visual methods. Various features are still under development and will be rolled out gradually.

github

: 1.4k

vulnerability-analysis

The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.

github

: 86

CodeAsk

CodeAsk is a code analysis tool designed to tackle complex issues such as code that seems to self-replicate, cryptic comments left by predecessors, messy and unclear code, and long-lasting temporary solutions. It offers intelligent code organization and analysis, security vulnerability detection, code quality assessment, and other interesting prompts to help users understand and work with legacy code more efficiently. The tool aims to translate 'legacy code mountains' into understandable language, creating an illusion of comprehension and facilitating knowledge transfer to new team members.

github

: 820

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k