vulnerability-analysis
Rapidly identify and mitigate container security vulnerabilities with generative AI.
Stars: 64
The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.
README:
- Table of Contents
- Overview
- Software components
- Target audience
- Prerequisites
- Hardware requirements
- API definition
- Use case description
- Getting started
- Running the workflow
- Customizing the Workflow
- Troubleshooting
- Testing and validation
- License
- Terms of Use
This repository is what powers the build experience, showcasing vulnerability analysis for container security using NVIDIA NIM microservices and NVIDIA Morpheus.
The NVIDIA AI Blueprint demonstrates accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation from days and hours to just seconds. While traditional methods require substantial manual effort to pinpoint solutions for vulnerabilities, these technologies enable quick, automatic, and actionable CVE risk analysis using large language models (LLMs) and retrieval-augmented generation (RAG). With this blueprint, security analysts can expedite the process of determining whether a software package includes exploitable and vulnerable components using LLMs and event-driven RAG triggered by the creation of a new software package or the detection of a CVE.
The following are used by this blueprint:
- NIM of meta/llama-3.1-70b-instruct
- NIM of nvidia/nv-embedqa-e5-v5
- NVIDIA Morpheus Cybersecurity AI SDK
This blueprint is for:
- Security analysts and IT engineers: People analyzing vulnerabilities and ensuring the security of containerized environments.
- AI practitioners in cybersecurity: People applying AI to enhance cybersecurity, particularly those interested in using the Morpheus SDK and NIMs for faster vulnerability detection and analysis.
- NVAIE developer licence
- API keys for vulnerability databases, search engines, and LLM model service(s).
- Details can be found in this later section: Obtain API keys
Below are the hardware requirements for each component of the vulnerability analysis pipeline.
The overall hardware requirements depend on selected pipeline configuration. At a minimum, the hardware requirements for pipeline operation must be met. The LLM NIM and Embedding NIM hardware requirements only need to be met if self-hosting these components. See Using self-hosted NIMs, Customizing the LLM models and Customizing the embedding model sections for more information.
- (Required) Pipeline operation: 1x L40 GPU or similar recommended
-
(Optional) LLM NIM: Meta Llama 3.1 70B Instruct Support Matrix
- For improved paralleled performance, we recommend 8x or more H100s for LLM inference.
- The pipeline can share the GPU with the LLM NIM, but it is recommended to have a separate GPU for the LLM NIM for optimal performance.
-
(Optional) Embedding NIM: NV-EmbedQA-E5-v5 Support Matrix
- The pipeline can share the GPU with the Embedding NIM, but it is recommended to have a separate GPU for the Embedding NIM for optimal performance.
Determining the impact of a documented CVE on a specific project or container is a labor-intensive and manual task, especially as the rate of new reports into the CVE database accelerates. This process involves the collection, comprehension, and synthesis of various pieces of information to ascertain whether immediate remediation is necessary upon the identification of a new CVE.
Current challenges in CVE analysis:
- Information collection: The process involves significant manual labor to collect and synthesize relevant information.
-
Decision complexity: Decisions on whether to update a library impacted by a CVE often hinge on various considerations, including:
- Scan false positives: Occasionally, vulnerability scans may incorrectly flag a library as vulnerable, leading to a false alarm.
- Mitigating factors: In some cases, existing safeguards within the environment may reduce or negate the risk posed by a CVE.
- Lack of required environments or dependencies: For an exploit to succeed, specific conditions must be met. The absence of these necessary elements can render a vulnerability irrelevant.
- Manual documentation: Once an analyst has determined the library is not affected, a Vulnerability Exploitability eXchange (VEX) document must be created to standardize and distribute the results.
The efficiency of this process can be significantly enhanced through the deployment of an automated LLM agent pipeline, leveraging generative AI to improve vulnerability defense while decreasing the load on security teams.
The workflow operates using a Plan-and-Execute-style LLM pipeline for CVE impact analysis. The process begins with an LLM planner that generates a context-sensitive task checklist. This checklist is then executed by an LLM agent equipped with Retrieval-Augmented Generation (RAG) capabilities. The gathered information and the agent's findings are subsequently summarized and categorized by additional LLM nodes to provide a final verdict.
[!TIP] The pipeline is adaptable, supporting various LLM services that conform to the
LLMService
interface, including OpenAI, NeMo, or local execution with llama-cpp-python.
The detailed architecture consists of the following components:
-
Security scan result: The workflow begins by inputting the identified CVEs from a container security scan as input. This can be generated from a container image scanner of your choosing such as Anchore.
-
PreProcessing: All the below actions are encapsulated by multiple Morpheus preprocessing pipeline stages to prepare the data for use with the LLM engine. (See
src/cve/pipeline/input.py
.)-
Code repository and documentation: The blueprint pulls code repositories and documentation provided by the user. These repositories are processed through an embedding model, and the resulting embeddings are stored in vector databases (VDBs) for the agent's reference.
- Vector database: Various vector databases can be used for the embedding. We currently utilize FAISS for the VDB because it does not require an external service and is simple to use. Any vector store can be used, such as NVIDIA cuVS, which would provide accelerated indexing and search.
- Lexical search: As an alternative, a lexical search is available for use cases where creating an embedding is impractical due to a large number of source files in the target container.
- Software Bill of Materials (SBOM): The provided SBOM document is processed into a software-ingestible format for the agent's reference. SBOMs can be generated for any container using the open-source tool Syft.
- Web vulnerability intel: The system collects detailed information about each CVE through web scraping and data retrieval from various public security databases, including GHSA, Redhat, Ubuntu, and NIST CVE records, as well as tailored threat intelligence feeds.
-
Code repository and documentation: The blueprint pulls code repositories and documentation provided by the user. These repositories are processed through an embedding model, and the resulting embeddings are stored in vector databases (VDBs) for the agent's reference.
-
Core LLM engine: (See
src/cve/pipeline/engine.py
.)-
Checklist generation: Leveraging the gathered information about each vulnerability, the checklist generation node creates a tailored, context-sensitive task checklist designed to guide the impact analysis. (See
src/cve/nodes/cve_checklist_node.py
.) -
Task agent: At the core of the process is an LLM agent iterating through each item in the checklist. For each item, the agent answers the question using a set of tools which provide information about the target container. The tools tap into various data sources (web intel, vector DB, search etc.), retrieving relevant information to address each checklist item. The loop continues until the agent resolves each checklist item satisfactorily. (See
src/cve/nodes/cve_langchain_agent_node.py
.) -
Summarization: Once the agent has compiled findings for each checklist item, these results are condensed by the summarization node into a concise, human-readable paragraph. (See
src/cve/nodes/cve_summary_node.py
.) -
Justification Assignment: Given the summary, the justification status categorization node then assigns a resulting VEX (Vulnerability Exploitability eXchange) status to the CVE. We provided a set of predefined categories for the model to choose from. (See
src/cve/nodes/cve_justification_node.py
.) If the CVE is deemed exploitable, the reasoning category is "vulnerable." If it is not exploitable, there are 10 different reasoning categories to explain why the vulnerability is not exploitable in the given environment:false_positive
code_not_present
code_not_reachable
requires_configuration
requires_dependency
requires_environment
protected_by_compiler
protected_at_runtime
protected_by_perimeter
protected_by_mitigating_control
-
-
Output: At the end of the pipeline run, an output file including all the gathered and generated information is prepared for security analysts for a final review. (See
src/cve/pipeline/output.py
.)
[!WARNING] All output should be vetted by a security analyst before being used in a cybersecurity application.
The Morpheus SDK can utilize various embedding model and LLM endpoints, and is optimized to use NVIDIA NIM microservices (NIMs). NIMs are pre-built containers for the latest AI models that provide industry-standard APIs and optimized inference for the given model and hardware. Using NIMs enables easy deployment and scaling for self-hosted model inference.
The current default embedding NIM model is nv-embedqa-e5-v5
, which was selected to balance speed and overall pipeline accuracy. The current default LLM model is the llama-3.1-70b-instruct
NIM, with specifically tailored prompt engineering and edge case handling. Other models are able to be substituted for either the embedding or LLM model, such as smaller, fine-tuned NIM LLM models or other external LLM inference services. Subsequent updates will provide more details about fine-tuning and data flywheel techniques.
[!NOTE] The LangChain library is employed to deploy all LLM agents within a Morpheus pipeline, streamlining efficiency and reducing the need for duplicative efforts.
[!TIP] Routinely checked validation datasets are critical to ensuring proper and consistent outputs. Learn more about our test-driven development approach in the section on testing and validation.
- git
- git-lfs
- Since the workflow uses an NVIDIA Morpheus pipeline, the Morpheus requirements also need to be installed.
To run the pipeline you need to obtain API keys for the following APIs. These will be needed in a later step to Set up the environment file.
-
Required API Keys: These APIs are required by the pipeline to retrieve vulnerability information from databases, perform online searches, and execute LLM queries.
- GitHub Security Advisory (GHSA) Database
- Follow these instructions to create a personal access token. No repository access or permissions are required for this API.
- This will be used in the
GHSA_API_KEY
environment variable.
- National Vulnerability Database (NVD)
- Follow these instructions to create an API key.
- This will be used in the
NVD_API_KEY
environment variable.
- SerpApi
- Go to https://serpapi.com/ and create a SerpApi account. Once signed in, navigate to Your Account > Api Key.
- This will be used in the
SERPAPI_API_KEY
environment variable.
- NVIDIA Inference Microservices (NIM)
- There are two possible methods to generate an API key for NIM:
- Sign in to the NVIDIA Build portal with your email.
- Click on any model, then click "Get API Key", and finally click "Generate Key".
- Sign in to the NVIDIA NGC portal with your email.
- Select your organization from the dropdown menu after logging in. You must select an organization which has NVIDIA AI Enterprise (NVAIE) enabled.
- Click on your account in the top right, select "Setup" from the dropdown.
- Click the "Generate Personal Key" option and then the "+ Generate Personal Key" button to create your API key.
- Sign in to the NVIDIA Build portal with your email.
- This will be used in the
NVIDIA_API_KEY
environment variable.
- There are two possible methods to generate an API key for NIM:
- GitHub Security Advisory (GHSA) Database
The workflow can be configured to use other LLM services as well, see the Customizing the LLM models section for more info.
Clone the repository and set an environment variable for the path to the repository root.
export REPO_ROOT=$(git rev-parse --show-toplevel)
All commands are run from the repository root unless otherwise specified.
First we need to create an .env
file in the REPO_ROOT
, and add the API keys you created in the earlier Obtain API keys step.
cd $REPO_ROOT
cat <<EOF > .env
GHSA_API_KEY="your GitHub personal access token"
NVD_API_KEY="your National Vulnerability Database API key"
NVIDIA_API_KEY="your NVIDIA Inference Microservices API key"
SERPAPI_API_KEY="your SerpApi API key"
EOF
These variables need to be exported to the environment:
export $(cat .env | xargs)
In order to pull images required by the workflow from NGC, you must first authenticate Docker with NGC. You can use same the NVIDIA API Key obtained in the Obtain API keys section (saved as NVIDIA_API_KEY
in the .env
file).
echo "${NVIDIA_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
If no customizations were made to the source code, you can proceed to Starting the Docker containers, where the docker compose up
step will automatically pull the pre-built Blueprint Docker container from NGC: nvcr.io/nvidia/morpheus/morpheus-vuln-analysis:24.10
.
If any customizations are made to the source code, we will need to build the container from source using the following command:
cd $REPO_ROOT
# Build the morpheus-vuln-analysis container
docker compose build morpheus-vuln-analysis
There are two supported configurations for starting the Docker containers. Both configurations utilize docker compose
to start the service:
-
NVIDIA-hosted NIMs: The workflow is run with all computation being performed by NIMs hosted in NVIDIA GPU Cloud. This is the default configuration and is recommended for most users getting started with the workflow.
- When using NVIDIA-hosted NIMs, only the
docker-compose.yml
configuration file is required.
- When using NVIDIA-hosted NIMs, only the
-
Self-hosted NIMs: The workflow is run using self-hosted LLM NIM services. This configuration is more advanced and requires additional setup to run the NIM services locally.
- When using self-hosted NIMs, both the
docker-compose.yml
anddocker-compose.nim.yml
configuration files are required.
- When using self-hosted NIMs, both the
These two configurations are illustrated by the following diagram:
Before beginning, ensure that the environment variables are set correctly. Both configurations require the same environment variables to be set. More information on setting these variables can be found in the Obtain API keys section.
[!TIP] The container binds to port 8080 by default. If you encounter a port collision error (e.g.
Bind for 0.0.0.0:8080 failed: port is already allocated
), you can set the environment variableNGINX_HOST_HTTP_PORT
to specify a custom port before launchingdocker compose
. For example:export NGINX_HOST_HTTP_PORT=8081 #... docker compose commands...
When running the workflow in this configuration, only the morpheus-vuln-analysis
service needs to be started since we will utilize NIMs hosted by NVIDIA. The morpheus-vuln-analysis
container can be started using the following command:
cd ${REPO_ROOT}
docker compose up -d
The command above starts the container in the background using the detached mode, -d
. We can confirm the container is running via the following command:
docker compose ps
Next, we need to attach to the morpheus-vuln-analysis
container to access the environment where the workflow command line tool and dependencies are installed.
docker compose exec -it morpheus-vuln-analysis bash
Continue to the Running the workflow section to run the workflow.
To run the workflow using self-hosted NIMs, we use a second docker compose
configuration file, docker-compose.nim.yml
, which adds the self-hosted NIM services to the workflow. Utilizing a second configuration file allows for easy switching between the two configurations while keeping the base configuration file the same.
[!NOTE] The self-hosted NIM services require additional GPU resources to run. With this configuration, the LLM NIM, embedding model NIM, and the
morpheus-vuln-analysis
service will all be launched on the same machine. Ensure that you have the necessary hardware requirements for all three services before proceeding (multiple services can share the same GPU).
To use multiple configuration files, we need to specify both configuration files when running the docker compose
command. You will need to specify both configuration files for every docker compose
command. For example:
docker compose -f docker-compose.yml -f docker-compose.nim.yml [NORMAL DOCKER COMPOSE COMMAND]
For example, to start the morpheus-vuln-analysis
service with the self-hosted NIMs, you would run:
cd ${REPO_ROOT}
docker compose -f docker-compose.yml -f docker-compose.nim.yml up -d
Next, we need to attach to the morpheus-vuln-analysis
container to access the environment where the workflow command line tool and dependencies are installed.
docker compose -f docker-compose.yml -f docker-compose.nim.yml exec -it morpheus-vuln-analysis bash
Continue to the Running the workflow section to run the workflow.
Once the services have been started, the workflow can be run using either the Quick start user guide notebook for an interactive step-by-step process, or directly from the command line.
To run the workflow in an interactive notebook, connect to the Jupyter notebook at http://localhost:8000/lab. Once connected, navigate to the notebook located at quick_start/quick_start_guide.ipynb
and follow the instructions.
[!TIP] If you are running the workflow on a remote machine, you can forward the port to your local machine using SSH. For example, to forward port 8000 from the remote machine to your local machine, you can run the following command from your local machine:
ssh -L 8000:127.0.0.1:8000 <remote_host_name>
The vulnerability analysis workflow is designed to be run using the command line tool installed within the morpheus-vuln-analysis
container. This section describes how get started using the command line tool. For more detailed information about the command line interface, see the Command line interface (CLI) reference section.
The pipeline settings are controlled using configuration files. These are JSON files that define various pipeline settings, such as the input data, the LLM models used, and the output format. Several example configuration files are located in the configs/
folder. A brief description of each configuration file is as follows:
-
from_manual.json
: This configuration file starts the pipeline using manually provided data. All pipeline inputs are manually specified directly in the configuration file. -
from_file.json
: This configuration file starts the pipeline using data fetched from a file. The pipeline fetches the input data from a file and processes it. This is very similar tofrom_manual.json
, but the input data is read from a file instead of being directly specified in the config file. -
from_http.json
: This configuration file starts an HTTP server to turn the workflow into a microservice. The pipeline fetches the input data from an HTTP source and processes it. To trigger the pipeline, you can send a POST request to the/scan
endpoint with the input data in the request body. The pipeline will process the input data.
There are two main modalities that the pipeline can be run in. When using the from_file.json
or from_manual.json
configuration files, the pipeline will process the input data, then it will shut down after it is completed. This modality is suitable for rapid iteration during testing and development. When using from_http.json
, the pipeline is turned into a microservice which will run indefinitely, which is suitable for using in production.
For a breakdown of the configuration file and available options, see the Configuration file reference section. To customize the configuration files for your use case, see Customizing the workflow.
The workflow pipeline can be started using the following command:
python src/main.py --log_level DEBUG \
cve pipeline --config_file=${CONFIG_FILE}
In the command, ${CONFIG_FILE}
is the path to the configuration file you want to use. For example, to run the pipeline with the from_manual.json
configuration file, you would run:
python src/main.py --log_level DEBUG \
cve pipeline --config_file=configs/from_manual.json
When the pipeline runs to completion, you should see logs similar to the following:
Message elapsed time: 28.240849 sec
Vulnerability 'GHSA-3f63-hfp8-52jq' affected status: FALSE. Label: code_not_reachable
Vulnerability 'CVE-2023-50782' affected status: FALSE. Label: requires_configuration
Vulnerability 'CVE-2023-36632' affected status: FALSE. Label: code_not_present
Vulnerability 'CVE-2023-43804' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-cxfr-5q3r-2rc2' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-554w-xh4j-8w64' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-3ww4-gg4f-jr7f' affected status: FALSE. Label: requires_configuration
Vulnerability 'CVE-2023-31147' affected status: FALSE. Label: code_not_present
Source[Complete]: 7 messages [00:14, 2.02s/ messages]
LLM[Complete]: 7 messages [00:42, 6.05s/ messages]
====Pipeline Complete====
Total time: 45.61 sec
Pipeline runtime: 42.69 sec
[!WARNING] The output you receive from the pipeline may not be identical as the output in the example above. The output may vary due to the non-deterministic nature of the LLM models.
The full pipeline JSON output is stashed by default at .tmp/output.json
. The output JSON includes the following top level fields:
-
input
: contains the inputs that were provided to the pipeline, such as the container and repo source information, the list of vulnerabilities to scan, etc. -
info
: contains additional information collected by the pipeline for decision making. This includes paths to the generated VDB files, intelligence from various vulnerability databases, the list of SBOM packages, and any vulnerable dependencies that were identified. -
output
: contains the output from the core LLM Engine, including the generated checklist, analysis summary, and justification assignment.
In addition to the raw JSON output, you can also view a Markdown-formatted report for each CVE in the .tmp/vulnerability_markdown_reports
directory. This view is helpful for human analysts reviewing the results.
[!TIP] To return detailed steps taken by the LLM agent in the output, set
return_intermediate_steps
totrue
in the configuration file. This can be helpful for explaining the output, and for troubleshooting unexpected results.
Similarly, to run the pipeline with the from_http.json
configuration file, you would run:
python src/main.py --log_level DEBUG cve pipeline --config_file=configs/from_http.json
This command starts an HTTP server that listens on port 26466
and runs the workflow indefinitely, waiting for incoming data to process. This is useful if you want to trigger the workflow on demand via HTTP requests.
Once the server is running, you can send a POST
request to the /scan
endpoint with the input parameters in the request body. The pipeline will process the input data and return the output in the terminal and the given output path in the config file.
Here's an example using curl
to send a POST
request. From a new terminal outside of the container, go to the root of the cloned git repository, and run:
curl -X POST http://localhost:26466/scan -d @data/input_messages/morpheus:24.03-runtime.json
In this command:
-
http://localhost:26466/scan
is the URL of the server and endpoint. - The
-d
option specifies the data file being sent in the request body. In this case, it's pointing to the input filemorpheus:24.03-runtime.json
under thedata/input_messages/
directory. You can refer to this file as an example of the expected data format.- Since it uses a relative path, it's important to run the
curl
command from the root of the git repository. Alternatively, you can modify the relative path in the command to directly reference the example json file.
- Since it uses a relative path, it's important to run the
Note that the results of the pipeline are not returned to the curl request. After processing the request, the server will save the results to the output path specified in the configuration file. The server will also display log and summary results from the workflow as it's running. Additional submissions to the server will append the results to the specified output file.
The top level entrypoint to each of the LLM example pipelines is src/main.py
. The main entrypoint is a CLI tool with built-in documentation using the --help
command. For example, to see what commands are available, you can run:
(morpheus) root@58145366033a:/workspace# python src/main.py --help
Usage: morpheus_llm [OPTIONS] COMMAND [ARGS]...
Main entrypoint for the Vulnerability Analysis for Container Security
Options:
--log_level [CRITICAL|FATAL|ERROR|WARN|WARNING|INFO|DEBUG]
Specify the logging level to use. [default:
INFO]
--use_cpp BOOLEAN Whether or not to use C++ node and message
types or to prefer python. Only use as a
last resort if bugs are encountered
[default: True]
--version Show the version and exit.
--help Show this message and exit.
Commands:
cve Run the Vulnerability Analysis for Container Security pipeline
It's common to want to override some options in the configuration file on the command line. This is useful to use some common options but tweak the input/output of the pipeline or to change a single setting without modifying the original config file.
For example, to override the max_retries
option in the configuration file, you can run:
python src/main.py --log_level=DEBUG \
cve pipeline --config_file=configs/from_manual.json \
config \
general --max_retries=3
This will run the pipeline with the from_manual.json
configuration file, but with the max_retries
option set to 3
.
It's also possible to change the input type. For example, to use a different input message with the from_file.json
configuration file, you can run:
python src/main.py --log_level=DEBUG \
cve pipeline --config_file=configs/from_file.json \
config \
input-file --file=data/input_messages/morpheus:24.03-runtime.json
It's possible to combine multiple overrides in a single command. For example, to run the pipeline with the from_manual.json
configuration file, but with the max_retries
option set to 3
, the input message set to morpheus:24.03-runtime.json
, and the output destinations set to .tmp/output_morpheus.json
and .tmp/morpheus_reports
, you can run:
python src/main.py --log_level=DEBUG \
cve pipeline --config_file=configs/from_manual.json \
config \
general --max_retries=3 \
input-file --file=data/input_messages/morpheus:24.03-runtime.json \
output-file --file_path=.tmp/output_morpheus.json --markdown_dir=.tmp/morpheus_reports
For the full list of possible options, use the --help
options from the CLI.
The configuration defines how the workflow operates, including model settings, input sources, and output options.
-
Schema
-
$schema
: Specifies the schema for validating the configuration file. This ensures the correct structure and data types are used throughout the config file.
-
-
LLM engine configuration (
engine
): Theengine
section configures various models for the LLM nodes.- LLM processing nodes:
agent
,checklist_model
,justification_model
,summary_model
-
model_name
: The name of the LLM model used by the node. -
prompt
: Manually set the prompt for the specific model in the configuration. The prompt can either be passed in as a string of text or as a path to a text file containing the desired prompting. -
service
: Specifies the service for running the LLM inference. (Set tonvfoundation
if using NIM.) -
max_tokens
: Defines the maximum number of tokens that can be generated in one output step. -
temperature
: Controls randomness in the output. A lower temperature produces more deterministic results. -
top_p
: Limits the diversity of token sampling based on cumulative probability. - Settings specific to the
agent
node:-
verbose
: Toggles detailed logging for the agent during execution. -
return_intermediate_steps
: Controls whether to return intermediate steps taken by the agent, and include them in the output file. Helpful for troubleshooting agent responses. -
return_source_documents
: Controls whether to return source documents from the VDB tools, and include them in the intermediate steps output. Helpful for identifying the source files used in agent responses.- Note: enabling this will also include source documents in the agent's memory and increase the agent's prompt length.
-
-
- Embedding model for generating VDB for RAG:
rag_embedding
-
_type
: Defines the source of the model used for generating embeddings (e.g.,nim
,huggingface
,openai
). - Other model-dependent parameters, such as
model
/model_name
,api_key
,truncate
, orencode_kwargs
: see the embedding model customization section below for more details.
-
- LLM processing nodes:
-
General configuration: The
general
section contains settings that influence the workflow's general behavior, including cache settings, batch sizes, and retry policies.-
cache_dir
: The directory where the node's cache should be stored. If None, caching is not used. -
base_vdb_dir
: The directory used for storing vector database files. -
base_git_dir
: The directory for storing pulled git repositories used for code analysis. -
max_retries
: Sets the number of retry attempts for failed operations. -
model_max_batch_size
: Specifies the maximum number of messages to send to the model for inference in a single batch. -
pipeline_batch_size
: Determines the number of messages per batch for the pipeline. -
use_uvloop
: Toggles the use ofuvloop
, an optimized event loop for improved performance. -
code_search_tool
: Enables or disables the use of the code search tool.
-
-
Input configuration: The
input
section defines how and where the input data (container images and vulnerabilities) is sourced.-
_type
: Defines the input type-
manual
: input data is provided manually in the config file -
http
: input data is provided through httpPOST
call pointing to an input source file -
file
: input data is fetched from a provided source file
-
-
message
: Contains details about the input image and its associated vulnerabilities. Required only for themanual
input type.-
image
: Specifies the container image to be analyzed.-
name
: Specifies the name of the container image. -
tag
: Specifies the tag of the container image. -
source_info
: Specifies the sources (e.g., git repositories) used to retrieve code or documentation for analysis.-
include
: Specifies the file patterns to be included in the analysis. -
exclude
: Specifies the file patterns to exclude from analysis.
-
-
sbom_info
: Specifies the Software Bill of Materials (SBOM) file to be analyzed.
-
-
scan
: Provides a list of vulnerabilities (by ID) to be analyzed for the input image.
-
-
-
Output configuration: The
output
section defines where the output results of the workflow should be stored and how they should be managed.-
_type
: Specifies the output type. Usefile
to write the results to a file. -
file_path
: Defines the path to the file where the output will be saved. -
markdown_dir
: Defines the path to the directory where the output will be saved in individual navigable markdown files per CVE-ID. -
overwrite
: Indicates whether the output file should be overwritten when the pipeline starts if it already exists. Will throw an error if set toFalse
and the file already exists. Note that the overwrite behavior only occurs on pipeline initialization. For pipelines started in HTTP mode, each new request will append the existing file until the pipeline is restarted.
-
The docker compose file includes an nginx-cache
proxy server container that enables caching for API requests made by the workflow. It is highly recommend to route API requests through the proxy server to reduce API calls for duplicate requests and improve workflow speed. This is especially useful when running the pipeline multiple times with the same configuration (e.g., for debugging) and can help keep costs down when using paid APIs.
The NGINX proxy server is started by default when running the morpheus-vuln-analysis
service. However, it can be started separately using the following command:
cd ${REPO_ROOT}
docker compose up --detach nginx-cache
To use the proxy server for API calls in the workflow, you can set environment variables for each base URL used by the workflow to point to http://localhost:${NGINX_HOST_HTTP_PORT}/
. These are set automatically when running the morpheus-vuln-analysis
service, but can be set manually in the .env
file as follows:
CVE_DETAILS_BASE_URL="http://localhost:8080/cve-details"
CWE_DETAILS_BASE_URL="http://localhost:8080/cwe-details"
DEPSDEV_BASE_URL="http://localhost:8080/depsdev"
FIRST_BASE_URL="http://localhost:8080/first"
GHSA_BASE_URL="http://localhost:8080/ghsa"
NGC_API_BASE="http://localhost:8080/nemo/v1"
NIM_EMBED_BASE_URL="http://localhost:8080/nim_embed/v1"
NVD_BASE_URL="http://localhost:8080/nvd"
NVIDIA_API_BASE="http://localhost:8080/nim_llm/v1"
OPENAI_API_BASE="http://localhost:8080/openai/v1"
OPENAI_BASE_URL="http://localhost:8080/openai/v1"
RHSA_BASE_URL="http://localhost:8080/rhsa"
SERPAPI_BASE_URL="http://localhost:8080/serpapi"
UBUNTU_BASE_URL="http://localhost:8080/ubuntu"
The primary method for customizing the workflow is to generate a new configuration file with new options. The configuration file defines the pipeline settings, such as the input data, the LLM models used, and the output format. The configuration file is a JSON file that can be modified to suit your needs.
Currently, there are 3 types of input sources supported by the pipeline:
- Manual input: The input data is directly specified in the configuration file.
- File input: The input data is read from a file.
- HTTP input: The input data is fetched from an HTTP source.
To customize the input, modify the configuration file accordingly. In any configuration file, locate the input
section to see the input source used by the pipeline. For example, in the configuration file configs/from_manual.json
, the following snippet defines the input source as manual:
"input": {
"_type": "manual",
"message": {
...Contents of the input message...
}
}
To use a file as the input source, update the JSON object in the config file to:
"input": {
"_type": "file",
"file": "data/input_messages/morpheus:23.11-runtime.json"
}
To use an HTTP source as the input, update the JSON object in the config file to:
"input": {
"_type": "http",
"address": "127.0.0.1",
"endpoint": "/scan",
"http_method": "POST",
"port": 26466
}
Vector databases are used by the agent to fetch relevant information for impact analysis investigations. The embedding model used to vectorize your documents can significantly affect the agent's performance. The default embedding model used by the pipeline is the NIM nvidia/nv-embedqa-e5-v5 model, but you can experiment with different embedding models of your choice.
To test a custom embedding model, modify the configuration file in the engine.rag_embedding
section. For example, in the from_manual.json
configuration file, the following snippet defines the settings for the default embedding model:
"rag_embedding": {
"_type": "nim",
"model": "nvidia/nv-embedqa-e5-v5",
"truncate": "END",
"max_batch_size": 128
},
-
rag_embedding._type
: specifies the embedding provider. The current supported options arenim
,huggingface
andopenai
. -
model_name
: specifies the model name for the embedding provider. Refer to the embedding provider's documentation to determine the available models. -
truncate
: specifies how inputs longer than the maximum token length of the model are handled. PassingSTART
discards the start of the input.END
discards the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. IfNONE
is selected, when the input exceeds the maximum input token length an error will be returned. -
max_batch_size
: specifies the batch size to use when generating embeddings. We recommend setting this to 128 (default) or lower when using the cloud-hosted embedding NIM. When using a local NIM, this value can be tuned based on throughput/memory performance on your hardware.
Steps to configure an alternate embedding provider
-
If using OpenAI embeddings, first obtain an API key, then update the
.env
file with the auth and base URL environment variables for the service as indicated in the Supported LLM Services table. Otherwise, proceed to step 2. -
Update the
rag_embedding
section of the config file as described above.- Example HuggingFace embedding configuration:
"rag_embedding": { "_type": "huggingface", "model_name": "intfloat/e5-large-v2", "encode_kwargs": { "batch_size": 128 } }
- Example OpenAI embedding configuration:
"rag_embedding": { "_type": "openai", "model_name": "text-embedding-3-small", "encode_kwargs": { "max_retries": 5, "chunk_size": 256 } }
- For HuggingFace embeddings, all parameters from LangChain's HuggingFaceEmbeddings class are supported. However, for OpenAI models, only a subset of parameters are supported. The full set of available parameters can be found in the config definitions here. Any non-supported parameters provided in the configuration will be ignored.
The current pipeline uses FAISS to create the vector databases. Interested users can customize the source code to use other vector databases such as cuVS.
The configuration file also allows customizing the LLM model and parameters for each component of the workflow, as well which LLM service is used when invoking the model.
In any configuration file, locate the engine
section to see the current settings. For example, in the from_manual.json
configuration file, the following snippet defines the LLM used for the checklist model:
"checklist_model": {
"service": {
"_type": "nvfoundation"
},
"model_name": "meta/llama-3.1-70b-instruct",
"temperature": 0,
"max_tokens": 2000,
"top_p": 0.01
}
-
service._type
: specifies the LLM service to use. Refer to the Supported LLM Services table for available options. -
model_name
: specifies the model name within the LLM service. Refer to the service's API documentation to determine the available models. -
temperature
,max_tokens
,top_p
, ...: specifies the model parameters. Note that by default, the config supports only a subset of parameters provided by each LLM service. The available parameters can be found in the configuration object's definition here. Any non-supported parameters provided in the configuration will be ignored.
Name | _type |
Auth Env Var(s) | Base URL Env Var(s) | Proxy Server Route |
---|---|---|---|---|
NVIDIA Inference Microservices (NIMs) (Default) | nvfoundation |
NVIDIA_API_KEY |
NVIDIA_API_BASE |
/nim_llm/v1 |
NVIDIA GPU Cloud (NGC) | nemo |
NGC_API_KEY NGC_ORG_ID
|
NGC_API_BASE |
/nemo/v1 |
OpenAI | openai |
OPENAI_API_KEY |
OPENAI_API_BASE (used by langchain )OPENAI_BASE_URL (used by openai ) |
/openai/v1 |
Steps to configure an LLM model
- Obtain an API key and any other required auth info for the selected service.
- Update the
.env
file with the auth and base URL environment variables for the service as indicated in the Supported LLM Services table. - Update the config file as described above. For example, if you want to use OpenAI's
gpt-4o
model for checklist generation, update the above json object in the config file to:
"checklist_model": {
"service": {
"_type": "openai"
},
"model_name": "gpt-4o",
"temperature": 0,
"top_p": 0.01,
"seed": 0,
"max_retries": 5
},
Please note that the prompts have been tuned to work best with the Llama 3.1 70B NIM and that when using other LLM models it may be necessary to adjust the prompting.
Currently, there are 3 types of outputs supported by the pipeline:
- File output: The output data is written to a file in JSON format.
- HTTP output: The output data is posted to an HTTP endpoint.
- Print output: The output data is printed to the console.
To customize the output, modify the configuration file accordingly. In any configuration file, locate the output
section to see the output destination used by the pipeline. For example, in the configuration file configs/from_manual.json
, the following snippet defines the output destination as a single json file and individual markdown files per CVE-ID:
"output": {
"_type": "file",
"file_path": ".tmp/output.json",
"markdown_dir": ".tmp/vulnerability_markdown_reports"
}
To post the output to an HTTP endpoint, update the JSON object in the config file as follows, replacing the domain, port, and endpoint with the desired destination (note the trailing slash in the "url" field). The output will be sent as JSON data.
"output": {
"type": "http",
"url": "http://<domain>:<port>/",
"endpoint": "<endpoint>"
}
To print the output without saving to a file, update the JSON object in the config file to:
"output": {
"_type": "print"
}
Additional output options will be added in the future.
The worklow is configured by default to use an NVIDIA-hosted LLM NIM for which an NVIDIA API key (NVIDIA_API_KEY
) is required. The workflow can also be used with a self-hosted LLM NIM. You can start here for more information on how to pull and run the NIM locally.
Once NIM is deployed, update your pipeline configuration (e.g. from_manual.json
) to now use the your NIM. For every component under engine
that uses nvfoundation
, update to now use openai
and model name of your NIM (i.e. meta/llama-3.1-70b-instruct). For example:
"agent": {
"model": {
"model_name": "meta/llama-3.1-70b-instruct",
"service": {
"_type": "openai"
},
"max_tokens": 2000
},
"verbose": true
}
If using nginx cache, update openai_upstream
in nginx_cache.conf to point to your NIM URL:
set $openai_upstream http://llm-nim:8000;
Here llm-nim
is the configured service name for the NIM if using helm chart of docker compose. Otherwise, set to actual host name or IP address of your NIM.
Now set OPENAI_BASE_URL
environment variable to NIM or nginx URL depending on your configuration. The OPENAI_API_KEY
must also be set but only to prevent a check error (not for authentication). This can be set to any string.
Several common issues can arise when running the pipeline. Here are some common issues and their solutions.
If you encounter issues with Git LFS, ensure that you have Git LFS installed and that it is enabled for the repository. You can check if Git LFS is enabled by running the following command:
git lfs install
Verifying that all files are being tracked by Git LFS can be done by running the following command:
git lfs ls-files
Files which are missing will show a -
next to their name. To ensure all LFS files have been pulled correctly, you can run the following command:
git lfs fetch --all
git lfs checkout *
When building containers for self-hosted NIMs, certain issues may occur. Below are common troubleshooting steps to help resolve them.
If you encounter an error resembling the following during the container build process for self-hosted NIMs:
nvidia-container-cli: device error: {n}: unknown device: unknown
This error typically indicates that the container is attempting to access GPUs that are either unavailable or non-existent on the host. To resolve this, verify the GPU count specified in the docker-compose.nim.yml configuration file:
- Navigate to the
deploy.resources.reservations.devices
section and check the count parameter. - Set the environment variable
NIM_LLM_GPU_COUNT
to the actual number of GPUs available on the host machine before building the container. Note that the default value is set to 4.
This adjustment ensures the container accurately matches the available GPU resources, preventing access errors during deployment.
If you encounter an error resembling the following during the container build process for self-hosted NIMs process:
1 error(s) decoding:
* error decoding 'Deploy.Resources.Reservations.devices[0]': invalid string value for 'count' (the only value allowed is 'all')
This is likely caused by an outdated Docker Compose version. Please upgrade Docker Compose to at least v2.21.0
.
Because the workflow makes such heavy use of the caching server to speed up API requests, it is important to ensure that the server is running correctly. If you encounter issues with the caching server, you can reset the cache.
To reset the entire cache, you can run the following command:
docker compose down -v
This will delete all the volumes associated with the containers, including the cache.
If you want to reset just the LLM cache or the services cache, you can run the following commands:
docker compose down
# To remove the LLM cache
docker volume rm ${COMPOSE_PROJECT_NAME:-morpheus_vuln_analysis}_llm-cache
# To remove the services cache
docker volume rm ${COMPOSE_PROJECT_NAME:-morpheus_vuln_analysis}_service-cache
We've integrated VDB and embedding creation directly into the pipeline with caching included for expediency. However, in a production environment, it's better to use a separately managed VDB service.
NVIDIA offers optimized models and tools like NIMs (build.nvidia.com/explore/retrieval) and cuVS (github.com/rapidsai/cuvs).
These typically resolve on their own. Please wait and try running the pipeline again later. Example errors:
404
Error requesting [1/10]: (Retry 0.1 sec) https://services.nvd.nist.gov/rest/json/cves/2.0: 404, message='', url=URL('https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2023-6709')
503
Error requesting [1/10]: (Retry 0.1 sec) https://services.nvd.nist.gov/rest/json/cves/2.0: 503, message='Service Unavailable', url=URL('https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2023-50447')
If you run out of credits for the NVIDIA API Catalog, you will need to obtain more credits to continue using the API. Please contact your NVIDIA representative to get more credits added.
Test-driven development is essential for building reliable LLM-based agentic systems, especially when deploying or scaling them in production environments.
In our development process, we use the Morpheus public container as a case study. We perform security scans and collaborate with developers and security analysts to assess the exploitability of identified CVEs. Each CVE is labeled as either vulnerable or not vulnerable. For non-vulnerable CVEs, we provide a justification based on one of the ten VEX statuses. Team members document their investigative steps and findings to validate and compare results at different stages of the system.
We have collected labels for 38 CVEs, which serve several purposes:
- Human-generated checklists, findings, and summaries are used as ground truth during various stages of prompt engineering to refine LLM output.
- The justification status for each CVE is used as a label to measure end-to-end pipeline accuracy. Every time there is a change to the system, such as adding a new agent tool, modifying a prompt, or introducing an engineering optimization, we run the labeled dataset through the updated pipeline to detect performance regressions.
As a next step, we plan to integrate this process into our CI/CD pipeline to automate testing. While LLMs' non-deterministic nature makes it difficult to assert exact results for each test case, we can adopt a statistical approach, where we run the pipeline multiple times and ensure that the average accuracy stays within an acceptable range.
We recommend that teams looking to test or optimize their CVE analysis system curate a similar dataset for testing and validation. Note that in test-driven development, it's important that the model has not achieved perfect accuracy on the test set, as this may indicate overfitting or that the set lacks sufficient complexity to expose areas for improvement. The test set should be representative of the problem space, covering both scenarios where the model performs well and where further refinement is needed. Investing in a robust dataset ensures long-term reliability and drives continued performance improvements.
By using this software or microservice, you are agreeing to the terms and conditions of the license and acceptable use policy.
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products; and use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.
ADDITIONAL Terms: Meta Llama 3.1 Community License, Built with Meta Llama 3.1.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for vulnerability-analysis
Similar Open Source Tools
vulnerability-analysis
The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.
ScreenAgent
ScreenAgent is a project focused on creating an environment for Visual Language Model agents (VLM Agent) to interact with real computer screens. The project includes designing an automatic control process for agents to interact with the environment and complete multi-step tasks. It also involves building the ScreenAgent dataset, which collects screenshots and action sequences for various daily computer tasks. The project provides a controller client code, configuration files, and model training code to enable users to control a desktop with a large model.
LongRAG
This repository contains the code for LongRAG, a framework that enhances retrieval-augmented generation with long-context LLMs. LongRAG introduces a 'long retriever' and a 'long reader' to improve performance by using a 4K-token retrieval unit, offering insights into combining RAG with long-context LLMs. The repo provides instructions for installation, quick start, corpus preparation, long retriever, and long reader.
eval-dev-quality
DevQualityEval is an evaluation benchmark and framework designed to compare and improve the quality of code generation of Language Model Models (LLMs). It provides developers with a standardized benchmark to enhance real-world usage in software development and offers users metrics and comparisons to assess the usefulness of LLMs for their tasks. The tool evaluates LLMs' performance in solving software development tasks and measures the quality of their results through a point-based system. Users can run specific tasks, such as test generation, across different programming languages to evaluate LLMs' language understanding and code generation capabilities.
aiid
The Artificial Intelligence Incident Database (AIID) is a collection of incidents involving the development and use of artificial intelligence (AI). The database is designed to help researchers, policymakers, and the public understand the potential risks and benefits of AI, and to inform the development of policies and practices to mitigate the risks and promote the benefits of AI. The AIID is a collaborative project involving researchers from the University of California, Berkeley, the University of Washington, and the University of Toronto.
OlympicArena
OlympicArena is a comprehensive benchmark designed to evaluate advanced AI capabilities across various disciplines. It aims to push AI towards superintelligence by tackling complex challenges in science and beyond. The repository provides detailed data for different disciplines, allows users to run inference and evaluation locally, and offers a submission platform for testing models on the test set. Additionally, it includes an annotation interface and encourages users to cite their paper if they find the code or dataset helpful.
0chain
Züs is a high-performance cloud on a fast blockchain offering privacy and configurable uptime. It uses erasure code to distribute data between data and parity servers, allowing flexibility for IT managers to design for security and uptime. Users can easily share encrypted data with business partners through a proxy key sharing protocol. The ecosystem includes apps like Blimp for cloud migration, Vult for personal cloud storage, and Chalk for NFT artists. Other apps include Bolt for secure wallet and staking, Atlus for blockchain explorer, and Chimney for network participation. The QoS protocol challenges providers based on response time, while the privacy protocol enables secure data sharing. Züs supports hybrid and multi-cloud architectures, allowing users to improve regulatory compliance and security requirements.
PolyMind
PolyMind is a multimodal, function calling powered LLM webui designed for various tasks such as internet searching, image generation, port scanning, Wolfram Alpha integration, Python interpretation, and semantic search. It offers a plugin system for adding extra functions and supports different models and endpoints. The tool allows users to interact via function calling and provides features like image input, image generation, and text file search. The application's configuration is stored in a `config.json` file with options for backend selection, compatibility mode, IP address settings, API key, and enabled features.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
atomic-agents
The Atomic Agents framework is a modular and extensible tool designed for creating powerful applications. It leverages Pydantic for data validation and serialization. The framework follows the principles of Atomic Design, providing small and single-purpose components that can be combined. It integrates with Instructor for AI agent architecture and supports various APIs like Cohere, Anthropic, and Gemini. The tool includes documentation, examples, and testing features to ensure smooth development and usage.
LLMeBench
LLMeBench is a flexible framework designed for accelerating benchmarking of Large Language Models (LLMs) in the field of Natural Language Processing (NLP). It supports evaluation of various NLP tasks using model providers like OpenAI, HuggingFace Inference API, and Petals. The framework is customizable for different NLP tasks, LLM models, and datasets across multiple languages. It features extensive caching capabilities, supports zero- and few-shot learning paradigms, and allows on-the-fly dataset download and caching. LLMeBench is open-source and continuously expanding to support new models accessible through APIs.
ReasonablePlanningAI
Reasonable Planning AI is a robust design and data-driven AI solution for game developers. It provides an AI Editor that allows creating AI without Blueprints or C++. The AI can think for itself, plan actions, adapt to the game environment, and act dynamically. It consists of Core components like RpaiGoalBase, RpaiActionBase, RpaiPlannerBase, RpaiReasonerBase, and RpaiBrainComponent, as well as Composer components for easier integration by Game Designers. The tool is extensible, cross-compatible with Behavior Trees, and offers debugging features like visual logging and heuristics testing. It follows a simple path of execution and supports versioning for stability and compatibility with Unreal Engine versions.
VoiceStreamAI
VoiceStreamAI is a Python 3-based server and JavaScript client solution for near-realtime audio streaming and transcription using WebSocket. It employs Huggingface's Voice Activity Detection (VAD) and OpenAI's Whisper model for accurate speech recognition. The system features real-time audio streaming, modular design for easy integration of VAD and ASR technologies, customizable audio chunk processing strategies, support for multilingual transcription, and secure sockets support. It uses a factory and strategy pattern implementation for flexible component management and provides a unit testing framework for robust development.
nx_open
The `nx_open` repository contains open-source components for the Network Optix Meta Platform, used to build products like Nx Witness Video Management System. It includes source code, specifications, and a Desktop Client. The repository is licensed under Mozilla Public License 2.0. Users can build the Desktop Client and customize it using a zip file. The build environment supports Windows, Linux, and macOS platforms with specific prerequisites. The repository provides scripts for building, signing executable files, and running the Desktop Client. Compatibility with VMS Server versions is crucial, and automatic VMS updates are disabled for the open-source Desktop Client.
warc-gpt
WARC-GPT is an experimental retrieval augmented generation pipeline for web archive collections. It allows users to interact with WARC files, extract text, generate text embeddings, visualize embeddings, and interact with a web UI and API. The tool is highly customizable, supporting various LLMs, providers, and embedding models. Users can configure the application using environment variables, ingest WARC files, start the server, and interact with the web UI and API to search for content and generate text completions. WARC-GPT is designed for exploration and experimentation in exploring web archives using AI.
mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic
For similar tasks
Academic_LLM_Sec_Papers
Academic_LLM_Sec_Papers is a curated collection of academic papers related to LLM Security Application. The repository includes papers sorted by conference name and published year, covering topics such as large language models for blockchain security, software engineering, machine learning, and more. Developers and researchers are welcome to contribute additional published papers to the list. The repository also provides information on listed conferences and journals related to security, networking, software engineering, and cryptography. The papers cover a wide range of topics including privacy risks, ethical concerns, vulnerabilities, threat modeling, code analysis, fuzzing, and more.
HackBot
HackBot is an AI-powered cybersecurity chatbot designed to provide accurate answers to cybersecurity-related queries, conduct code analysis, and scan analysis. It utilizes the Meta-LLama2 AI model through the 'LlamaCpp' library to respond coherently. The chatbot offers features like local AI/Runpod deployment support, cybersecurity chat assistance, interactive interface, clear output presentation, static code analysis, and vulnerability analysis. Users can interact with HackBot through a command-line interface and utilize it for various cybersecurity tasks.
vulnerability-analysis
The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.
watchtower
AIShield Watchtower is a tool designed to fortify the security of AI/ML models and Jupyter notebooks by automating model and notebook discoveries, conducting vulnerability scans, and categorizing risks into 'low,' 'medium,' 'high,' and 'critical' levels. It supports scanning of public GitHub repositories, Hugging Face repositories, AWS S3 buckets, and local systems. The tool generates comprehensive reports, offers a user-friendly interface, and aligns with industry standards like OWASP, MITRE, and CWE. It aims to address the security blind spots surrounding Jupyter notebooks and AI models, providing organizations with a tailored approach to enhancing their security efforts.
LLM-PLSE-paper
LLM-PLSE-paper is a repository focused on the applications of Large Language Models (LLMs) in Programming Language and Software Engineering (PL/SE) domains. It covers a wide range of topics including bug detection, specification inference and verification, code generation, fuzzing and testing, code model and reasoning, code understanding, IDE technologies, prompting for reasoning tasks, and agent/tool usage and planning. The repository provides a comprehensive collection of research papers, benchmarks, empirical studies, and frameworks related to the capabilities of LLMs in various PL/SE tasks.
invariant
Invariant Analyzer is an open-source scanner designed for LLM-based AI agents to find bugs, vulnerabilities, and security threats. It scans agent execution traces to identify issues like looping behavior, data leaks, prompt injections, and unsafe code execution. The tool offers a library of built-in checkers, an expressive policy language, data flow analysis, real-time monitoring, and extensible architecture for custom checkers. It helps developers debug AI agents, scan for security violations, and prevent security issues and data breaches during runtime. The analyzer leverages deep contextual understanding and a purpose-built rule matching engine for security policy enforcement.
OpenRedTeaming
OpenRedTeaming is a repository focused on red teaming for generative models, specifically large language models (LLMs). The repository provides a comprehensive survey on potential attacks on GenAI and robust safeguards. It covers attack strategies, evaluation metrics, benchmarks, and defensive approaches. The repository also implements over 30 auto red teaming methods. It includes surveys, taxonomies, attack strategies, and risks related to LLMs. The goal is to understand vulnerabilities and develop defenses against adversarial attacks on large language models.
Awesome-LLM4Cybersecurity
The repository 'Awesome-LLM4Cybersecurity' provides a comprehensive overview of the applications of Large Language Models (LLMs) in cybersecurity. It includes a systematic literature review covering topics such as constructing cybersecurity-oriented domain LLMs, potential applications of LLMs in cybersecurity, and research directions in the field. The repository analyzes various benchmarks, datasets, and applications of LLMs in cybersecurity tasks like threat intelligence, fuzzing, vulnerabilities detection, insecure code generation, program repair, anomaly detection, and LLM-assisted attacks.
For similar jobs
vulnerability-analysis
The NVIDIA AI Blueprint for Vulnerability Analysis for Container Security showcases accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation time from days to seconds. It enables security analysts to determine software package vulnerabilities using large language models (LLMs) and retrieval-augmented generation (RAG). The blueprint is designed for security analysts, IT engineers, and AI practitioners in cybersecurity. It requires NVAIE developer license and API keys for vulnerability databases, search engines, and LLM model services. Hardware requirements include L40 GPU for pipeline operation and optional LLM NIM and Embedding NIM. The workflow involves LLM pipeline for CVE impact analysis, utilizing LLM planner, agent, and summarization nodes. The blueprint uses NVIDIA NIM microservices and Morpheus Cybersecurity AI SDK for vulnerability analysis.
ciso-assistant-community
CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.
PurpleLlama
Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.
vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.
taranis-ai
Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.
NightshadeAntidote
Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.
h4cker
This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.
AIMr
AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.