ctinexus
CTINexus is a framework that leverages optimized in-context learning of LLMs to enable data-efficient extraction of cyber threat intelligence and the construction of high-quality cybersecurity knowledge graphs.
Stars: 70
CTINexus is a framework that leverages optimized in-context learning of large language models to automatically extract cyber threat intelligence from unstructured text and construct cybersecurity knowledge graphs. It processes threat intelligence reports to extract cybersecurity entities, identify relationships between security concepts, and construct knowledge graphs with interactive visualizations. The framework requires minimal configuration, with no extensive training data or parameter tuning needed.
README:
π¦ [2025/10] CTINexus Python package released! Install with pip install ctinexus for seamless integration into your Python projects.
π [2025/07] CTINexus now features an intuitive Gradio interface! Submit threat intelligence text and instantly visualize extracted interactive graphs.
π₯ [2025/04] We released the camera-ready paper on arxiv.
π₯ [2025/02] CTINexus is accepted at 2025 IEEE European Symposium on Security and Privacy (Euro S&P).
- Overview
- Features
- Supported AI Providers
- Getting Started
- Command Line Interface
- Contributing
- Citation
- License
CTINexus is a framework that leverages optimized in-context learning (ICL) of large language models (LLMs) to automatically extract cyber threat intelligence (CTI) from unstructured text and construct cybersecurity knowledge graphs (CSKG).
The framework processes threat intelligence reports to:
- π Extract cybersecurity entities (malware, vulnerabilities, tactics, IOCs)
- π Identify relationships between security concepts
- π Construct knowledge graphs with interactive visualizations
- β‘ Require minimal configuration - no extensive training data or parameter tuning needed
-
Intelligence Extraction (IE)
- Automatically extracts cybersecurity entities and relationships from unstructured text
- Uses optimized prompt construction and demonstration retrieval
-
Hierarchical Entity Alignment
- Entity Typing (ET): Classifies entities by semantic type
- Entity Merging (EM): Canonicalizes entities and removes redundancy with IOC protection
-
Link Prediction (LP)
- Predicts and adds missing relationships to complete the knowledge graph
-
Interactive Visualization
- Network graph visualization of the constructed cybersecurity knowledge graph
CTINexus supports multiple AI providers for flexibility:
| Provider | Models | Setup Required |
|---|---|---|
| OpenAI | GPT-4, GPT-4o, o1, o3, etc. | API Key |
| Google Gemini | Gemini 2.0, 2.5 Flash, etc. | API Key |
| AWS Bedrock | Claude, Nova, Llama, DeepSeek, etc. | AWS Credentials |
| Ollama | Llama, Mistral, Qwen, Gemma, etc. | Local Installation (FREE) |
Note: When using Ollama models, use the π Ollama Setup Guide.
pip install ctinexusCreate a .env file in your project directory with credentials for at least one provider. Look at .env.example for reference.
To route requests through a custom OpenAI-compatible gateway, set:
-
CUSTOM_BASE_URL(for example,https://gateway.example.com/v1) -
CUSTOM_API_KEY(if needed)
from ctinexus import process_cti_report
from dotenv import load_dotenv
# Load API credentials
load_dotenv()
# Process threat intelligence text
text = """
APT29 used PowerShell to download additional malware from command-and-control
server at 192.168.1.100. The attack exploited CVE-2023-1234 in Microsoft Exchange.
"""
result = process_cti_report(
text=text,
provider="openai", # optional: auto-detected if not specified
model="gpt-4", # optional: uses default if not specified
similarity_threshold=0.6,
output="results.json" # optional: save results to file
)
# Access results
print(f"Graph saved to: {result['entity_relation_graph']}")
# Open the HTML file in your browser to view the interactive graph
# Or process from a CTI report/blog URL
result = process_cti_report(
source_url="https://example.com/threat-report",
provider="openai",
model="gpt-4",
)API Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str | None | Threat intelligence text to process (required if source_url is not provided) |
source_url |
str | None | CTI report/blog URL to ingest and process (required if text is not provided) |
provider |
str | Auto-detect |
"openai", "gemini", "aws", or "ollama"
|
model |
str | Provider default | Model name (e.g., "gpt-4o", "gemini-2.0-flash") |
embedding_model |
str | Provider default | Embedding model for entity alignment |
similarity_threshold |
float | 0.6 | Entity similarity threshold (0.0-1.0) |
output |
str | None | Path to save JSON results |
Note: text and source_url are mutually exclusive. Provide exactly one input source.
Return Value:
The function returns a dictionary with complete analysis results:
{
"text": "Original input text",
"IE": {"triplets": [...]}, # Extracted entities and relationships
"ET": {"typed_triplets": [...]}, # Entities with type classifications
"EA": {"aligned_triplets": [...]}, # Canonicalized entities
"LP": {"predicted_links": [...]}, # Predicted relationships
"entity_relation_graph": "path/to/graph.html" # Interactive visualization
}git clone https://github.com/peng-gao-lab/CTINexus.git
cd CTINexus
# Create and activate virtual environment
python -m venv .venv
# Activate (macOS/Linux)
source .venv/bin/activate
# Activate (Windows)
# .venv\Scripts\activate
# Install the package
pip install -e .# Copy the example environment file
cp .env.example .env
# Edit .env with your credentials1. Launch the application:
ctinexus2. Access the web interface:
Open your browser to: http://127.0.0.1:7860
3. Process threat intelligence:
- Paste threat intelligence text into the input area
- Select your AI provider and model from dropdowns
- Click "Run" to analyze
- View extracted entities, relationships, and interactive graph
- Export results as JSON or save graph images
Prerequisites:
- Install Docker Desktop
Setup:
# Clone the repository
git clone https://github.com/peng-gao-lab/CTINexus.git
cd CTINexus
# Copy environment template
cp .env.example .env
# Edit .env with your credentials1. Build and start:
# Run in foreground
docker compose up --build
# OR run in background (detached mode)
docker compose up -d --build
# View logs (if running in background)
docker compose logs -f2. Access the application:
Open your browser to: http://localhost:8000
3. Process threat intelligence:
- Paste threat intelligence text into the input area
- Select your AI provider and model from dropdowns
- Click "Run" to analyze
- View extracted entities, relationships, and interactive graph
- Export results as JSON or save graph images
The CLI works with any installation method and is perfect for automation and batch processing.
# Process a file
ctinexus --input-file report.txt
# Process text directly
ctinexus --text "APT29 exploited CVE-2023-1234 using PowerShell..."
# Specify provider and model
ctinexus -i report.txt --provider openai --model gpt-4o
# Save to custom location
ctinexus -i report.txt --output results/analysis.jsonπ Complete CLI Documentation - Detailed examples and all available options.
We warmly welcome contributions from the community! Whether you're interested in:
- π Fix bugs or add features
- π Improve documentation
- π¨ Enhance the UI/UX
- π§ͺ Add tests or examples
Please check out our Contributing Guide for detailed information on how to get started, development setup, and submission guidelines.
If you use CTINexus in your research, please cite our paper:
@inproceedings{cheng2025ctinexusautomaticcyberthreat,
title={CTINexus: Automatic Cyber Threat Intelligence Knowledge Graph Construction Using Large Language Models},
author={Yutong Cheng and Osama Bajaber and Saimon Amanuel Tsegai and Dawn Song and Peng Gao},
booktitle={2025 IEEE European Symposium on Security and Privacy (EuroS\&P)},
year={2025},
organization={IEEE}
}The source code is licensed under the MIT License. We warmly welcome industry collaboration. If youβre interested in building on CTINexus or exploring joint initiatives, please email [email protected] or [email protected], weβd be happy to set up a brief call to discuss ideas.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ctinexus
Similar Open Source Tools
ctinexus
CTINexus is a framework that leverages optimized in-context learning of large language models to automatically extract cyber threat intelligence from unstructured text and construct cybersecurity knowledge graphs. It processes threat intelligence reports to extract cybersecurity entities, identify relationships between security concepts, and construct knowledge graphs with interactive visualizations. The framework requires minimal configuration, with no extensive training data or parameter tuning needed.
MassGen
MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other's progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result. The system operates through an architecture designed for seamless multi-agent collaboration, with key features including cross-model/agent synergy, parallel processing, intelligence sharing, consensus building, and live visualization. Users can install the system, configure API settings, and run MassGen for various tasks such as question answering, creative writing, research, development & coding tasks, and web automation & browser tasks. The roadmap includes plans for advanced agent collaboration, expanded model, tool & agent integration, improved performance & scalability, enhanced developer experience, and a web interface.
alphora
Alphora is a full-stack framework for building production AI agents, providing agent orchestration, prompt engineering, tool execution, memory management, streaming, and deployment with an async-first, OpenAI-compatible design. It offers features like agent derivation, reasoning-action loop, async streaming, visual debugger, OpenAI compatibility, multimodal support, tool system with zero-config tools and type safety, prompt engine with dynamic prompts, memory and storage management, sandbox for secure execution, deployment as API, and more. Alphora allows users to build sophisticated AI agents easily and efficiently.
QuantaAlpha
QuantaAlpha is a framework designed for factor mining in quantitative alpha research. It combines LLM intelligence with evolutionary strategies to automatically mine, evolve, and validate alpha factors through self-evolving trajectories. The framework provides a trajectory-based approach with diversified planning initialization and structured hypothesis-code constraint. Users can describe their research direction and observe the automatic factor mining process. QuantaAlpha aims to transform how quantitative alpha factors are discovered by leveraging advanced technologies and self-evolving methodologies.
indexify
Indexify is an open-source engine for building fast data pipelines for unstructured data (video, audio, images, and documents) using reusable extractors for embedding, transformation, and feature extraction. LLM Applications can query transformed content friendly to LLMs by semantic search and SQL queries. Indexify keeps vector databases and structured databases (PostgreSQL) updated by automatically invoking the pipelines as new data is ingested into the system from external data sources. **Why use Indexify** * Makes Unstructured Data **Queryable** with **SQL** and **Semantic Search** * **Real-Time** Extraction Engine to keep indexes **automatically** updated as new data is ingested. * Create **Extraction Graph** to describe **data transformation** and extraction of **embedding** and **structured extraction**. * **Incremental Extraction** and **Selective Deletion** when content is deleted or updated. * **Extractor SDK** allows adding new extraction capabilities, and many readily available extractors for **PDF**, **Image**, and **Video** indexing and extraction. * Works with **any LLM Framework** including **Langchain**, **DSPy**, etc. * Runs on your laptop during **prototyping** and also scales to **1000s of machines** on the cloud. * Works with many **Blob Stores**, **Vector Stores**, and **Structured Databases** * We have even **Open Sourced Automation** to deploy to Kubernetes in production.
logicstamp-context
LogicStamp Context is a static analyzer that extracts deterministic component contracts from TypeScript codebases, providing structured architectural context for AI coding assistants. It helps AI assistants understand architecture by extracting props, hooks, and dependencies without implementation noise. The tool works with React, Next.js, Vue, Express, and NestJS, and is compatible with various AI assistants like Claude, Cursor, and MCP agents. It offers features like watch mode for real-time updates, breaking change detection, and dependency graph creation. LogicStamp Context is a security-first tool that protects sensitive data, runs locally, and is non-opinionated about architectural decisions.
trpc-agent-go
A powerful Go framework for building intelligent agent systems with large language models (LLMs), hierarchical planners, memory, telemetry, and a rich tool ecosystem. tRPC-Agent-Go enables the creation of autonomous or semi-autonomous agents that reason, call tools, collaborate with sub-agents, and maintain long-term state. The framework provides detailed documentation, examples, and tools for accelerating the development of AI applications.
quantalogic
QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.
code_puppy
Code Puppy is an AI-powered code generation agent designed to understand programming tasks, generate high-quality code, and explain its reasoning. It supports multi-language code generation, interactive CLI, and detailed code explanations. The tool requires Python 3.9+ and API keys for various models like GPT, Google's Gemini, Cerebras, and Claude. It also integrates with MCP servers for advanced features like code search and documentation lookups. Users can create custom JSON agents for specialized tasks and access a variety of tools for file management, code execution, and reasoning sharing.
R2R
R2R (RAG to Riches) is a fast and efficient framework for serving high-quality Retrieval-Augmented Generation (RAG) to end users. The framework is designed with customizable pipelines and a feature-rich FastAPI implementation, enabling developers to quickly deploy and scale RAG-based applications. R2R was conceived to bridge the gap between local LLM experimentation and scalable production solutions. **R2R is to LangChain/LlamaIndex what NextJS is to React**. A JavaScript client for R2R deployments can be found here. ### Key Features * **π Deploy** : Instantly launch production-ready RAG pipelines with streaming capabilities. * **π§© Customize** : Tailor your pipeline with intuitive configuration files. * **π Extend** : Enhance your pipeline with custom code integrations. * **βοΈ Autoscale** : Scale your pipeline effortlessly in the cloud using SciPhi. * **π€ OSS** : Benefit from a framework developed by the open-source community, designed to simplify RAG deployment.
FDAbench
FDABench is a benchmark tool designed for evaluating data agents' reasoning ability over heterogeneous data in analytical scenarios. It offers 2,007 tasks across various data sources, domains, difficulty levels, and task types. The tool provides ready-to-use data agent implementations, a DAG-based evaluation system, and a framework for agent-expert collaboration in dataset generation. Key features include data agent implementations, comprehensive evaluation metrics, multi-database support, different task types, extensible framework for custom agent integration, and cost tracking. Users can set up the environment using Python 3.10+ on Linux, macOS, or Windows. FDABench can be installed with a one-command setup or manually. The tool supports API configuration for LLM access and offers quick start guides for database download, dataset loading, and running examples. It also includes features like dataset generation using the PUDDING framework, custom agent integration, evaluation metrics like accuracy and rubric score, and a directory structure for easy navigation.
sdk
The Kubeflow SDK is a set of unified Pythonic APIs that simplify running AI workloads at any scale without needing to learn Kubernetes. It offers consistent APIs across the Kubeflow ecosystem, enabling users to focus on building AI applications rather than managing complex infrastructure. The SDK provides a unified experience, simplifies AI workloads, is built for scale, allows rapid iteration, and supports local development without a Kubernetes cluster.
mcp-documentation-server
The mcp-documentation-server is a lightweight server application designed to serve documentation files for projects. It provides a simple and efficient way to host and access project documentation, making it easy for team members and stakeholders to find and reference important information. The server supports various file formats, such as markdown and HTML, and allows for easy navigation through the documentation. With mcp-documentation-server, teams can streamline their documentation process and ensure that project information is easily accessible to all involved parties.
automem
AutoMem is a production-grade long-term memory system for AI assistants, achieving 90.53% accuracy on the LoCoMo benchmark. It combines FalkorDB (Graph) and Qdrant (Vectors) storage systems to store, recall, connect, learn, and perform with memories. AutoMem enables AI assistants to remember, connect, and evolve their understanding over time, similar to human long-term memory. It implements techniques from peer-reviewed memory research and offers features like multi-hop bridge discovery, knowledge graphs that evolve, 9-component hybrid scoring, memory consolidation cycles, background intelligence, 11 relationship types, and more. AutoMem is benchmark-proven, research-validated, and production-ready, with features like sub-100ms recall, concurrent writes, automatic retries, health monitoring, dual storage redundancy, and automated backups.
agentfield
AgentField is an open-source control plane designed for autonomous AI agents, providing infrastructure for agents to make decisions beyond chatbots. It offers features like scaling infrastructure, routing & discovery, async execution, durable state, observability, trust infrastructure with cryptographic identity, verifiable credentials, and policy enforcement. Users can write agents in Python, Go, TypeScript, or interact via REST APIs. The tool enables the creation of AI backends that reason autonomously within defined boundaries, offering predictability and flexibility. AgentField aims to bridge the gap between AI frameworks and production-ready infrastructure for AI agents.
ai-counsel
AI Counsel is a true deliberative consensus MCP server where AI models engage in actual debate, refine positions across multiple rounds, and converge with voting and confidence levels. It features two modes (quick and conference), mixed adapters (CLI tools and HTTP services), auto-convergence, structured voting, semantic grouping, model-controlled stopping, evidence-based deliberation, local model support, data privacy, context injection, semantic search, fault tolerance, and full transcripts. Users can run local and cloud models to deliberate on various questions, ground decisions in reality by querying code and files, and query past decisions for analysis. The tool is designed for critical technical decisions requiring multi-model deliberation and consensus building.
For similar tasks
awesome-business-of-cybersecurity
The 'Awesome Business of Cybersecurity' repository is a comprehensive resource exploring the cybersecurity market, focusing on publicly traded companies, industry strategy, and AI capabilities. It provides insights into how cybersecurity companies operate, compete, and evolve across 18 solution categories and beyond. The repository offers structured information on the cybersecurity market snapshot, specialists vs. multiservice cybersecurity companies, cybersecurity stock lists, endpoint protection and threat detection, network security, identity and access management, cloud and application security, data protection and governance, security analytics and threat intelligence, non-US traded cybersecurity companies, cybersecurity ETFs, blogs and newsletters, podcasts, market insights and research, and cybersecurity solutions categories.
ctinexus
CTINexus is a framework that leverages optimized in-context learning of large language models to automatically extract cyber threat intelligence from unstructured text and construct cybersecurity knowledge graphs. It processes threat intelligence reports to extract cybersecurity entities, identify relationships between security concepts, and construct knowledge graphs with interactive visualizations. The framework requires minimal configuration, with no extensive training data or parameter tuning needed.
DAMO-ConvAI
DAMO-ConvAI is the official repository for Alibaba DAMO Conversational AI. It contains the codebase for various conversational AI models and tools developed by Alibaba Research. These models and tools cover a wide range of tasks, including natural language understanding, natural language generation, dialogue management, and knowledge graph construction. DAMO-ConvAI is released under the MIT license and is available for use by researchers and developers in the field of conversational AI.
For similar jobs
ciso-assistant-community
CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.
PurpleLlama
Purple Llama is an umbrella project that aims to provide tools and evaluations to support responsible development and usage of generative AI models. It encompasses components for cybersecurity and input/output safeguards, with plans to expand in the future. The project emphasizes a collaborative approach, borrowing the concept of purple teaming from cybersecurity, to address potential risks and challenges posed by generative AI. Components within Purple Llama are licensed permissively to foster community collaboration and standardize the development of trust and safety tools for generative AI.
vpnfast.github.io
VPNFast is a lightweight and fast VPN service provider that offers secure and private internet access. With VPNFast, users can protect their online privacy, bypass geo-restrictions, and secure their internet connection from hackers and snoopers. The service provides high-speed servers in multiple locations worldwide, ensuring a reliable and seamless VPN experience for users. VPNFast is easy to use, with a user-friendly interface and simple setup process. Whether you're browsing the web, streaming content, or accessing sensitive information, VPNFast helps you stay safe and anonymous online.
taranis-ai
Taranis AI is an advanced Open-Source Intelligence (OSINT) tool that leverages Artificial Intelligence to revolutionize information gathering and situational analysis. It navigates through diverse data sources like websites to collect unstructured news articles, utilizing Natural Language Processing and Artificial Intelligence to enhance content quality. Analysts then refine these AI-augmented articles into structured reports that serve as the foundation for deliverables such as PDF files, which are ultimately published.
NightshadeAntidote
Nightshade Antidote is an image forensics tool used to analyze digital images for signs of manipulation or forgery. It implements several common techniques used in image forensics including metadata analysis, copy-move forgery detection, frequency domain analysis, and JPEG compression artifacts analysis. The tool takes an input image, performs analysis using the above techniques, and outputs a report summarizing the findings.
h4cker
This repository is a comprehensive collection of cybersecurity-related references, scripts, tools, code, and other resources. It is carefully curated and maintained by Omar Santos. The repository serves as a supplemental material provider to several books, video courses, and live training created by Omar Santos. It encompasses over 10,000 references that are instrumental for both offensive and defensive security professionals in honing their skills.
AIMr
AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.
admyral
Admyral is an open-source Cybersecurity Automation & Investigation Assistant that provides a unified console for investigations and incident handling, workflow automation creation, automatic alert investigation, and next step suggestions for analysts. It aims to tackle alert fatigue and automate security workflows effectively by offering features like workflow actions, AI actions, case management, alert handling, and more. Admyral combines security automation and case management to streamline incident response processes and improve overall security posture. The tool is open-source, transparent, and community-driven, allowing users to self-host, contribute, and collaborate on integrations and features.


