
VeritasGraph
VeritasGraph: Enterprise-Grade Graph RAG for Secure, On-Premise AI with Verifiable Attribution
Stars: 83

VeritasGraph is an enterprise-grade graph RAG framework designed for secure, on-premise AI applications. It leverages a knowledge graph to perform complex, multi-hop reasoning, providing transparent, auditable reasoning paths with full source attribution. The framework excels at answering complex questions that traditional vector search engines struggle with, ensuring trust and reliability in enterprise AI. VeritasGraph offers full control over data and AI models, verifiable attribution for every claim, advanced graph reasoning capabilities, and open-source deployment with sovereignty and customization.
README:
Enterprise-Grade Graph RAG for Secure, On-Premise AI with Verifiable Attribution
VeritasGraph is a production-ready, end-to-end framework for building advanced question-answering and summarization systems that operate entirely within your private infrastructure.
It is architected to overcome the fundamental limitations of traditional vector-search-based Retrieval-Augmented Generation (RAG) by leveraging a knowledge graph to perform complex, multi-hop reasoning.
Baseline RAG systems excel at finding direct answers but falter when faced with questions that require connecting disparate information or understanding a topic holistically. VeritasGraph addresses this challenge directly, providing not just answers, but transparent, auditable reasoning paths with full source attribution for every generated claim, establishing a new standard for trust and reliability in enterprise AI.
Maintain 100% control over your data and AI models, ensuring maximum security and privacy.
Every generated claim is traced back to its source document, guaranteeing transparency and accountability.
Answer complex, multi-hop questions that go beyond the capabilities of traditional vector search engines.
Build a sovereign knowledge asset, free from vendor lock-in, with full ownership and customization.
A brief video demonstrating the core functionality of VeritasGraph, from data ingestion to multi-hop querying with full source attribution.
The following diagram illustrates the end-to-end pipeline of the VeritasGraph system:
graph TD
subgraph "Indexing Pipeline (One-Time Process)"
A --> B{Document Chunking};
B --> C{"LLM-Powered Extraction<br/>(Entities & Relationships)"};
C --> D[Vector Index];
C --> E[Knowledge Graph];
end
subgraph "Query Pipeline (Real-Time)"
F[User Query] --> G{Hybrid Retrieval Engine};
G -- "1. Vector Search for Entry Points" --> D;
G -- "2. Multi-Hop Graph Traversal" --> E;
G --> H{Pruning & Re-ranking};
H -- "Rich Reasoning Context" --> I{LoRA-Tuned LLM Core};
I -- "Generated Answer + Provenance" --> J{Attribution & Provenance Layer};
J --> K[Attributed Answer];
end
style A fill:#f2f2f2,stroke:#333,stroke-width:2px
style F fill:#e6f7ff,stroke:#333,stroke-width:2px
style K fill:#e6ffe6,stroke:#333,stroke-width:2px
I'm using Ollama ( llama3.1) on Windows and Ollama (nomic-text-embed) for text embeddings
Please don't use WSL if you use LM studio for embeddings because it will have issues connecting to the services on Windows (LM studio)
Ollama's default context length is 2048, which might truncate the input and output when indexing
I'm using 12k context here (10*1024=12288), I tried using 10k before, but the results still gets truncated
Input / Output truncated might get you a completely out of context report in local search!!
Note that if you change the model in setttings.yaml
and try to reindex, it will restart the whole indexing!
First, pull the models we need to use
ollama serve
# in another terminal
ollama pull llama3.1
ollama pull nomic-embed-text
Then build the model with the Modelfile
in this repo
ollama create llama3.1-12k -f ./Modelfile
First, activate the conda enviroment
conda create -n rag python=<any version below 3.12>
conda activate rag
Clone this project then cd the directory
cd graphrag-ollama-config
Then pull the code of graphrag (I'm using a local fix for graphrag here) and install the package
cd graphrag-ollama
pip install -e ./
You can skip this step if you used this repo, but this is for initializing the graphrag folder
pip install sympy
pip install future
pip install ollama
python -m graphrag.index --init --root .
Create your .env
file
cp .env.example .env
Move your input text to ./input/
Double check the parameters in .env
and settings.yaml
, make sure in setting.yaml
,
it should be "community_reports" instead of "community_report"
Then finetune the prompts (this is important, this will generate a much better result)
You can find more about how to tune prompts here
python -m graphrag.prompt_tune --root . --domain "Christmas" --method random --limit 20 --language English --max-tokens 2048 --chunk-size 256 --no-entity-types --output ./prompts
Then you can start the indexing
python -m graphrag.index --root .
You can check the logs in ./output/<timestamp>/reports/indexing-engine.log
for errors
Test a global query
python -m graphrag.query \
--root . \
--method global \
"What are the top themes in this story?"
First, make sure requirements are installed
pip install -r requirements.txt
Then run the app using
gradio app.py
To use the app, visit http://127.0.0.1:7860/
- Core Capabilities
- The Architectural Blueprint
- Beyond Semantic Search
- Secure On-Premise Deployment Guide
- API Usage & Examples
- Project Philosophy & Future Roadmap
- Acknowledgments & Citations
VeritasGraph integrates four critical components into a cohesive, powerful, and secure system:
- Multi-Hop Graph Reasoning – Move beyond semantic similarity to traverse complex relationships within your data.
- Efficient LoRA-Tuned LLM – Fine-tuned using Low-Rank Adaptation for efficient, powerful on-premise deployment.
- End-to-End Source Attribution – Every statement is linked back to specific source documents and reasoning paths.
- Secure & Private On-Premise Architecture – Fully deployable within your infrastructure, ensuring data sovereignty.
The VeritasGraph pipeline transforms unstructured documents into a structured knowledge graph for attributable reasoning.
-
Document Chunking – Segment input docs into granular
TextUnits
. -
Entity & Relationship Extraction – LLM extracts structured triplets
(head, relation, tail)
. - Graph Assembly – Nodes + edges stored in a graph database (e.g., Neo4j).
- Query Analysis & Entry-Point Identification – Vector search finds relevant entry nodes.
- Contextual Expansion via Multi-Hop Traversal – Graph traversal uncovers hidden relationships.
- Pruning & Re-Ranking – Removes noise, keeps most relevant facts for reasoning.
- Augmented Prompting – Context formatted with query, sources, and instructions.
- LLM Generation – Locally hosted, LoRA-tuned open-source model generates attributed answers.
- LoRA Fine-Tuning – Specialization for reasoning + attribution with efficiency.
- Metadata Propagation – Track source IDs, chunks, and graph nodes.
- Traceable Generation – Model explicitly cites sources.
- Structured Attribution Output – JSON object with provenance + reasoning trail.
Traditional RAG fails at complex reasoning (e.g., linking an engineer across projects and patents).
VeritasGraph succeeds by combining:
- Semantic search → finds entry points.
- Graph traversal → connects the dots.
- LLM reasoning → synthesizes final answer with citations.
Hardware
- CPU: 16+ cores
- RAM: 64GB+ (128GB recommended)
- GPU: NVIDIA GPU with 24GB+ VRAM (A100, H100, RTX 4090)
Software
- Docker & Docker Compose
- Python 3.10+
- NVIDIA Container Toolkit
- Copy
.env.example
→.env
- Populate with environment-specific values
VeritasGraph is founded on the principle that the most powerful AI systems should also be the most transparent, secure, and controllable.
The project's philosophy is a commitment to democratizing enterprise-grade AI, providing organizations with the tools to build their own sovereign knowledge assets.
This stands in contrast to reliance on opaque, proprietary, cloud-based APIs, empowering organizations to maintain full control over their data and reasoning processes.
Planned future enhancements include:
-
Expanded Database Support – Integration with more graph databases and vector stores.
-
Advanced Graph Analytics – Community detection and summarization for holistic dataset insights (inspired by Microsoft’s GraphRAG).
-
Agentic Framework – Multi-step reasoning tasks, breaking down complex queries into sub-queries.
-
Visualization UI – A web interface for graph exploration and attribution path inspection.
This project builds upon the foundational research and open-source contributions of the AI community.
We acknowledge the influence of the following works:
-
HopRAG – pioneering research on graph-structured RAG and multi-hop reasoning.
-
Microsoft GraphRAG – comprehensive approach to knowledge graph extraction and community-based reasoning.
-
LangChain & LlamaIndex – robust ecosystems that accelerate modular RAG system development.
-
Neo4j – foundational graph database technology enabling scalable Graph RAG implementations.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for VeritasGraph
Similar Open Source Tools

VeritasGraph
VeritasGraph is an enterprise-grade graph RAG framework designed for secure, on-premise AI applications. It leverages a knowledge graph to perform complex, multi-hop reasoning, providing transparent, auditable reasoning paths with full source attribution. The framework excels at answering complex questions that traditional vector search engines struggle with, ensuring trust and reliability in enterprise AI. VeritasGraph offers full control over data and AI models, verifiable attribution for every claim, advanced graph reasoning capabilities, and open-source deployment with sovereignty and customization.

MM-RLHF
MM-RLHF is a comprehensive project for aligning Multimodal Large Language Models (MLLMs) with human preferences. It includes a high-quality MLLM alignment dataset, a Critique-Based MLLM reward model, a novel alignment algorithm MM-DPO, and benchmarks for reward models and multimodal safety. The dataset covers image understanding, video understanding, and safety-related tasks with model-generated responses and human-annotated scores. The reward model generates critiques of candidate texts before assigning scores for enhanced interpretability. MM-DPO is an alignment algorithm that achieves performance gains with simple adjustments to the DPO framework. The project enables consistent performance improvements across 10 dimensions and 27 benchmarks for open-source MLLMs.

MARBLE
MARBLE (Multi-Agent Coordination Backbone with LLM Engine) is a modular framework for developing, testing, and evaluating multi-agent systems leveraging Large Language Models. It provides a structured environment for agents to interact in simulated environments, utilizing cognitive abilities and communication mechanisms for collaborative or competitive tasks. The framework features modular design, multi-agent support, LLM integration, shared memory, flexible environments, metrics and evaluation, industrial coding standards, and Docker support.

resume-job-matcher
Resume Job Matcher is a Python script that automates the process of matching resumes to a job description using AI. It leverages the Anthropic Claude API or OpenAI's GPT API to analyze resumes and provide a match score along with personalized email responses for candidates. The tool offers comprehensive resume processing, advanced AI-powered analysis, in-depth evaluation & scoring, comprehensive analytics & reporting, enhanced candidate profiling, and robust system management. Users can customize font presets, generate PDF versions of unified resumes, adjust logging level, change scoring model, modify AI provider, and adjust AI model. The final score for each resume is calculated based on AI-generated match score and resume quality score, ensuring content relevance and presentation quality are considered. Troubleshooting tips, best practices, contribution guidelines, and required Python packages are provided.

Vodalus-Expert-LLM-Forge
Vodalus Expert LLM Forge is a tool designed for crafting datasets and efficiently fine-tuning models using free open-source tools. It includes components for data generation, LLM interaction, RAG engine integration, model training, fine-tuning, and quantization. The tool is suitable for users at all levels and is accompanied by comprehensive documentation. Users can generate synthetic data, interact with LLMs, train models, and optimize performance for local execution. The tool provides detailed guides and instructions for setup, usage, and customization.

APOLLO
APOLLO is a memory-efficient optimizer designed for large language model (LLM) pre-training and full-parameter fine-tuning. It offers SGD-like memory cost with AdamW-level performance. The optimizer integrates low-rank approximation and optimizer state redundancy reduction to achieve significant memory savings while maintaining or surpassing the performance of Adam(W). Key contributions include structured learning rate updates for LLM training, approximated channel-wise gradient scaling in a low-rank auxiliary space, and minimal-rank tensor-wise gradient scaling. APOLLO aims to optimize memory efficiency during training large language models.

ai-flow
AI Flow is an open-source, user-friendly UI application that empowers you to seamlessly connect multiple AI models together, specifically leveraging the capabilities of multiples AI APIs such as OpenAI, StabilityAI and Replicate. In a nutshell, AI Flow provides a visual platform for crafting and managing AI-driven workflows, thereby facilitating diverse and dynamic AI interactions.

CortexON
CortexON is an open-source, multi-agent AI system designed to automate and simplify everyday tasks. It integrates specialized agents like Web Agent, File Agent, Coder Agent, Executor Agent, and API Agent to accomplish user-defined objectives. CortexON excels at executing complex workflows, research tasks, technical operations, and business process automations by dynamically coordinating the agents' unique capabilities. It offers advanced research automation, multi-agent orchestration, integration with third-party APIs, code generation and execution, efficient file and data management, and personalized task execution for travel planning, market analysis, educational content creation, and business intelligence.

koog
Koog is a Kotlin-based framework for building and running AI agents entirely in idiomatic Kotlin. It allows users to create agents that interact with tools, handle complex workflows, and communicate with users. Key features include pure Kotlin implementation, MCP integration, embedding capabilities, custom tool creation, ready-to-use components, intelligent history compression, powerful streaming API, persistent agent memory, comprehensive tracing, flexible graph workflows, modular feature system, scalable architecture, and multiplatform support.

omniscient
Omniscient is an advanced AI Platform offered as a SaaS, empowering projects with cutting-edge artificial intelligence capabilities. Seamlessly integrating with Next.js 14, React, Typescript, and APIs like OpenAI and Replicate, it provides solutions for code generation, conversation simulation, image creation, music composition, and video generation.

multi-agent-orchestrator
Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries to the most suitable agent based on context and content, supports dual language implementation in Python and TypeScript, offers flexible agent responses, context management across agents, extensible architecture for customization, universal deployment options, and pre-built agents and classifiers. It is suitable for various applications, from simple chatbots to sophisticated AI systems, accommodating diverse requirements and scaling efficiently.

preswald
Preswald is a full-stack platform for building, deploying, and managing interactive data applications in Python. It simplifies the process by combining ingestion, storage, transformation, and visualization into one lightweight SDK. With Preswald, users can connect to various data sources, customize app themes, and easily deploy apps locally. The platform focuses on code-first simplicity, end-to-end coverage, and efficiency by design, making it suitable for prototyping internal tools or deploying production-grade apps with reduced complexity and cost.

eliza
Eliza is a versatile AI agent operating system designed to support various models and connectors, enabling users to create chatbots, autonomous agents, handle business processes, create video game NPCs, and engage in trading. It offers multi-agent and room support, document ingestion and interaction, retrievable memory and document store, and extensibility to create custom actions and clients. Eliza is easy to use and provides a comprehensive solution for AI agent development.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

DevoxxGenieIDEAPlugin
Devoxx Genie is a Java-based IntelliJ IDEA plugin that integrates with local and cloud-based LLM providers to aid in reviewing, testing, and explaining project code. It supports features like code highlighting, chat conversations, and adding files/code snippets to context. Users can modify REST endpoints and LLM parameters in settings, including support for cloud-based LLMs. The plugin requires IntelliJ version 2023.3.4 and JDK 17. Building and publishing the plugin is done using Gradle tasks. Users can select an LLM provider, choose code, and use commands like review, explain, or generate unit tests for code analysis.

pyloid
Pyloid is a Python backend version of Electron and Tauri, simplifying desktop application development. Built on QtWebEngine and PySide6, it offers seamless integration with Python features, enabling easy creation of powerful applications. It provides web-based GUI generation, system tray icon support, multi-window management, bridge API between Python and JavaScript, single/multi-instance application support, comprehensive desktop app features, clean code structure, live UI development experience, cross-platform support, integration with frontend libraries, window customization, direct utilization of PySide6 features, and detailed Numpy-style docstrings.
For similar tasks

VeritasGraph
VeritasGraph is an enterprise-grade graph RAG framework designed for secure, on-premise AI applications. It leverages a knowledge graph to perform complex, multi-hop reasoning, providing transparent, auditable reasoning paths with full source attribution. The framework excels at answering complex questions that traditional vector search engines struggle with, ensuring trust and reliability in enterprise AI. VeritasGraph offers full control over data and AI models, verifiable attribution for every claim, advanced graph reasoning capabilities, and open-source deployment with sovereignty and customization.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.