HuixiangDou2

HuixiangDou2

HuixiangDou2: A Robustly Optimized GraphRAG Approach

Stars: 78

Visit
 screenshot

HuixiangDou2 is a robustly optimized GraphRAG approach that integrates multiple open-source projects to improve performance in graph-based augmented generation. It conducts comparative experiments and achieves a significant score increase, leading to a GraphRAG implementation with recognized performance. The repository provides code improvements, dense retrieval for querying entities and relationships, real domain knowledge testing, and impact analysis on accuracy.

README:

English | Simplified Chinese

HuixiangDou2: A Robustly Optimized GraphRAG Approach

Introduction

GraphRAG has many tuning spots, making it hard to discern whether performance gains stem from parameter adjustments or pipeline optimizations. Moreover, RAG test data is embedded in LLM training sets. LLM input tokens impact generation probabilities (background: phi-4 technical report). It's unclear if precision improvements originate from key token searches or retrievals.

Thus, HuixiangDou2 didn't introduce new methods but integrated multiple open-source projects (HuixiangDou, KAG, LightRAG, and DB-GPT, totaling 18k lines of code) and conducted comparative experiments on a test set where Qwen2.5-7B-Instruct underperformed. The score rose from 60 to 74.5. Ultimately, a GraphRAG implementation with performance recognized by human domain experts was developed. Here is the report.

Note: The impact of open-source on different fields/industries varies. Since licensing restriction, we can only give the code and test conclusions, and the test data cannot be provided.

Version Description

Compared to HuixiangDou1, this repo improves accuracy and async refactor:

  1. Graph Schema. Dense retrieval is only for querying similar entities and relationships.

  2. Ported/merged multiple open-source implementations, with code differences of nearly 18k lines:

    • Data. Organized a set of real domain knowledge that LLM has not fully seen for testing (gpt accuracy < 0.6)
    • Ablation. Confirmed the impact of different stages and parameters on accuracy
    • Improvement. As shown below.
  3. API remains compatible

If it is useful to you, please star it ⭐

Documentation

Acknowledgements

  • SiliconCloud Abundant LLM API, some models are free
  • KAG Graph retrieval based on reasoning
  • DB-GPT LLM tool collection
  • LightRAG Simple and efficient graph retrieval solution

Citation

@misc{huixiangdou2,
  author = {Huanjun Kong},
  title = {HuixiangDou2: A Graph-based Augmented Generation Approach},
  howpublished = {\url{https://github.com/tpoisonooo/HuixiangDou2}},
  year = {2025}
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for HuixiangDou2

Similar Open Source Tools

For similar tasks

For similar jobs