Empowering GraphRAG with Knowledge Filtering and Integration
Problem Statement
LLMs suffer from knowledge gaps and hallucinations, and GraphRAG was proposed to address this by integrating structured external knowledge graphs. However, existing GraphRAG systems retrieve noisy and irrelevant graph information that can degrade performance, and they excessively suppress the model's own parametric reasoning by over-relying on retrieved content, limiting their practical reliability.
Key Novelty
- Two-stage filtering mechanism (GraphRAG-Filtering) that refines retrieved graph knowledge to remove noise and irrelevant information before it reaches the LLM
- Logits-based selection strategy (GraphRAG-Integration) that dynamically balances external graph knowledge against the LLM's intrinsic reasoning to prevent over-reliance on retrievals
- A unified GraphRAG-FI framework that jointly addresses both retrieval quality and knowledge integration, demonstrated across multiple backbone LLMs on knowledge graph QA benchmarks
Evaluation Highlights
- GraphRAG-FI significantly improves reasoning performance on knowledge graph QA tasks compared to standard GraphRAG baselines across multiple backbone models
- The framework establishes consistent gains over both vanilla LLM inference and existing GraphRAG methods, demonstrating robustness across different model architectures
Breakthrough Assessment
Methodology
- Step 1 – Retrieval: Given a query, retrieve relevant subgraphs or triples from an external knowledge graph using standard GraphRAG retrieval mechanisms
- Step 2 – Two-stage filtering (GraphRAG-Filtering): Apply a two-stage filtering process to the retrieved graph information to remove noisy, redundant, or irrelevant content before it is passed to the LLM
- Step 3 – Logits-based integration (GraphRAG-Integration): At inference time, use a logits-based selection strategy to weigh and merge the LLM's intrinsic next-token predictions with those conditioned on the filtered external knowledge, adaptively reducing over-reliance on retrieved content
System Components
A two-stage filtering mechanism applied to graph-retrieved information that prunes noisy and irrelevant triples or subgraphs before they are incorporated into the LLM prompt, improving signal-to-noise ratio of external context
A logits-based selection strategy that operates at the token probability level to balance predictions derived from external graph knowledge against the LLM's own parametric knowledge, preventing over-reliance on potentially imperfect retrievals
Results
| Metric/Benchmark | Baseline (GraphRAG) | GraphRAG-FI | Delta |
|---|---|---|---|
| KG QA Accuracy (multiple backbones) | Standard GraphRAG performance | Significantly improved | Positive across all backbones |
| Reasoning reliability vs. noisy retrieval | Degrades with noisy graphs | Robust to noisy retrieval | Notable improvement |
| Over-reliance on external knowledge | High suppression of intrinsic LLM reasoning | Balanced integration | Improved intrinsic reasoning preservation |
Key Takeaways
- Filtering retrieved graph knowledge before feeding it to an LLM is crucial—noisy graph triples can actively hurt performance compared to no retrieval at all, making pre-integration filtering a necessary component in any GraphRAG pipeline
- Operating at the logits level to blend external and intrinsic knowledge is a practical strategy for preventing retrieval-augmented models from blindly trusting external sources, and can be applied as a post-hoc addition to existing GraphRAG systems
- Evaluating GraphRAG methods across multiple backbone LLMs is important for demonstrating robustness; practitioners should validate that retrieval augmentation strategies generalize beyond a single model family before deployment
Abstract
In recent years, large language models (LLMs) have revolutionized the field of natural language processing. However, they often suffer from knowledge gaps and hallucinations. Graph retrieval-augmented generation (GraphRAG) enhances LLM reasoning by integrating structured knowledge from external graphs. However, we identify two key challenges that plague GraphRAG:(1) Retrieving noisy and irrelevant information can degrade performance and (2)Excessive reliance on external knowledge suppresses the model's intrinsic reasoning. To address these issues, we propose GraphRAG-FI (Filtering and Integration), consisting of GraphRAG-Filtering and GraphRAG-Integration. GraphRAG-Filtering employs a two-stage filtering mechanism to refine retrieved information. GraphRAG-Integration employs a logits-based selection strategy to balance external knowledge from GraphRAG with the LLM's intrinsic reasoning,reducing over-reliance on retrievals. Experiments on knowledge graph QA tasks demonstrate that GraphRAG-FI significantly improves reasoning performance across multiple backbone models, establishing a more reliable and effective GraphRAG framework.