← Back to Papers

Empowering GraphRAG with Knowledge Filtering and Integration

Kai Guo, Harry Shomer, Shenglai Zeng, Haoyu Han, Yu Wang, Jiliang Tang
Conference on Empirical Methods in Natural Language Processing | 2025
GraphRAG-FI addresses two critical failure modes of Graph Retrieval-Augmented Generation—noisy retrieval and over-reliance on external knowledge—by introducing a two-stage filtering mechanism and a logits-based integration strategy that balances external graph knowledge with the LLM's intrinsic reasoning.

Problem Statement

LLMs suffer from knowledge gaps and hallucinations, and GraphRAG was proposed to address this by integrating structured external knowledge graphs. However, existing GraphRAG systems retrieve noisy and irrelevant graph information that can degrade performance, and they excessively suppress the model's own parametric reasoning by over-relying on retrieved content, limiting their practical reliability.

Key Novelty

  • Two-stage filtering mechanism (GraphRAG-Filtering) that refines retrieved graph knowledge to remove noise and irrelevant information before it reaches the LLM
  • Logits-based selection strategy (GraphRAG-Integration) that dynamically balances external graph knowledge against the LLM's intrinsic reasoning to prevent over-reliance on retrievals
  • A unified GraphRAG-FI framework that jointly addresses both retrieval quality and knowledge integration, demonstrated across multiple backbone LLMs on knowledge graph QA benchmarks

Evaluation Highlights

  • GraphRAG-FI significantly improves reasoning performance on knowledge graph QA tasks compared to standard GraphRAG baselines across multiple backbone models
  • The framework establishes consistent gains over both vanilla LLM inference and existing GraphRAG methods, demonstrating robustness across different model architectures

Breakthrough Assessment

5/10 GraphRAG-FI makes a solid, practical contribution by identifying and systematically addressing two well-motivated failure modes in GraphRAG, but the individual components (filtering and logit-based fusion) are evolutionary extensions of known techniques rather than fundamentally new paradigms.

Methodology

  1. Step 1 – Retrieval: Given a query, retrieve relevant subgraphs or triples from an external knowledge graph using standard GraphRAG retrieval mechanisms
  2. Step 2 – Two-stage filtering (GraphRAG-Filtering): Apply a two-stage filtering process to the retrieved graph information to remove noisy, redundant, or irrelevant content before it is passed to the LLM
  3. Step 3 – Logits-based integration (GraphRAG-Integration): At inference time, use a logits-based selection strategy to weigh and merge the LLM's intrinsic next-token predictions with those conditioned on the filtered external knowledge, adaptively reducing over-reliance on retrieved content

System Components

GraphRAG-Filtering

A two-stage filtering mechanism applied to graph-retrieved information that prunes noisy and irrelevant triples or subgraphs before they are incorporated into the LLM prompt, improving signal-to-noise ratio of external context

GraphRAG-Integration

A logits-based selection strategy that operates at the token probability level to balance predictions derived from external graph knowledge against the LLM's own parametric knowledge, preventing over-reliance on potentially imperfect retrievals

Results

Metric/Benchmark Baseline (GraphRAG) GraphRAG-FI Delta
KG QA Accuracy (multiple backbones) Standard GraphRAG performance Significantly improved Positive across all backbones
Reasoning reliability vs. noisy retrieval Degrades with noisy graphs Robust to noisy retrieval Notable improvement
Over-reliance on external knowledge High suppression of intrinsic LLM reasoning Balanced integration Improved intrinsic reasoning preservation

Key Takeaways

  • Filtering retrieved graph knowledge before feeding it to an LLM is crucial—noisy graph triples can actively hurt performance compared to no retrieval at all, making pre-integration filtering a necessary component in any GraphRAG pipeline
  • Operating at the logits level to blend external and intrinsic knowledge is a practical strategy for preventing retrieval-augmented models from blindly trusting external sources, and can be applied as a post-hoc addition to existing GraphRAG systems
  • Evaluating GraphRAG methods across multiple backbone LLMs is important for demonstrating robustness; practitioners should validate that retrieval augmentation strategies generalize beyond a single model family before deployment

Abstract

In recent years, large language models (LLMs) have revolutionized the field of natural language processing. However, they often suffer from knowledge gaps and hallucinations. Graph retrieval-augmented generation (GraphRAG) enhances LLM reasoning by integrating structured knowledge from external graphs. However, we identify two key challenges that plague GraphRAG:(1) Retrieving noisy and irrelevant information can degrade performance and (2)Excessive reliance on external knowledge suppresses the model's intrinsic reasoning. To address these issues, we propose GraphRAG-FI (Filtering and Integration), consisting of GraphRAG-Filtering and GraphRAG-Integration. GraphRAG-Filtering employs a two-stage filtering mechanism to refine retrieved information. GraphRAG-Integration employs a logits-based selection strategy to balance external knowledge from GraphRAG with the LLM's intrinsic reasoning,reducing over-reliance on retrievals. Experiments on knowledge graph QA tasks demonstrate that GraphRAG-FI significantly improves reasoning performance across multiple backbone models, establishing a more reliable and effective GraphRAG framework.

Generated on 2026-02-21 using Claude