← Back to Papers

Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

T. Cruz, Bernardo Tavares, Francisco Belo
arXiv.org | 2025
Different Knowledge Graph construction strategies significantly impact RAG system performance, and ontology-guided KGs built from relational databases offer a cost-effective alternative to text-derived ontologies while achieving competitive retrieval quality.

Problem Statement

RAG systems suffer from suboptimal performance when knowledge is represented as flat vector embeddings, losing structural and relational context. Existing graph-based approaches like GraphRAG are costly due to repeated LLM usage, and text-based ontology extraction introduces complexity through ontology merging. There is a lack of systematic comparison between KG construction strategies and their downstream effect on RAG quality.

Key Novelty

  • Systematic empirical comparison of multiple KG construction strategies for RAG: vector-based, GraphRAG, and ontology-guided KGs from both relational databases and text corpora
  • Demonstration that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art GraphRAG frameworks at lower cost
  • Evidence that relational database-derived ontologies are a viable and cost-efficient alternative to text-extracted ontologies, requiring only a one-time ontology learning process and avoiding ontology merging complexity

Evaluation Highlights

  • Ontology-guided KGs with chunk information substantially outperform vector retrieval baselines in RAG performance benchmarks
  • Relational database-derived ontology KGs perform competitively with text-derived ontology KGs while significantly reducing LLM API usage and associated costs

Breakthrough Assessment

5/10 The paper provides a solid and practically valuable comparative study with a useful finding about cost-efficient KG construction from relational databases, but it is primarily an empirical comparison rather than a fundamentally new technique or architecture.

Methodology

  1. Construct multiple KG variants: (1) standard vector-based RAG baseline, (2) GraphRAG using Microsoft's framework, (3) ontology-guided KG with ontology extracted from a relational database, and (4) ontology-guided KG with ontology extracted from textual corpora
  2. Integrate chunk-level document information into ontology-guided KGs to preserve local context alongside structured relational knowledge, enabling hybrid retrieval
  3. Evaluate all approaches on a common RAG benchmark, measuring retrieval quality and generation accuracy while tracking LLM usage cost across construction pipelines

System Components

Vector-based RAG Baseline

Standard dense embedding retrieval over document chunks using similarity search, serving as the performance floor

GraphRAG

Microsoft's GraphRAG framework that builds a knowledge graph from text using LLMs and performs graph-augmented retrieval

Relational DB Ontology Learner

One-time process that extracts an ontology schema from an existing relational database, avoiding repeated LLM calls during KG construction

Text-based Ontology Extractor

LLM-driven process that derives ontologies from textual corpora, requiring ontology merging across multiple extractions

Chunk-augmented KG

Ontology-guided knowledge graph enriched with document chunk nodes to preserve local textual context alongside structured relational triples

Results

Approach Baseline (Vector RAG) This Paper (Ontology KG + Chunks) Delta
RAG Performance Lower (baseline) Substantially higher Significant improvement
vs. GraphRAG N/A Competitive Comparable quality
LLM Cost (DB Ontology vs Text Ontology) Text-based (higher, repeated) DB-based (one-time only) Substantially reduced
Ontology Complexity Text-based (requires merging) DB-based (no merging needed) Reduced complexity

Key Takeaways

  • If you have an existing relational database, use it to derive your ontology for KG-based RAG — it's cheaper, avoids merging complexity, and performs on par with text-extracted ontologies
  • Augmenting ontology-guided KGs with document chunk nodes is a key design decision that bridges structured and unstructured retrieval, yielding competitive performance against GraphRAG
  • Pure vector-based RAG is consistently outperformed by graph-structured knowledge representations, making KG investment worthwhile for high-stakes or knowledge-intensive applications

Abstract

Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.

Generated on 2026-03-02 using Claude