Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

Different Knowledge Graph construction strategies significantly impact RAG system performance, and ontology-guided KGs built from relational databases offer a cost-effective alternative to text-derived ontologies while achieving competitive retrieval quality.

Problem Statement

RAG systems suffer from suboptimal performance when knowledge is represented as flat vector embeddings, losing structural and relational context. Existing graph-based approaches like GraphRAG are costly due to repeated LLM usage, and text-based ontology extraction introduces complexity through ontology merging. There is a lack of systematic comparison between KG construction strategies and their downstream effect on RAG quality.

Key Novelty

Systematic empirical comparison of multiple KG construction strategies for RAG: vector-based, GraphRAG, and ontology-guided KGs from both relational databases and text corpora
Demonstration that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art GraphRAG frameworks at lower cost
Evidence that relational database-derived ontologies are a viable and cost-efficient alternative to text-extracted ontologies, requiring only a one-time ontology learning process and avoiding ontology merging complexity

Evaluation Highlights

Ontology-guided KGs with chunk information substantially outperform vector retrieval baselines in RAG performance benchmarks
Relational database-derived ontology KGs perform competitively with text-derived ontology KGs while significantly reducing LLM API usage and associated costs

Signal Assessment

5/10 The paper provides a solid and practically valuable comparative study with a useful finding about cost-efficient KG construction from relational databases, but it is primarily an empirical comparison rather than a fundamentally new technique or architecture.

Methodology

Construct multiple KG variants: (1) standard vector-based RAG baseline, (2) GraphRAG using Microsoft's framework, (3) ontology-guided KG with ontology extracted from a relational database, and (4) ontology-guided KG with ontology extracted from textual corpora
Integrate chunk-level document information into ontology-guided KGs to preserve local context alongside structured relational knowledge, enabling hybrid retrieval
Evaluate all approaches on a common RAG benchmark, measuring retrieval quality and generation accuracy while tracking LLM usage cost across construction pipelines

System Components

Vector-based RAG Baseline

Standard dense embedding retrieval over document chunks using similarity search, serving as the performance floor

GraphRAG

Microsoft's GraphRAG framework that builds a knowledge graph from text using LLMs and performs graph-augmented retrieval

Relational DB Ontology Learner

One-time process that extracts an ontology schema from an existing relational database, avoiding repeated LLM calls during KG construction

Text-based Ontology Extractor

LLM-driven process that derives ontologies from textual corpora, requiring ontology merging across multiple extractions

Chunk-augmented KG

Ontology-guided knowledge graph enriched with document chunk nodes to preserve local textual context alongside structured relational triples

Results

Approach	Baseline (Vector RAG)	This Paper (Ontology KG + Chunks)	Delta
RAG Performance	Lower (baseline)	Substantially higher	Significant improvement
vs. GraphRAG	N/A	Competitive	Comparable quality
LLM Cost (DB Ontology vs Text Ontology)	Text-based (higher, repeated)	DB-based (one-time only)	Substantially reduced
Ontology Complexity	Text-based (requires merging)	DB-based (no merging needed)	Reduced complexity

Key Takeaways

If you have an existing relational database, use it to derive your ontology for KG-based RAG — it's cheaper, avoids merging complexity, and performs on par with text-extracted ontologies
Augmenting ontology-guided KGs with document chunk nodes is a key design decision that bridges structured and unstructured retrieval, yielding competitive performance against GraphRAG
Pure vector-based RAG is consistently outperformed by graph-structured knowledge representations, making KG investment worthwhile for high-stakes or knowledge-intensive applications

Abstract

Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.