Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance
Problem Statement
RAG systems suffer from suboptimal performance when knowledge is represented as flat vector embeddings, losing structural and relational context. Existing graph-based approaches like GraphRAG are costly due to repeated LLM usage, and text-based ontology extraction introduces complexity through ontology merging. There is a lack of systematic comparison between KG construction strategies and their downstream effect on RAG quality.
Key Novelty
- Systematic empirical comparison of multiple KG construction strategies for RAG: vector-based, GraphRAG, and ontology-guided KGs from both relational databases and text corpora
- Demonstration that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art GraphRAG frameworks at lower cost
- Evidence that relational database-derived ontologies are a viable and cost-efficient alternative to text-extracted ontologies, requiring only a one-time ontology learning process and avoiding ontology merging complexity
Evaluation Highlights
- Ontology-guided KGs with chunk information substantially outperform vector retrieval baselines in RAG performance benchmarks
- Relational database-derived ontology KGs perform competitively with text-derived ontology KGs while significantly reducing LLM API usage and associated costs
Breakthrough Assessment
Methodology
- Construct multiple KG variants: (1) standard vector-based RAG baseline, (2) GraphRAG using Microsoft's framework, (3) ontology-guided KG with ontology extracted from a relational database, and (4) ontology-guided KG with ontology extracted from textual corpora
- Integrate chunk-level document information into ontology-guided KGs to preserve local context alongside structured relational knowledge, enabling hybrid retrieval
- Evaluate all approaches on a common RAG benchmark, measuring retrieval quality and generation accuracy while tracking LLM usage cost across construction pipelines
System Components
Standard dense embedding retrieval over document chunks using similarity search, serving as the performance floor
Microsoft's GraphRAG framework that builds a knowledge graph from text using LLMs and performs graph-augmented retrieval
One-time process that extracts an ontology schema from an existing relational database, avoiding repeated LLM calls during KG construction
LLM-driven process that derives ontologies from textual corpora, requiring ontology merging across multiple extractions
Ontology-guided knowledge graph enriched with document chunk nodes to preserve local textual context alongside structured relational triples
Results
| Approach | Baseline (Vector RAG) | This Paper (Ontology KG + Chunks) | Delta |
|---|---|---|---|
| RAG Performance | Lower (baseline) | Substantially higher | Significant improvement |
| vs. GraphRAG | N/A | Competitive | Comparable quality |
| LLM Cost (DB Ontology vs Text Ontology) | Text-based (higher, repeated) | DB-based (one-time only) | Substantially reduced |
| Ontology Complexity | Text-based (requires merging) | DB-based (no merging needed) | Reduced complexity |
Key Takeaways
- If you have an existing relational database, use it to derive your ontology for KG-based RAG — it's cheaper, avoids merging complexity, and performs on par with text-extracted ontologies
- Augmenting ontology-guided KGs with document chunk nodes is a key design decision that bridges structured and unstructured retrieval, yielding competitive performance against GraphRAG
- Pure vector-based RAG is consistently outperformed by graph-structured knowledge representations, making KG investment worthwhile for high-stakes or knowledge-intensive applications
Abstract
Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.