Retrieval-Augmented Generation (RAG) Chatbots for Education: A Survey of Applications

This survey systematically reviews the adoption of Retrieval-Augmented Generation (RAG) chatbots in education over five years since RAG's introduction, analyzing 47 papers across application types, knowledge domains, LLM choices, and evaluation methods. RAG is positioned as the key architectural solution to hallucination problems that previously hindered LLM chatbot adoption in educational settings.

Problem Statement

LLM-based chatbots in education face a critical trust and reliability barrier due to hallucinations, which can propagate misinformation to learners. While RAG architectures offer a practical mitigation strategy by grounding responses in retrieved documents, there has been no comprehensive survey mapping how the educational community has actually adopted and applied RAG chatbots. Five years after RAG's introduction, a structured landscape analysis is needed to identify gaps, trends, and best practices across diverse educational use cases.

Key Novelty

First dedicated systematic survey of RAG chatbot applications specifically in the educational domain, covering 47 papers across diverse use cases
Multi-dimensional taxonomy analyzing chatbots by support target (students, teachers, institutions), thematic knowledge scope, underlying LLM, and evaluation methodology
Consolidated landscape view enabling identification of underexplored educational niches and gaps in RAG chatbot deployment and evaluation practices

Evaluation Highlights

47 papers identified and analyzed across multiple dimensions including application character, support target, knowledge scope, LLM choice, and evaluation approach
Qualitative synthesis of evaluation methods used across studies, revealing variability in rigor and standardization of RAG chatbot assessment in education

Breakthrough Assessment

3/10 This is a survey/literature review paper offering organizational and taxonomic value rather than a technical or algorithmic advance; it provides useful consolidation for practitioners but does not introduce new methods, models, or empirical breakthroughs in RAG or educational AI.

Methodology

Systematic literature search identifying papers focused on RAG-based chatbots deployed or evaluated in educational contexts, yielding a corpus of 47 relevant papers
Multi-dimensional analysis of each paper across axes: application character (research prototype vs. deployed system), support target (students/teachers/administrators), thematic knowledge scope, and underlying LLM used
Synthesis and classification of evaluation approaches used across studies to assess chatbot quality, reliability, and educational effectiveness

System Components

RAG Architecture

Combines a retrieval module (fetching relevant documents from a knowledge base) with an LLM generator, grounding outputs in source material to reduce hallucinations

Taxonomy of Support Targets

Classification framework distinguishing whether chatbots are designed to support students, educators, or institutional administrative functions

Knowledge Scope Classification

Categorization of chatbots by the thematic domain of their grounded knowledge base (e.g., course-specific, institutional policy, general curriculum)

Evaluation Method Analysis

Assessment of how each surveyed paper measures chatbot performance, including automated metrics, human evaluation, and task-based assessments

Results

Dimension	Prior State	This Survey's Finding	Insight
Coverage of RAG-edu papers	No prior survey	47 papers identified	Growing but still nascent field
Support target diversity	Unknown distribution	Students most common target	Teacher/admin support underexplored
Evaluation rigor	Assumed inconsistent	Highly variable across studies	Lack of standardized benchmarks
LLM diversity	Unknown	Multiple LLMs used (GPT variants dominant)	Limited exploration of open-source alternatives

Key Takeaways

RAG is the de facto architectural choice for deploying trustworthy LLM chatbots in education, and practitioners should default to RAG-based designs over vanilla LLM chatbots to mitigate hallucination risks in high-stakes learning environments
The educational RAG chatbot space lacks standardized evaluation benchmarks — ML practitioners building such systems should invest in rigorous, reproducible evaluation frameworks (e.g., RAGAs-style metrics) rather than ad-hoc assessments
There are significant underexplored niches in RAG-for-education (e.g., teacher support tools, administrative assistants, multilingual or low-resource educational settings), representing open research and deployment opportunities for practitioners

Abstract

Retrieval-Augmented Generation (RAG) overcomes the main barrier for the adoption of LLM-based chatbots in education: hallucinations. The uncomplicated architecture of RAG chatbots makes it relatively easy to implement chatbots that serve specific purposes and thus are capable of addressing various needs in the educational domain. With five years having passed since the introduction of RAG, the time has come to check the progress attained in its adoption in education. This paper identifies 47 papers dedicated to RAG chatbots’ uses for various kinds of educational purposes, which are analyzed in terms of their character, the target of the support provided by the chatbots, the thematic scope of the knowledge accessible via the chatbots, the underlying large language model, and the character of their evaluation.