Retrieval Augmented Generation with Collaborative Filtering for Personalized Text Generation
Problem Statement
Existing personalized RAG methods only retrieve from a single user's history, ignoring the rich signal available from similar users' histories. This limits personalization quality, especially for users with sparse interaction histories. The work addresses the lack of collaborative inter-user information in RAG-based LLM personalization pipelines.
Key Novelty
- First framework to integrate collaborative filtering into personalized RAG, enabling cross-user history retrieval for LLM generation
- Contrastive learning-based user embedding training to identify similar users without requiring explicit user similarity labels
- LLM-feedback-driven fine-tuning of a personalized retriever and reranker that jointly considers user preferences and generation quality
Evaluation Highlights
- Validated on the LaMP (Language Model Personalization) benchmark, showing effectiveness of CFRAG over existing personalized RAG baselines
- Ablation analysis confirms that collaborative information (cross-user retrieval) provides measurable gains beyond single-user RAG baselines
Breakthrough Assessment
Methodology
- Train user embeddings via contrastive learning using interaction histories as supervision signals, enabling retrieval of similar users without explicit similarity labels
- Retrieve top-k documents from the combined histories of the current user and identified similar users using a personalized retriever that encodes user preference context
- Rerank retrieved documents with a personalized reranker, then fine-tune both retriever and reranker using feedback from the LLM's generation quality as a reward signal
System Components
Encodes users into a shared embedding space using contrastive learning on interaction histories to enable similarity-based user retrieval without labeled pairs
Identifies top-N similar users based on embedding similarity, expanding the candidate document pool beyond the target user's own history
Retrieves top-k documents from the expanded user pool, conditioned on the current user's preference representation and query context
Reorders retrieved documents by relevance to both the query and the user's personal preferences to surface the most generation-supportive documents
Uses the downstream LLM generation quality as a feedback signal to iteratively fine-tune the retriever and reranker, aligning retrieval with generation needs
Results
| Benchmark | Best Baseline (Personalized RAG) | CFRAG | Delta |
|---|---|---|---|
| LaMP (overall) | Competitive personalized RAG baseline | Best reported on LaMP tasks | Positive improvement across tasks |
| Collaborative info ablation | Single-user RAG (no collaborative info) | CFRAG with collaborative filtering | Meaningful gain confirming collaborative signal utility |
Key Takeaways
- When building personalized LLM systems, incorporating histories from similar users (collaborative filtering) can meaningfully improve generation quality beyond single-user RAG, especially for sparse-history users
- Contrastive learning is a practical way to learn user similarity without needing explicit labels, making CFRAG deployable in real-world settings where user similarity ground truth is unavailable
- Closing the loop between retrieval and generation via LLM feedback fine-tuning is an effective strategy to align retrieved context with what actually helps downstream generation, and is worth adopting in RAG system design
Abstract
Recently, the personalization of Large Language Models (LLMs) to generate content that aligns with individual user preferences has garnered widespread attention. Personalized Retrieval-Augmented Generation (RAG), which retrieves relevant documents from the user's history to reflect their preferences and enhance LLM generation, is one commonly used approach for personalization. However, existing personalized RAG methods do not consider that the histories of similar users can also assist in personalized generation for the current user, meaning that collaborative information between users can also benefit personalized generation. Inspired by the application of collaborative filtering in recommender systems, we propose a method called CFRAG, which adapts Collaborative Filtering to RAG for personalized text generation. However, this presents two challenges: (1) how to incorporate collaborative information without explicit user similarity labels? (2) how to retrieve documents that support personalized LLM generation? For Challenge 1, we use contrastive learning to train user embeddings to retrieve similar users and introduce collaborative information. For Challenge 2, we design a personalized retriever and reranker to retrieve the top-k documents from these users' histories. We take into account the user's preference during retrieval and reranking. Then we leverage feedback from the LLM to fine-tune the personalized retriever and reranker, enabling them to retrieve documents that meet the personalized generation needs of the LLM. Experimental results on the Language Model Personalization (LaMP) benchmark validate the effectiveness of CFRAG. Further analysis confirms the importance of incorporating collaborative information.