Monte Carlo Planning with Large Language Model for Text-Based Game Agents
Problem Statement
Traditional planning-then-learning approaches like MCTS+RL are computationally expensive due to extensive iterative rollouts. These methods rely on uncertainty-driven exploration but lack the semantic understanding needed to reason about natural language action spaces. There is a gap between the exploratory power of tree search and the language comprehension capabilities needed for text-based game environments.
Key Novelty
- Introduction of MC-DML: a hybrid algorithm combining MCTS with LLM-guided action evaluation, replacing uncertainty-only heuristics with language-aware reasoning
- In-trial memory mechanism that allows the LLM to accumulate and leverage observations within a single planning episode for dynamic action re-evaluation
- Cross-trial memory mechanism that enables the agent to learn from past game episodes and transfer experiential knowledge to improve future planning iterations
Evaluation Highlights
- MC-DML significantly outperforms strong contemporary MCTS+RL baselines on the Jericho benchmark suite across multiple text-based games at the initial planning phase
- The algorithm achieves competitive or superior performance without requiring multiple costly RL training iterations, demonstrating greater sample and time efficiency
Breakthrough Assessment
Methodology
- Construct a Monte Carlo Tree Search planning framework where the LLM serves as the policy/value estimator, replacing or augmenting traditional neural network rollout policies with language-grounded action scoring
- Equip the LLM with an in-trial memory module that records observations, actions, and outcomes within the current game episode, enabling the model to dynamically refine action evaluations as new information is gathered
- Implement a cross-trial memory module that persists and retrieves relevant experiences across game episodes, allowing the agent to ground its reasoning in historical successes and failures during the MCTS selection and expansion phases
System Components
Provides the structured exploratory planning framework, managing the selection, expansion, simulation, and backpropagation phases of the search tree
Replaces traditional heuristic or learned neural network evaluators with an LLM that scores candidate actions using natural language understanding and commonsense reasoning
Stores observations, actions, and rewards within the current planning episode, allowing the LLM to dynamically update its action evaluations based on accumulating context
Maintains a persistent memory store of experiences across multiple game episodes, retrieved at planning time to inform the LLM with relevant historical knowledge
A collection of diverse text-based games used to evaluate agent performance across varied interactive fiction environments
Results
| Benchmark | MCTS+RL Baseline | MC-DML (Ours) | Delta |
|---|---|---|---|
| Jericho (avg. across games) | Strong contemporary multi-iteration baseline | Outperforms at initial planning phase | Significant improvement without multi-iteration RL |
| Planning Efficiency | Requires extensive RL iterations | Single/few planning phases sufficient | Substantially reduced compute cost |
| Language Reasoning | Uncertainty-driven only, no semantic understanding | LLM-guided semantic action evaluation | Qualitative leap in action scoring quality |
Key Takeaways
- LLMs can serve as effective plug-in policy and value estimators within classical tree search frameworks, reducing reliance on expensive RL training loops while improving semantic action reasoning
- Memory mechanisms (both within-episode and across-episodes) are critical for LLM agents to improve over time in interactive environments — stateless LLM calls are insufficient for complex sequential decision-making
- For practitioners building game or simulation agents, the MC-DML pattern offers a practical template: use MCTS for structured exploration, LLMs for language-grounded evaluation, and episodic memory for continual improvement without full retraining
Abstract
Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.