ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning
Problem Statement
Existing LLM agent methods like ReAct rely on a single unified trajectory that entangles all past decisions and observations, making them brittle for long-horizon, multi-step tasks. This monolithic approach suffers from context bloat, compounding errors, and inability to modularly reuse or coordinate sub-strategies. There is a need for structured decomposition that mirrors how humans break complex goals into manageable subgoals.
Key Novelty
- Hierarchical agent tree architecture where each node is an LLM agent capable of reasoning, acting, and dynamically expanding child nodes for subgoal decomposition
- Control flow nodes (analogous to programming constructs like conditionals/loops) that coordinate execution strategies among agent nodes within the tree
- Dual complementary memory systems: episodic memory for retrieving goal-specific subgoal-level examples per agent node, and working memory for sharing environment-specific observations across the tree
Evaluation Highlights
- On WAH-NL benchmark, ReAcTree achieves 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31% baseline
- ReAcTree consistently outperforms ReAct and other strong task-planning baselines across diverse LLMs on both WAH-NL and ALFRED benchmarks
Breakthrough Assessment
Methodology
- Given a complex goal, a root LLM agent node reasons about the goal and decomposes it into subgoals, dynamically spawning child agent nodes in a tree structure
- Control flow nodes (e.g., sequential, conditional, loop) are inserted between agent nodes to coordinate how subgoals are executed, enabling structured branching and iteration
- Each agent node executes its subgoal using ReAct-style reasoning while retrieving relevant examples from episodic memory and accessing shared environment observations via working memory
System Components
An LLM-based node responsible for reasoning about a subgoal, taking actions in the environment, and optionally expanding the tree by spawning child nodes
Non-LLM structural nodes (e.g., sequence, condition, loop) that govern how child agent nodes are scheduled and coordinated during execution
A retrieval system that provides each agent node with goal-specific, subgoal-level demonstration examples to guide its reasoning and action selection
A shared memory buffer that propagates environment-specific observations across agent nodes in the tree, reducing redundant perception and improving coordination
The mechanism by which the agent tree is built incrementally at runtime, with nodes added as subgoals are identified during task execution
Results
| Metric/Benchmark | Baseline (ReAct) | ReAcTree | Delta |
|---|---|---|---|
| Goal Success Rate (WAH-NL, Qwen 2.5 72B) | 31% | 61% | +30% (~2x improvement) |
| Task Planning Performance (ALFRED) | Lower baseline | Consistently higher | Positive across all tested LLMs |
| Robustness across LLMs (WAH-NL + ALFRED) | Varies by model | Consistent outperformance | Stable gains across diverse LLMs |
Key Takeaways
- Hierarchical decomposition with explicit control flow dramatically improves LLM agent performance on long-horizon tasks — practitioners should consider tree-structured planning over flat ReAct-style loops for complex embodied tasks
- Separating episodic memory (subgoal-level examples) from working memory (shared observations) is a practical design pattern that reduces context overload and improves per-node reasoning quality
- ReAcTree's gains hold across multiple LLM backbones (including Qwen 2.5 72B), suggesting the architectural approach is model-agnostic and can be adopted without requiring frontier-scale models
Abstract
Recent advancements in large language models (LLMs) have enabled significant progress in decision-making and task planning for embodied autonomous agents. However, most existing methods struggle with complex, long-horizon tasks because they rely on a monolithic trajectory that entangles all past decisions and observations to solve the entire task in a single unified process. To address this limitation, we propose ReAcTree, a hierarchical task-planning method that decomposes a complex goal into manageable subgoals within a dynamically constructed agent tree. Each subgoal is handled by an LLM agent node capable of reasoning, acting, and further expanding the tree, while control flow nodes coordinate the execution strategies of agent nodes. In addition, we integrate two complementary memory systems: each agent node retrieves goal-specific, subgoal-level examples from episodic memory and shares environment-specific observations through working memory. Experiments on the WAH-NL and ALFRED show ReAcTree consistently outperforms strong task-planning baselines such as ReAct across diverse LLMs. Notably, on WAH-NL, ReAcTree achieves a 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31%. The code is available at https://github.com/Choi-JaeWoo/ReAcTree.git.