ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning

ReAcTree decomposes complex long-horizon tasks into a dynamically constructed hierarchical agent tree where LLM nodes handle subgoals and control flow nodes coordinate execution strategies, overcoming the limitations of monolithic trajectory-based planning.

Problem Statement

Existing LLM agent methods like ReAct rely on a single unified trajectory that entangles all past decisions and observations, making them brittle for long-horizon, multi-step tasks. This monolithic approach suffers from context bloat, compounding errors, and inability to modularly reuse or coordinate sub-strategies. There is a need for structured decomposition that mirrors how humans break complex goals into manageable subgoals.

Key Novelty

Hierarchical agent tree architecture where each node is an LLM agent capable of reasoning, acting, and dynamically expanding child nodes for subgoal decomposition
Control flow nodes (analogous to programming constructs like conditionals/loops) that coordinate execution strategies among agent nodes within the tree
Dual complementary memory systems: episodic memory for retrieving goal-specific subgoal-level examples per agent node, and working memory for sharing environment-specific observations across the tree

Evaluation Highlights

On WAH-NL benchmark, ReAcTree achieves 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31% baseline
ReAcTree consistently outperforms ReAct and other strong task-planning baselines across diverse LLMs on both WAH-NL and ALFRED benchmarks

Signal Assessment

6/10 ReAcTree presents a well-engineered and practically impactful contribution to embodied agent planning with strong empirical gains, but it primarily combines known ideas (hierarchical decomposition, control flow, memory) rather than introducing a fundamentally new theoretical paradigm.

Methodology

Given a complex goal, a root LLM agent node reasons about the goal and decomposes it into subgoals, dynamically spawning child agent nodes in a tree structure
Control flow nodes (e.g., sequential, conditional, loop) are inserted between agent nodes to coordinate how subgoals are executed, enabling structured branching and iteration
Each agent node executes its subgoal using ReAct-style reasoning while retrieving relevant examples from episodic memory and accessing shared environment observations via working memory

System Components

Agent Node

An LLM-based node responsible for reasoning about a subgoal, taking actions in the environment, and optionally expanding the tree by spawning child nodes

Control Flow Node

Non-LLM structural nodes (e.g., sequence, condition, loop) that govern how child agent nodes are scheduled and coordinated during execution

Episodic Memory

A retrieval system that provides each agent node with goal-specific, subgoal-level demonstration examples to guide its reasoning and action selection

Working Memory

A shared memory buffer that propagates environment-specific observations across agent nodes in the tree, reducing redundant perception and improving coordination

Dynamic Tree Construction

The mechanism by which the agent tree is built incrementally at runtime, with nodes added as subgoals are identified during task execution

Results

Metric/Benchmark	Baseline (ReAct)	ReAcTree	Delta
Goal Success Rate (WAH-NL, Qwen 2.5 72B)	31%	61%	+30% (~2x improvement)
Task Planning Performance (ALFRED)	Lower baseline	Consistently higher	Positive across all tested LLMs
Robustness across LLMs (WAH-NL + ALFRED)	Varies by model	Consistent outperformance	Stable gains across diverse LLMs

Key Takeaways

Hierarchical decomposition with explicit control flow dramatically improves LLM agent performance on long-horizon tasks — practitioners should consider tree-structured planning over flat ReAct-style loops for complex embodied tasks
Separating episodic memory (subgoal-level examples) from working memory (shared observations) is a practical design pattern that reduces context overload and improves per-node reasoning quality
ReAcTree's gains hold across multiple LLM backbones (including Qwen 2.5 72B), suggesting the architectural approach is model-agnostic and can be adopted without requiring frontier-scale models

Abstract

Recent advancements in large language models (LLMs) have enabled significant progress in decision-making and task planning for embodied autonomous agents. However, most existing methods struggle with complex, long-horizon tasks because they rely on a monolithic trajectory that entangles all past decisions and observations to solve the entire task in a single unified process. To address this limitation, we propose ReAcTree, a hierarchical task-planning method that decomposes a complex goal into manageable subgoals within a dynamically constructed agent tree. Each subgoal is handled by an LLM agent node capable of reasoning, acting, and further expanding the tree, while control flow nodes coordinate the execution strategies of agent nodes. In addition, we integrate two complementary memory systems: each agent node retrieves goal-specific, subgoal-level examples from episodic memory and shares environment-specific observations through working memory. Experiments on the WAH-NL and ALFRED show ReAcTree consistently outperforms strong task-planning baselines such as ReAct across diverse LLMs. Notably, on WAH-NL, ReAcTree achieves a 61% goal success rate with Qwen 2.5 72B, nearly doubling ReAct's 31%. The code is available at https://github.com/Choi-JaeWoo/ReAcTree.git.