ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Problem Statement
Long-horizon tasks requiring multi-step reasoning expose critical weaknesses in LLM agents: sequential prompting suffers from context drift and goal forgetting, while hierarchical approaches either break cross-level continuity or incur prohibitive computational overhead. Existing methods also struggle with recurrent failure cycles where the model cannot recover from early mistakes. These limitations make reliable autonomous LLM agent deployment on complex, dynamic tasks difficult in practice.
Key Novelty
- Plan-ahead decomposition: generates a full subtask list upfront, executes the first item, then iteratively refines remaining subtasks based on execution feedback rather than committing to a rigid plan
- Structured parent-plan re-injection: explicitly re-introduces higher-level context (parent goals and plans) when returning from recursive sub-calls, preserving multi-level goal coherence throughout execution
- Memory-efficient recursive execution: bounds the active prompt size so computational cost scales linearly with task depth rather than quadratically, making deep recursion practical
Evaluation Highlights
- 32% improvement in task success rate on synchronous Robotouille benchmark under strict pass@1 protocol compared to baseline methods
- 29% improvement on asynchronous Robotouille (more complex, concurrent task variant) under strict pass@1 protocol, demonstrating robustness to dynamic re-planning demands
Breakthrough Assessment
Methodology
- Plan-ahead decomposition phase: given a high-level goal, the LLM generates a complete ordered subtask list, then executes the first subtask while treating the remaining list as a dynamic, refinable plan
- Recursive execution with structured context re-injection: each subtask is handled recursively; upon returning from a sub-call, the parent plan and goal context are explicitly re-injected into the prompt to prevent context drift across recursion levels
- Memory-efficient prompt management: only the active execution path and bounded context window are maintained in the prompt at any time, discarding completed sub-tree details to keep prompt length and cost linear in task depth
System Components
Generates a full subtask list at each planning level before execution begins, enabling lookahead and dynamic refinement of remaining steps as earlier steps complete
Manages the call stack of planning levels, re-injecting parent plan summaries and goal information when execution returns up the hierarchy to maintain multi-level coherence
Enforces a bounded active prompt by pruning completed subtree context, ensuring token usage and API cost scale linearly with task depth rather than exponentially
After each subtask execution, updates the remaining subtask list based on observed outcomes, allowing the agent to adapt to unexpected results without restarting from scratch
Results
| Benchmark | Baseline (Best Prior) | ReCAP | Delta |
|---|---|---|---|
| Synchronous Robotouille (pass@1) | Prior hierarchical/sequential methods | +32% success rate | +32% |
| Asynchronous Robotouille (pass@1) | Prior hierarchical/sequential methods | +29% success rate | +29% |
Key Takeaways
- Re-injecting parent-level goals explicitly at each recursive return is a low-cost, high-impact technique to prevent goal drift in hierarchical LLM agents — practitioners should incorporate this into any multi-level planning pipeline
- Bounding the active prompt to linear depth scaling is essential for deploying recursive LLM agents on long-horizon tasks at reasonable cost; unbounded context accumulation is a practical bottleneck worth addressing architecturally
- Plan-ahead decomposition with dynamic refinement outperforms both rigid upfront planning and purely reactive sequential prompting, suggesting a middle-ground strategy is most robust for real-world agentic tasks with uncertainty
Abstract
Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hierarchical prompting methods often weaken cross-level continuity or incur substantial runtime overhead. We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in LLMs. ReCAP combines three key mechanisms: (i) plan-ahead decomposition, in which the model generates a full subtask list, executes the first item, and refines the remainder; (ii) structured re-injection of parent plans, maintaining consistent multi-level context during recursive return; and (iii) memory-efficient execution, bounding the active prompt so costs scale linearly with task depth. Together these mechanisms align high-level goals with low-level actions, reduce redundant prompting, and preserve coherent context updates across recursion. Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks, achieving a 32% gain on synchronous Robotouille and a 29% improvement on asynchronous Robotouille under the strict pass@1 protocol.