ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

ReCAP is a hierarchical, recursive prompting framework for LLM agents that maintains shared context across planning levels to enable coherent multi-step reasoning on long-horizon tasks. It combines plan-ahead decomposition, structured parent-plan re-injection, and memory-efficient execution to align high-level goals with low-level actions.

Problem Statement

Long-horizon tasks requiring multi-step reasoning expose critical weaknesses in LLM agents: sequential prompting suffers from context drift and goal forgetting, while hierarchical approaches either break cross-level continuity or incur prohibitive computational overhead. Existing methods also struggle with recurrent failure cycles where the model cannot recover from early mistakes. These limitations make reliable autonomous LLM agent deployment on complex, dynamic tasks difficult in practice.

Key Novelty

Plan-ahead decomposition: generates a full subtask list upfront, executes the first item, then iteratively refines remaining subtasks based on execution feedback rather than committing to a rigid plan
Structured parent-plan re-injection: explicitly re-introduces higher-level context (parent goals and plans) when returning from recursive sub-calls, preserving multi-level goal coherence throughout execution
Memory-efficient recursive execution: bounds the active prompt size so computational cost scales linearly with task depth rather than quadratically, making deep recursion practical

Evaluation Highlights

32% improvement in task success rate on synchronous Robotouille benchmark under strict pass@1 protocol compared to baseline methods
29% improvement on asynchronous Robotouille (more complex, concurrent task variant) under strict pass@1 protocol, demonstrating robustness to dynamic re-planning demands

Breakthrough Assessment

6/10 ReCAP offers a well-motivated and practically effective synthesis of hierarchical planning with memory efficiency for LLM agents, achieving substantial empirical gains on challenging benchmarks. However, the core ideas (hierarchical decomposition, context re-injection, memory bounding) are evolutionary rather than paradigm-shifting, and evaluation is limited to a single benchmark domain (Robotouille).

Methodology

Plan-ahead decomposition phase: given a high-level goal, the LLM generates a complete ordered subtask list, then executes the first subtask while treating the remaining list as a dynamic, refinable plan
Recursive execution with structured context re-injection: each subtask is handled recursively; upon returning from a sub-call, the parent plan and goal context are explicitly re-injected into the prompt to prevent context drift across recursion levels
Memory-efficient prompt management: only the active execution path and bounded context window are maintained in the prompt at any time, discarding completed sub-tree details to keep prompt length and cost linear in task depth

System Components

Plan-Ahead Decomposer

Generates a full subtask list at each planning level before execution begins, enabling lookahead and dynamic refinement of remaining steps as earlier steps complete

Recursive Context Manager

Manages the call stack of planning levels, re-injecting parent plan summaries and goal information when execution returns up the hierarchy to maintain multi-level coherence

Memory-Efficient Prompt Binder

Enforces a bounded active prompt by pruning completed subtree context, ensuring token usage and API cost scale linearly with task depth rather than exponentially

Dynamic Plan Refiner

After each subtask execution, updates the remaining subtask list based on observed outcomes, allowing the agent to adapt to unexpected results without restarting from scratch

Results

Benchmark	Baseline (Best Prior)	ReCAP	Delta
Synchronous Robotouille (pass@1)	Prior hierarchical/sequential methods	+32% success rate	+32%
Asynchronous Robotouille (pass@1)	Prior hierarchical/sequential methods	+29% success rate	+29%

Key Takeaways

Re-injecting parent-level goals explicitly at each recursive return is a low-cost, high-impact technique to prevent goal drift in hierarchical LLM agents — practitioners should incorporate this into any multi-level planning pipeline
Bounding the active prompt to linear depth scaling is essential for deploying recursive LLM agents on long-horizon tasks at reasonable cost; unbounded context accumulation is a practical bottleneck worth addressing architecturally
Plan-ahead decomposition with dynamic refinement outperforms both rigid upfront planning and purely reactive sequential prompting, suggesting a middle-ground strategy is most robust for real-world agentic tasks with uncertainty

Abstract

Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hierarchical prompting methods often weaken cross-level continuity or incur substantial runtime overhead. We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in LLMs. ReCAP combines three key mechanisms: (i) plan-ahead decomposition, in which the model generates a full subtask list, executes the first item, and refines the remainder; (ii) structured re-injection of parent plans, maintaining consistent multi-level context during recursive return; and (iii) memory-efficient execution, bounding the active prompt so costs scale linearly with task depth. Together these mechanisms align high-level goals with low-level actions, reduce redundant prompting, and preserve coherent context updates across recursion. Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks, achieving a 32% gain on synchronous Robotouille and a 29% improvement on asynchronous Robotouille under the strict pass@1 protocol.