iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning

iCLP introduces a framework that enables LLMs to perform 'implicit cognition' by generating compact latent plans (discrete vector-quantized encodings) that guide reasoning without requiring explicit, error-prone textual planning steps.

Problem Statement

Explicit textual planning in LLMs suffers from hallucinations and poor generalization across diverse task-specific questions, making step-by-step reasoning unreliable. Current chain-of-thought approaches require verbose intermediate plans that are difficult to generate accurately and add computational overhead. There is a gap between how humans leverage subconscious, generalized reasoning patterns and how LLMs perform deliberate, explicit planning.

Key Novelty

Introduction of latent plans (LPs) as discrete, compact encodings of reasoning instructions learned via a vector-quantized autoencoder with a codebook, enabling planning in latent space while reasoning in language space
A three-stage pipeline: distilling explicit plans from reasoning trajectories, learning discrete representations via VQ-autoencoder, and fine-tuning LLMs on paired latent plan–reasoning step data
Strong cross-domain generalization of latent plans across different task types (math reasoning and code generation) while preserving chain-of-thought interpretability

Evaluation Highlights

Significant accuracy improvements over baselines on mathematical reasoning benchmarks, demonstrating that latent planning outperforms both no-planning and explicit textual planning approaches
Improved efficiency (reduced token overhead) compared to explicit textual plan generation, with strong cross-domain transfer between mathematical reasoning and code generation tasks

Signal Assessment

7/10 iCLP presents a technically novel and well-motivated approach to bridging latent representation learning with LLM reasoning, addressing real limitations of explicit planning. While the vector-quantized autoencoder concept is borrowed from existing work, its integration into an LLM reasoning pipeline with demonstrated cross-domain generalization represents a significant and practical advance.

Methodology

Step 1 - Plan Distillation: Extract and distill explicit textual plans from existing step-by-step reasoning trajectories (e.g., chain-of-thought solutions) to create a supervised plan corpus
Step 2 - Latent Representation Learning: Train a vector-quantized (VQ) autoencoder to encode explicit plans into discrete latent codes stored in a codebook, learning compact generalizable plan representations
Step 3 - LLM Fine-tuning with Latent Plans: Fine-tune the LLM on paired data (latent plan tokens + corresponding reasoning steps), teaching the model to generate appropriate latent plan codes before producing reasoning chains

System Components

Implicit Cognition (IC) Inspiration

Conceptual framework borrowed from cognitive science: subconscious, generalizable decision-guiding patterns that don't require explicit verbalization, used to motivate compact latent planning

Plan Distillation Module

Extracts structured, explicit plans from existing step-by-step reasoning trajectories to create training supervision for the latent space

Vector-Quantized (VQ) Autoencoder

Encodes explicit textual plans into discrete latent codes via a learned codebook, enabling compact and generalizable plan representations

Codebook

A finite set of learned discrete latent vectors that serve as the vocabulary of latent plans, allowing the LLM to select and compose reasoning strategies

Latent Plan Fine-tuned LLM

The LLM fine-tuned to emit latent plan tokens before generating reasoning steps, enabling implicit planning that guides chain-of-thought without explicit text plans

Results

Metric/Benchmark	Baseline (Explicit Text Plan / CoT)	iCLP (Latent Plan)	Delta
Mathematical Reasoning Accuracy	Standard CoT baseline	Significant improvement	Positive, substantial
Code Generation Accuracy	Standard CoT baseline	Significant improvement	Positive, substantial
Cross-domain Generalization	Poor transfer	Strong transfer	Large qualitative gain
Planning Token Efficiency	High (verbose text plans)	Low (compact latent codes)	Reduced overhead

Key Takeaways

Replacing verbose textual plans with compact discrete latent codes (via VQ-autoencoders) is a viable and effective way to inject planning capability into LLMs while reducing hallucination risk in plan generation
The three-stage pipeline (distill → encode → fine-tune) is modular and could be applied to other reasoning domains beyond math and code, making it a reusable recipe for practitioners looking to improve LLM reasoning
Latent planning preserves chain-of-thought interpretability in the output while abstracting the planning process, offering a practical middle ground between opaque latent reasoning and fragile explicit textual planning

Abstract

Large language models (LLMs), when guided by explicit textual plans, can perform reliable step-by-step reasoning during problem-solving. However, generating accurate and effective textual plans remains challenging due to LLM hallucinations and the high diversity of task-specific questions. To address this, we draw inspiration from human Implicit Cognition (IC), the subconscious process by which decisions are guided by compact, generalized patterns learned from past experiences without requiring explicit verbalization. We propose iCLP, a novel framework that enables LLMs to adaptively generate latent plans (LPs), which are compact encodings of effective reasoning instructions. iCLP first distills explicit plans from existing step-by-step reasoning trajectories. It then learns discrete representations of these plans via a vector-quantized autoencoder coupled with a codebook. Finally, by fine-tuning LLMs on paired latent plans and corresponding reasoning steps, the models learn to perform implicit planning during reasoning. Experimental results on mathematical reasoning and code generation tasks demonstrate that, with iCLP, LLMs can plan in latent space while reasoning in language space. This approach yields significant improvements in both accuracy and efficiency and, crucially, demonstrates strong cross-domain generalization while preserving the interpretability of chain-of-thought reasoning.