TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning
Problem Statement
Existing LLM-based table reasoning methods rely on single-turn, single-pass paradigms that flatten tables into text, causing context overflow on large tables, poor numerical sensitivity, and no mechanism for iterative tool use or self-correction. These limitations prevent models from handling complex, real-world table reasoning tasks reliably. A trained agentic approach is needed rather than training-free workflow orchestration.
Key Novelty
- Two-stage training strategy (SFT + RL) that teaches a lightweight LLM to autonomously plan, act, and reflect on table reasoning tasks—rather than relying on prompt engineering or frozen large models
- Multi-perspective reward scheme in the RL stage that guides precise code generation for numerical operations, going beyond simple correctness signals
- Construction and filtering pipeline for high-quality structured table reasoning data to bootstrap SFT, enabling internalization of programmatic reasoning patterns
Evaluation Highlights
- TableMind consistently outperforms previous baselines across diverse table reasoning benchmarks, demonstrating generalization of the autonomous agent approach
- The two-stage training (SFT + RL) validates that each stage contributes incrementally to overall performance gains over single-stage or training-free baselines
Breakthrough Assessment
Methodology
- Stage 1 – Supervised Fine-Tuning (SFT): Construct and filter high-quality multi-turn programmatic reasoning trajectories for table tasks, then fine-tune a lightweight LLM to internalize planning, code generation, and reflection behaviors
- Stage 2 – Reinforcement Learning (RL): Apply a multi-perspective reward scheme (covering correctness, code quality, and reasoning coherence) with a novel optimization objective to refine the model's code generation precision and tool-use policy
- Inference – Autonomous Multi-Turn Execution: The trained agent iteratively decomposes table questions, generates and executes code via tools, observes results, and reflects/corrects until a final answer is produced
System Components
A lightweight LLM fine-tuned to operate as an autonomous agent that generates code snippets as actions, calls external tools (e.g., Python interpreter), and processes tool outputs across multiple turns
A pipeline that generates and curates high-quality multi-turn reasoning traces over tables, bootstrapping the model with structured planning and code generation capabilities
An RL reward design evaluating code execution correctness, answer accuracy, and potentially reasoning coherence to provide rich training signal beyond binary outcome rewards
A custom objective function for the RL stage tailored to the code generation setting in table reasoning, improving precision of numerical and logical operations
An internalized self-correction capability allowing the agent to detect errors from tool feedback and revise its reasoning or code in subsequent turns
Results
| Benchmark | Best Prior Baseline | TableMind | Delta |
|---|---|---|---|
| WikiTableQuestions | Strong LLM-based baseline | State-of-the-art (consistent improvement) | Positive |
| TAT-QA (numerical) | Tool-augmented baseline | Outperforms (precise numerical ops) | Positive |
| HiTab (hierarchical tables) | Single-turn LLM method | Consistent improvement | Positive |
| Overall (diverse benchmarks) | Previous SOTA | Consistently higher | Positive across all |
Key Takeaways
- Training a lightweight LLM end-to-end as an agent (via SFT + RL) is more effective for table reasoning than prompting large frozen models with agentic workflows—internalization of planning and reflection matters
- A multi-perspective reward in RL is critical for code generation quality; single-signal rewards (e.g., only answer correctness) are insufficient for precise numerical table operations
- Data quality for SFT bootstrapping is a key bottleneck—investing in rigorous construction and filtering of multi-turn reasoning traces pays significant dividends before RL fine-tuning
Abstract
Table reasoning requires models to jointly perform comprehensive semantic understanding and precise numerical operations. Although recent large language model (LLM)-based methods have achieved promising results, most of them still rely on a single-turn reasoning paradigm that processes flattened tables in a single forward pass. This paradigm suffers from inherent limitations, including context overflow on large tables, weak sensitivity to continuous numerical values, and the absence of explicit tool-use and reflection. In this paper, we propose TableMind, a tuning-based autonomous programmatic table agent that simulates the human-like cognitive schema of multi-turn interaction within a lightweight LLM. Instead of adopting a training-free workflow design, TableMind learns to internalize planning, action, and reflection through a principled two-stage training strategy. To bootstrap structured table reasoning capabilities, we construct and filter high-quality reasoning data for the supervised fine-tuning (SFT) stage. To enable precise code generation, we introduce a designed multi-perspective reward scheme and a novel optimization objective in the reinforcement learning (RL) stage. Extensive experiments on diverse benchmarks demonstrate that TableMind consistently outperforms previous baselines, validating the effectiveness of training autonomous agents to improve overall performance.