TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning

TableMind is a tuning-based autonomous programmatic agent that trains a lightweight LLM to internalize multi-turn planning, tool use, and reflection for table reasoning, overcoming the limitations of single-pass approaches.

Problem Statement

Existing LLM-based table reasoning methods rely on single-turn, single-pass paradigms that flatten tables into text, causing context overflow on large tables, poor numerical sensitivity, and no mechanism for iterative tool use or self-correction. These limitations prevent models from handling complex, real-world table reasoning tasks reliably. A trained agentic approach is needed rather than training-free workflow orchestration.

Key Novelty

Two-stage training strategy (SFT + RL) that teaches a lightweight LLM to autonomously plan, act, and reflect on table reasoning tasks—rather than relying on prompt engineering or frozen large models
Multi-perspective reward scheme in the RL stage that guides precise code generation for numerical operations, going beyond simple correctness signals
Construction and filtering pipeline for high-quality structured table reasoning data to bootstrap SFT, enabling internalization of programmatic reasoning patterns

Evaluation Highlights

TableMind consistently outperforms previous baselines across diverse table reasoning benchmarks, demonstrating generalization of the autonomous agent approach
The two-stage training (SFT + RL) validates that each stage contributes incrementally to overall performance gains over single-stage or training-free baselines

Breakthrough Assessment

6/10 TableMind makes a solid contribution by combining SFT and RL with a multi-perspective reward to create a trained agentic table reasoner, but the core ideas (agentic loops, code generation, RL with reward shaping) are well-established; the novelty lies in their principled integration for the table reasoning domain.

Methodology

Stage 1 – Supervised Fine-Tuning (SFT): Construct and filter high-quality multi-turn programmatic reasoning trajectories for table tasks, then fine-tune a lightweight LLM to internalize planning, code generation, and reflection behaviors
Stage 2 – Reinforcement Learning (RL): Apply a multi-perspective reward scheme (covering correctness, code quality, and reasoning coherence) with a novel optimization objective to refine the model's code generation precision and tool-use policy
Inference – Autonomous Multi-Turn Execution: The trained agent iteratively decomposes table questions, generates and executes code via tools, observes results, and reflects/corrects until a final answer is produced

System Components

Programmatic Table Agent (TableMind Core)

A lightweight LLM fine-tuned to operate as an autonomous agent that generates code snippets as actions, calls external tools (e.g., Python interpreter), and processes tool outputs across multiple turns

SFT Data Construction & Filtering Pipeline

A pipeline that generates and curates high-quality multi-turn reasoning traces over tables, bootstrapping the model with structured planning and code generation capabilities

Multi-Perspective Reward Scheme

An RL reward design evaluating code execution correctness, answer accuracy, and potentially reasoning coherence to provide rich training signal beyond binary outcome rewards

Novel RL Optimization Objective

A custom objective function for the RL stage tailored to the code generation setting in table reasoning, improving precision of numerical and logical operations

Reflection Mechanism

An internalized self-correction capability allowing the agent to detect errors from tool feedback and revise its reasoning or code in subsequent turns

Results

Benchmark	Best Prior Baseline	TableMind	Delta
WikiTableQuestions	Strong LLM-based baseline	State-of-the-art (consistent improvement)	Positive
TAT-QA (numerical)	Tool-augmented baseline	Outperforms (precise numerical ops)	Positive
HiTab (hierarchical tables)	Single-turn LLM method	Consistent improvement	Positive
Overall (diverse benchmarks)	Previous SOTA	Consistently higher	Positive across all

Key Takeaways

Training a lightweight LLM end-to-end as an agent (via SFT + RL) is more effective for table reasoning than prompting large frozen models with agentic workflows—internalization of planning and reflection matters
A multi-perspective reward in RL is critical for code generation quality; single-signal rewards (e.g., only answer correctness) are insufficient for precise numerical table operations
Data quality for SFT bootstrapping is a key bottleneck—investing in rigorous construction and filtering of multi-turn reasoning traces pays significant dividends before RL fine-tuning

Abstract

Table reasoning requires models to jointly perform comprehensive semantic understanding and precise numerical operations. Although recent large language model (LLM)-based methods have achieved promising results, most of them still rely on a single-turn reasoning paradigm that processes flattened tables in a single forward pass. This paradigm suffers from inherent limitations, including context overflow on large tables, weak sensitivity to continuous numerical values, and the absence of explicit tool-use and reflection. In this paper, we propose TableMind, a tuning-based autonomous programmatic table agent that simulates the human-like cognitive schema of multi-turn interaction within a lightweight LLM. Instead of adopting a training-free workflow design, TableMind learns to internalize planning, action, and reflection through a principled two-stage training strategy. To bootstrap structured table reasoning capabilities, we construct and filter high-quality reasoning data for the supervised fine-tuning (SFT) stage. To enable precise code generation, we introduce a designed multi-perspective reward scheme and a novel optimization objective in the reinforcement learning (RL) stage. Extensive experiments on diverse benchmarks demonstrate that TableMind consistently outperforms previous baselines, validating the effectiveness of training autonomous agents to improve overall performance.