← Back to Papers

Synthesizing question answering data from financial documents: An End-to-End Multi-Agent Approach

Chetan Harsha, Karmvir Singh Phogat, Sridhar Dasaratha, Shashishekar Ramakrishna
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track) | 2026
An end-to-end multi-agent pipeline automatically synthesizes high-quality question-answer pairs from unstructured financial documents, enabling cost-effective fine-tuning of small language models (SLMs) for numerical reasoning tasks without manual annotation.

Problem Statement

Deploying LLMs for financial numerical reasoning is expensive and slow at enterprise scale, while SLMs require high-quality domain-specific fine-tuning data that traditionally demands costly manual expert annotation. This bottleneck limits the practical adoption of smaller, more efficient models in financial NLP applications. The scattered and heterogeneous nature of financial documents further complicates automated data extraction and QA generation.

Key Novelty

  • Modular, scalable agentic pipeline that autonomously extracts, selects, and structures relevant content from unstructured financial documents end-to-end
  • Automated synthetic QA data generation tailored for numerical reasoning over financial documents, replacing the manual annotation bottleneck
  • Demonstrated that SLMs fine-tuned on pipeline-generated synthetic data achieve competitive in-distribution performance and superior out-of-distribution generalization compared to models trained on manually curated data

Evaluation Highlights

  • One SLM trained on synthetic data achieved competitive in-distribution performance relative to models trained on prior manually generated datasets
  • All tested SLMs fine-tuned on synthetic data demonstrated superior generalization (out-of-distribution performance) compared to counterparts trained on manual data

Breakthrough Assessment

5/10 The work makes a solid, practically valuable contribution by automating financial QA data synthesis with a multi-agent framework, but the core ideas (agentic pipelines, synthetic data generation for fine-tuning) are established paradigms applied to a specific domain rather than a fundamental methodological advance.

Methodology

  1. Step 1 – Content Extraction & Selection: Agents parse unstructured financial documents, identify numerically relevant sections (tables, narratives), and select content pertinent to complex reasoning queries
  2. Step 2 – QA Pair Generation: A generation agent synthesizes diverse, high-quality question-answer pairs requiring numerical reasoning from the selected content, mimicking expert annotation patterns
  3. Step 3 – SLM Fine-tuning & Evaluation: Small language models are fine-tuned on the synthetic QA dataset and evaluated on both in-distribution and out-of-distribution benchmarks against models trained on manually annotated data

System Components

Document Parsing Agent

Extracts and structures raw content (text, tables, figures) from unstructured financial documents such as annual reports and earnings filings

Content Selection Agent

Identifies and filters the most relevant numerical and contextual information needed for complex financial reasoning questions

QA Generation Agent

Generates diverse question-answer pairs requiring multi-step numerical reasoning from the selected financial content

Quality Control Module

Ensures generated QA pairs meet domain-specific quality standards before inclusion in the fine-tuning dataset

SLM Fine-tuning Pipeline

Orchestrates supervised fine-tuning of small language models using the synthetic QA dataset to enable cost-effective deployment

Results

Metric/Benchmark Baseline (Manual Data) This Paper (Synthetic Data) Delta
In-distribution performance (best SLM) Competitive reference Competitive (matches baseline) ~Neutral
Out-of-distribution generalization (all SLMs) Lower generalization Superior generalization Positive improvement
Manual annotation effort High (expert required) Eliminated (automated) Significant reduction

Key Takeaways

  • Multi-agent pipelines can effectively replace costly manual annotation for domain-specific QA datasets, making SLM fine-tuning more accessible for enterprise financial NLP
  • Synthetic data generated by agentic pipelines can yield better generalization than manually curated data, suggesting that diversity and scale from automation may outweigh the precision of human annotation
  • Practitioners should consider modular, agent-based data synthesis as a first-pass strategy when entering new financial sub-domains where labeled data is scarce or expensive to acquire

Abstract

Answering complex questions that require numerical reasoning over financial documents is challenging due to the diverse and scattered nature of relevant information. While large language models (LLMs) excel at financial reasoning, their enterprise deployment is often limited by cost and latency. Small language models (SLMs) present a cost-effective alternative but need to be fine-tuned with high-quality, domain-specific question-answer (QA) data. Acquiring such data requires manual expert annotation, presenting a bottleneck to the wider application of SLMs. This work introduces a modular, scalable end-to-end agentic pipeline that extracts and selects relevant content from unstructured financial documents and then generates QA pairs from the selected content for SLM fine-tuning. Compared to the same models trained on previous manually generated data for the task, one of the models trained on our pipeline-produced synthetic data achieved competitive in-distribution performance, and all tested models demonstrated superior generalization. The framework thus demonstrates considerable potential to accelerate the deployment of smaller, cost-effective models by reducing manual data creation efforts.

Generated on 2026-04-01 using Claude