Efficient Text Classification with Conformal In-Context Learning

Conformal In-Context Learning (CICLe) combines a lightweight base classifier with Conformal Prediction to adaptively reduce candidate classes before LLM prompting, achieving efficient and accurate text classification across diverse NLP benchmarks.

Problem Statement

LLM-based text classification is expensive due to long prompts with many candidate classes, and performance is highly sensitive to prompt design. Existing few-shot prompting approaches scale poorly with the number of classes and incur high computational costs. CICLe was previously validated only in a single domain, leaving its generalizability and efficiency benefits unproven.

Key Novelty

Comprehensive multi-benchmark evaluation of CICLe across diverse NLP classification tasks, establishing its generalizability beyond a single domain
Demonstration that CICLe reduces prompt length by up to 25.16% and number of shots by up to 34.45% through adaptive class-set reduction via Conformal Prediction
Empirical evidence that CICLe is especially advantageous for highly class-imbalanced datasets and enables competitive performance with smaller LLMs

Evaluation Highlights

CICLe reduces the number of in-context shots and prompt length by up to 34.45% and 25.16% respectively compared to full few-shot prompting baselines
CICLe consistently outperforms its base classifier and few-shot prompting baselines when sufficient training data is available, and matches them in low-data regimes

Signal Assessment

4/10 The paper is a solid empirical contribution that broadens validation of an existing framework (CICLe) across multiple benchmarks and quantifies efficiency gains, but does not introduce fundamentally new methods or paradigm-shifting ideas.

Methodology

Train a lightweight base classifier (e.g., logistic regression or small neural model) on available labeled data for the target classification task
Apply Conformal Prediction on the base classifier's output to construct a prediction set that contains the true class with high probability, reducing the candidate class set per input
Use only the reduced candidate classes to construct a shorter, more focused few-shot LLM prompt, then query the LLM for the final classification decision

System Components

Lightweight Base Classifier

A traditional or small ML classifier trained on labeled data that provides initial class probability estimates for each input

Conformal Prediction Module

A statistically grounded uncertainty quantification method that converts base classifier scores into prediction sets with guaranteed coverage, filtering out unlikely classes

Adaptive LLM Prompting

Constructs few-shot prompts using only the conformal prediction set of candidate classes, reducing prompt length and computational cost while maintaining accuracy

Results

Metric/Benchmark	Baseline (Few-shot LLM)	CICLe	Delta
Number of shots	Full shot set	Up to 34.45% fewer shots	-34.45%
Prompt length	Full prompt	Up to 25.16% shorter	-25.16%
Classification accuracy (sufficient data)	Few-shot baseline	Outperforms baseline	Positive
Classification accuracy (low-data regime)	Few-shot baseline	Comparable performance	~0
Class-imbalanced tasks	Few-shot baseline	Particularly advantageous	Positive

Key Takeaways

CICLe is a practical drop-in framework for reducing LLM API costs in text classification by pruning irrelevant classes before prompting, making it appealing for production deployments with many classes
Practitioners with sufficient labeled data (enough to train a base classifier) should prefer CICLe over naive few-shot prompting, especially for tasks with many or imbalanced classes
CICLe enables the use of smaller, cheaper LLMs while maintaining competitive accuracy, offering a cost-quality tradeoff lever that is highly relevant for resource-constrained settings

Abstract

Large Language Models (LLMs) demonstrate strong in-context learning abilities, yet their effectiveness in text classification depends heavily on prompt design and incurs substantial computational cost. Conformal In-Context Learning (CICLe) has been proposed as a resource-efficient framework that integrates a lightweight base classifier with Conformal Prediction to guide LLM prompting by adaptively reducing the set of candidate classes. However, its broader applicability and efficiency benefits beyond a single domain have not yet been systematically explored. In this paper, we present a comprehensive evaluation of CICLe across diverse NLP classification benchmarks. The results show that CICLe consistently improves over its base classifier and outperforms few-shot prompting baselines when the sample size is sufficient for training the base classifier, and performs comparably in low-data regimes. In terms of efficiency, CICLe reduces the number of shots and prompt length by up to 34.45% and 25.16%, respectively, and enables the use of smaller models with competitive performance. CICLe is furthermore particularly advantageous for text classification tasks with high class imbalance. These findings highlight CICLe as a practical and scalable approach for efficient text classification, combining the robustness of traditional classifiers with the adaptability of LLMs, and achieving substantial gains in data and computational efficiency.