Efficient Text Classification with Conformal In-Context Learning
Problem Statement
LLM-based text classification is expensive due to long prompts with many candidate classes, and performance is highly sensitive to prompt design. Existing few-shot prompting approaches scale poorly with the number of classes and incur high computational costs. CICLe was previously validated only in a single domain, leaving its generalizability and efficiency benefits unproven.
Key Novelty
- Comprehensive multi-benchmark evaluation of CICLe across diverse NLP classification tasks, establishing its generalizability beyond a single domain
- Demonstration that CICLe reduces prompt length by up to 25.16% and number of shots by up to 34.45% through adaptive class-set reduction via Conformal Prediction
- Empirical evidence that CICLe is especially advantageous for highly class-imbalanced datasets and enables competitive performance with smaller LLMs
Evaluation Highlights
- CICLe reduces the number of in-context shots and prompt length by up to 34.45% and 25.16% respectively compared to full few-shot prompting baselines
- CICLe consistently outperforms its base classifier and few-shot prompting baselines when sufficient training data is available, and matches them in low-data regimes
Breakthrough Assessment
Methodology
- Train a lightweight base classifier (e.g., logistic regression or small neural model) on available labeled data for the target classification task
- Apply Conformal Prediction on the base classifier's output to construct a prediction set that contains the true class with high probability, reducing the candidate class set per input
- Use only the reduced candidate classes to construct a shorter, more focused few-shot LLM prompt, then query the LLM for the final classification decision
System Components
A traditional or small ML classifier trained on labeled data that provides initial class probability estimates for each input
A statistically grounded uncertainty quantification method that converts base classifier scores into prediction sets with guaranteed coverage, filtering out unlikely classes
Constructs few-shot prompts using only the conformal prediction set of candidate classes, reducing prompt length and computational cost while maintaining accuracy
Results
| Metric/Benchmark | Baseline (Few-shot LLM) | CICLe | Delta |
|---|---|---|---|
| Number of shots | Full shot set | Up to 34.45% fewer shots | -34.45% |
| Prompt length | Full prompt | Up to 25.16% shorter | -25.16% |
| Classification accuracy (sufficient data) | Few-shot baseline | Outperforms baseline | Positive |
| Classification accuracy (low-data regime) | Few-shot baseline | Comparable performance | ~0 |
| Class-imbalanced tasks | Few-shot baseline | Particularly advantageous | Positive |
Key Takeaways
- CICLe is a practical drop-in framework for reducing LLM API costs in text classification by pruning irrelevant classes before prompting, making it appealing for production deployments with many classes
- Practitioners with sufficient labeled data (enough to train a base classifier) should prefer CICLe over naive few-shot prompting, especially for tasks with many or imbalanced classes
- CICLe enables the use of smaller, cheaper LLMs while maintaining competitive accuracy, offering a cost-quality tradeoff lever that is highly relevant for resource-constrained settings
Abstract
Large Language Models (LLMs) demonstrate strong in-context learning abilities, yet their effectiveness in text classification depends heavily on prompt design and incurs substantial computational cost. Conformal In-Context Learning (CICLe) has been proposed as a resource-efficient framework that integrates a lightweight base classifier with Conformal Prediction to guide LLM prompting by adaptively reducing the set of candidate classes. However, its broader applicability and efficiency benefits beyond a single domain have not yet been systematically explored. In this paper, we present a comprehensive evaluation of CICLe across diverse NLP classification benchmarks. The results show that CICLe consistently improves over its base classifier and outperforms few-shot prompting baselines when the sample size is sufficient for training the base classifier, and performs comparably in low-data regimes. In terms of efficiency, CICLe reduces the number of shots and prompt length by up to 34.45% and 25.16%, respectively, and enables the use of smaller models with competitive performance. CICLe is furthermore particularly advantageous for text classification tasks with high class imbalance. These findings highlight CICLe as a practical and scalable approach for efficient text classification, combining the robustness of traditional classifiers with the adaptability of LLMs, and achieving substantial gains in data and computational efficiency.