Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

Rule-based role prompting (RRP) with character-card/scene-contract design and strict function-calling enforcement outperforms both basic prompting and automatic prompt optimization (APO) for tool-augmented LLM role-play dialogue agents. Structured, explicit constraints on agent behavior reduce over-speaking and under-acting more effectively than learned or automated prompt strategies.

Problem Statement

Tool-augmented LLM dialogue agents in persona-grounded settings suffer from two key failure modes: over-speaking (excessively long in-character responses) and under-acting (poor tool use, including hallucinated or unnecessary function calls). Existing prompting strategies lack the structured behavioral constraints needed to reliably align agent output with both persona requirements and tool-use protocols. This work addresses the gap between elaborate automated optimization methods and the practical need for reliable, interpretable prompting solutions in competitive dialogue system settings.

Key Novelty

Character-card/scene-contract design: a structured prompt format that explicitly defines persona attributes and behavioral rules as a 'contract' the agent must follow during dialogue
Strict function-calling enforcement: explicit prompt-level rules that constrain when and how tools may be invoked, reducing hallucinated and unnecessary API calls
Open-sourced APO tool and best-performing prompts for the CPDC 2025 API track, enabling reproducibility and community reuse

Evaluation Highlights

Rule-based role prompting (RRP) achieved an overall score of 0.571 on the CPDC 2025 API track, compared to a zero-shot baseline of 0.519 (+0.052 absolute improvement)
RRP outperformed all other tested approaches including basic role prompting, improved role prompting, and automatic prompt optimization (APO), demonstrating that structured rule-based constraints are more effective than automated optimization for this task

Signal Assessment

4/10 The paper presents a solid engineering contribution with practical value for role-play dialogue agent design, but the improvements are incremental and specific to a competition track. The findings are useful but do not fundamentally advance LLM prompting theory or introduce a generalizable algorithmic breakthrough.

Methodology

Identified two core failure modes in tool-augmented role-play agents (over-speaking and under-acting) through analysis of the CPDC 2025 API track task and baseline system outputs
Designed and evaluated four prompting strategies in sequence: basic role prompting → improved role prompting → automatic prompt optimization (APO) → rule-based role prompting (RRP), iterating based on observed failure patterns
Developed the RRP approach using character-card/scene-contract prompt structures and explicit function-calling rules, then evaluated all approaches on the CPDC 2025 benchmark to compare overall dialogue agent scores

System Components

Character-Card/Scene-Contract Design

A structured prompt template that encodes persona attributes, behavioral expectations, and situational rules as an explicit 'contract,' ensuring the agent consistently embodies the role and respects dialogue context

Strict Function-Calling Enforcement

Prompt-level instructions that define precise conditions under which tools should or should not be called, preventing hallucinated function names and unnecessary pre-answer tool invocations

Automatic Prompt Optimization (APO) Tool

An automated system for iteratively refining prompts based on performance feedback, open-sourced for community use despite being outperformed by RRP in this setting

Role Prompting Variants

A progression of role prompting strategies (basic → improved) used as baselines and stepping stones toward the final RRP design, providing ablation-style insight into what prompt elements matter most

Results

Approach	Overall Score	This Paper (RRP)	Delta vs. Baseline
Zero-shot Baseline	0.519	0.571	+0.052
Basic Role Prompting	Below RRP	0.571	Positive
Improved Role Prompting	Below RRP	0.571	Positive
Automatic Prompt Optimization (APO)	Below RRP	0.571	Positive

Key Takeaways

Explicit, rule-based prompt structures (character cards + scene contracts + strict tool-use rules) are more reliable than automated prompt optimization for constraining LLM behavior in structured role-play and tool-use settings
Over-speaking and under-acting are distinct, addressable failure modes in persona-grounded dialogue agents—targeted prompt constraints for each issue yield measurable performance gains without complex training or fine-tuning
Open-sourcing optimized prompts and APO tooling enables practitioners to bootstrap persona prompt development for similar dialogue agent tasks, reducing iteration cost in competition and production settings

Abstract

This report investigates approaches for prompting a tool-augmented large language model (LLM) to act as a role-playing dialogue agent in the API track of the Commonsense Persona-grounded Dialogue Challenge (CPDC) 2025. In this setting, dialogue agents often produce overly long in-character responses (over-speaking) while failing to use tools effectively according to the persona (under-acting), such as generating function calls that do not exist or making unnecessary tool calls before answering. We explore four prompting approaches to address these issues: 1) basic role prompting, 2) improved role prompting, 3) automatic prompt optimization (APO), and 4) rule-based role prompting. The rule-based role prompting (RRP) approach achieved the best performance through two novel techniques-character-card/scene-contract design and strict enforcement of function calling-which led to an overall score of 0.571, improving on the zero-shot baseline score of 0.519. These findings demonstrate that RRP design can substantially improve the effectiveness and reliability of role-playing dialogue agents compared with more elaborate methods such as APO. To support future efforts in developing persona prompts, we are open-sourcing all of our best-performing prompts and the APO tool Source code is available at https://github.com/scb-10x/apo