BPMN Assistant: An LLM-Based Approach to Business Process Modeling
Problem Statement
Direct XML generation for BPMN diagrams is verbose, slow, and error-prone, especially during complex modifications where LLMs must regenerate or patch large XML structures. Existing approaches lack a structured intermediate representation that maps naturally to atomic edit operations, leading to high latency and frequent syntax errors. This limits the practical usability of LLM-based tools for interactive, iterative business process modeling.
Key Novelty
- Specialized JSON-based intermediate representation for BPMN that enables atomic editing operations (add, remove, modify elements) rather than full document regeneration
- Function calling integration that maps natural language editing intents to discrete, structured BPMN manipulation operations
- Systematic comparative evaluation of JSON vs. XML generation approaches across state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, DeepSeek V3) using conformance checking with F1 scoring
Evaluation Highlights
- JSON approach achieves higher or equivalent editing success rates across all evaluated models vs. direct XML, with conformance F1 of 0.72 (JSON) vs. 0.69 (XML) on average
- JSON approach reduces generation latency by ~43% and output token count by >75% compared to direct XML, despite requiring more input context tokens
Breakthrough Assessment
Methodology
- Define a compact JSON schema representing BPMN elements and relationships, along with a set of atomic edit functions (add node, remove edge, update property, etc.) exposed via LLM function calling
- Route natural language user instructions through an LLM that selects and parameterizes the appropriate atomic edit function(s), applying changes incrementally to the current diagram state rather than regenerating the full document
- Evaluate correctness via BPMN conformance checking (precision, recall, F1 against reference models) and measure latency and token usage across multiple frontier LLMs, comparing against a direct XML generation baseline
System Components
A compact, LLM-friendly schema encoding BPMN elements (tasks, gateways, events, flows) designed to support incremental atomic edits rather than full document serialization
A set of structured tool/function definitions exposed to the LLM enabling it to perform discrete BPMN edit operations (insert, delete, update) in response to natural language commands
Evaluation module that compares generated BPMN models against reference process models using precision, recall, and F1 to verify executable semantic correctness
Interactive UI layer that accepts natural language process modeling instructions and renders the resulting BPMN diagrams in real time
Results
| Metric | Direct XML Baseline | JSON Approach (This Paper) | Delta |
|---|---|---|---|
| Conformance F1 (avg) | 0.69 | 0.72 | +0.03 |
| Editing Task Success Rate | Lower or equivalent | Higher or equivalent (all models) | Favors JSON |
| Generation Latency | Baseline (100%) | ~57% of baseline | -43% |
| Output Token Count | Baseline (100%) | <25% of baseline | >-75% |
| XML Precision (frontier models) | Superior (GPT-5.1, Claude 4.5) | Slightly lower | XML edge for precision only |
Key Takeaways
- When building LLM-based structured document editors, designing a compact intermediate representation with atomic operations via function calling dramatically reduces token usage and latency compared to full-document regeneration — a pattern generalizable beyond BPMN to any schema-driven editing task.
- Frontier models (GPT-5.1, Claude 4.5 Sonnet) can achieve high precision with direct XML for generation tasks, but the JSON + function calling approach consistently wins on editing tasks across all model tiers, making it the safer default for interactive agentic workflows.
- Conformance checking (precision/recall/F1 against reference models) is a meaningful evaluation framework for LLM-generated structured artifacts and should be adopted more broadly when assessing semantic correctness of generated process or workflow models.
Abstract
This paper presents BPMN Assistant, a tool that leverages Large Language Models for natural language-based creation and editing of BPMN diagrams. While direct XML generation is common, it is verbose, slow, and prone to syntax errors during complex modifications. We introduce a specialized JSON-based intermediate representation designed to facilitate atomic editing operations through function calling. We evaluate our approach against direct XML manipulation using a suite of state-of-the-art models, including GPT-5.1, Claude 4.5 Sonnet, and DeepSeek V3. Results demonstrate that the JSON-based approach significantly outperforms direct XML in editing tasks, achieving higher or equivalent success rates across all evaluated models. Conformance checking evaluation confirms that generated models preserve executable semantics, with JSON achieving an average F1 score of 0.72 compared to 0.69 for XML, though frontier models like GPT-5.1 and Claude 4.5 Sonnet demonstrated superior precision with direct XML generation. Furthermore, despite requiring more input context, our approach reduces generation latency by approximately 43% and output token count by over 75%, offering a more reliable and responsive solution for interactive process modeling.