Qwen3-Coder-Next Technical Report
Problem Statement
Most high-performing coding models require large active parameter counts at inference time, making deployment expensive and slow. Existing models are often trained on static code datasets without grounding in executable environments, limiting their ability to act as autonomous coding agents. This work investigates how far efficient sparse models can be pushed using agentic training signals derived directly from real execution feedback.
Key Novelty
- MoE architecture with 80B total but only 3B active parameters, enabling strong capability at low inference cost for coding agents
- Large-scale synthesis of verifiable coding tasks paired with executable environments, enabling mid-training and RL directly from environment execution feedback
- Open release of both base and instruction-tuned variants of an agent-specialized coding model, supporting community research and deployment
Evaluation Highlights
- Competitive performance on SWE-Bench (software engineering agentic benchmark) relative to its 3B active parameter footprint
- Strong results on Terminal-Bench, demonstrating agent-centric capabilities in terminal/shell-based coding tasks
Breakthrough Assessment
Methodology
- Pre-train an 80B MoE model with 3B active parameters on large-scale code and general data to establish a strong base
- Perform mid-training on synthetically generated verifiable coding tasks with paired executable environments, allowing the model to receive direct correctness signals from code execution
- Apply reinforcement learning (policy optimization) using environment feedback as reward signal to further align the model toward agentic coding task completion
System Components
80 billion total parameters with only ~3 billion activated per forward pass via mixture-of-experts routing, drastically reducing inference cost while retaining large model capacity
Large-scale automated generation of coding tasks that have ground-truth verifiable outcomes (e.g., passing test suites), each paired with a sandboxed executable environment
Supervised or self-improvement training phase using execution feedback from synthesized tasks to teach the model multi-step coding agent behaviors before RL
RL phase where the model interacts with executable coding environments and receives reward signals based on task completion, enabling policy improvement beyond imitation
Fine-tuned version of the base model optimized for instruction following in real-world coding agent deployments, released as open weights
Results
| Benchmark | Comparable Open Models (similar active params) | Qwen3-Coder-Next | Delta |
|---|---|---|---|
| SWE-Bench (agentic SW engineering) | Competitive small/mid active-param models | Competitive (top tier for 3B active params) | Positive relative to active param class |
| Terminal-Bench (agentic terminal tasks) | Prior open coding agents | Strong performance | Positive relative to active param class |
| Inference Efficiency | Dense 7B-13B models (comparable active params) | 3B active params from 80B MoE | Lower inference cost at higher effective capacity |
Key Takeaways
- MoE sparsity is a practical path to building powerful coding agents without paying full inference costs — 3B active parameters from an 80B model can rival dense models many times larger in active compute
- Training coding agents with verifiable tasks in executable environments (rather than static corpora) is critical for developing true agentic capabilities like multi-step debugging and tool use
- The open release of both base and instruction-tuned versions makes this directly useful for practitioners building coding assistants, autonomous agents, or fine-tuning on domain-specific coding tasks
Abstract
We present Qwen3-Coder-Next, an open-weight language model specialized for coding agents. Qwen3-Coder-Next is an 80-billion-parameter model that activates only 3 billion parameters during inference, enabling strong coding capability with efficient inference. In this work, we explore how far strong training recipes can push the capability limits of models with small parameter footprints. To achieve this, we perform agentic training through large-scale synthesis of verifiable coding tasks paired with executable environments, allowing learning directly from environment feedback via mid-training and reinforcement learning. Across agent-centric benchmarks including SWE-Bench and Terminal-Bench, Qwen3-Coder-Next achieves competitive performance relative to its active parameter count. We release both base and instruction-tuned open-weight versions to support research and real-world coding agent development.