← Back to Papers

Qwen3-Coder-Next Technical Report

Ruisheng Cao, Mouxiang Chen, Jiawei Chen, Zeyu Cui, Yunlong Feng, Binyuan Hui, Yuheng Jing, Kaixin Li, Mingze Li, Junyang Lin, Zeyao Ma, Kashun Shum, Xuwu Wang, Jinxi Wei, Jiaxi Yang, Jiajun Zhang, Lei Zhang, Zongmeng Zhang, Wenting Zhao, Fan Zhou
2026
Qwen3-Coder-Next is an 80B sparse MoE language model (3B active parameters) trained specifically for coding agents via large-scale verifiable task synthesis and reinforcement learning from environment feedback, achieving strong agentic coding performance with a minimal active parameter footprint.

Problem Statement

Most high-performing coding models require large active parameter counts at inference time, making deployment expensive and slow. Existing models are often trained on static code datasets without grounding in executable environments, limiting their ability to act as autonomous coding agents. This work investigates how far efficient sparse models can be pushed using agentic training signals derived directly from real execution feedback.

Key Novelty

  • MoE architecture with 80B total but only 3B active parameters, enabling strong capability at low inference cost for coding agents
  • Large-scale synthesis of verifiable coding tasks paired with executable environments, enabling mid-training and RL directly from environment execution feedback
  • Open release of both base and instruction-tuned variants of an agent-specialized coding model, supporting community research and deployment

Evaluation Highlights

  • Competitive performance on SWE-Bench (software engineering agentic benchmark) relative to its 3B active parameter footprint
  • Strong results on Terminal-Bench, demonstrating agent-centric capabilities in terminal/shell-based coding tasks

Breakthrough Assessment

6/10 The paper makes a solid engineering and training contribution by demonstrating that MoE sparsity combined with agentic RL training can yield competitive coding-agent performance at very low active parameter counts, but the core techniques (MoE, RL from execution feedback) are not individually novel — the value lies in their effective combination and scale.

Methodology

  1. Pre-train an 80B MoE model with 3B active parameters on large-scale code and general data to establish a strong base
  2. Perform mid-training on synthetically generated verifiable coding tasks with paired executable environments, allowing the model to receive direct correctness signals from code execution
  3. Apply reinforcement learning (policy optimization) using environment feedback as reward signal to further align the model toward agentic coding task completion

System Components

Sparse MoE Architecture (80B/3B)

80 billion total parameters with only ~3 billion activated per forward pass via mixture-of-experts routing, drastically reducing inference cost while retaining large model capacity

Verifiable Task Synthesis Pipeline

Large-scale automated generation of coding tasks that have ground-truth verifiable outcomes (e.g., passing test suites), each paired with a sandboxed executable environment

Agentic Mid-Training

Supervised or self-improvement training phase using execution feedback from synthesized tasks to teach the model multi-step coding agent behaviors before RL

Reinforcement Learning from Environment Feedback

RL phase where the model interacts with executable coding environments and receives reward signals based on task completion, enabling policy improvement beyond imitation

Instruction-Tuned Variant

Fine-tuned version of the base model optimized for instruction following in real-world coding agent deployments, released as open weights

Results

Benchmark Comparable Open Models (similar active params) Qwen3-Coder-Next Delta
SWE-Bench (agentic SW engineering) Competitive small/mid active-param models Competitive (top tier for 3B active params) Positive relative to active param class
Terminal-Bench (agentic terminal tasks) Prior open coding agents Strong performance Positive relative to active param class
Inference Efficiency Dense 7B-13B models (comparable active params) 3B active params from 80B MoE Lower inference cost at higher effective capacity

Key Takeaways

  • MoE sparsity is a practical path to building powerful coding agents without paying full inference costs — 3B active parameters from an 80B model can rival dense models many times larger in active compute
  • Training coding agents with verifiable tasks in executable environments (rather than static corpora) is critical for developing true agentic capabilities like multi-step debugging and tool use
  • The open release of both base and instruction-tuned versions makes this directly useful for practitioners building coding assistants, autonomous agents, or fine-tuning on domain-specific coding tasks

Abstract

We present Qwen3-Coder-Next, an open-weight language model specialized for coding agents. Qwen3-Coder-Next is an 80-billion-parameter model that activates only 3 billion parameters during inference, enabling strong coding capability with efficient inference. In this work, we explore how far strong training recipes can push the capability limits of models with small parameter footprints. To achieve this, we perform agentic training through large-scale synthesis of verifiable coding tasks paired with executable environments, allowing learning directly from environment feedback via mid-training and reinforcement learning. Across agent-centric benchmarks including SWE-Bench and Terminal-Bench, Qwen3-Coder-Next achieves competitive performance relative to its active parameter count. We release both base and instruction-tuned open-weight versions to support research and real-world coding agent development.

Generated on 2026-04-01 using Claude