A.X K1 Technical Report

A.X K1 is a 519B-parameter Mixture-of-Experts language model trained from scratch by SK Telecom, featuring a novel Think-Fusion training recipe that enables user-controlled switching between reasoning and non-reasoning modes within a single unified model. The model is optimized for both reasoning capability and inference efficiency, with particular strength in Korean-language tasks.

Problem Statement

Large language models often require separate models for different inference modes (e.g., chain-of-thought reasoning vs. fast inference), creating deployment complexity and resource overhead. Existing open-source models lack strong multilingual support, particularly for Korean, limiting their utility in non-English enterprise contexts. Balancing reasoning depth with inference efficiency at scale remains an unsolved challenge for practical real-world deployment.

Key Novelty

Think-Fusion training recipe: a unified training approach enabling explicit user-controlled switching between 'thinking' (extended reasoning) and 'non-thinking' (fast inference) modes within a single model
Scaling law-guided optimization of both training configurations and vocabulary size under fixed computational budgets for a 519B MoE architecture
Multi-stage data processing pipeline curating ~10T tokens with specialized curation for Korean-language data, achieving state-of-the-art performance on Korean benchmarks

Evaluation Highlights

A.X K1 achieves performance competitive with leading open-source models (e.g., comparable to DeepSeek, Qwen, LLaMA-scale MoE models) on general reasoning and language benchmarks
Establishes a distinctive advantage over competing models on Korean-language benchmarks, demonstrating superior multilingual specialization for Korean

Signal Assessment

5/10 A.X K1 is a solid engineering and research contribution with a practical innovation in Think-Fusion for controllable reasoning, but it is primarily a scaled MoE model with incremental architectural novelty; the core techniques build on established MoE, scaling law, and RLHF/reasoning training paradigms rather than introducing fundamentally new methods.

Methodology

Apply scaling laws to determine optimal model architecture (519B MoE), vocabulary size, and training hyperparameters under a fixed compute budget before training begins
Pre-train on ~10T tokens curated via a multi-stage data pipeline emphasizing quality filtering, deduplication, and Korean-language data enrichment
Apply Think-Fusion post-training recipe to fine-tune the model to support controllable reasoning, allowing users to explicitly toggle between extended thinking mode and direct response mode at inference time

System Components

519B MoE Architecture

Mixture-of-Experts model with 519B total parameters, activating a sparse subset per token to balance capacity with computational efficiency during inference

Think-Fusion Training Recipe

A unified post-training methodology that teaches the model to switch between chain-of-thought reasoning (thinking mode) and direct answer generation (non-thinking mode) based on user instruction, eliminating the need for separate models

Multi-Stage Data Pipeline

A curated data processing system for constructing the ~10T token pre-training corpus, with dedicated stages for quality filtering, deduplication, domain balancing, and Korean-language data enrichment

Scaling Law Optimizer

A framework that uses empirical scaling laws to determine the optimal training configuration (model size, learning rate, batch size, vocabulary size) given a fixed FLOPs budget

Korean Language Module

Specialized data curation and training focus on Korean-language content, enabling benchmark-leading performance on Korean NLP tasks

Results

Benchmark	Leading Open-Source Baseline	A.X K1	Delta
Korean Language Benchmarks	Competitive open-source models (e.g., Qwen, LLaMA)	State-of-the-art among open-source	Distinctive advantage
General Reasoning Benchmarks	Leading open-source MoE models	Competitive / on-par	Neutral to slight improvement
Thinking Mode Tasks	Single-mode reasoning models	Unified model matches dedicated reasoning models	Efficiency gain (1 model vs. 2)
Non-Thinking Mode Tasks	Single-mode fast inference models	Competitive with fast-inference specialists	No quality degradation from unification

Key Takeaways

Think-Fusion offers a practical deployment pattern: a single MoE model can replace two separate models (reasoning and non-reasoning), reducing infrastructure complexity and memory footprint for production LLM serving
Scaling law-guided vocabulary and architecture optimization before training is a cost-effective practice — ML teams building large models from scratch should invest in scaling law experiments to avoid suboptimal compute allocation
For organizations targeting non-English markets (especially Korean), A.X K1 demonstrates that language-specific data curation within a large MoE framework can yield measurable benchmark advantages without sacrificing general capability

Abstract

We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixed computational budgets. A.X K1 is pre-trained on a corpus of approximately 10T tokens, curated by a multi-stage data processing pipeline. Designed to bridge the gap between reasoning capability and inference efficiency, A.X K1 supports explicitly controllable reasoning to facilitate scalable deployment across diverse real-world scenarios. We propose a simple yet effective Think-Fusion training recipe, enabling user-controlled switching between thinking and non-thinking modes within a single unified model. Extensive evaluations demonstrate that A.X K1 achieves performance competitive with leading open-source models, while establishing a distinctive advantage in Korean-language benchmarks.