AI News
Real Time

Alibaba PAI Open-Sources AgenticQwen: "Dual Data Flywheels" Propel 8B Model to Rival 235B Performanc

Alibaba’s Platform for AI (PAI) team has officially released and open-sourced AgenticQwen, a new series of Small Language Models specifically eng...
Alibaba’s Platform for AI (PAI) team has officially released and open-sourceAgenticQwen, a new series of Small Language Models specifically engineered for industrial-grade tool usage. Available in 8B and 30B-A3B parameter sizes, these models leveRAGe an innovative "Dual Data Flywheel" reinforcement learning Framework. This architecture allows them to deliver Agent capabilities comparable to massive hundred-billion-parameter models while significantly reducing inference costs.

️ The Core Mechanism: Dual Data Flywheels

Traditional synthetic data methods often suffer from homogeneity, causing Model Performance to plateau. AgenticQwen overcomes this limitation through a dynamic training APProach involving two distinct flywheels:
  • Reasoning Flywheel: This mechanism automatically generates harder variants of problems based on the model's previous errors, forcing continuous improvement in logic and reasoning.

  • Agent Flywheel: Instead of simple Linear workflows (e.g., a straightforward booking process), this flywheel expands execution trajectories into complex behavior trees. It simulates real-world decision-making by introducing constraints, rejection scenarios, and adversarial conditions.

benchmark Performance

evaluations indicate that AgenticQwen delivers exceptional results on real-world tool-use benchmarks, such as TAU-2 and BFCL-V4:
  • AgenticQwen-8B: Achieved an average score of 47.4, vastly outperforming the base Qwen3-8B (23.8) and closely approaching the performance of the massive Qwen3-235B (52.0).

  • AgenticQwen-30B-A3B: Utilizing a Mixture of Experts (MoE) architecture that ACTivates only 3B parameters during inference, this model reached a score of 50.2.

Industrial Application & Limitations

The model has already been deployed in internal production systems similar to manus, dEMOnstrating a significant reduction in end-to-end inference time compared to larger models. However, the team notes that due to a native context window of 40K tokens, small models like AgenticQwen still face limitations in deep search tasks requiring extensive context retention.


★★★★★
★★★★★
Be the first to rate this article.

Comments & Questions (0)

Captcha
Please be respectful — let's keep the conversation friendly.

No comments yet

Be the first to comment!