Alibaba Unveils Qwen3-Max-Thinking: A Trillion-Parameter AI Model That Outperforms GPT-5.2, Claude O
On the evening of JAnuary 26, Alibaba officially launched Qwen3-Max-Thinking, its flagship reasoning model in the Qwen series. The new model has set new global records on multiple key benchmarks, outperforming leading models such as GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro, and significantly pushing the boundaries of AI reasoning capabilities.
Through extreme-scale expansion in total parameters, reinforcement learning, and reasoning computation, Qwen3-Max-Thinking achieves a substantial performance leap. It now holds world-leading scores on critical benchmarks including GPQA Diamond (scientific knowledge), IMO-AnswerBench (mathematical reasoning), and LiveCodeBench (code generation).
A key innovation is its novel test-time scaling mechanism, which boosts reasoning performance while maintaining cost efficiency. According to Alibaba, the model features over one trillion parameters, benefits from large-scale reinforcement learning fine-tuning, and incorporates a suite of reasoning innovations. It also demonstrates significantly enhanced native agent capabilities—enabling it to autonomously invoke tools while reasoning, much like a human expert—and exhibits markedly reduced hallucination rates, laying a solid foundation for tackling real-world complex tasks.
The model is now available for free public access via the Qwen desktop and web platforms, with mobile app integration coming soon.
Comments & Questions (0)
No comments yet
Be the first to comment!