AI Flash

Ant Group Open-Sources Ling-2.6-1T: A Trillion-Parameter MoE Model with Fast-Thinking Capability

2 weeks ago Apr 30, 2026 · 11:12 22 views
Quick Brief

Ant Group's Inclusion AI has officially released the weights for its flagship large language model, Ling-2.6-1T. This trillion-parameter MoE...

Ant Group's Inclusion AI has officially released the weights for its flagship large language modelLing-2.6-1T. This trillion-parameter MoE (Mixture of Experts) model is now avAIlable to the public under a permissive MIT license. While featuring a total parameter count of 1 trillion, it ACTivates only 63 billion parameters during inference, balancing immense capability with computational efficiency. The model supports a context window of up to 256K tokens.

Architecture and Core Features

Ling-2.6-1T shares the Same advanced architecture as the recently open-sourced Lightweight "flash" version (104B total / 7.4B active parameters). It employs a hybrid attention mechanism combining MLA (Multi-head Latent Attention) and Lightning Linear Attention.
A key innovation in the 1T version is the introduction of a "Fast-Thinking" training strategy. By incorporating a "Contextual Process Redundancy Suppression" reward during post-training, the model is optimized to compress verbose chain-of-thought ouTPUts. This APProach significantly reduces Token Consumption without compromising the quality of reasoning or task execution.

Performance benchmarks

The model dEMOnstrates state-of-the-art performance among open-source non-reasoning models on several complex task benchmarks:
  • SWE-bench Verified: Achieved a score of 72.2%, a notable improvement over the flash version's 61.2%. For context, some open-source reasoning models like DeepSeek V4 Pro have surpassed 80%.

  • Other Benchmarks: The model reportedly reaches top-tier performance on execution-focused tests including AIME 2026, BFCL-V4, TAU2-Bench, and IFBench.

  • Efficiency: On the Artificial Analysis comprehensive Intelligence index, the model scored 34, consuming approximately 16 million output tokens for the full evaluation, highlighting its Token Efficiency.

deployment and Accessibility

Ling-2.6-1T is designed for integration into complex workflows and is compatible with mainstream Agent Frameworks such as Claude Code, openclaw, and OpenCode.
  • API Access: A free API is currently available via OpenRouter.

  • Self-Hosting: For developers looking to self-host, deployment requires a minimum of 8 GPUs. The model supports popular inference engines like SGLang and vLLM.



★★★★★
★★★★★
Be the first to rate this article.

Comments & Questions (0)

Captcha
Please be respectful — let's keep the conversation friendly.

No comments yet

Be the first to comment!