AI News
Real Time

Why Volcano Engine's MaaS Market Share Keeps Growing Despite Fiercer Competition

China's Model-as-a-Service (MaaS) market is expanding rapidly, evolving from a small, nARRow segment into a high-potential business growth area. A...

China's Model-as-a-Service (MaaS) market is expanding rapidly, evolving from a small, nARRow segment into a high-potential business growth area. According to the latest data from market research firm IDC, enterprise-level MaaS large model call volume in China grew 16 times year-on-year in 2025, reaching 1,941 trillion tokens, with even faster growth projected for 2026. Throughout 2025, especially in the second half, Chinese cloud computing providers and large model companies entered the market en masse, pouring more computing power, sales resources, and product Investment into rAIsing the priority of their MaaS businesses, intensifying competition significantly.

Conventionally, when latecomers flood a rapidly expanding emerging market, the early leader's share tends to be diluted. This logic seemed especially APPlicable to MaaS, where many outsiders believed large model APIs struggle to build stickiness: developers merely need to change a few lines of code to swap out the underlying model or switch cloud platforms. Yet IDC's latest data reveals a counterintuitive outcome. In 2025, Volcano Engine's share of China's MaaS market remained remarkably solid, rising from 49.2% in the first half to 49.5% for the full year. This means that during the most competitive phase in the second half, Volcano Engine was not diluted by newcomers; instead, it further widened its lead as the market expanded. For every two large model tokens generated on China's public cloud, nearly one runs on Volcano Engine.

Observers often attribute this to aggressive pricing. When Volcano Engine launched its Doubao large model MaaS service in May 2024, it slashed prices to 99.3% below the industry's prevailing level. But subsidies alone cannot explain the sustained market share expansion. Other industry players quickly lowered their MaaS service prices to comparable levels. What truly determines whether low pricing is sustainable lies in call volume scale and inference engineering capability.

Model capability is equally critical. The rapid expansion of the MaaS market primarily stems from new application scenarios unlocked by improving model capabilities: enhanced programming abilities drive the rise of Vibe Coding and AI Agents, while Video Generation models enter production pipelines for short dramas, comic-adapted series, and advertising, continuously amplifying Token Consumption. This suggests MaaS is essentially a speed competition within an incremental market. Whoever can productize model capabilities faster and provide cost-effective, stable services can quickly capture newly emerging scenarios and continue to expand market share as the market grows.

From the Doubao large language model to the Seedance video generation model, the Doubao model family continues to iterate and improve. Building on this foundation, Volcano Engine is accelerating the transformation of its accumulated token scale into more comprehensive competitiveness: lower inference costs, higher engineering efficiency, and the infrastructure required for Agent Operations. A Cloud Computing flywheel for the large model era is taking shape.

Low Prices Backed by Scale and Engineering Prowess

Cloud computing is a classic high-fixed-cost, low-marginal-cost industry. Servers, networks, R&D, and operational systems all require substantial upfront investment, but the marginal cost of each additional call steadily deCLInes. Greater scale makes it easier to amortize R&D and infrastructure spending. Scale also magnifies the value of engineering optimization. Volcano Engine President Tan Dai once offered an analogy: "Optimizing utilization by one percentage point for 10,000 servers versus one million servers yields a hundredfold difference in returns. You can build a powerful team to do it much better."

Scale was the most critical variable when Volcano Engine focused its efforts on MaaS: the goal was not simply to sell model interfaces, but to rapidly scale up token call volume. To that end, Volcano Engine made token consumption a core business metric and adjusted its sales team's performance evaluation system accordingly. For the Same revenue amount, MaaS products carry several times the incentive weight of traditional cloud services in internal assessments.

Alongside elevated business priority, Volcano Engine increased its technical investment in model inference. MaaS costs largely depend on token generation efficiency. When server utilization, cache hit rates, and computing power scheduling efficiency improve, costs have room to fall. "Lower costs can spur more applications and expand the overall market," Tan Dai later explained about the pricing strategy at the time, adding that upon realizing Technology could drive costs down, "we decided to cut them thoroughly in one go."

Key technologies underpinning Volcano Engine's price reduction included the early large-scale application of Prefill-Decode (PD) separation and KV Cache. PD separation splits the "underStanding the question" (Prefill) and "generating the answer" (Decode) stages of large model inference, matching each with more suitable computing units. KV Cache stores historical states generated during the model's process, avoiding repeated computation of prior context with each new ouTPUt, thereby saving mEMOry bandwidth and inference costs. However, these technologies depend on scale. At low call volumes, maintaining complex caching and scheduling systems incurs costs that may even offset the computing power saved.

As PD separation, KV Cache, and similar techniques spread throughout the industry, token prices gradually converged. Followers lacking scale effects face greater cost pressure when matching low prices and may even incur losses. Volcano Engine, with its larger call volume, faces less cost pressure and has more room to continue optimizing inference technology, forming a sustainable low-price capability.

Beyond technology and engineering, Volcano Engine also seeks Cost Reduction opportunities elsewhere: on one hand, it offers differentiated pricing based on context length tiers, giving clients more choice; on the other, it introduced a "Savings Plan" that consolidates clients' usage across different models, such as language models and video generation. Scale discounts accumulated on language models can offset trial-and-error costs for new ventures like video generation.

The latest IDC China MaaS report notes that Volcano Engine holds the highest market share by call volume, and its revenue share also ranks first, albeit a few percentage points lower than its call volume share. The price per token on Volcano Engine is below the industry aveRAGe. Notably, IDC's statistics on China's MaaS market primarily cover enterprise model calls on public clouds, excluding AI applications developed by ByteDance such as Doubao and Jimeng, as well as tokens generated when internal services like Douyin and Feishu deploy large models. While these call volumes are not counted in IDC's market share statistics, they similarly inFluence Volcano Engine's cost structure and engineering efficiency.

Agents Turn MaaS into an Infrastructure Business

OpenAI CEO Sam Altman recently remarked in an interview that the next phase of AI will shift from "a user provides a piece of text, and the large model returns a piece of text or code" to "Agents truly running inside companies, completing various types of work." He also mentioned that openai is collaborating with AWS on a product akin to a "virtual colleague." MaaS is evolving from supplying standardized model interfaces into enterprise infrastructure with stronger stickiness. For an enterprise Agent to truly operate, it requires components such as identity authentication, permission controls, memory systems, tool invocation, sandbox environments, logging, security governance, and connections to internal enterprise systems.

This underlies the recent emphasis within the large model industry on the "Agent Harness." The term "harness," originally referring to the gear for controlling a horse, in the Agent context signifies the engineering system that works in concert with the base model. MaaS supplies stable model capabilities; the harness handles turning inference into a workflow that is constrainable, traceable, and sustainably operational. Consequently, how cloud platforms provide large model services is changing. Whether it's anthropic's partnerships with multiple cloud providers or OpenAI's collaboration with AWS announced in April this year, the approach goes beyond simply placing model interfaces on cloud platforms. Instead, APIs are encapsulated within the cloud platform's native agent environment, enabling enterprises to develop and operate production-grade Agents within that environment.

Volcano Engine's product evolution over the past few years can be understood within this trend: while enhancing MaaS competitiveness, it is extending large model services into infrastructure covering the development and operation of Agents. "We were the first in China to launch a full suite of Agent products that simplify Agent development," Tan Dai said in an end-of-year interview, explaining that clients can build a complex Agent with just a few lines of code, "much like developing a complex website before," only now requiring new AI middleware. In his assessment, writing code previously meant essentially writing if-else statements to DeFine workflows; now, when developing Agents based on models, developers write more Prompts, while processes like plan orchestration, task decomposition, and sub-Agent creation are increasingly handled by the model itself. This is also the underlying working logic of products like openclaw.

Therefore, Volcano Engine was able to rapidly launch the OpenClaw product ArkClaw earlier this year while simultaneously supporting ACTivities like the CCTV Spring Festival Gala. Alongside enhancing security capabilities, it open-sourced the context database OpenViking, designed for Agent long-term memory, to make ArkClaw more user-friendly. They define the "ArkClaw Personal Edition" as an "AGIle Agent": it allows employees to quickly experiment with ideas that improve business efficiency, then validates effective capabilities and consolidates them into "stable Agents." The latter corresponds to HiAgent, the Agent development and operations platform Volcano Engine launched in 2024. By April of this year, the number of enterprises on Volcano Engine that have cumulatively consumed over one trillion tokens had grown from 100 at the end of last year to 140. An increasing number of large MaaS clients are forming deeper Partnerships with Volcano Engine.

The AI Cloud Flywheel Begins to Spin

In business analysis, the flywheel effect is the core logic used to explain the success of AWS, the world's largest cloud platform: scale amortizes costs, lower prices attract more customers, and customer growth brings more feedback, cash flow, and a stronger ecosystem, propelling further iteration of technology and services. Volcano Engine is building a similar flywheel in the AI era, but its flywheel does not entirely follow the logic of the traditional cloud computing industry. The traditional cloud flywheel primarily revolves around computing power, storage, networking, and software ecosystems; the MaaS flywheel adds model capabilities, token usage patterns, Agent scenarios, and real business feedback.

The first layer of Volcano Engine's flywheel is the cycle between model capability, call volume scale, and inference cost. ByteDance's internal model research team, Seed, consistently supplies Volcano Engine with first-tier models. Stronger models more easily expand call volume; larger call volumes better leverage engineering technology to reduce costs; and lower costs attract more customers. This is a scale flywheel closely resembling traditional cloud computing, only the unit of measurement has shifted from servers, storage, and bandwidth to tokens.

The second layer of the flywheel comes from feedback from real-world scenarios. Within the ByteDance ecosystem, Doubao, used daily by hundreds of millions of people, the rapidly growing Jimeng, dozens of internal business lines such as Douyin and Feishu, and external clients all develop and use large model capabilities through Volcano Engine, providing high-frequency, complex, and genuine product feedback. This feedback flows in one direction to the Seed model team to aid the continued iteration of base models, and in the other direction to Volcano Engine's Agent team to help enhance product capabilities. Agent products are particularly dependent on such feedback. Anthropic has also noted in multiple technical articles that improving Agent capabilities does not rely solely on model advancements. Internal employees, external users, production monitoring, A/B testing, user research, and client deployment needs collectively drive the iteration of products like Claude Code.

Volcano Engine capturing nearly half of China's MaaS market share in 2025 is merely a milestone result of its flywheel beginning to spin. Now, the Agent boom continues to drive market demand higher, with the industry periodically experiencing computing power shortages. Some companies have chosen to raise prices to optimize short-term financial performance. Volcano Engine has indicated it will not follow suit. This pricing restraint stems from Volcano Engine's assessment of the industry's stage: more important than capturing higher short-term profits is expanding call volume, lowering barriers to usage, increasing real-world application scenarios, and allowing the flywheel to continue accelerating.

★★★★★
★★★★★
Be the first to rate this article.

Comments & Questions (0)

Captcha
Please be respectful — let's keep the conversation friendly.

No comments yet

Be the first to comment!