Is the "Free" token Era Here? A Conversation with the Founder of Agnes AI on the Business and Ambition Behind Free multimodal AI
Anyone who has been genuinely using AI over the past year will share the Same feeling: AI is getting more expensive. A $20 monthly subscription used to be more than enough, but since the rise of Agents and Vibe Coding, tokens burn like water. Let a Coding Agent run for an afternoon, and the bill stacks up fast. Consequently, users have learned to be frugal, constantly weighing whether a task is worth the run or if a block of code truly needs an AI rewrite. Many creative ideas are snuffed out the moment they arise, suppressed by the nagging question, "How many tokens will this burn?"
AI was supposed to empower everyone to create freely, yet using it has become a metered, cost-saving exercise.
Now, one company is letting you use three AI models—for text, image, and video—without spending a dime. This is not a seven-day trial or a one-time credit that runs out; it’s an unlimited offering. Is this the "Cyber Bodhisattva" of the AI age?
On June 1st, a startup called Agnes AI made the API tokens for its text, image, and video models completely free. The news broke, and within days, over a dozen commUnity groups were flooded to capacity. In the first week alone, calls to Agnes-2.0-Flash soared past 1 trillion tokens. Agnes-Image-2.1-flash generated over 2 million images, and Agnes-Video-2.0 produced more than 2 million SEConds of video. The earliest to ARRive were almost all geeks rushing in overnight to "test the experience."
But the tone in the groups quickly shifted.
Someone used it to generate a minutes-long video. Others connected it to workflows to produce entire sets of materials. One user even stitched together footage of his two daughters growing up, paired with an AI voiceover. Given past pricing, he likely would never have been willing to try a video of that length. This is the most intriguing aspect of "free": what it truly unlocks is not the money saved, but the ideas you previously didn't even dare to try because they were too expensive.
Even more rare, while most companies focus on a single model modality, Agnes is tackling text, image, and video simultaneously—and making all of them free.
The excitement continues. This week, Agnes AI is set to update its 1M super-Long Context window and a 4K ultra-high-DeFinition image model.
Of course, questions follow: Does "free" mean the models are inferior? How on Earth do you drive costs down enough to sustain such massive usage? Without charging, how does the team survive?
And most importantly, what is the Agnes AI team ultimately aiming for?
With these questions in mind, GeekPark spoke with Bruce Yang, the founder of Agnes AI. Below is a summary of the conversation.
Higher-priced models are better, and lower-priced models underperform—this is a huge misconception. DeepSeek is also very cheap, yet it matches or even surpasses many more expensive models on numerous metrics.
What token-free truly unlocks isn't the money saved, but the ideas you previously didn't dare to try because of the cost. A user's potential should never be limited by cost.
Precisely because of a harness as a constraint, the gap between models is ACTually shrinking. The harness serves two roles: first, it narrows the performance gap between models, and second, it provides clearer direction for model upgrades and optimization.
We want to seize this moment to be the first to raise the Banner of "free," to get a seat at the table and become a significant player early on.
A decade ago, if you couldn't speak Chinese or English, you might have been considered illiterate. A decade from now, if you don't underStand AI, you might be the new illiterate. It's not really a fear of AI, but a fear that those who do not understand AI will have of those who do, feeling they can be replaced at any moment.
01
The Misconception: Cheaper Models Are Worse
GeekPark: Let's start with an introduction to yourself and the Agnes AI team.
Bruce Yang: I'll begin with my background. I went abroad at 15 to study at Raffles Institution in Singapore, then went to the UniveRSIty of California, Berkeley, for college, majoring in Computer Science and Mathematics. I was very fortunate to study under two Turing Award winners, Richard Karp and David Patterson. Ion stoica, who taught us operating systems, is now the founder of Databricks. Later, I worked in Silicon Valley at Microsoft and LinkedIn, started a company there, returned to China, and am now based in Singapore. A turning point came during the pandemic lockdowns in China; I returned to Singapore to pursue a Ph.D. at the National University of Singapore in the School of Computing, focusing on AI. That experience gave me a lot of inspiration and was a crucial foundation for founding Agnes.
Agnes truly got started around late 2024 or early 2025. We are a very young company. We were building models from the start, but last year was more about APPlications. Since our models weren’t good enough yet, we first built our own harness, which is our current agent, and gradually optimized capabilities on top of that. Initially, we relied heavily on external APIs from the "Big Three," but the costs were consistently high, especially after we surpassed 10 million total users. It became unsustainable, so we accelerated our proprietary model development, what you might call a domestic substitution. By the end of last year, the story was still mostly about product and substitution.
At the beginning of this year, we found our models were performing quite well, even showing advantages over some closed-source models in certain areas. So we made a bold move, gradually opening up our model APIs, from a small scope to full multimodal, and now we’ve gone all-in with free, multimodal access. We announced it on June 1st, and it’s only been three days of official Operation, but we already have over a dozen community groups, each with hundreds of geek users. Our daily Token Consumption surpassed 100 billion yesterday. Reaching that number in three days is decent, and it’s expected to multiply three or four times over by the weekend. So far, it's within our expectations.
GeekPark: It’s not just domestic users; the whole world gets excited about "free." But people will question if the product is free because it's not that impressive. What is the actual standard of your three models right now?
Bruce Yang: I think this is a misconception, and it applies not just to free models but to low-cost, high-value models in General. There's a persistent belief that more expensive models are better and that cheaper ones must have compromised performance, hence the low price. But look at DeepSeek; its price is very low, yet it rivals or outperforms many pricier models on many metrics.
While our models are currently free, that does not signify any compromise on performance. Based on our current results, our text model ranks among the top 10 global AI labs in agentic scenarios like PinchBench and ClawEval. Our image and video models are also ranked among the top 10 global AI labs on Artificial Analysis, the world's most authoritative blind evaluation benchmark.
The models are continuously improving. A new version will be released this month, and likely every month thereafter. Our standard for ourselves is not necessarily to immediately match the very top SOTA model in strength, but to keep up quickly and stay within one generation of capability. For instance, when a new version is released, we aim to have the capabilities of its predecessor. Achieving this is not easy, and combined with being free, I believe we can win the favor of many users.
GeekPark: You sound confident in your models. Why don't you show us and explain a few dEMOs created by Agnes's models?
Bruce Yang: There are already quite a few reviews of our models online. We've looked, and 95% of them weren't provided by us. After the first day of the free announcement, user-generated promotion exploded. The reviews are all quite balanced, and they point out some of our issues, but overall, people acknowledge our capabilities.
For example, this particle effect was a key indicator for testing text model capability when Gemini first came out. Another user built an entire operating system using the text model, complete with a little airplane-flying game inside.
Besides text, our image and video models are also decent, especially images. We've done a good job optimizing for high information density content. Of course, compared to Nano banana or GPT's image model, there's still a gap. Some high-density text details aren't fully optimized yet, but overall, we should be among the top domestic models.
For video, we support simultaneous audio and video ouTPUt; characters can speak in the video in both Chinese and English, though some small details still need refinement. We plan to launch the next version of our video model around the second half of this month, with a target of approaching the stage of happyHorse. There’s still a gap with Seedance. But overall, as a commercialized model, "free" does not mean it lacks commercial value. We've already reached the capability level of many closed-source models and can unlock substantial commercial potential.
GeekPark: For the tasks you just showed, were they completed end-to-end by a single model, or did they involve collaboration among multiple agents behind the scenes?
Bruce Yang: Our API provides only three models: text, image, and video. We haven't unified these three APIs yet but plan to release them together next week, because many users get confused during configuration. Many harnesses don't support direct uploading or downloading of images and videos natively; they need to be loaded as Skills. So currently, they are three separate models. The content you saw was bASIcally all completed on top of a harness. This harness could be our own Agnes harness, or it could be Codex, openclaw, or Claude Code. By connecting to our single model, these capabilities can be realized.
Currently, we aren't using multiple text models or multiple image/video models to support a harness's work. However, during execution, a harness might, based on its own understanding, needs, and dependencies, dispatch multiple agents at a certain moment, which we do support.
GeekPark: How does the computational cost for these tasks compare to popular models and tools today?
Bruce Yang: Let's talk about our pricing first. Although it's free now, we had a pricing structure before, and a token plan still exists. For the text model, generally, only output tokens are linked to cost; input tokens are essentially zero cost for the model company.
Our input token price was $0.15 per million, roughly 1/100th of GPT and anthropic, and about half the price of DeepSeek's flash version. We still had some margin there. Images were $3 per 1,000 images, or $0.003 per image, which is quite extreme. The actual cost for video is about $0.30 per minute, which is roughly one cent RMB per second. This cost is about 1/100th of the quoted price from leading market models.
That was our old pricing. Now it's free, so everyone can use it freely. We only have slight rate limits on QPS (Queries Per Second) and RPM (Requests Per Minute), but the limits are generous, allowing 20 requests per minute. We haven't yet seen a case where a normal indiVidual developer finds that volume insufficient.
GeekPark: "Free" makes people worry about whether the team can actually sustain it. Very few teams do all three modalities. Agnes isn't a giant company, so why build all three types of models together?
Bruce Yang: The pressure is definitely there. Our research team now has over 100 people. There aren't many companies with labs that rank in the global top 10 for text, image, and video simultaneously. Overseas, there are Google and OpenAI. Domestically, perhaps Alibaba and ByteDance. Not many others do all three.
We didn't think too much about it at first. Our own harness product already supported text, image, and video, and our usage data showed a pretty balanced demand. So the first step was a domestic substitution. During that process, we found synergies between the three models.
When Nano Banana came out, they mentioned that one reason for its strong instruction-following capability was using their then-flagship model, Gemini 2.5 Pro, for visual content parsing, resulting in very strong reverse Prompting ability. It's the same for video models. Anyone who has actually trained one knows the first prerequisite is strong text and image models. Video models also require vast amounts of data, much of it from film and TV CLIps. After slicing these clips, you need text to describe that video segment well, and those descriptions can be used for reverse training—a process that also heavily relies on a text model. So, there's a certain dependency among the three models during training. There are also emerging pathways, like image models starting to use autoregressive methods that combine understanding and generation capabilities.
So overall, there are two reasons. First, it's driven by real-world usage needs. For a One-Person Company or small workshop, configuring interfaces from three different providers is quite challenging. If we can combine them into a single Omni-mode API, it significantly lowers the usage cost and barrier. Second, there's synergy in training. A better multimodal understanding text model better supports image and Video Generation, and they reInforce each other. Multimodal scenarios also generate a lot of new data, which is very helpful for our synthetic data efforts and further training, especially since image and video models need a text harness for prompt enhancement.
Only by integrating the three models and simultaneously building an environment that lets users continuously explore can we truly understand the direction for the next model upgrade.
From another perspective, the difference between training three models and training one depends on a company's vision and understanding. A company like Anthropic or openai envisions using the strongest text model to quickly achieve a qualitative leap in capability, to achieve AGI. But our understanding of AGI is slightly different. We hope our AGI is used by the widest range of users in the largest number of scenarios—a broader AGI. On that path, we might not be the absolute best in every single model, but we want to remain in the front ranks, perhaps the top 10, never falling behind by a generation. Simultaneously, we hope the models' capabilities reinforce each other and progress together. We also want more and more users to use our products, building an ecosystem that fuels our progress, helps us understand market needs, and shows us how to lower usage barriers.
Because our vision, starting point, and technical approach are different, we will choose a path others might not take. But this doesn't mean any downgrade or compromise on performance; we will always remain at the global forefront.
GeekPark: What special chArm do you possess to bring together top talent for text, Image Generation, and video into one place?
Bruce Yang: Actually, we have four teams. One each for text, image, and video—each with about a dozen or so people. There's also a dedicated team for performance optimization, figuring out how to drive costs down further. Actually, "cost" isn't the best word; "efficiency" is better—how to achieve staggering results, like 1% of the typical inference cost, during both the training and inference phases.
A core logic for us is that from day one, we've been solving an optimization problem with very strong constraints, but our constraints are different from others. Many are constrained by having enough resources to maximize capability; our constraint from day one was that our resources weren't that large. That's why we needed a cross-cutting performance optimization team spanning the three verticals. From the GPU and codex level to the algorithmic level, the goal is to use the smallest possible parameters to achieve the perfect match between user satisfaction and performance. When we first started this, we honestly weren't sure it would work.
As for Personal charm, we are a latecomer. Both our Singapore and China teams are latecomers, as most model companies aren't in these regions. But as we started to show some promise, we attracted a large number of ouTSTanding local students—from NUS and NTU in Singapore, and from Nanjing University, Southeast University, USTC, and even Tsinghua in China, many of whom chose to join us.
The entire research team is now nearly 100 people, all very smart and outstanding, fighting for a great vision. On June 1st, we made a big move, releasing our accumulated capabilities and research findings. Next week, we'll open-source some new discoveries. The team is highly motivated, wanting to be not just recipients in the AI era but also builders. That's our company culture.
02
Giving Away Hundreds of Billions of Tokens in Three Days: Going Free to Get a "Seat at the Table"
GeekPark: In your view, in which scenarios can these three types of models truly enter production and commercialization, where a user can start making money as soon as they integrate them? Are there clear use cases?
Bruce Yang: It goes back to my earlier point: paid, expensive models are not necessarily better. Users in the groups who have tested our models say they are on par with any paid SOTA model, even when compared to Gemini or claude. Of course, we know internally there's still a gap. Because of this very misconception, simply reducing prices is meaningless. If you lower prices, many will assume it's because your performance is inferior and still won't use it, preferring the Big Three.
The way to break this deadlock and change this stereotype is to first let people try it boldly and find moments of pleasant surprise along the way. In three days since opening up, we've attracted a dozen groups with thousands of users—actually far more, as only about 10% of users scan the QR code from the API key page on our website to join.
From the feedback, we can handle the vast majority of functions they use with paid premium models. Even where we fall short, like particularly complex instruction following or very long-chain agentic tasks where there's some deviation, these can be fixed, optimized in the next version, perhaps in just two weeks, like some tool-calling capabilities. The big picture is, we can now fulfill 90% of the scenarios people are using.
If I have to specify a focus, we've spent more time optimizing agentic capabilities, which is why I care about PinchBench and ClawEval. The next version will further optimize coding. We’re currently working on SWE benchmarks to upgrade coding ability, hoping to also break into the global top 10 there; there seems to be an opportunity. For text, we concentrate more on Agent and Coding, where user volume is most concentrated. For images, I feel we are quite capable. Though there's a gap with GPT's image model, we're competitive among domestic models. The gap in video is slightly larger; we're behind Seedance and HappyHorse. But whether free or at our original price, the cost-performance ratio is definitely excellent. You can look forward to this month’s next version; I hope it gets close.
To summarize the three models, even where there's a gap with some SOTA closed-source models, we know how to close that distance. Our mission is to drive research relentlessly towards infinitely approaching the capability of closed-source models.
GeekPark: Without the current wave of agent enthusiasm, tokens might not have drawn so much attention. But now, as soon as you get tokens, you burn through them fast.
Bruce Yang: Exactly. For coding, because of agent harnesses like OpenClaw, Hermes, Codex, and Claude Code, their architectures are actually quite similar. Precisely because of this harness as a constraint, the gap between models is shrinking.
I recently went horseback riding in Xinjiang specifically to feel the harness. I rode several different horses. The first was very obedient but slow. The second was very fast but not so obedient. But when the reins were in my hands, I found the gap wasn't that big. The slow one, I'd nudge with the stirrups, and it sped up. The disobedient one, I'd pull the reins, and it listened. So, the harness plays two roles: first, it inherently narrows the gap between models, and second, it provides clearer direction for model upgrades and optimization.
What we need to do more is not to train a wild horse without tack, but to train a horse that is wearing a harness. With the harness on, many dimensions are effectively compressed, and the directions for progress become very clear. There was a third horse, fast and obedient, that I didn't ride—it was the guide's horse, a thoroughbred, not for me. That's the SOTA model. What I need to do now is take the less naturally gifted horse, train it with the harness as a foundation, and make it infinitely approach the SOTA model.
GeekPark: Going horseback riding just to experience the concept of a harness is quite impressive. Claude Code is so powerful not only because of Anthropic's model but also because its entire harness is brilliantly designed, filled with things to learn from.
Bruce Yang: Comparing Claude Code with OpenClaw, I see two main advantages for Claude Code. First, its handling and compression of Memory is much stronger than OpenClaw's; it has done a lot of optimization for long-term memory capability. Second is its optimization of KV Cache, which can reduce token usage and increase the cache hit rate for tokens.
A cache hit is essentially zero cost for the model company. Even though the user is charged, for the model company, it's zero cost. Input tokens are also zero cost. That's why we often see companies slashing the price of cache-hit tokens and input tokens so aggressively—because everyone's cost items are mainly in the output tokens, in the output layer.
GeekPark: After the June 1st free launch, you built over a dozen groups. What's the current situation? How are users spending their free tokens?
Bruce Yang: They've helped us uncover many problems we couldn't find when building the product ourselves—various stress-testing methods, usage scenarios, configurations for adapting different harnesses, and error logs. Our original testing team of seven or eight people couldn't catch these issues, but now many active users in the groups are finding them and offering excellent advice. Many are developers and ops engineers who have even pointed out bottlenecks in our gateway.
Second, and more touching to me, is the variety of scenarios being explored. Originally, our video model was used for short clips, 5 or 10 seconds, because that was all the model supported. But users, with their own harnesses and skills written specifically for us, and some even creating ComfyUI workflows, are producing and sharing videos that are several minutes, 3 to 5 minutes long, in the groups.
I saw one user post a short video montage of his two daughters growing up, accompanied by a very moving narration done with TTS, stitching the video together. My first reaction was surprise—wondering if our model really made that, and thinking it looked pretty good. Many people making 5-minute videos probably wouldn't have even tried if they had to pay the normal cost. We've essentially opened up a new scenario, a new kind of right. We have a saying at our company: a user's potential should not be limited by cost. We have given them the right to unleash that potential.
Another touching point: we previously tried emailing OpenClaw, saying, "The models you integrate by default are all very famous, and we rank well in benchmarks. Could you also include our model?"
GeekPark: What did OpenClaw say?
Bruce Yang: They replied with an email saying, "We do not allow and will not integrate models with no reputation." Yet today, I searched GitHub for OpenClaw and Agnes, and from June 1st to 3rd, there were dozens of comments every day asking, "Why don't you support Agnes AI? Why do I have to configure it myself?" So, what we gave away has brought back incredibly moving rewards.
GeekPark: I previously spoke with Yang Pan from SiliconFlow, and he gave me a piece of advice: subscribe to the $200 plan. You'll find that when you have unlimited tokens, your ambition grows.
Bruce Yang: Yes, that's our idea too. Before pushing for free access, even internally, we hadn't fully figured out the next steps or the business model—only a rough concept. But we had one big understanding: when you push something to an extreme, like dropping the price to zero, it will inevitably trigger a new opening mode for the entire ecosystem, a paradigm shift that will burst forth with new scenarios. We don't need to envision all those scenarios now; many users will think of better ones than we can because the power of the crowd is limitless. We are already seeing this, some seeds starting to bloom.
GeekPark: Are you worried about people not only free-riding but also setting up something like a relay station, channeling your free tokens to more users and starting to charge for it themselves? Are you concerned about these resellers?
Bruce Yang: We've set a limit on RPM, requests per minute, at about 20. This is definitely fine for individual users, but very difficult for enterprise users. If you give a 20 RPM product to 10 users, they’ll quickly find it limiting. So, for enterprise users, a paid model is still likely in the future. The price will still be very low, and they can use the free tier for POC and pilot projects.
GeekPark: In a CLI environment, what’s the rule of thumb for an individual to maximize economic efficiency, choosing which tasks for paid models and which for Agnes's free one?
Bruce Yang: For the vast majority of people, unless you are a true geek... I'd say there are two user types that might want to be slightly cautious. The first are absolute geeks who need multiple codex instances running continuously for 3 to 4 hours. Our support for that isn't quite there yet, though we are optimizing for these long-duration, multi-instance scenarios, working alongside our coding harness. The second are Professionals making short DRAMas. It's not that our model is unusable, but in certain scenarios, like very complex motion or extreme consistency requirements, you might want to pair it with some higher-end models.
Other than that, our models should currently handle over 95% of the market's scenarios, which has been validated in our dozen-plus WeChat groups. About 80% of users say we are comparable to other models they've seen. Another portion raises issues, which fall into two categories: those we can resolve quickly and those that are temporarily unsolvable. Of that 10-20% of users who raise issues, we can quickly resolve 80% of them. When you do the math, truly unresolved and with no known solution right now are perhaps only about 1% of scenarios and problems. Coupled with dropping the usage barrier to free, I think it’s a very attractive and worthwhile direction to explore.
GeekPark: How did Agnes drive down the costs of these three model modalities to sustain free access? You’ve given away hundreds of billions of tokens in just three days.
Bruce Yang: Yes. And those hundreds of billions of tokens only represent about one-fifth of our reserved card capacity. Based on current daily consumption, we can handle five times that volume. I’ve also prepared a second batch of cards. Everyone can feel free to use them heavily, until we can't take it anymore.
The logic is this. First, we are solving an optimization problem, but with different constraints than others. Mainstream companies largely believe in scaling law, letting parameters and data scale together as compute allows. But it doesn't answer the question of marginal utility: often a 10x increase in parameters only yields a few percentage points on a benchmark. Plus, reverse distillation is common now, like Gemini using Pro to distill Flash. With a 10x parameter reduction, the gap on most benchmarks isn't huge.
So, from day one, we set a crucial hypothesis: we won't work on models over 200B parameters. We only optimize within 200B, searching for the sweet spot. We rely on environmental stability, synthetic data, and real online data from our own product to continuously expand, along with augmenting data on benchmark problems, which is quite a mature practice now. We will open-source some of our synthetic data methods soon.
On top of that, we're placing only two big bets: agent and coding, aiming to be on par with SOTA models. We're strategically abandoning other areas—not because they're unimportant, but they aren't the first step to solve. Because right now, massive Token consumption is definitely happening in coding harnesses or white-collar office harnesses.
There's also a slightly more forward-looking attempt. We published an article on our website about how to approximate the effect of larger models by recursively calling layers of a Transformer without increasing parameters or depth. We call it a recurrent depth transformer. In small-scale validation, one cycle reduced PPL by 10%, which means a 10% improvement in parameter utilization efficiency. For the same MoE model, multiple calls can better leveRAGe the capability of each unit parameter. This is a key area for future experiments. The long-term vision is to continuously optimize performance within 200B, approaching SOTA. Resources are limited, but so far, it seems effective.
Images and videos are a bit different. They haven't broken through scaling law yet; basically, more data leads to better results. When many products don't achieve a desired effect, it's not a capability issue but a data issue, and synthetic data is very complex. For example, if you need 100 million video clips, scraping and slicing them yourself could take months. By the time you're done, the window of opportunity has passed.
So the question is, how do you get the data you need in the shortest time? And through what pipeline should you train that data? How can the image model empower the video model? Along the way, do you choose a DiT or autoregressive technical route? There are many small know-hows in this process that are more critical than one-time big conceptual upgrades. It's a bit like what Yann Dubois, a post-training lead at OpenAI, said: training models is more of a craft, not a conclusion that can be systematically deduced.
Over the past year plus, our hundred-plus research colleagues have made many innovations, fully leveraging the power of academia and Open Source. That's why we are also giving back to the open-source ecosystem. For instance, the paper on recurrent depth transformers has already been open-sourced. Next week, we'll open-source a VAE module capability for optimizing text in images. Later, on the video model side, our most important work is on how to synthesize data, and we will progressively open-source that too.
This ecosystem has been very helpful to us. Many ingredients and recipes are already out there, but the question is whether you have a sufficiently large and strong team and enough confidence to invest and cook the dish. I'd say we've cooked it quite well so far.
GeekPark: What's the business thinking behind "Token Free"?
Bruce Yang: We have a rough idea but haven't mapped it all out completely. I can share some parts. First, the numbers. We processed hundreds of billions of tokens in a few days. I looked, and currently, the number one model on OpenRouter is DeepSeek V4 Flash, doing about 3 trillion tokens per week. I calculated that if we reached that weekly usage volume, our actual server cost would be about a few million RMB, which is not a huge number at all. A very important reason is that we've compressed costs to the absolute extreme. I currently don't see anyone else in the market achieving our cost structure; it's a bit outrageous.
What's the target for this free period? The goal is to reach double the scale of the number one model on OpenRouter. Once we hit that, we might continue to support new users depending on our funding situation, but up to twice that volume, we can fully support it. Currently, our capacity sits at the scale of the top-ranked model on OpenRouter, mainly serving individual consumers. We haven't yet done massive promotion to enterprise consumers; they can do a POC, but the RPM provided isn't that high. If our volume reaches twice that of the largest model on OpenRouter, the free access is completely sustainable. Because rather than saving that cost, we want more users to experience our models, to like them, and to become loyal users. It's absolutely worth it.
The next step for commercialization has a few Threads.
The first is enterprise users. Sales can be tiring, but if you open up a free tier for them to try and let them come to us proactively, it's much faster. That's a very important commercialization path for us.
Second, looking at OpenAI and Anthropic, their fastest-growing segment in B2B is their harnesses, namely Claude Code and Codex. So, we will soon launch our own harness product. I’ll keep the details under wraps for now, but that’s another very important commercialization path.
Third, for super-heavy individual users, the geeks, this isn't a primary focus. When we further upgrade our models to be truly SOTA, say top 3 in the market, we might consider a small-scale charge, or offer priority access for paying users who after a period of payment, could get it for free. But none of these are the highest priority; the first two are much more so.
GeekPark: Today it's "Token Free." Could the next step be "paying users to use tokens"?
Bruce: It's a possibility. But overall, in the AI era, trying to maintain a one- or two-year barrier and moat is very difficult. Now, while we have the capability, with our multimodal models all reaching a usable state and our lab ranking in the global top 10, we want to be the first to raise the banner of "free." We hope to push this vision out first, because this action aligns with our vision. Companies that can completely match this—multimodal, similar capability, and free—there aren't many on the market currently. Most choose to focus their efforts on one domain; while they may slowly branch into others, it takes time.
So we want to seize this opportunity to quickly get a seat at the table and become a significant player. We have follow-up moves. When others try to match us, we have other cards yet to play. The harness product is what we’re intensely preparing now. I can't say exactly when or what product, but there are new growth curves ahead.
GeekPark: Will the tech giants follow suit, making their older models free, for instance?
Bruce: It depends on how quickly they can match this move. I think it will be difficult, especially since they already have so many paying users. As a new entrant, we don't have that baggage, that huge base of enterprise and recurring paid users, so we can pivot quickly. But for many larger companies, it's hard to turn a big ship. Their entire planning, budgets, and annual plans would need adjustment. The decision-making path in a large company isn't that fast.
03
AI equality is the Underlying Ethos Behind "Free"
GeekPark: You mentioned ordinary users using free tokens to create memory videos of their daughters. Is that a sentiment shared by you and the team—a hope to give AI freely as a tool, allowing everyone to unleash their creativity and make life better?
Bruce Yang: Let me share a bit more about my background. I grew up in a fourth-tier city in China. I earned a scholarship to Raffles Institution in Singapore through academic competitions and middle school exam results, essentially the best high school there. There, I met many peers from across Southeast Asia—from families that weren't wealthy but excelled academically—and gained many new perspectives. I won Gold medals and ranked top in the nation in Singapore's math, physics, and chemistry Olympiads, and also served on the student council. That portfolio helped me earn a leadership scholarship to attend UC Berkeley.
People in Silicon Valley say that rich kids go to Stanford, and poor kids go to Berkeley. The student body at Berkeley is a real slice of society, not stereotypically elite, but everyone is smart, full of ideas, and very pure and unvarnished.
Later, I started companies in Silicon Valley, and when I returned to Singapore for my Ph.D., I also received the President's Scholarship. I've been incredibly lucky. Coming from a fourth-tier city with unwealthy parents, I've had a path paved with scholarships and support. Many of my achievements today are built on that accumulation, plus a defiant heart. Even as a latecomer, I'm willing to challenge the current market players. But AI is becoming less egalitarian now because of cost. Many creative people are mindful of token consumption and dare not use it at scale, making them relatively less creative and efficient.
Reflecting on my own journey, whether it was those scholarship students at Raffles or the affordable tuition at Berkeley that allowed smart kids from ordinary California families to attend, that seed was gifted to me. I've reached a point where I need to give back to society, to pass that torch on. That is equality: equality of capability and equality of value.
In this era, AI equality is the most core principle. A decade ago, if you couldn't speak Chinese or English, you might have been considered illiterate. A decade from now, if you don't understand AI, you might be the new illiterate.
Many of my friends in Silicon Valley are very anti-AI, very afraid of it. It's not really a fear of AI; it's a fear that those who don't understand AI have of those who do, fearing they can be replaced at any moment. The solution is not to suppress AI but to make it a more egalitarian capability, letting everyone know how to use AI to create more. This is a very important vision for our company: making world-class AI belong to everyone. What we can do may be minuscule, but this vision is incredibly long-lasting and enduring.
GeekPark: Many big companies have stopped open-sourcing, but you are still doing it. Beyond AI equality, what other thoughts are behind it?
Bruce Yang: Many companies now are trying open source, but they only open-source the parameters, not the methods. The one that open-s
Comments & Questions (0)
No comments yet
Be the first to comment!