Which AI Glasses Will Be the First to Achieve Million-Level Shipments? Industry Insights on the Path-AI Topic

As the boundaries between AI, AR, and XR become increASIngly blurred, the industry must first clarify a fundamental proposition: What exACTly do we mean by "AI glasses"? Generally speaking, AI glasses integrate AI capabilities, focusing on voice and smart perception to enable functions like translation, reCognition, and voice assistants. They can be display-free, with typical use cases including meeting minutes. AR glasses utilize optical see-through Technology to overlay digital Information onto the real world; their core capability is the fusion of virtual and real environments for rEMOte collaboration, requiring optical transparency, with typical APPlications in industrial maintenance. XR glasses encompass AR/VR/MR, providing immersive or mixed reality experiences; their core lies in virtual screens and fully immeRSIve interaction, utilizing either closed or video see-through solutions, such as virtual offices. These three categories differ significantly in their technological pathways, hardware forms, and commercialization Timelines. The industry consensus for 2026 is focusing on AI glasses—the lightest and most consumer-friendly category.

To underStand the current position of AI glasses in the industry, it is necessary to look back at industry forecasts from over a year ago. In April 2025, during a roundtable discussion at a wearable technology seminar, VeriSilicon presented six topics. The on-site voting results outlined a clear industry evolution map: over 40% (44.83%) of guests believed that smart glasses with cameras but no displays (like Ray-Ban Meta) would be more popular in the short term. Regarding Human-Computer Interaction, voice (35.29%) and displays (19.61%) were selected as the two most suitable methods. In terms of chip solutions, 45.83% predicted that ISP+MCU would become the mainstream within two years, while a staggering 93.33% favored custom chip solutions. For system power consumption and interface standards, the AR Processor Interface (ARPI), developed through a collaboration between VeriSilicon and Digital Light Chip, is suitable for video transmission in three-color light-combining systems. Compared to MIPI interfaces, it saves two-thirds of the bandwidth. The three-chip color-combining Micro LED + optical waveguide solution was predicted to be the mainstream AR solution for the next 5-10 years. In smart computing architecture, 50% of guests chose a "glasses-side + phone-side + cloud-side" three-end linkage model. Regarding market competition, nearly 60% (58.33%) believed that smartphone manufacturers like Xiaomi, vivo, Meizu, and Huawei would capture a larger market share in AI glasses.

Based on these industry predictions, VeriSilicon further proposed five forecasts at its CEO Forum in February 2026: The AI glasses category capable of achieving tens of millions in shipments within two years of mass production should feature a "display-free design, cameras serving environmental perception and Lightweight recording, total weight under 30 grams (excluding lenses), 12-hour standby time, and a price under 2,000 RMB." Dr. Wayne Dai, Chairman of VeriSilicon, summarized this as "Smartphone++"—not a replacement for smartphones, but a natural extension of smartphone capabilities. According to Omdia, global AI/AR glasses shipments in 2025 were 8.7 million units, with Meta dominating the market with 7.4 million units (85.2% share). Global AI branded glasses shipments are expected to reach 15 million units in 2026.

Against this industry backdrop, on June 3, during the 16th Songshan Lake China IC innovation Summit, a roundtable discussion titled "The Path to the Industrialization of AI Glasses" unfolded enthusiastically. Core industry players gathered to debate seven major topics: killer applications, million-level shipments, privacy boundaries, edge models, chip architecture, viewing experience, and capital pathways.

What is the Killer Application Most Attractive to Consumers?
"What is the killer application for AI glasses targeting the mass market?" Dr. Dai’s first voting question struck at the core of the industry. The on-site voting results showed voice assistants leading at 27.36%, followed closely by shooting and sharing at 21.39%, and information Prompts at 13.93%. AR navigation (10.95%), immersive audio-visual entertainment (10.45%), AR real-time encyclopedia (7.96%), office efficiency (7.46%), and others (0.5%) followed.

Mr. Shi Qing, general Manager of Emdoor VR, voted for voice assistants, shooting and sharing, and office efficiency. He revealed that CLIents he onboarded last year have already used AI glasses as an entry point for meeting records, feeding multimodal data into large models to generate meeting minutes. "Even Robin Li is using it," he noted, adding that this productivity tool attribute will be a key selling point.

Mr. Gao Kang, Vice President of Business Development at Bestechnic, ranked information prompts first, voice assistants SECond, and shooting and sharing third. He stated bluntly that products from various manufacturers are currently converging in voice and shooting capabilities; the true differentiation lies in the completeness of information prompts. "Qwen's glasses are selling better than those that only take photos. Deep analysis shows that their optical waveguide prompts provide better information feedback." Gao emphasized that while cameras and audio are indispensable for glasses, research indicates that "information prompts" are the most frequently used feature on smartwatches, a logic that will be replicated in glasses.

From an industry mapping perspective, the layout of international giants validates these judgments. Apple positions its AI glasses as an AI accessory for the iPhone. Its screenless design focuses on lightweight wear and ecosystem integration, with core functions including Visual Intelligence, an upgraded Siri, and first-person shooting. However, it has been delayed until late 2027, with a target price of $200-$500. Google is taking an independent AI wearable platform route, deeply integrating Gemini AI, with Audio and Display versions; the Audio version is expected to launch in Fall 2026. Meta has built the most complete product matrix: Ray-Ban Meta (starting at $379), Oakley Vanguard (starting at $499), and a Display flagship model (starting at $799), covering fashion, sports, and high-end display scenarios.

More imAGInative are vertical innovations. In March 2026, Beijing Nidejia released AI zoom reading glasses, utilizing AI algorithms and optical technology to dynamically adjust lens diopters, automatically switching between near and far vision. This upgrades traditional optical correction to an active myopia management system, targeting myopia prevention in children and adolescents. In the hearing aid sector, Cearvol Lyra uses an AI neural engine to optimize sound in real-time, and Nuance Audio achieves clear dialogue in noisy environments through AI beamforming. Lingban Technology is exploring proactive understanding and assistance via AI super-Agents. AI glasses are transitioning from "general-purpose terminals" to "medical-grade health entry points."

Which AI Glasses Will Be the First to Achieve Million-Level Shipments?
"Within two years, in the Chinese market, which single AI glasses model will be the first to achieve million-level commercial deployment?" This was Dr. Dai’s "soul-searching question." The on-site voting yielded a somewhat unexpected result: display-free AI glasses with cameras took first place at 35%, followed by full-color AI glasses at 30%, and monochrome AI glasses at 25%. Display-free glasses without cameras received only 6.67%, and other vertical applications accounted for 3.33%.

Mr. Ma Chao, Chief AR Platform Engineer at Lingban Technology, was surprised by this result. He admitted he originally thought the Meta effect would give display-free glasses an overwhelming advantage. However, with full-color and monochrome combined reaching 55%, "it shows that the vast majority believe displays are important." Nevertheless, full-color glasses are not yet ready in terms of maturity, cost, and power consumption. Monochrome glasses are more likely to break through the million-unit mark within two years. He provided clear price anchors: display-free products without cameras should be priced under 600-700 RMB; for display-free products with cameras, using a multi-in-one SoC from major manufacturers would cost 1,500-2,000 RMB, while traditional ISP+MCU solutions could achieve 1,000-1,500 RMB; monochrome products could be controlled at 2,000-3,000 RMB; full-color products currently have high costs, priced at 4,000-5,000 RMB, and will need to drop to 3,000-4,000 RMB upon maturity to be accepted—"they cannot exceed the promotional price of an Apple iPhone."

Mr. Zhang Junjie, CTO of Rayneo, also chose monochrome. He believes display-free camera glasses are "sharing-oriented" products, while display glasses are "AI-related creation and sharing-oriented" products. Both have the opportUnity to quickly surpass a million units, but "the AI opportunity is greater." From a technical perspective, the threshold for displays is being broken, and their value far exceeds the cost burden. He further differentiated the sub-scenarios for display glasses: business-oriented ones might not have cameras, while tech enthusiasts might accept versions with cameras.

Gao Kang issued a warning from the supply chain perspective: the cost of glasses with cameras has been "rising continuously" from this year to next. Camera modules face dual challenges in supply and pricing, which will exert realistic pressure on the pricing strategies for million-level shipments.

In fact, the division of labor for cameras in AI glasses represents differentiation between technological pathways: shooting/livestreaming dedicated cameras (e.g., Ray-Ban Meta 12MP, Rokid 12MP) only work when activated; CV spatial perception cameras (e.g., 4 on Meta Aria Gen 2) are always on, responsible for SLAM, gesture recognition, and real-time mapping; eye-tracking cameras (e.g., 2 on Meta Aria prototype) actively perceive to assist interaction and optimize power consumption; low-power wake-up cameras achieve 24/7 environmental detection and keyword wake-up at about 1mW power consumption. This "multi-camera division of labor" architecture is the hardware foundation supporting million-level shipments.

Balancing Always-On Perception and User Privacy
The tension between the always-on perception capabilities of AI glasses and user privacy is an ethical and technical threshold that industrialization must cross. Ma Chao proposed a systematic tiered perception solution. He believes future camera data should be divided into two categories: one for edge-side model perception, which cannot be accessed by user apps; and another for taking photos and videos, which cannot be cracked, must be accompanied by a lit indicator light, and is not allowed to have any backdoors to turn it off.

Technically, he suggests dividing always-on perception into three levels: Level 1 performs scene classification and time perception on the environmental sensing side, with power consumption controlled within 3-5 milliwatts. Level 2, triggered by preset audio, enters a low-power performance module (approx. 20-30 milliwatts) for more precise user intent understanding. Level 3, when a true rigid demand scenario is triggered, allows the AI to capture images for tokenization, converting privacy data into text or encrypted forms before uploading to the cloud as large model memory, with power consumption around 200-300 milliwatts. "Through this step-by-step tier switching, we solve both power consumption concerns and societal worries about privacy."

Zhang Junjie revealed that Rayneo has had in-depth exchanges with China's MIIT regarding this. The core consensus is that facial data must be desensitized; in the always-on era, raw data "absolutely cannot be uploaded to the cloud." Edge devices must have blurring capabilities or rely on strict encryption via smartphones. "Current MIIT regulations state facial data cannot go to the cloud, and whether encrypted transmission to a phone for processing is feasible remains uncertain; ideally, it shouldn't exist at all."

Shi Qing proposed a new perspective from the angle of Data Sovereignty: privacy is divided into "others' privacy" and "one's own privacy." The state regulates the former, while users should control the latter through private clouds (like AI NAS or edge entity stoRAGe). He revealed that Emdoor VR is advancing related projects, allowing multimodal data collected by glasses to ultimately be stored on home servers. For example, if a user "takes a photo in the US, their mother can see it on a digital photo frame at home in China," achieving a data closed loop.

Edge-Side tokenization and Vertical Deployment of Small Models on Glasses
"What kind of small models do AI glasses need, and what are they mainly used for?" Dr. Dai steered the topic toward edge computing power. From a system architecture perspective, the development process of AI glasses exhibits a typical "three-end linkage" characteristic. As a wearable perception entry point, the glasses receive three inputs: voice/sound captured by microphones, images/videos captured by cameras, and motion/action data recorded by IMUs. This multimodal data undergoes preliminary processing in the environmental perception model, interacts with the smartphone through tokenization and encoding/routing, and finally forms a data closed loop with the cloud-based large language model and storage. Google AI glasses (Gemini), Meta Ray-Ban (Meta intelligence), and third-party APIs based on ChatGPT are representative products of this architecture.

Wang Zhiwei, Executive Vice President of VeriSilicon and General Manager of the Custom Chip Platform Business Unit, pointed out that AI glasses face an "impossible triangle" of "lightweight, low power consumption, and long battery life," but always-on NPUs and small models are indispensable. During the always-on phase, small models handle Intelligent sensor processing: voice wake-up, visual wake-up, voice processing (which can use small-parameter vertical models), and visual analysis. The key innovation is tokenization—by generating voice tokens and image tokens, raw data is abstracted, which mitigates privacy leakage risks to a certain extent. "Data should go out as much as possible and not be concentrated on the glasses. But for issues like latency or privacy, we try to keep them on the glasses."

google's recently open-sourced Gemma 3 series provides a viable technical benchmark for this pathway. Gemma 3 270M (Tiny Gemma) is a compact model with only 270 million parameters, designed specifically for fine-tuning for specific tasks. It has built-in powerful instruction-following and text structuring capabilities. At the edge deployment level, after INT4 quantization, this model requires only 240MB of memory. running 25 rounds of dialogue on a Pixel 9 Pro consumes only 0.75% battery, demonstrating astonishing energy efficiency.

VeriSilicon's strategic coOperation with Google is based on this technical foundation. Based on the Open Se Cura open-source project, they are jointly creating an ultra-low-power Coral NPU IP for edge-side LLMs. Google provides open-source technology, while VeriSilicon provides enterprise-grade IP, chip design, and mass production services, offering "lightweight, always-on, ultra-low energy consumption" computing power for smart glasses, wearables, and AI toys. Coral NPU adopts a collaborative architecture of Scalar, Matrix, and Vector, distinguishing it from traditional CPU+ML general solutions, becoming a key component of VeriSilicon's NPU IP product matrix, empowering nearly a hundred existing and new customers. Leveraging the Google ecosystem and its own design capabilities, VeriSilicon is locking in its leading position in AI ASIC chip design.

Wang further revealed that by reusing and fine-tuning Google's 270M parameter model, VeriSilicon can achieve daily life scenario conversations in two or more languages. For complex image and video AI processing, VeriSilicon's self-developed NPU can provide 1 TOPS or even 4-5 TOPS of computing power. "Whether it's the low-power, lightweight NPU developed with Google for always-on applications or our self-developed NPU, we can configure different parameters to support varying performance and computing power needs, thereby meeting the custom chip requirements for AI glasses."

How to Choose Chip Architecture? Competition and Cooperation Between General SoCs and custom ASICs
"Which AI glasses need general SoCs? Which need custom ASICs, and what kind of ASICs?" Dr. Dai shifted the topic from the model layer to the hardware layer. Currently, mainstream AI glasses chip solutions present three technical routes: System-level SoC solutions, achieving full-function coverage of audio, video, WiFi, etc., through high integration; MCU+ISP solutions, meeting pure recognition or lightweight perception needs at lower costs; and SoC+MCU dual-chip solutions, balancing performance and power consumption through functional separation. These three routes correspond to different product positioning and cost ranges.

Gao Kang believes that any mass-market consumer electronics product will eventually face demands for differentiation. "Just as wheels must be round and Bluetooth must follow Bluetooth protocols, Personalized needs require ASICs." He revealed that Bestechnic currently offers two tiers: one requires higher integration, handling audio, video, WiFi, and everything else within a single chip; the other is a pure recognition glasses solution, using a suitable ISP chip in a dual-SoC solution. "Initially, there are many customization needs in vertical industries. In the public consumer electronics sector, due to cost considerations, people are currently using more generic products."

Wang Zhiwei further argued that AI glasses have extreme differentiation—how many cameras and displays are supported, whether dual-eye full color is needed, video codec resolution, etc. General chips inevitably lead to large die sizes, high power consumption, and high prices, failing to solve diverse needs. When product sales reach the 100,000 to 1 million level, the economic benefits of custom chips will become prominent. He revealed that VeriSilicon helped Google design and tape out AI glasses chips as early as 2022, and recently taped out ASIC glasses chips for overseas clients. "In the current stage, as some speakers have shared, the demand for AI glasses products is highly differentiated, so general chips cannot solve the market's diverse needs."

He Jun, General Manager and CEO of Nanjing CoreSight, judged that the custom and general pathways for AI glasses will be similar to smartphones, but glasses are more stringent, requiring lower power consumption and faster response. "Meta's AI glasses are customized in collaboration with Qualcomm. But customization will eventually move toward generalization; for example, when everyone has the Same functional requirements, that part becomes a general function. In the next 2-3 years, customization needs will be even greater than we imagine."

On-site voting also validated the value perception of custom ASICs. In the vote on "What features can custom ASICs bring to AI glasses," faster response ranked first at 23.73%, followed by longer battery life at 20.34%. Dedicated computing power (16.1%) and higher integration (16.1%) tied for third, followed by stronger real-time performance (13.56%) and reduced costs (10.17%). What guests valued most was not merely Cost Reduction, but response speed, battery life, and dedicated computing power—these are the core advantages of ASICs in edge AI devices.

Rendering Algorithms and Insufficient Computing Power: The Biggest Bottlenecks for Viewing Glasses
Regarding the viewing glasses niche, Zhou Zhenhong, CEO of Pixelworks, demonstrated the value of independent visual chips. Founded in the US in 1997, Pixelworks has focused on visual image processing for nearly 30 years, investing 12 years and developing 8 chips in the mobile phone discrete display chip field, somewhat akin to "embedding a graphics card into a smartphone." In the last 2-3 years, its chips have entered the glasses sector, with companies like Rayneo using Pixelworks chips to achieve HDR10 and HDR10+ experiences.

Zhou revealed that Pixelworks' new-generation X8 chip possesses single-frame modeling capabilities, extracting 3D models from captured footage through the chip. "This is a very important technology for future glasses and robots." This capability simultaneously solves two pain points: privacy localization (transmitting only 3D models rather than raw video streams) and bandwidth compression (significantly reducing transmitted data volume). Additionally, Pixelworks' 30 years of accumulated optical image correction algorithms in projectors can be directly applied to optical distortion correction in glasses.

From a user experience perspective, Wang Zhiwei pointed out that while OLED screens can meet basic needs for viewing glasses, power consumption remains a bottleneck—"watching a one-hour movie basically requires a smartphone or laptop for power supply." He shared an interesting survey: several female attendees at this forum felt that the biggest deterrents to the popularity of viewing glasses are them being "ugly" and "heavy," with power consumption coming second. This means viewing glasses must achieve breakthroughs in fashion and lightweight design to expand sales.

On-site voting also revealed the industry's consensus on the bottlenecks of viewing glasses. In the vote on "the biggest bottleneck currently restricting the visual experience of viewing glasses," insufficient rendering algorithms and computing power ranked first at 35.71%, closely followed by battery power consumption restricting high frame rates at 32.14%. Optical module yield and cost accounted for 28.57%, while micro-display brightness and lifespan accounted for only 3.57%. This result is profound: the industry believes that hardware display technology itself is no longer the biggest shortcoming. The true bottleneck lies in the game between algorithm rendering capabilities and battery power consumption, as well as the engineering costs of optical modules.

Will There Be AI Glasses Companies Listed on the Hong Kong Stock Exchange in the Coming Years?
Regarding "How can domestic AI glasses brands achieve an IPO on the HKEX?", Liu Weijun, Investment Director of the Strategic Investment Department at Goertek Microelectronics, provided quantitative and qualitative analysis from an investment perspective. Qualitatively, he believes displays, photography, and personal assistants are the three core applications, but the prerequisite for capital market buy-in is a "single-point breakthrough": either a major breakthrough in voice assistants by a large manufacturer; or achieving the ultimate in displays (he mentioned a company whose product, using a microphone + display + cloud input + ring to achieve high-end business scenarios, is priced at 8,000-9,000 HKD for the whole set); or solving the paradox of getting women—the primary demographic for photography—to willingly wear AI glasses to take photos.

Quantitatively, global mobile phone shipments are 2 billion units annually, while smartwatches and TWS are around 200-300 million. Glasses last year (including Meta) were under 10 million units. To support a Hong Kong IPO, "one million units shipped annually is enough"—single product sales exceeding 600,000, with 200,000-300,000 for other secondary products, is sufficient to prove market buy-in.

Liu Weijun revealed that three companies (including AR/VR-related firms) have already applied for Hong Kong listings in the first half of this year. He cited cases like GArmin (sports niche, 1-2 million units shipped annually) and Shokz (bone conduction niche, commercialized for 6 years) to prove that capturing niche scenarios and achieving million-level volume will be recognized by capital markets.

On-site voting results highly aligned with Liu's judgment. In the vote on "Time expectations for domestic AI glasses brands to list on the HKEX," a massive 72% of guests chose "within 1-2

★★★★★

Be the first to rate this article.

Which AI Glasses Will Be the First to Achieve Million-Level Shipments? Industry Insights on the Path

Comments & Questions (0)

No comments yet