Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, has officially unveiled a research preview of its groundbreaking "Interaction Models." Moving away from the traditional APProach of stitching together speech and text via external tools, this new system natively processes real-time audio and video interACTions.
The core innovation lies in its ability to continuously receive Information through 200ms "micro-turns," enabling the AI to listen, watch, and speak simultaneously while allowing users to interrupt in real-time. The debut model, TML-Interaction-Small, utilizes a Mixture-of-Experts (MoE) architecture with 276 billion total parameters, activating 12 billion parameters per inference.
To overcome the limitation of traditional large models—which stop perceiving the external world while generating responses—the development team has engineered a split front-end and back-end system. The front-end model is dedicated to maintaining uninterrupted dialogue, while the back-end model Synchronously handles complex reasoning, Web Searches, or UI generation, seamlessly streaming results back to the conversation.
This architecture delivers a significant leap in responsiveness, surpassing competitors from Murati's former employer. Official data indicates a voice turn-taking latency of just 0.40 seconds. Furthermore, the model scored 77.8 on the FD-Bench v1.5, outperforming both GPT-Realtime-2.0 and Gemini 3.1 Flash Live in these core metrics.
While the continuous processing of audio and video rapidly consumes context capacity and relies heavily on network stability for low-latency performance, Thinking Machines plans to open a limited preview of the system in the coming months.
Comments & Questions (0)
No comments yet
Be the first to comment!