AI News
Real Time

The Age of Async Agents: Cognition’s Walden Yan & OpenInspect’s Cole Murray on Background AI Agents,

The new AI Engineer (AIE) website is now live. Calls for Proposals (CFPs) close in two days, and the first New Engineer Orientation will take place th...

The new AI Engineer (AIE) website is now live. Calls for Proposals (CFPs) close in two days, and the first New Engineer Orientation will take place this weekend. Tickets are expected to sell out, so early booking is recommended. Additionally, participants in the AI Engineering Survey will receive over $2,000 in credits and free tickets to the AIE World Fair.

A central tension exists in the AI Agents industry: major decacorn Agent labs such as Sierra, Decagon, Notion, and Cursor are scaling rapidly. At the Same time, building do-it-yourself (DIY) agents has never been easier, thanks to a growing number of agent Frameworks including LangGraph, Pydantic, and Flue, as well as managed agents from anthropic, Gemini, and Amazon. A wave of companies—Shopify, Stripe, Paradigm, and Razorpay, among others—are building their own background agents. Notably, Cognition’s partners at Ramp have built their own Coding Agent in collaboration with Modal.

Despite these developments, Cognition remains confident. Their recently announced $1 billion Series D round was significantly oversubscribed. Walden Yan, coiner of the term "context engineering" and Chief Product Officer and Co-founder of Cognition, joined Cole Murray, creator of OpenInspect, to discuss why "Devin is in the Details."

In retrospect, investing in async agents was one of the most ambitious artificial general intelligence (AGI)-focused bets of 2024. At that time, models were not sufficiently advanced for effective "Vibe Coding," public trust in autonomous AI was limited, and no one—including early Cognition—was certain about the optimal form fACTors.

Today, the evolution is clear:

  • The first wave of AI coding tools (e.g., Copilot and Cursor’s tab autocomplete) made developers faster but kept them heavily in the loop. The workflow remained centered on the developer’s local environment: an integrated development environment (IDE) where the developer watches the model, accepts or rejects changes, and pushes code one interaction at a time.

  • The SECond wave introduced local agents: Claude Code, Windsurf, and Cursor’s agents pane. This wave featured increasing numbers of concurrently running terminals.

  • The current Age of Async Agents points to a different future, focused on agent orchestration that drives end-to-end development.

According to previous guest Steve Yegge, there are eight finer-grained levels of agent adoption, but this discussion collapses them into three primary stages.

As Cursor’s Michael Truell stated in "The third era of AI software development": "Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work."

The agent should not sit solely inside the developer’s flow. Instead, it should work in the background. Developers can assign it a task, a repository, a machine, a shell, a browser, tests, mEMOry, and review loops, allowing the agent to perform work elsewhere.

In less than a year, sentiment has shifted from avoiding multi-agent systems to embracing them. From coining "Context Engineering" to building the infrastructure behind Devin’s 7x pull request (PR) growth and a rise from 16% to 80% of commits across Cognition repositories, Walden Yan has witnessed the background-agent shift firsthand. In this episode, Yan joins swyx alongside Cole Murray, creator of OpenInspect, to discuss why organizations are building their own Devin-like systems, what changed after the December 2025 model inflection, and why "spec to pull request" is now a practical production workflow.

The discussion dives deep into the architecture of background agents: harness-in-the-box versus out-of-the-box configurations, why Devin separates the "brain" from the machine, why repository setup remains one of the hardest challenges, why Docker alone is insufficient, and how full virtual machines (VMs), snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Yan and Murray also explore memory, MCP limitations, multi-agent orchestration, AI Code Review, SRE auto-triage, product managers shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: a codebase regressing to the level of the worst engineer.

Key topics discussed include:

  • Why the engineering world is adopting background agents and cloud agents

  • The December 2025 model inflection that made spec-to-PR workflows practical

  • Devin’s 7x growth in merged PRs and increase from 16% to 80% of commits

  • Why Cole built OpenInspect as an open-source background-agent system

  • The economics of $20/seat agent products and the challenges of monetization

  • What Cognition sells beyond Devin: infrastructure, onboarding, integrations, and adoption

  • Harness-in-the-box vs. out-of-the-box architecture and its significance

  • Why Devin separates the brain from the machine for security and permissions

  • Repository setup, scoped secrets, docker Compose, and agent-ready dev environments

  • Why full VMs are necessary for agents to run and test real APPlications

  • Support for Android, macOS, Windows, nested virtualization, and machine-specific agent work

  • Why testing is more complex than "computer use"

  • screenshots, video verification, and the "I know it works" merge moment

  • GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments

  • Why MCP alone is insufficient for first-class Slack and enterprise integrations

  • Memory, knowledge, Skills, claude.md, and the unsolved challenge of retrieval

  • Devin’s auto-generated memories and the difficulty of memory pruning

  • Always-on agents as permanent product managers for issues, tickets, and product areas

  • Sub-agents, Meta-Devin management, and the real benefits of Multi-Agent Systems

  • Why pure auto-merge vibe coding breaks down after approximately two weeks

  • AI code smells, lint rules, reward hacking, and using Semgrep for agent-written code

  • GitAI, inline context, and preserving the reasoning behind code changes

  • Local testing, mock servers, legacy codebases, and preparing companies for agents

  • Windsurf 2.0 and the handoff between local foreground agents and cloud background agents

  • SRE auto-triage, support workflows, and agents as first responders

  • PMs, marketers, and non-engineers creating pull requests from Slack

  • AI agent budgets (1k5k per engineer) and hybrid frontier/sub-frontier systems

  • The rise of autonomous coding factories and who Cognition is hiring

Speakers and Links:

Timestamps:
00:00:00 Introduction
00:00:43 Why Everyone Is Building Their Own Devin
00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits
00:03:49 OpenInspect and the Rise of Open-Source Background Agents
00:07:59 What Cognition Actually Sells Beyond Devin
00:09:56 Background agent architecture: Harness In vs Out of the Box
00:12:08 Separating the Brain from the Machine
00:14:07 Repo Setup, Secrets, Docker, and Full VMs
00:19:13 Why Testing Is Harder Than Computer Use
00:22:40 Video Verification and the “I Know It Works” Merge Moment
00:23:19 GitHub UX, Devin Review, and AI code review
00:25:42 MCP, Slack, and Enterprise Agent Integrations
00:28:59 Memory, Knowledge, and Always-On Agents
00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin
00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay
00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore
00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems
00:56:10 Making Codebases Agent-Ready
00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff
01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases
01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories
01:06:51 Hiring at Cognition and OpenInspect Consulting
01:07:45 Outro

★★★★★
★★★★★
Be the first to rate this article.

Comments & Questions (0)

Captcha
Please be respectful — let's keep the conversation friendly.

No comments yet

Be the first to comment!