The new AI Engineer (AIE) website is now live. Calls for Proposals (CFPs) close in two days, and the first New Engineer Orientation will take place this weekend. Tickets are expected to sell out, so early booking is recommended. Additionally, participants in the AI Engineering Survey will receive over $2,000 in credits and free tickets to the AIE World Fair.
A central tension exists in the AI Agents industry: major decacorn Agent labs such as Sierra, Decagon, Notion, and Cursor are scaling rapidly. At the Same time, building do-it-yourself (DIY) agents has never been easier, thanks to a growing number of agent Frameworks including LangGraph, Pydantic, and Flue, as well as managed agents from anthropic, Gemini, and Amazon. A wave of companies—Shopify, Stripe, Paradigm, and Razorpay, among others—are building their own background agents. Notably, Cognition’s partners at Ramp have built their own Coding Agent in collaboration with Modal.
Despite these developments, Cognition remains confident. Their recently announced $1 billion Series D round was significantly oversubscribed. Walden Yan, coiner of the term "context engineering" and Chief Product Officer and Co-founder of Cognition, joined Cole Murray, creator of OpenInspect, to discuss why "Devin is in the Details."
In retrospect, investing in async agents was one of the most ambitious artificial general intelligence (AGI)-focused bets of 2024. At that time, models were not sufficiently advanced for effective "Vibe Coding," public trust in autonomous AI was limited, and no one—including early Cognition—was certain about the optimal form fACTors.
Today, the evolution is clear:
The first wave of AI coding tools (e.g., Copilot and Cursor’s tab autocomplete) made developers faster but kept them heavily in the loop. The workflow remained centered on the developer’s local environment: an integrated development environment (IDE) where the developer watches the model, accepts or rejects changes, and pushes code one interaction at a time.
The SECond wave introduced local agents: Claude Code, Windsurf, and Cursor’s agents pane. This wave featured increasing numbers of concurrently running terminals.
The current Age of Async Agents points to a different future, focused on agent orchestration that drives end-to-end development.
According to previous guest Steve Yegge, there are eight finer-grained levels of agent adoption, but this discussion collapses them into three primary stages.
As Cursor’s Michael Truell stated in "The third era of AI software development": "Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work."
The agent should not sit solely inside the developer’s flow. Instead, it should work in the background. Developers can assign it a task, a repository, a machine, a shell, a browser, tests, mEMOry, and review loops, allowing the agent to perform work elsewhere.
In less than a year, sentiment has shifted from avoiding multi-agent systems to embracing them. From coining "Context Engineering" to building the infrastructure behind Devin’s 7x pull request (PR) growth and a rise from 16% to 80% of commits across Cognition repositories, Walden Yan has witnessed the background-agent shift firsthand. In this episode, Yan joins swyx alongside Cole Murray, creator of OpenInspect, to discuss why organizations are building their own Devin-like systems, what changed after the December 2025 model inflection, and why "spec to pull request" is now a practical production workflow.
The discussion dives deep into the architecture of background agents: harness-in-the-box versus out-of-the-box configurations, why Devin separates the "brain" from the machine, why repository setup remains one of the hardest challenges, why Docker alone is insufficient, and how full virtual machines (VMs), snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Yan and Murray also explore memory, MCP limitations, multi-agent orchestration, AI Code Review, SRE auto-triage, product managers shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: a codebase regressing to the level of the worst engineer.
Key topics discussed include:
Why the engineering world is adopting background agents and cloud agents
The December 2025 model inflection that made spec-to-PR workflows practical
Devin’s 7x growth in merged PRs and increase from 16% to 80% of commits
Why Cole built OpenInspect as an open-source background-agent system
The economics of $20/seat agent products and the challenges of monetization
What Cognition sells beyond Devin: infrastructure, onboarding, integrations, and adoption
Harness-in-the-box vs. out-of-the-box architecture and its significance
Why Devin separates the brain from the machine for security and permissions
Repository setup, scoped secrets, docker Compose, and agent-ready dev environments
Why full VMs are necessary for agents to run and test real APPlications
Support for Android, macOS, Windows, nested virtualization, and machine-specific agent work
Why testing is more complex than "computer use"
screenshots, video verification, and the "I know it works" merge moment
GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments
Why MCP alone is insufficient for first-class Slack and enterprise integrations
Memory, knowledge, Skills, claude.md, and the unsolved challenge of retrieval
Devin’s auto-generated memories and the difficulty of memory pruning
Always-on agents as permanent product managers for issues, tickets, and product areas
Sub-agents, Meta-Devin management, and the real benefits of Multi-Agent Systems
Why pure auto-merge vibe coding breaks down after approximately two weeks
AI code smells, lint rules, reward hacking, and using Semgrep for agent-written code
GitAI, inline context, and preserving the reasoning behind code changes
Local testing, mock servers, legacy codebases, and preparing companies for agents
Windsurf 2.0 and the handoff between local foreground agents and cloud background agents
SRE auto-triage, support workflows, and agents as first responders
PMs, marketers, and non-engineers creating pull requests from Slack
AI agent budgets (5k per engineer) and hybrid frontier/sub-frontier systems
The rise of autonomous coding factories and who Cognition is hiring
Speakers and Links:
Walden Yan – X: https://x.com/walden_yan , LinkedIn: https://www.linkedin.com/in/waldenyan/
Cole Murray – X: https://x.com/_colemurray , LinkedIn: https://www.linkedin.com/in/colemurray/
OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents
Timestamps:
00:00:00 Introduction
00:00:43 Why Everyone Is Building Their Own Devin
00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits
00:03:49 OpenInspect and the Rise of Open-Source Background Agents
00:07:59 What Cognition Actually Sells Beyond Devin
00:09:56 Background agent architecture: Harness In vs Out of the Box
00:12:08 Separating the Brain from the Machine
00:14:07 Repo Setup, Secrets, Docker, and Full VMs
00:19:13 Why Testing Is Harder Than Computer Use
00:22:40 Video Verification and the “I Know It Works” Merge Moment
00:23:19 GitHub UX, Devin Review, and AI code review
00:25:42 MCP, Slack, and Enterprise Agent Integrations
00:28:59 Memory, Knowledge, and Always-On Agents
00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin
00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay
00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore
00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems
00:56:10 Making Codebases Agent-Ready
00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff
01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases
01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories
01:06:51 Hiring at Cognition and OpenInspect Consulting
01:07:45 Outro
Comments & Questions (0)
No comments yet
Be the first to comment!