AI Topic

SIA Paper Deep Dive: AI Self-Improvement via Harness & Weight Updates

Sat, 30 May 2026 19:08:49 +0800

In May 2026, a paper from arXiv (ID: 2605.27276) sparked extensive discussion within the AI research community. The title itself carries a significant impact: "SIA: Self Improving AI with Harness & Weight Updates." The authors pose a core question: Humans have always been the bottleneck in building and improving AI—models are written by humans, Agent scaffolds are tuned by humans, and feedback is provided by humans. So, can AI improve itself?

Prior to this paper, researchers were divided into two isolated camps:

One camp focused solely on updating the Harness without touching model weights.
The other camp focused solely on updating model weights without altering the Harness.

SIA's innovation lies in pulling both levers simultaneously. The result? It surpassed the previous State-of-the-Art (SOTA) across three distinctly different tasks.

1. Two Research Factions: Why Were They Isolated for So Long?

1.1 The Harness Update Faction (Represented by SkillOpt)
The logic of this faction is that the model is already powerful enough; the problem lies in the poorly configured peripheral engineering system. A representative work is Microsoft's recently analyzed SkillOpt. It uses a meta-Agent to repeatedly optimize the Task Agent's SKILL.md—including tool descriptions, prompts, retry logic, and search strategies—but the model's weights remain completely frozen.

Core Assumption: As long as the "shell" (Harness) is tuned correctly, the model can unleash its full potential.

1.2 The Weight Update Faction (Represented by Test-Time Training)
This faction follows the traditional Reinforcement Learning (RL) route: feed data, apply gradient descent, and update weights. They fine-tune the model on task feedback using an RL pipeline, while the harness remains completely fixed.

The Problem: While effective in some tasks, it requires hand-writing an RL pipeline for every new task, resulting in high training costs and limited generalization capabilities.

1.3 Why Were the Two Isolated?
Ultimately, these were two different "improvement levers" competing against each other:

Harness improvements change "how to use this model."
Weight improvements change "the model itself."
Both directions had limitations, yet no one had considered: Why not change both simultaneously?

2. SIA's Core Innovation

The design philosophy of SIA (Self Improving Agent) is to use AI to guide AI improvement, exerting force on two dimensions simultaneously. It introduces a role called the "Feedback-Agent."

The job of this Feedback-Agent is to:

Observe the performance of the Task Agent.
Analyze the causes of failure.
Simultaneously output two types of modifications: a new Harness configuration + a new weight update direction.

The entire process is a bootstrapping loop:
Task Agent executes task → Feedback-Agent analyzes feedback → Simultaneously generates: Harness Patch + Weight Gradient Update → Task Agent re-executes with new config & weights → Loop iterates until convergence.

3. SOTA Performance Across Three Tasks

SIA's effectiveness is not merely theoretical. The paper conducted tests in three contrasting fields:

3.1 Legal Charge Classification (Chinese)

Task: Automatically classify criminal case descriptions into the correct legal charges.
Result: SIA-W+H improved upon the previous SOTA by 25.1%.
Analysis: This is an understanding-intensive task requiring domain knowledge (law) + reasoning capabilities. Pure Harness optimization or pure weight updates alone could not achieve this magnitude of improvement—it requires the synergy of both.

3.2 GPU Kernel Optimization (Low-level Computing)

Task: Given a computational target, automatically generate faster CUDA/GPU kernel code.
Result: The runtime of the generated kernel was compressed from 1,161μs to 1,017μs, a 12.4% speed increase.
Analysis: This task demands extremely high precision and hardware intuition. Harness updates let the Agent know "where to look," while weight updates enable the Agent to truly understand "how to write it correctly."

3.3 Single-cell RNA Denoising (Biological Science)

Task: Recover true signals from highly noisy single-cell RNA sequencing data.
Result: SIA-W+H improved upon the previous SOTA by 20.4%.
Analysis: Data annotation costs in this field are extremely high, so the model must learn from very few labeled samples. The Harness shapes the Agent's behavioral strategy, while weight updates inject domain intuition—both are indispensable.

4. Why Is Using Both Levers Simultaneously Effective?

"Harness updates make the model agentic, shaping how it searches and acts, while weight updates build the domain intuition that no prompt or scaffold can instil."

Translated and explained:

Harness updates make the model "agentic"—deciding where to search and how to act.
Weight updates build domain intuition—something no prompt or scaffold can teach.

Analogy:

Harness Update = Teaching a person "the correct way to use Google."
Weight Update = Teaching a person "the background knowledge of this field."

If you only teach search methods without domain knowledge, they won't understand what they find. If they only have domain knowledge but don't know how to use tools, they have skills but don't know how to look things up. Only by combining both levers do you create a complete expert.

5. Relationship with SkillOpt: Same Origin, Different Paths

A previous article analyzed Microsoft's SkillOpt in detail. SIA and SkillOpt actually share the same starting point: The improvement of AI systems should be automated rather than relying on manual tuning.

However, there is a fundamental difference:

Dimension	SkillOpt (Microsoft)	SIA (arXiv:2605.27276)
Optimization Target	Harness (`SKILL.md`)	Harness + Model Weights
Optimization Method	Meta-Agent generates new prompts	Feedback-Agent outputs Harness patches + Weight gradients
Modifies Model?	No	Yes
Training Cost	Low (API calls only)	High (Requires gradient computation)
Applicable Scenarios	General tasks, existing strong models	Vertical domains, extremely high precision requirements

Simply put: SkillOpt is about "how to use the model correctly," while SIA is about "how to make the model understand the task better." They are not substitutes but complements: SkillOpt is sufficient for general scenarios, while SIA is more suitable for pursuing extreme performance.

6. Limitations of This Research

6.1 High Computational Cost
Performing weight updates simultaneously requires GPUs and gradient calculations, unlike SkillOpt which runs purely on prompt generation. The deployment threshold is higher.

6.2 Dependency on Feedback Quality
The Feedback-Agent itself is an LLM. The quality of its analysis directly affects the direction of improvement. If the base model's capability is insufficient, the analysis itself may be flawed.

6.3 No Unified Standard for Convergence
The paper does not provide clear convergence conditions; in practice, human judgment may be needed to determine when to stop iterating.

6.4 Newly Published, Not Yet Widely Reproduced
This is new work. The GitHub repository (sumanth-077/SIA) has few stars and forks so far, and its engineering maturity remains to be verified.

7. What Does This Mean?

If the direction of SIA is correct, it points to a grander vision: Future AI systems will no longer be static products trained once, but dynamic loops capable of continuous self-improvement.

In this loop, the Harness is the "learning method," and the Weights are the "knowledge reserve." Improving the learning method lets the AI know how to learn, while accumulating the knowledge reserve lets the AI truly learn. The synergy of the two constitutes a complete self-improvement closed loop.

This is actually very similar to human growth: we don't just accumulate knowledge; we are also constantly improving "how we learn" itself.

Summary

Research	Core Idea	What is Improved
SkillOpt (Microsoft)	Meta-Agent automates `SKILL.md` optimization	Harness only
SIA (arXiv:2605.27276)	Feedback-Agent outputs Harness patches + Weight gradients	Harness + Weights
Traditional RL	Hand-written pipeline fine-tuning	Weights only

One-sentence conclusion: SIA proves that "pulling two levers simultaneously" is far more effective than "pulling just one." AI self-improvement is moving from single-point breakthroughs to systems engineering.

GAO Report Warns of Generative AI's Uncertain Environmental and Human Effects

Sat, 30 May 2026 18:49:55 +0800

United States Government Accountability Office
Report to Congressional Requesters
April 2025

TECHNOLOGY ASSESSMENT

Artificial Intelligence: Generative AI’s Environmental and Human Effects
GAO-25-107172

Why GAO Conducted This Study
Generative AI consumes large amounts of energy and water. It also holds the potential to displace workers, contribute to the spread of false information, and create or heighten national security risks. The benefits and risks of generative AI remain unclear, and estimates of its effects vary widely due to a lack of available data. The continued expansion of generative AI products and services raises questions about the scale of both its benefits and its risks.

GAO was asked to conduct a technology assessment of generative AI's effects, with a particular focus on its risks. This report examines: (1) the potential environmental effects of generative AI technologies, (2) the potential human effects of these technologies, and (3) policy options that could enhance the benefits or mitigate the environmental and human effects of generative AI.

What GAO Found
Generative artificial intelligence could revolutionize entire industries. In the nearer term, it may dramatically increase productivity and transform daily tasks across many sectors. However, both its benefits and its risks—including environmental and human effects—are unknown or unclear.

Generative AI uses significant energy and water resources, but companies are generally not reporting details of this consumption. Most estimates of environmental effects have focused on quantifying the energy consumed, and the associated carbon emissions, required to train a generative AI model. Estimates of water consumption by generative AI are limited. Generative AI is expected to be a driving force for data center demand, but the specific portion of data center electricity consumption related to generative AI is unclear. According to the International Energy Agency, U.S. data center electricity consumption represented approximately 4 percent of total U.S. electricity demand in 2022 and could reach 6 percent by 2026.

While generative AI may bring beneficial effects for people, GAO highlights five risks and challenges that could result in negative human effects on society, culture, and individuals. For example, unsafe systems may produce outputs that compromise safety, such as inaccurate information, undesirable content, or the enabling of malicious behavior. However, making definitive statements about these risks is difficult because generative AI is evolving rapidly, and private developers do not disclose some key technical information.

Policy Options to Enhance Benefits or Address Challenges
GAO identified several policy options for consideration that could enhance the benefits or address the challenges related to the environmental and human effects of generative AI. These options identify possible actions for policymakers, which include Congress, federal agencies, state and local governments, academic and research institutions, and industry. Policymakers could also choose to maintain the status quo, taking no additional action beyond current efforts.

Policy Options for Environmental Effects

Maintain Status Quo: Continue technical innovations in hardware, algorithms, and models, along with current federal agency efforts. While technical innovations may address some challenges without additional resources, current efforts may not fully address the challenges given existing knowledge gaps and uncertain future demand.
Improve Data Collection and Reporting: Encourage industry to share data on the environmental effects of building and disposing of equipment. Developers could provide details such as model information, infrastructure used for training and operation, energy consumption, carbon emissions, and water consumption. Addressing gaps in understanding environmental effects can assist policymakers, though industry may be reluctant to release proprietary information.
Encourage Innovation: Government could encourage the creation of more resource-efficient models and training techniques. Industry and researchers could increase efforts to develop more efficient hardware and infrastructure to reduce energy and water use. The development of such technical methods may require improved data collection and reporting by industry.

Policy Options for Human Effects

Maintain Status Quo: Government policymakers are taking various actions aimed at understanding and addressing the human effects of artificial intelligence. However, existing policy actions, some of which are not fully implemented, may not fully address the specific human effects challenges of generative AI identified in this report.
Encourage Use of AI Frameworks: Developers could create acceptable use policies for their products. Government could encourage the use of available frameworks, such as GAO’s AI Accountability Framework and the National Institute of Standards and Technology’s AI Risk Management Framework. These frameworks help manage risks and increase public transparency, though internal testing and external review methods can be insufficient or costly.
Share Best Practices and Establish Standards: Industry or standards-developing organizations could identify areas where best practices and standards would be most beneficial across different sectors and applications using generative AI. This could require adopting knowledge-sharing mechanisms, though building consensus from many public- and private-sector stakeholders can be time- and resource-intensive.

This report is a work of the U.S. government and is not subject to copyright protection in the United States. It may be reproduced and distributed in its entirety without further permission from GAO.

Original PDF link: https://www.gao.gov/assets/gao-25-107172.pdf

The Age of Async Agents: Cognition’s Walden Yan & OpenInspect’s Cole Murray on Background AI Agents,

Fri, 29 May 2026 23:25:31 +0800

The new AI Engineer (AIE) website is now live. Calls for Proposals (CFPs) close in two days, and the first New Engineer Orientation will take place this weekend. Tickets are expected to sell out, so early booking is recommended. Additionally, participants in the AI Engineering Survey will receive over $2,000 in credits and free tickets to the AIE World Fair.

A central tension exists in the AI agents industry: major decacorn agent labs such as Sierra, Decagon, Notion, and Cursor are scaling rapidly. At the same time, building do-it-yourself (DIY) agents has never been easier, thanks to a growing number of agent frameworks including LangGraph, Pydantic, and Flue, as well as managed agents from Anthropic, Gemini, and Amazon. A wave of companies—Shopify, Stripe, Paradigm, and Razorpay, among others—are building their own background agents. Notably, Cognition’s partners at Ramp have built their own coding agent in collaboration with Modal.

Despite these developments, Cognition remains confident. Their recently announced $1 billion Series D round was significantly oversubscribed. Walden Yan, coiner of the term "context engineering" and Chief Product Officer and Co-founder of Cognition, joined Cole Murray, creator of OpenInspect, to discuss why "Devin is in the Details."

In retrospect, investing in async agents was one of the most ambitious artificial general intelligence (AGI)-focused bets of 2024. At that time, models were not sufficiently advanced for effective "vibe coding," public trust in autonomous AI was limited, and no one—including early Cognition—was certain about the optimal form factors.

Today, the evolution is clear:

The first wave of AI coding tools (e.g., Copilot and Cursor’s tab autocomplete) made developers faster but kept them heavily in the loop. The workflow remained centered on the developer’s local environment: an integrated development environment (IDE) where the developer watches the model, accepts or rejects changes, and pushes code one interaction at a time.
The second wave introduced local agents: Claude Code, Windsurf, and Cursor’s agents pane. This wave featured increasing numbers of concurrently running terminals.
The current Age of Async Agents points to a different future, focused on agent orchestration that drives end-to-end development.

According to previous guest Steve Yegge, there are eight finer-grained levels of agent adoption, but this discussion collapses them into three primary stages.

As Cursor’s Michael Truell stated in "The third era of AI software development": "Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work."

The agent should not sit solely inside the developer’s flow. Instead, it should work in the background. Developers can assign it a task, a repository, a machine, a shell, a browser, tests, memory, and review loops, allowing the agent to perform work elsewhere.

In less than a year, sentiment has shifted from avoiding multi-agent systems to embracing them. From coining "context engineering" to building the infrastructure behind Devin’s 7x pull request (PR) growth and a rise from 16% to 80% of commits across Cognition repositories, Walden Yan has witnessed the background-agent shift firsthand. In this episode, Yan joins swyx alongside Cole Murray, creator of OpenInspect, to discuss why organizations are building their own Devin-like systems, what changed after the December 2025 model inflection, and why "spec to pull request" is now a practical production workflow.

The discussion dives deep into the architecture of background agents: harness-in-the-box versus out-of-the-box configurations, why Devin separates the "brain" from the machine, why repository setup remains one of the hardest challenges, why Docker alone is insufficient, and how full virtual machines (VMs), snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Yan and Murray also explore memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, product managers shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: a codebase regressing to the level of the worst engineer.

Key topics discussed include:

Why the engineering world is adopting background agents and cloud agents
The December 2025 model inflection that made spec-to-PR workflows practical
Devin’s 7x growth in merged PRs and increase from 16% to 80% of commits
Why Cole built OpenInspect as an open-source background-agent system
The economics of $20/seat agent products and the challenges of monetization
What Cognition sells beyond Devin: infrastructure, onboarding, integrations, and adoption
Harness-in-the-box vs. out-of-the-box architecture and its significance
Why Devin separates the brain from the machine for security and permissions
Repository setup, scoped secrets, Docker Compose, and agent-ready dev environments
Why full VMs are necessary for agents to run and test real applications
Support for Android, macOS, Windows, nested virtualization, and machine-specific agent work
Why testing is more complex than "computer use"
Screenshots, video verification, and the "I know it works" merge moment
GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments
Why MCP alone is insufficient for first-class Slack and enterprise integrations
Memory, knowledge, skills, Claude.md, and the unsolved challenge of retrieval
Devin’s auto-generated memories and the difficulty of memory pruning
Always-on agents as permanent product managers for issues, tickets, and product areas
Sub-agents, meta-Devin management, and the real benefits of multi-agent systems
Why pure auto-merge vibe coding breaks down after approximately two weeks
AI code smells, lint rules, reward hacking, and using Semgrep for agent-written code
GitAI, inline context, and preserving the reasoning behind code changes
Local testing, mock servers, legacy codebases, and preparing companies for agents
Windsurf 2.0 and the handoff between local foreground agents and cloud background agents
SRE auto-triage, support workflows, and agents as first responders
PMs, marketers, and non-engineers creating pull requests from Slack
AI agent budgets ( $1 k –$ 5k per engineer) and hybrid frontier/sub-frontier systems
The rise of autonomous coding factories and who Cognition is hiring

Speakers and Links:

Walden Yan – X: https://x.com/walden_yan , LinkedIn: https://www.linkedin.com/in/waldenyan/
Cole Murray – X: https://x.com/_colemurray , LinkedIn: https://www.linkedin.com/in/colemurray/
OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents

Timestamps:
00:00:00 Introduction
00:00:43 Why Everyone Is Building Their Own Devin
00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits
00:03:49 OpenInspect and the Rise of Open-Source Background Agents
00:07:59 What Cognition Actually Sells Beyond Devin
00:09:56 Background Agent Architecture: Harness In vs Out of the Box
00:12:08 Separating the Brain from the Machine
00:14:07 Repo Setup, Secrets, Docker, and Full VMs
00:19:13 Why Testing Is Harder Than Computer Use
00:22:40 Video Verification and the “I Know It Works” Merge Moment
00:23:19 GitHub UX, Devin Review, and AI Code Review
00:25:42 MCP, Slack, and Enterprise Agent Integrations
00:28:59 Memory, Knowledge, and Always-On Agents
00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin
00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay
00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore
00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems
00:56:10 Making Codebases Agent-Ready
00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff
01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases
01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories
01:06:51 Hiring at Cognition and OpenInspect Consulting
01:07:45 Outro

Anthropic Raises 65 B i l l i o n , N e a r s 65Billion,Nears1 Trillion Valuation Ahead of Planned

Fri, 29 May 2026 23:19:14 +0800

Anthropic Secures $65 B i l l i o n i n S e r i e s H F u n d i n g, R e a c h e s$ 965 Billion Valuation as IPO Looms

Anthropic has raised $65 b i l l i o n i n i t s l a t e s t f u n d i n g r o u n d a t a p o s t - m o n e y v a l u a t i o n o f$ 965 billion, positioning the artificial intelligence startup for what is expected to be its final private capital raise before going public.

The Series H round was co-led by a group of prominent investors, including Altimeter Capital, Dragoneer, Greenoaks, Sequoia Capital, Capital Group, Coatue, and D1 Capital Partners. Additional institutional participation came from Baillie Gifford, Blackstone, Brookfield, D.E. Shaw Ventures, DST Global, and Fidelity Management & Research.

Strategic infrastructure partners such as Samsung, SK Hynix, and Micron also joined the round. Of the total $65 b i l l i o n,$ 15 billion consists of previously committed investments from hyperscalers, including a $5 billion contribution from Amazon announced in April.

TechCrunch reported last month that Anthropic was nearing the close of a $50 b i l l i o n r o u n d, w i t h s t r o n g d e m a n d f r o m i n v e s t o r s s e e k i n g a p l a c e o n t h e c a p t a b l e . O n e i n s t i t u t i o n a l i n v e s t o r r e p o r t e d l y p l e d g e d u p t o$ 5 billion solely to secure a meeting with Anthropic CFO Krishna Rao.

The company plans to use the new capital to advance its safety and interpretability research, expand computing capacity to meet growing demand for its Claude AI assistant, and scale the products and partnerships that enterprise customers rely on.

The funding announcement coincides with the release of Anthropic’s new Claude Opus 4.8 model, which features enhanced capabilities in agentic tasks, advanced coding, and a stronger focus on honesty and self-correction. The startup is also reportedly preparing for a wider launch of models comparable to its powerful cybersecurity AI, Mythos, which has so far seen only limited release due to potential safety concerns.

Anthropic has experienced accelerated growth since its previous funding round, particularly among enterprise customers using Claude Code. The company’s run-rate revenue surpassed $47 billion earlier this month. According to The Wall Street Journal, Anthropic expects a 130% revenue surge that would bring the company to its first operating profit.

“Claude’s latest advancements have driven large-scale adoption among the world’s most demanding organizations. This momentum positions Anthropic to lead the next phase of AI innovation and capture the enormous opportunity ahead,” said Brad Gerstner, founder and CEO of Altimeter Capital.

Anthropic remains in fierce competition with OpenAI for funding and user growth ahead of their respective initial public offerings. OpenAI last raised a $122 b i l l i o n r o u n d i n M a r c h a t a n$ 852 billion post-money valuation.

Meanwhile, Elon Musk’s SpaceX — which merged with xAI earlier this year — is targeting a $2 t r i l l i o n v a l u a t i o n i n i t s u p c o m i n g I P O a n d s e e k i n g t o r a i s e m o r e t h a n$ 75 billion.

LLM Smells: How AI-Assisted Writing Develops Recognizable Patterns Across the Web

Fri, 29 May 2026 23:04:04 +0800

Late last year, the author started a math blog and used LLMs to polish and enhance their writing. At first, the LLM-generated text felt significantly better than their own drafts—richer vocabulary, more interesting sentence structures, and none of the obvious signs of AI slop. But three months later, they noticed the exact same sentence structures appearing everywhere across the internet.

What stood out was how certain “AI smells” emerge as recognizable artifacts across different AI-assisted tasks. Below are examples from two domains:

1. LLM Writing (beyond the obvious em-dashes)

Examples from the author’s now-deleted math blog and associated drafts:

Too many punchlines
“Humans trust symmetry because it feels like intelligence made visible.”
“The Tiger fit the story. Jin-yong fit the physics.”
“Symmetry becomes a trap.”
Consecutive short sentences
“Yet the tilt is not an accident. It is the shape of the optimum.”
“Then AlphaEvolve arrived. It had no preference for symmetry. No aesthetic prior. No instinct to preserve harmony.”
“These examples are not decorative. They form a distributed argument.”
"X is the Y of Z" structure
“Cringe is the visible signature of moving along a gradient you chose.”
"Not just X, it’s Y" structure
“Solutions that do not merely satisfy the constraint but satisfy the aesthetic instincts.”

2. AI-Generated Websites

Common visual patterns include:

Widespread use of JetBrains Mono font
Consistent step indicators and bullet points using that exact font
Nearly identical button designs across unrelated AI-generated sites

These patterns make AI-assisted content increasingly recognizable, even when the text itself avoids obvious slop.

Asana Acquires StackAI for $75M to Expand No-Code AI Agent Building and Workflow Automation

Fri, 29 May 2026 22:56:14 +0800

Asana has acquired the no-code agent-builder and workflow automation company StackAI for $75 million as part of its broader strategy to become an AI-native workplace platform. The acquisition was announced on Thursday afternoon to coincide with Asana’s earnings and investor call.

StackAI’s founders, Tony Rosinol and Bernard Aceituno, will join Asana following the deal. The startup, which builds AI agents that operate within existing business systems by pulling data from Salesforce, Slack, and Gsuite, was part of Y Combinator’s Winter ’23 cohort. It has faced intense competition from automation tools like Zapier and AI labs such as OpenAI and Anthropic.

According to PitchBook data, StackAI had raised just under $20 m i l l i o n p r i o r t o t h e a c q u i s i t i o n, w i t h m o s t o f t h a t c o m i n g f r o m a r e c e n t$ 16 million Series A round. Investors in the round included Gradient, Epakon Capital, Lobby VC, LifeX Ventures, and Vercel CEO Guillermo Rauch.

Asana has been expanding its AI capabilities in recent years, launching products such as the AI Studio agent builder and a series of pre-built automations called AI Teammates. While comparable tools exist from major AI labs, Asana believes its deep integration into existing corporate workflows provides a key advantage, offering context and training data that would otherwise be unavailable.

Since the introduction of ChatGPT, Asana has struggled on public markets, losing more than half its market capitalization. The decline accelerated with the departure of founder Dustin Moskovitz as CEO in March of the previous year. Nevertheless, the company’s revenue has continued to grow steadily. Under new leadership, CEO Dan Rogers expressed confidence that the company’s human-agent products will drive a rebound.

“This acquisition accelerates our roadmap and takes us into the next phase of human-agent work,” said Rogers in a statement. “We’re already seeing real momentum with AI Teammates and AI Studio … StackAI now lets them go further, agentifying the most complex business processes end-to-end.”

Asana framed the acquisition as a step toward building its platform into “the operating system for human-agent teams.”

AWS Redesigns OpenSearch Serverless for AI Agents: Decoupled Compute & Storage to Handle Machine-Dri

Fri, 29 May 2026 22:44:30 +0800

Cloud infrastructure has long been designed around humans who search, click, scroll, and stream in a steady, predictable fashion. AI agents behave differently. They can unleash a swell of activity — spinning up multiple sub‑agents that query hundreds of databases, search documents, and call APIs within seconds — and then disappear as quickly as they arrived.

Under that premise, Amazon is redesigning a core piece of its cloud infrastructure. On Thursday, AWS launched its next generation of OpenSearch Serverless, a fully managed search and vector database (a system for storing and retrieving information at scale) that is designed specifically for agentic workloads. AWS says the new system can instantly scale up when agents trigger tasks and scale back down to zero when idle.

This launch reflects a growing realization across the tech industry: infrastructure originally built for a human‑driven internet does not work as well in a world increasingly populated by autonomous agents.

While AI agents still represent a relatively small portion of overall internet activity, machine‑generated traffic is already significant — and poised to grow. Cloudflare reports that bots accounted for 31% of global HTTP traffic over the last six months. AI crawlers, search engines, and assistants made up roughly one‑quarter of all bot requests during that period.

“Non‑human traffic will exceed human traffic sometime in the first half of 2027,” Lai Yi Ohlsen, Senior Product Manager at Cloudflare, told TechCrunch.

At Google’s I/O developer conference last week, the company said users will soon be able to delegate tasks to AI systems — such as researching purchases, booking travel, browsing the web, and interacting with apps. But the shift does not stop at consumer‑focused agents. Enterprises are increasingly deploying agents internally and for their customers, creating new forms of machine‑generated traffic behind the scenes.

As a result, cloud providers and infrastructure companies have been rethinking how to adapt systems built for humans to a world of agents that constantly and autonomously retrieve information, invoke tools, and generate machine‑to‑machine traffic.

That is where AWS’s new OpenSearch Serverless comes into play.

“The timing is straightforward. Agents are moving from experimentation into production, and they create traffic patterns that previous infrastructure simply wasn’t designed for,” Tia White, General Manager for Amazon OpenSearch Service, told TechCrunch. “They spike without warning, they go idle without notice, and enterprise needs search that keeps up without paying for empty or idle compute.”

The key technical change in this new generation is that it decouples compute from storage. This allows compute to scale up in seconds to accommodate agent traffic bursts and to scale down to zero, so customers pay $0 when agents are idle.

“Previously, even in our prior Serverless version, you had to have at least one instance operational and running because storage and compute were coupled,” White said. “You couldn’t just automatically spin up [compute] at the rate you needed, so you always had idle compute reserved for your workload — whether you were using it or not.”

Think of it like always paying for a parking space even when you are not using it. With AWS’s upgraded Serverless, it becomes more like paying for a metered parking spot.

At launch, OpenSearch Serverless will integrate natively with AI development platforms such as Vercel and Kiro, allowing developers to deploy production‑ready search and vector backends for agents without managing infrastructure.

This shift is emerging across the cloud industry. Databricks and Snowflake are repositioning themselves as AI memory and retrieval systems for enterprise data. Microsoft has rolled out updates to Azure designed to handle AI agent bursts and share memory between agents. Cloudflare — in a similar vein to Amazon — last month introduced infrastructure aimed at giving agents persistent environments and instant scalability.

The more companies deploy AI agents, the greater the pressure to redesign infrastructure around machine‑generated workloads. That, in turn, could make agents cheaper and easier to deploy at larger scales.

Blue Origin’s New Glenn Rocket Explodes During Static Fire Test in Florida – Major Setback for Bezos

Fri, 29 May 2026 22:40:50 +0800

Blue Origin’s New Glenn mega-rocket exploded during a static fire test at a launch site in Cape Canaveral, Florida, according to livestreams from NASASpaceFlight.com and SpaceFlight Now. Blue Origin later confirmed the explosion.

Jeff Bezos’ space company was conducting the static fire test ahead of an anticipated fourth launch of the new rocket, scheduled in the coming weeks to carry Amazon’s Leo internet satellites to space. The rocket was likely fully fueled at the time, making this one of the largest rocket explosions in U.S. history and the worst failure in Blue Origin’s existence.

Blue Origin said in an X post on Thursday evening that “[a]ll personnel have been accounted for,” and Bezos wrote that they were “safe.” The company did not specify what went wrong, only stating that an “anomaly” occurred.

“It’s too early to know the root cause but we’re already working to find it. Very rough day, but we’ll rebuild whatever needs rebuilding and get back to flying. It’s worth it,” Bezos wrote.

NASA Administrator Jared Isaacman said in a late Thursday post that the agency will “work with our partners to support a thorough investigation of this anomaly, assess near-term mission impacts, and get back to launching rockets.”

In a statement, the Federal Aviation Administration (FAA) told TechCrunch it was aware of the explosion and confirmed “no impact to air traffic.” NASA and the Space Force did not immediately respond to requests for comment.

The explosion occurred at approximately 9 p.m. ET (0100 UTC) as Blue Origin began a static fire test of New Glenn at Launch Complex 36.

The explosion likely forces Blue Origin to pause the New Glenn rocket program for an extended period while investigating the root cause. The company had planned up to 12 New Glenn launches this year, following nearly a decade of development to compete with Elon Musk’s SpaceX.

Blue Origin is also slated to help power NASA’s Artemis missions to the Moon. Earlier this week, NASA highlighted Blue Origin’s expected role in that program. Isaacman said Thursday that NASA will “provide any impacts to the Artemis and Moon Base programs as it becomes available.”

Blue Origin has also been aiming to launch national security missions for the Pentagon.

“Most unfortunate. Rockets are hard,” Elon Musk wrote on X shortly after the explosion. “I hope you recover quickly.”

The explosion comes just weeks after New Glenn’s third flight, which suffered its own failure when the upper stage failed to put an AST SpaceMobile satellite into orbit, resulting in a total mission loss. Just last week, the FAA cleared New Glenn to fly again after Blue Origin completed an investigation into that failure.

A very new, late rocket
Blue Origin spent years developing New Glenn while using its New Shepard program for smaller sub-orbital flights. New Shepard has carried wealthy individuals, celebrities, and science missions to the edge of space, but Blue Origin worked in the background to develop a rocket capable of delivering large commercial payloads to orbit.

That work culminated in January 2025 with New Glenn’s first flight. The rocket reached orbit, though the booster stage exploded before Blue Origin could attempt a drone ship landing.

New Glenn’s second flight in November 2025 was more successful, launching twin Mars spacecraft for NASA and achieving Blue Origin’s first booster landing. That allowed the company to re-fly the booster on New Glenn’s third mission in April 2026, demonstrating recovery and refurbishment for re-use — a key cost-reduction step.

The reused booster performed flawlessly and landed a second time, but a cryogenic failure in the upper stage during the third mission led to loss of the satellite.

The planned fourth mission was to be the first of 24 launches contracted by Amazon for its Leo satellite internet network, a competitor to SpaceX’s Starlink. On Wednesday, Amazon touted New Glenn as a “reusable, heavy-lift rocket.” Amazon confirmed to TechCrunch late Thursday that no Leo satellites were aboard during this test.

Late Thursday, Congressman Mike Haridopolos (R-FL), who represents the district containing Cape Canaveral, wrote on X that he had spoken with NASA Administrator Jared Isaacman about the explosion.

“I am grateful there were no reported injuries and thankful for the first responders, engineers, and launch crews who acted quickly. Praying for Florida’s Space Coast and everyone involved,” he said.

XCENA Raises $135M Series B to Solve AI Memory Bottlenecks with In-Memory Computing

Fri, 29 May 2026 22:20:32 +0800

XCENA Raises $135 Million to Tackle AI’s Biggest Bottleneck: Memory

Every time a user interacts with an AI model like ChatGPT, a high-stakes data relay race begins. Information must leave memory, travel to a CPU for preprocessing, move to a GPU for heavy computation, and then return. This inefficient journey repeats for every single word generated, creating a structural bottleneck that routes data through some of the industry's most expensive and power-hungry components.

XCENA, a four-year-old startup with offices in South Korea and the United States, is on a mission to solve this inefficiency. The company has designed a novel chip architecture that places compute capabilities directly adjacent to DRAM (the fast, short-term memory used for active data). This allows routine data operations to be handled near the memory itself, eliminating costly round trips between CPUs, GPUs, and memory modules.

A Major Bet on Memory-Centric AI
Investor enthusiasm for this "memory-centric" approach is evident. XCENA recently closed a $135 million Series B round at a $570 million valuation, bringing its total capital raised to $185 million.

The startup is led by CEO Jin Kim, CTO Dohun Kim, and CPO Harry Juhyun Kim, all veterans of memory giants Samsung and SK Hynix. "CPUs and GPUs have both gotten smarter over the decades, but memory never did. XCENA wants to change that," Jin Kim stated in an interview. He noted that the recent surge in memory prices and the historic trillion-dollar valuations of Samsung, SK Hynix, and Micron signal a massive infrastructure shift.

XCENA operates on the thesis that AI inference is no longer just a compute problem—it is increasingly a memory scaling problem.

How the MX1 Chip Works
XCENA’s flagship chip, the MX1, connects to the CPU via CXL (Compute Express Link)—a high-speed interconnect that acts as an express lane between the processor and memory. By processing data before it ever leaves the memory module, the MX1 brings the compute to the data, rather than the other way around. The company claims this efficiency could allow workloads that currently require 10 servers to run on just one.

"While GPUs excel at the heavy math of matrix multiplication, much of the surrounding data orchestration—such as preprocessing, KV cache management, and data caching—still relies on CPUs," Kim explained. "Our chip handles those tasks directly within the memory module."

Strategic Positioning and Technology
XCENA is targeting the memory-intensive layer beneath the AI training workloads dominated by Nvidia. While rivals like Astera Labs and Marvell are also advancing memory connectivity, XCENA differentiates itself through its intellectual property and architecture.

"We have thousands of cores," Kim noted, contrasting their approach with Marvell’s reliance on a handful of general-purpose cores. XCENA’s architecture is built on RISC-V, an open-source blueprint, optimized specifically for data processing. The company maintains a high level of vertical integration by designing its own internal memory hierarchy, interconnect bus, and DRAM controller.

Roadmap and Future Outlook
Demand for advanced memory solutions has surged since late last year, putting XCENA in a favorable position. The company is currently in early-stage discussions with global memory vendors and is targeting hyperscalers—tech giants spending billions on AI infrastructure where even minor efficiency gains translate to massive cost savings.

Currently, the MX1 is in the prototype stage. Mass production is scheduled to begin at Samsung’s foundries by the end of 2026, with revenue generation expected to start in 2027.

The Series B round was co-led by Seoul-based VC firms Altinum and IMM Investment, alongside Corstone Asia and existing investors SBI Investment and Mirae Asset Capital. With over 90 employees across Pangyo (South Korea) and Sunnyvale (USA), XCENA is also in talks with international investors for further funding.

AI Tech Trends: Recursive Self-Improvement (RSI), Token Futures, and AWS Serverless Infrastructure

Fri, 29 May 2026 22:12:19 +0800

In the realm of artificial intelligence, "Recursion" is rapidly emerging as the hottest buzzword following the pursuit of AGI. With Richard Socher founding "Recursive Superintelligence" and top-tier researchers like Andrej Karpathy and Sara Hooker launching projects such as Auto-Research and AutoScientist, Recursive Self-Improvement (RSI) has become a core objective on the roadmaps of major tech giants. While industry leaders like Anthropic and OpenAI view RSI as the inevitable path to superintelligence, there is currently no consensus on how to measure it: should it be judged by the efficiency of code optimization or the paradigm shift in logical reasoning?

This ambiguity has sparked intense technical debates regarding an "intelligence explosion" and turned RSI into an industry puzzle that combines technological vision with regulatory challenges. Simply put, RSI refers to an AI system's ability to continuously upgrade itself. Once AI takes over the entire R&D closed loop—from ideation to verification—humans may become obsolete, potentially triggering an uncontrollable explosion in computing power. Currently, nearly 100% of the code at companies like Anthropic is written by AI tools, and some internal engineers believe future AI versions will be capable of replacing mid-level programmers. However, these systems still have clear shortcomings in autonomously managing complex tasks and grasping organizational priorities.

Despite the industry's fervor for RSI, experts urge rationality. Helen Toner, a former OpenAI board member, points out that merely using AI to assist R&D is not equivalent to true RSI. METR researcher Ajeya Cotra divides this process into three stages: the "Passing Line" (producing output without human involvement), the "Parity Line" (rivaling human teams), and the "Hegemony Line" (surpassing human-machine collaboration). She predicts that AI crossing the "Passing Line" in the next few years is highly probable, and once it reaches the "Parity Line," the speed of progress will skyrocket exponentially. Nevertheless, the road to a fully autonomous recursive system is not smooth. It faces real-world bottlenecks in engineering implementation, value alignment, and the unattainability of infinite computing power. Just as in the history of computational science, while humans continue to delegate operational permissions, they still maintain control over the global direction. Regarding the doomsday vision of complete detachment from human control, the only consensus among researchers remains: that day has not yet arrived.

Trading AI Like Gold? Global Giants Rush to Grab the "Token Futures" Blue Ocean

In the financial markets of the future, the core underlying assets may no longer be oil and gold, but the "Tokens" of Large Language Models (LLMs). Recently, a covert financial battle over AI pricing power has quietly ignited. According to Reuters, the Shanghai Futures Exchange is working on designing a derivatives market based on AI Tokens. Almost simultaneously, two of the world's largest derivatives giants—the Chicago Mercantile Exchange (CME) and the Intercontinental Exchange (ICE)—have separately announced preparations to launch GPU rental futures contracts. This series of moves indicates that the battle for pricing power over computing power and model services has officially reached the doorstep of the futures market.

Currently, the GPU spot market has begun to take shape. Data shows that rental prices for high-end graphics cards like Nvidia's H100 fluctuate drastically across different platforms, with the average price over the past seven days experiencing significant volatility. However, the financial infrastructure for "Tokens" themselves—the most basic building blocks of contemporary AI models—remains a blank slate. Today, mainstream vendors, including OpenAI and Amazon Bedrock, have fully shifted the billing units for enterprise-grade AI services to Tokens. As global cloud service providers and private capital frantically pour hundreds of billions of dollars into expanding data centers, the demand for computing power continues to surge.

Against this backdrop, the Token futures proposed by the Shanghai Futures Exchange are directly linked to the core variables of AI service pricing, aiming to provide a new financial tool for AI enterprises, investors, and data center operators to hedge against fluctuations in computing power and inference costs. This not only signifies that the AI industry is transitioning from pure technological costs to tradable financial assets but also marks a new stage in the competition between China and the US in the fields of artificial intelligence and financial derivatives. Whoever masters the pricing power of AI will occupy an absolute initiative in the future digital economy game.

Traffic Paradigm Shift! AWS Launches "Per-Second Billing" Cloud Architecture Tailored for AI Agents

Over the past two decades, global cloud computing infrastructure has been designed for humans. We click, swipe, and browse, generating steady and predictable traffic. However, with the rise of AI Agents, this underlying logic is being completely overturned. This week, Amazon AWS officially launched the next generation of OpenSearch Serverless, the first cloud database system custom-built for agent workloads.

Why do AI Agents make traditional cloud services "acclimatize poorly"? Because their operating style is extremely aggressive: they can wake up hundreds of sub-programs within seconds, frantically querying databases and calling APIs, only to vanish instantly after completing their tasks. This abrupt burst of traffic combined with extreme periods of idleness leaves servers originally designed for humans overwhelmed. According to Cloudflare data, the proportion of non-human traffic has reached as high as 31% and is expected to fully surpass human traffic in the first half of 2027.

Facing this monumental shift, AWS has deployed its killer feature: the "separation of compute and storage." Under the old architecture, even with no user access, enterprises had to pay for at least one resident instance, much like renting a monthly parking spot. The brand-new Serverless architecture achieves true elastic scaling: when agents burst into tasks, computing power ramps up rapidly within seconds; when they go dormant, computing power drops to zero. This means enterprises no longer need to pay a single cent for idle resources, shifting from "monthly parking rentals" to precise "per-second billing." Not just AWS, but giants like Microsoft Azure and Databricks are also accelerating the restructuring of underlying infrastructure to adapt to autonomous machine-to-machine interactions. This infrastructure arms race targeting AI Agents has begun, and its direct benefit will be a significant reduction in deployment costs, thereby pushing AI applications toward a larger scale.