One camp focused solely on updating the Harness without touching model weights.
The other camp focused solely on updating model weights without altering the Harness.
1. Two Research Factions: Why Were They Isolated for So Long?
The logic of this faction is that the model is already powerful enough; the problem lies in the poorly configured peripheral engineering system. A representative work is Microsoft's recently analyzed SkillOpt. It uses a Meta-Agent to repeatedly optimize the Task Agent's
SKILL.md—including tool descriptions, Prompts, retry logic, and search strategies—but the model's weights remain completely fRozen.Core Assumption: As long as the "shell" (Harness) is tuned correctly, the model can unleash its full potential.
This faction follows the traditional reinforcement learning (RL) route: feed data, APPly gradient descent, and update weights. They fine-tune the model on task feedback using an RL pipeline, while the harness remains completely fixed.
The Problem: While effective in some tasks, it requires hand-writing an RL pipeline for every new task, resulting in high training costs and limited Generalization capabilities.
Ultimately, these were two different "improvement levers" competing against each other:
Harness improvements change "how to use this model."
Weight improvements change "the model itself."
Both directions had limitations, yet no one had considered: Why not change both simultaneously?
2. SIA's Core Innovation
Observe the performance of the Task Agent.
Analyze the causes of failure.
Simultaneously ouTPUt two types of modifications: a new Harness configuration + a new weight update direction.
Task Agent executes task → Feedback-Agent analyzes feedback → Simultaneously generates: Harness Patch + Weight Gradient Update → Task Agent re-executes with new config & weights → Loop iterates until convergence.3. SOTA Performance Across Three Tasks
Task: Automatically classify criminal case descriptions into the correct legal charges.
Result: SIA-W+H improved upon the previous SOTA by 25.1%.
Analysis: This is an understanding-intensive task requiring domain knowledge (law) + reasoning capabilities. Pure Harness optimization or pure weight updates alone could not achieve this magnitude of improvement—it requires the synergy of both.
Task: Given a computational target, automatically generate faster CUDA/GPU kernel code.
Result: The runtime of the generated kernel was compressed from 1,161μs to 1,017μs, a 12.4% speed increase.
Analysis: This task demands extremely high precision and hardware intuition. Harness updates let the Agent know "where to look," while weight updates enable the Agent to truly underStand "how to write it correctly."
Task: Recover true signals from highly noisy single-cell RNA sequencing data.
Result: SIA-W+H improved upon the previous SOTA by 20.4%.
Analysis: Data annotation costs in this field are extremely high, so the model must learn from very few labeled Samples. The Harness shapes the Agent's behavioral strategy, while weight updates inject domain intuition—both are indispensable.
4. Why Is Using Both Levers Simultaneously Effective?
"Harness updates make the model agentic, shaping how it searches and acts, while weight updates build the domain intuition that no prompt or scaffold can instil."
Harness updates make the model "agentic"—deciding where to search and how to act.
Weight updates build domain intuition—something no prompt or scaffold can teach.
Harness Update = Teaching a person "the correct way to use Google."
Weight Update = Teaching a person "the background knowledge of this field."
5. Relationship with SkillOpt: Same Origin, Different Paths
| Dimension | SkillOpt (Microsoft) | SIA (arXiv:2605.27276) |
|---|---|---|
| Optimization Target | Harness (SKILL.md) | Harness + Model Weights |
| Optimization Method | Meta-Agent generates new prompts | Feedback-Agent outputs Harness patches + Weight gradients |
| Modifies Model? | No | Yes |
| Training Cost | Low (API calls only) | High (Requires gradient computation) |
| Applicable Scenarios | general tasks, existing strong models | vertical domains, extremely high precision requirements |
6. Limitations of This Research
Performing weight updates simultaneously requires GPUs and gradient calculations, unlike SkillOpt which runs purely on prompt generation. The deployment threshold is higher.
The Feedback-Agent itself is an LLM. The quality of its analysis directly affects the direction of improvement. If the base model's capability is insufficient, the analysis itself may be flawed.
The paper does not provide clear convergence conditions; in practice, human judgment may be needed to determine when to stop iterating.
This is new work. The GitHub repository (
sumanth-077/SIA) has few stars and forks so far, and its engineering maturity remains to be verified.7. What Does This Mean?
| Research | Core Idea | What is Improved |
|---|---|---|
| SkillOpt (Microsoft) | Meta-Agent automates SKILL.md optimization | Harness only |
| SIA (arXiv:2605.27276) | Feedback-Agent outputs Harness patches + Weight gradients | Harness + Weights |
| Traditional RL | Hand-written pipeline fine-tuning | Weights only |
One-sentence conclusion: SIA proves that "pulling two levers simultaneously" is far more effective than "pulling just one." AI self-improvement is moving from single-point breakthroughs to systems engineering.
Comments & Questions (0)
No comments yet
Be the first to comment!