Harness Engineering and AI-First Organizations: Building Systems Where AI Leads and Humans Trust-AI Topic

When the Harness Era ARRives: From Trusting People to Trusting AI, an Organizational Blueprint for AI-First

"Harness Engineering" is emerging as a new consensus in Silicon Valley, with companies like anthropic and OpenAI exploring this engineering paradigm. However, few truly underStand what Harness means. Not long ago, an article titled "Why Your 'AI-First' strategy Is Probably Wrong" garnered over a million views and sparked heated discussion on X. The author, Peter Pang from Silicon Valley-based CreaoAI, dEMOnstrated the extreme efficiency unleashed by the Harness Agent system: 99% of code is completed by AI, with an aveRAGe of 3 to 8 production deployments per day, condensing a product workflow that previously took six weeks into a single day.

In this episode of the Silicon Valley 101 podcast, host Hong Jun invited Creao's three founders to discuss the company's prACTice of Harness and their deep thoughts on organizational AI-First transformation. The guests pointed out that AI-First does not mean "using AI." If you want to improve efficiency by 100 or 1,000 times, you cannot just treat AI as a tool; AI must become the master of all Productivity. The most difficult step in Organizational Transformation lies in whether all employees can learn to Trust AI.

The conversation yielded some interesting observations. At Creao, the marketing department no longer needs to chase the engineering department for requirements, because the pace of development has far exceeded the market's capacity to absorb it. After a large amount of alignment work was taken over by AI, removing the product manager role actually greatly improved team efficiency. Junior engineers adapt to the AI era transition better than senior ones. Although expertise accumulated over the past decade is rapidly depreciating, senior engineers remain competitive, because the core competency of the future is no longer writing code but "finding flAWS in AI Planning" and "judging what is valuable."

Here are the highlights of the conversation:

01
Harness Engineering Explained: How to Squeeze the Limit Out of Large Models
Hong Jun: Peter, let's start with you. What exactly is Harness engineering?
Peter: The concept of Harness can be traced back to the early days of large language models, when many people were talking about Prompt Engineering, which later evolved into context engineering. At that stage, the focus was more on how to interact with the large model itself.

But for Harness, we are "taming" a General-purpose system. So, in terms of scope, it is much broader than Prompt and Context Engineering. It involves the use of tooling, the Architectural design of your sandbox, how your host services interact with each other, how these interactions can be SECure, what your sandbox startup time is, and what the latency is. All of this is part of Harness.
Hong Jun: Can we understand that the engineering capability of Harness determines how to "squeeze" the best performance ceiling out of a large model? I remember Kai mentioned that an Agent could replace the workflow of three people doing SEO overnight. At the Same time, there was a content pipeline that ran for two days before someone realized it was all garbage. The huge gap between these two outcomes is a victory for Harness in one case and a failure in the other.
Peter: I think this perfectly illustrates why we need Harness. The essence of Harness lies in how we can continuously improve a system. When a system produces poor results, does it need human feedback to improve, or can the system itself self-heal and self-improve? This is precisely the core of Harness.

A critical aspect of Harness is how to enable an Agent to scale during the reasoning phase, including how to provide it with more context and tool chains to allow it to think longer and complete a task over an extended period. If your Harness is not well-built at this stage, it is easy to produce hallucinations or context overflow, and your model's capability will degrade. So, Harness is a very complex task that requires significant experience.
Hong Jun: What are the consensus and non-consensus views on Harness in the market today?
Peter: Many people think of Harness as static—developing a supporting system to leverage the advantages of an LLM. But we believe it is a dynamic process: how your system can truly come alive from a static state, be able to self-improve, continuously adapt to various signals from the market, product, and users, and enable rapid iteration. I think this is a point that many people have yet to realize.
Hong Jun: And this iteration is primarily AI-driven, not human-driven?
Peter: Exactly, it is AI-driven iteration. The thing humans need to do is figure out how to feed various signals to the AI.

02
From Six Weeks to One Day: How Fast Is an AI-Driven Development Process?
Hong Jun: You had a very popular tweet thread about your 25-person company, where 99% of the code is written by AI. A feature was written at 10 a.m., A/B tested at noon, partially cut based on data feedback at 3 p.m., and rewritten into a better veRSIon by 5 p.m. This is a single day's work rhythm, a process that in traditional Product Development would take six weeks. This is the APProach you explored using Harness.
Peter: From our perspective, Harness is divided into two parts: one is the Harness for Creao's own Agent system, and the other is how to help users Harness their own Agents when they build with Creao. In a traditional development process, iterating on a feature might take two to three months. Now, with AI-assisted coding, implementation only takes one or two hours. If you still spend a long time on design and testing, it becomes meaningless. So, integrating design, planning, and testing into the entire Harness process is crucial for whether a company can truly transform into an AI-First Organization.
Clark: I want to express a concept first: if you want to achieve a state of being truly AI-First or AI-Native, it's not about using AI tools within existing processes. You must rebuild your workflow and organizational structure around AI capabilities.

Image Source: Peter Pang@intuitiveml

For a long time before, every engineer was using AI to write code, every product manager was using AI to write PRDs, and every designer was using AI to create graphics. But this didn't actually increase our efficiency. Instead, as everyone's work progress and rhythm became different, our alignment costs became extremely high, especially since we operate fully remotely.

Therefore, we had to rethink how to truly enable AI to run autonomously in the company's Operational process. That's what led Peter to design a new development process, architecture, and product architecture reconstruction, resulting in the self-healing Agent Harness mentioned in the article.
Hong Jun: Can you give an example of which directions changed when you reshaped the organizational structure? Where were the bottlenecks?
Peter: The first problem to solve is the human element—whether everyone can accept a new way of working. We spent a lot of time aligning mindsets. Previously, such a transformation usually required an architect or engineer spending several months to demonstrate that the new workflow was better, making the transformation cost very high.

Now, with AI assistance, this process is much faster. It might only take one to two weeks to refactor the entire system, including the front-end, back-end, architecture, and infrastructure, and then show everyone that it works more efficiently. Whether it's the deployment frequency, reliability, or final results, there is a huge improvement over the previous way of working. This allows us to align mindsets in a very short time and get everyone quickly integrated into the new development process.
Kai: Actually, Harness is more about building a system that truly enables an AI-First organization to run efficiently. Many people in organizations find it hard to change their thinking; they think using AI to improve efficiency is enough. But what AI-First requires is letting AI drive the direction of your entire company. Your daily work style could be driven by AI, which is a completely different concept.
Hong Jun: Does the AI assign tasks to you?
Kai: Yes. If you still treat AI as a tool for increASIng efficiency, the user's efficiency gain might be at most 10 times, because a person can only work up to 24 hours a day. If you hope for a 100-fold or 1,000-fold efficiency increase, you cannot just be the tool user; AI should become the master of all productivity. The human role changes, focusing more on reviewing the quality of outcomes, and on how to cooperate with this system when I am not the one doing the actual work. This is something many companies don't realize or struggle to achieve during their transformation.
Hong Jun: Can you give an example of how your system works in cooperation with people? I feel that a major pain point in traditional team product development is the need for inter-team alignment and Synchronizing Information to everyone. If anyone misses a piece of information, they might not know what the last update was when they work on the product. Can all these tasks be handed over to AI now, or can it be done automatically within the process?
Kai: I think the core issue here is trust. Many people distrust the system, so alignment costs remain very high. In an AI-First setup, alignment is led by AI. For instance, AI tells the marketing team what features the engineers are releasing tomorrow. The marketing team no longer needs to repeatedly ask the engineers.
Hong Jun: How does the AI know that the engineering team can finish all the work by tomorrow?
Peter: In the AI mindset, during the product iteration process, we focus more on whether a new feature can improve the product's top-line metrics or generate real user usage data. So, in this process, our core focus is on how to build the entire data chain. Once we set up this chain, the Agents use this data to decide whether a feature is useful, whether we should roll it out, or fall back from it.
Hong Jun: Does that mean that after an engineer finishes writing code, they don't need to manually tell the AI, "I'm done"? The AI can now automatically make judgments based on the overall code quality and progress.
Peter: Yes, this exists in traditional engineering too, known as the CI/CD process. However, in traditional CI/CD, many steps are rule-based or driven by unit testing. But with AI, we can have many AI-driven tests. For example, tools like Playwright, which are quite popular now, can perform complete AI-driven end-to-end testing, ensuring that the code we release has no obvious bugs that could break the product. In this process, AI-driven testing is very important. Signals like whether there are errors or incidents in the logs after the code is released are all fed back to the AI to assess the code's quality.
Hong Jun: Regarding letting AI write code, how do you guarantee its quality? Peter's article mentioned that under normal circumstances, you spend one day writing code but three days fixing bugs. What new methods are available now to avoid spending so much time on fixes?
Peter: I think bugs are inevitable in any engineering process, whether written by AI or humans. This is because Harness is not a static state; it's not that you have a system, and you only need to maintain it, and it will have no bugs and need no improvement.

The core of the Harness process lies in being able to find bugs in this system. As we just discussed, in the CI/CD process, we run a series of regression tests to prevent some bugs from being released into production and breaking the system. That's the first step. The second step is that, even if some corner cases or race conditions slip into the system, how can we identify these bugs in the shortest time and fix them promptly?

Traditionally, both steps were human-driven, but with Agent Harness, we have an Agent system driving them. So, we developed an Agent-driven CI/CD system and an Agent-driven bug triage system. It triages issues based on the problems in the system and assigns them to engineers for fixing.

Image Source: Peter Pang@intuitiveml

Hong Jun: How much has your efficiency improved since you introduced these two systems?
Peter: Since much of it is Agent-driven, it can run in parallel. Many Agents can simultaneously identify issues. It takes only 1-2 minutes to discover a bug and a few seconds to assign it to an engineer. The engineer then uses another Agent to investigate and propose a solution. The whole cycle takes about 1-2 hours. In contrast, previously, identifying, fixing a bug, and releasing the patch could have taken a week.
Clark: Right, there's a very interesting phenomenon here. We used to have a very long feature wish list and a bug list with many items to fix. The marketing, product, and engineering teams would always debate whether to fix bugs first or develop features. Now, neither of those lists exists. Bugs are found and fixed promptly, and the number of features now far exceeds what we need.
Peter: We now have an auto-fixing system. For issues that need fixing, if they are located in low-risk folders, the AI automatically submits a pull request. The engineer just needs to briefly approve it for release. More than 50% of issues are now resolved through auto-fixing.

03
The Architect: The Core Role in Harness
Hong Jun: Since I don't understand coding, let me use writing an article as an analogy. Suppose I am editing a draft. Even if only a small error appears, I might need to review the entire article, which takes about the same time as rewriting it myself. If the Agent sets up a very good technical Framework for you, but a major error occurs at the base layer requiring an engineer to solve it, wouldn't the engineer need to relearn the entire system?
Peter: That's a great question. In my previous article, I also discussed that engineering teams in the AI environment might split into two types of people: architects and operators. The architect's role in building this whole system is extremely important. For example, in building Creao's entire system, the overall agent architecture—like how the sandbox and the host interact—is still decided by the architect.

If the Agent directly gives you a solution through AI Coding or Vibe Coding, that solution often has security or latency risks. How to optimize this system is still the architect's call. The difference is that, traditionally, a team building an Agent system might have needed 10 to 20 people. Now, building such a system requires just one architect, who can complete it in a week.
Clark: Let me add to that. AI's capabilities have been strong for a while, but why hasn't it met people's expectations? People feel AI didn't do a good job, and they still have to compensate for its errors. A conceptual shift is needed here: we must treat AI as a system, not just an intelligence. When the system makes a mistake, don't try to correct the Intelligence; instead, think about how to compensate for the system. This is the most significant difference between what we do with Harness and the current common understanding. It's a process of dynamic change and improvement, not a static, fixed shackle to restrain the intelligence. We need to give it space to grow, much like raising a child, helping it improve within a set of rules.
Hong Jun: That's an interesting perspective. Previously, it was "fixing problems," but now it's "fixing problems in the system."
Clark: There's an even more radical idea here. The content we publish might not even target human audiences in the future. For example, marketing collateral might not look Aesthetically pleasing to a human. But when you release it to the market, you might find that an Agent is reading this article or image, and the data feedback you get could be better.

In the future, maybe Agents will do the purchasing, and Agents will subscribe to newspapers. So, is your advertisement for Agents or for people? You need to be clear about who will consume your work's results. Considering the value of that, we need to reconsider whether we are improving the system or going back to the most primal human creation process to correct errors.
Hong Jun: I believe that in future consumer decision-making, Agents might indeed be the primary consumers. But it's arriving much faster than I expected. The whole evolution is incredibly fast.
Peter: From another perspective, we can also clearly see the transformation of SaaS products now. Because previously, many SaaS products needed a dashboard for people to view and manage. But at least at our current stage, when our team uses task management, we care more about whether an Agent can "see" and prioritize these tasks. So, we look at whether these task management products have better MCP and API interfaces for our Agents to use.
Hong Jun: So, the overall evolution is indeed very fast.
Kai: Actually, the question you just raised is the first one many companies consider when undergoing an AI transformation. They feel that when using AI, they still need to review the ouTPUt, which seems like no time or cost saving compared to doing it manually. But if you truly build an AI system and calculate carefully, you'll see a huge improvement in both time and cost. It's just that this process requires the entire team to have a shared goal. As long as someone thinks it's better to do it manually, the transformation time will be extended. This is a challenge most enterprises face on the organizational level.

04
The Hardest Step in Transformation: From Trusting People to Trusting AI
Hong Jun: Were you AI-First from day one, or did you figure it out later?
Kai: Our company also went through a process. You have to realize who will be the core role of productivity in the future. In the first half of 2025, people still saw AI as assisting human work, with humans dominating. But by the second half of the year, we realized that if we kept that mindset, the improvement in enterprise efficiency would remain very limited. The core issue was not shifting the user of the productivity tool from a human to AI. This shift takes time; our marketing and engineering teams even spent one to two months going back and forth, discussing what the better way of working was.
Peter: I think this is also related to the improvement in AI capabilities. Over the past year, AI transitioned from an aUXiliary role to participating in development and now, relatively speaking, to leading the development process. This role shift is tied to the enhancement of foundational model capabilities and the evolution of Agent Architecture and infrastructure. A year ago, having AI lead development wasn't technically feasible. But during our reconstruction, when we realized AI had reached that tipping point, the speed and effect of our reconstruction far exceeded what anyone could have imAGIned a year prior.
Hong Jun: When did you start this reconstruction? What were the core things you did?
Peter: We realized the need for reconstruction around August or September last year. The most time spent was on shifting everyone's mindset. The actual reconstruction of the code architecture and development process began in January this year, and it took about two weeks to rebuild the entire architecture, including the product you see now.
Hong Jun: What were the things that AI couldn't solve before but can now?
Peter: Before, the main thing AI couldn't solve well was the planning phase. For instance, if we consider a perfect plan as 100 points, it can now give me a 90-point plan. When I see this plan, I can offer some criticisms, and it can give me a revised plan without me having to actually change it manually. A year ago, its plan might only have been worth 50 points, and I would have needed to manually modify the plan and the entire architecture.
Hong Jun: Do you feel AI's coding ability is above your own now?
Peter: First, AI's coding ability is DeFinitely above mine now; I haven't written a single line of code in 2026. But regarding planning ability, the value of an architect, whether you are a CTO or a tech lead, lies in being able to find the flaws in AI's planning. AI planning still has flaws today. For example, its initial plan for me might have security flaws or latency issues. My value, based on my previous architecture experience, is in being able to judge or question it, enabling further improvement.
Hong Jun: You taught it about latency and security. Has its new planning gotten much better now?
Peter: This is the core of Harness. Before, I taught it what principles to follow regarding security when designing the relationship between the sandbox and the host. Now, I can turn this into a Skill. Next time, I just need to reference this Skill, and it will speak on my behalf with more specific content, making it a very easy process. Other engineers on our team can also reference this Skill.
Hong Jun: Can you give the audience a tangible sense of scale? Now that AI is the main producer, how many employees would it take to do what it achieves?
Peter: If we go back a year, when AI wasn't in the lead, developing a product like Creao's current version would have required, I think, a team of at least 100 people, spending four to five months on research and development. If you look at the scale of other general Agent companies, it's probably in a similar range. Now, our company has 25 people, with fewer than 10 on the engineering team, and we deployed the product's first phase in just about two weeks.
Kai: From the perspective of operating a SaaS product, you get a very intuitive feeling. In the traditional software era, the sales team's vision was 4-5 months ahead of the product. Now, it's reversed; the technical team is 4-5 months ahead of the marketing team. Marketing is playing catch-up with the features being developed. This leads to completely different operational and organizational structures.
Hong Jun: Clark, on your end, in marketing, are you also using more Agents?
Clark: Our go-to-market team uses our own company's product to build AI-First workflows. Of course, there are many pitfalls. I think the biggest confusion is that Engineering is relatively easy to evaluate; it has clear metrics to tell if you did a good job or not. But from a go-to-market perspective, whether you're writing an article or making a video, everyone has their own judgment of its value, which is quite subjective. How we build this system to better turn these subjective judgments into signals that allow our system to run autonomously is a major challenge. We don't claim to be 100% Agent-driven in decision-making today, but we let many Agent results through and then have humans judge whether the result is good or bad. We have many features that we believe the market isn't ready for, so we don't release them yet.
Hong Jun: I'm curious, what are some of your forward-thinking ideas that the market has yet to accept?

05
Why Do Large Enterprises Struggle to Transform? The AI Dividend Period for SMBs
Clark: Every person's Agent has full read-and-write access, which is a relatively bold move. We hope that in the future, to make your organization more efficient, much of your data should be open to your Agents and to every indiVidual. But this probably requires better technical support, including how to restrict permissions for each person or Agent, and how to ensure the Agent doesn't make mistakes when reading this data. For example, if I previously wanted to ask, "Peter, how many users do we have today exhibiting a certain behavior?" I might have had to find a data colleague or engineer to build a new table for me. Now, I just need to ask the Agent my question, and it gives me an answer within three seconds.
Peter: I think the main challenge right now isn't whether the market can accept this working method, but that the market doesn't know this method exists or doesn't know how to efficiently use an Agent to help them complete their own work. So, in Creao's entire process, we've done a lot of work to allow users to access an Agent easily without complex setup, helping them complete their work.
Hong Jun: So, what scenarios are you targeting, and who are your main customers? Are you targeting workflow automation in large enterprises, or entrepreneurs, individuals, or the general public?
Kai: All of the above, actually. But our true target is the so-called SMBs, because they might be the first to adopt AI. The larger the enterprise, the more compliance issues there are, alongside numerous "human" factors, making the entire process very difficult. But if a company doesn't have too many compliance issues or heavy legacy systems, it is naturally among the first group that can most easily undergo this transformation.

Image Source: Pixabay

Hong Jun: For a company wanting to make such a transformation, does its founder also need a core belief, which is trusting AI?
Kai: Yes. More importantly, they need to understand what this transformation ultimately represents. For example, when large models first emerged in 2023, many SaaS companies' immediate thought was, "How can I integrate AI as a feature into my product?" The essence of this misunderstanding is not grasping that if you want to build such a feature, your original product architecture might not support integrating these latest AI capabilities. Your database might not meet the requirements, and your interaction model might not be the future way. Therefore, what you might need is a complete product reconstruction. If you can't accept that outcome, this transformation might be very difficult.

06
When AI Leads Production, What is Humanity's New Role?
Hong Jun: Can you talk about how your organizational structure has changed? You've given many examples, but I still want to see the whole picture.
Kai: Looking at the entire organizational structure change, I think it fundamentally involves shifts in many roles. First, if you're really going to undertake this transformation, the initial change needed is in trust. Previously, the organization trusted people. But when you bring AI in, you must solve the first problem: can you trust your AI to make decisions or execute tasks? This trust is essentially the same reason we build a Harness system—we need many guardrails and mechanisms to ensure that all the AI's work, whether decisions, plans, or execution, ultimately produces outcomes that people can trust.

Then come changes to specific positions within your organization. We no longer have a dedicated product manager role. That function has been broken down and distributed among individual engineers and someone like Peter who handles engineering management. I don't think this role is unimportant; rather, it's usually the point of most intense friction. The product manager has to communicate with both marketing and development simultaneously. A massive amount of alignment cost occurs within the product manager role. But when you remove this role, you find that the alignment cost actually becomes lower in many cases.
Hong Jun: Do you think the product manager role still has value now? Or, if you were to rehire someone for a similar role, what new skills would they need?
Kai: Will product managers be needed in the future? Absolutely, they will be needed. One future possibility is that many of the powers of a product manager role will be distributed among the members of the development team. If every one of your development team members, with AI assistance, can better grasp product concepts, then the existence of the role itself becomes less crucial. I think the product manager historically solved alignment costs and helped the company lower development costs.
Hong Jun: So, can I understand it this way: Peter's engineering team is now playing the role of product manager to some extent. But conversely, a good product manager who has great ideas about the market and the product could also, given how low the development cost now is, easily turn an idea into a good product quickly using AI's execution capabilities. They might not even need an engineering team. So, these two roles are merging to some degree, becoming one role. Whether this role is filled by a technical team or a product team doesn't matter, as AI is in the lead. The idea is more important.
Kai: Yes. In the future, this role itself might not be a single person but the entire team collectively acting as the product manager. The role itself will be "organizationalized." In the traditional software era, there was a lot of individual heroism, where a specific product manager or a visionary soul made a product exceptionally popular. But in the future, it might be an organization that makes a great product accepted by the market.
Peter: I think there's a very clear trend: compound talents, or more generalist individuals, can thrive much better in the AI environment. Whether it's an engineer with product and market sense, or a product manager with implementation capability, it will be extremely important. Another trend is that UX and UI designers will become very important, but they need the execution capability to turn their ideas into products. If you need to pass your ideas to another person, the communication or alignment cost might far exceed the execution cost, making it less efficient in the entire AI working environment.
Hong Jun: Peter, I was very struck by one of your articles where you said junior engineers adapt to the AI environment better than senior ones. Why is that?
Peter: Because junior engineers typically have less technical debt or fewer mental constraints. They are more willing to expand their scope—not just acting as an engineer, but also integrating into product design and, after a feature deploy, doing some analysis and making judgments based on that analysis.

Conversely, more senior engineers are often very specialized. For example, if you are an infrastructure engineer or a back-end engineer, in a traditional work environment, you might not care about what happens after your code is released. But in the AI environment, the scope of an engineer needs to be much broader than before. It's not just about finishing writing the code, but about how you can add your judgment before coding and judge the impact after it's released. This entire process is very important.

Generally, junior engineers can better accept such a working state. For instance, if you tell a senior engineer that, beyond the process where their code has been committed, they also need to be responsible for the subsequent steps, they often can't understand or immediately shift their mindset. The cost of aligning this mindset is usually much higher.

Actually, as a senior engineer or a specialist in a specific field—whether infrastructure or front-end—your knowledge was very valuable in the past software development process, because you knew how to write the most concise code and design the best architecture, which might take two to three months. But AI coding is already very strong at this stage and will become even stronger, making your specialty increasingly less relevant. So, many people may struggle to accept a state where the knowledge they spent 10 or 20 years accumulating might not be as important in the future.
Hong Jun: So, what kind of people are you more inCLIned to hire now?
Peter: Although I mentioned in my article that transforming a senior engineer is harder than a relatively junior engineer, from a value perspective, a senior engineer's value is still irreplaceable now. So, finding a senior engineer who can also embrace the AI mindset, and additionally possesses product sense and some market knowledge, is extremely difficult but very valuable for the company. The good news is that before, we might have needed many such people, but now we might only need one or two.
Hong Jun: Now, we're saying AI has become the main force, and AI can iterate products. Everything is dynamic. What do you all think the core human capability will be in the future?
Peter: I think the most needed ability for humans is system architecture capability. This is a shift from implementing features to architecting this AI system and maintaining it. Whether you are an engineer or on the marketing side, the core process of go-to-market also involves building a marketing system of Agents that can run autonomously, not just producing standalone marketing content.
Kai: If we look at human value from a very long-term perspective, just as with the entire history of technological development, the direction of Technology will always be determined by human needs and societal needs. As long as the human species exists, the value of humans defining the direction of needs and judging the final outcome is irreplaceable.

★★★★★

Be the first to rate this article.

Harness Engineering and AI-First Organizations: Building Systems Where AI Leads and Humans Trust

Comments & Questions (0)

No comments yet