Harness Engineering from the Product Side
OpenAI just named what product managers should have been building all along

OpenAI just named what product managers should have been building all along.
In February, OpenAI published a piece on "harness engineering" — designing environments, constraints, and feedback loops that let AI agents produce reliable work at scale. A small team built roughly a million lines of production code without typing source code directly. The engineers built the system that made AI-written software trustworthy. They designed the constraints, not the code.
Martin Fowler's team at Thoughtworks picked it up immediately, breaking the concept into three pillars: context engineering, architectural constraints, and entropy management. Within weeks, the industry conversation moved from "how do we prompt better" to "how do we constrain better."
I'd been building exactly this kind of system around Claude Code for the past year — a memory pipeline, pre-tool hooks, background agents, domain-specific context modules — without a name for it. When I mapped my setup against OpenAI's three pillars, the correspondence was almost exact.
The three pillars, mapped to practice
OpenAI's pillars describe what harness engineering looks like from the engineering side. Here is what each one looks like from a solo practitioner who built a working harness before the term existed.
Context engineering. OpenAI uses AGENTS.md files, cross-linked design specs, and dynamic telemetry to give agents decision-making context. I use a CLAUDE.md file that compounds over time, domain modules that load based on project type, and a diary-to-reflection pipeline that converts session corrections into documented rules. The shared principle: give the agent enough context to make good decisions without constant supervision.
Architectural constraints. OpenAI enforces dependency layering through structural tests and deterministic linters. I enforce standards through pre-commit hooks, pnpm-only package management, conventional commit formatting, and a writing guard that rejects prose patterns I've flagged. The shared principle: make certain classes of mistakes structurally impossible.
Entropy management. OpenAI runs periodic "garbage collection" agents that detect documentation inconsistencies and architectural violations. I run a curation cycle — diary entries feed reflections, reflections feed pattern extraction, patterns above threshold promote to active rules, stale rules get pruned. The shared principle: prevent the system from rotting as it grows.
The underlying thesis is the same across all three: design the environment so the agent's default behavior is the correct behavior.
What the engineering framing misses
OpenAI's three pillars answer an important question: how do we make agents write reliable code? But the pillars are scoped to a single team. Most organizations operate across many teams, and that is where the framing breaks down.
I've written before about what happens when individual AI setups collide at the team level — context redundancy, context decay, and capability silos. Harness engineering, as OpenAI defines it, doesn't address any of these. Each engineer's harness is locally optimal. When their AI-generated code meets in a pull request, the collision creates bugs that neither harness anticipated.
The four-layer context model I proposed in that earlier piece maps cleanly onto harness engineering. OpenAI's dynamic context engineering corresponds to Layer 1 (live infrastructure) — but scoped to one team rather than the organization. Their static context engineering maps to Layer 2 (project context). Their architectural constraints partially cover Layer 3 (team capabilities), though they treat constraints as project-scoped when most real constraints are organization-scoped. Layer 4 (personal preferences) is the only layer OpenAI's framing handles well, because it's the only one that doesn't require coordination.
The organizational gap matters because harness engineering will scale the same way every other development practice scales: messily, with coordination problems that pure engineering can't solve. Someone has to decide which constraints are project-level and which are organization-level. Someone has to design the handoffs between teams with different harnesses. Those are product decisions.
The harness needs humans inside it
There is a second gap in the engineering framing. OpenAI's harness is designed for autonomous agent execution — humans set constraints, agents execute within them. That works for code generation, where output quality is mechanically verifiable through tests and linters.
It fails for everything else.
I learned this when I automated my job search. Maximum automation produced a 2% response rate across 150 applications. When I rebuilt with four human quality gates — each requiring my explicit review before anything shipped — the response rate hit 40% across 35 applications.
The twenty-fold improvement came from a specific architectural choice: the system didn't generate cover letters for me to approve. It prepared research, surfaced relevant context, and drafted materials that I then rewrote. The cognitive load fell on making decisions rather than evaluating someone else's decisions. That distinction is structural, and it applies directly to harness design.
For code, constraints and tests are sufficient quality gates. For anything involving judgment — writing, prioritization, communication, product decisions — the harness needs points where humans do the work themselves, armed with AI-prepared materials. A harness that only constrains autonomous execution is half a harness.
Harness engineering is product work
Here is the claim I want to make plainly: harness engineering is product management with a new name.
The core skill — designing systems where the right behavior is the default behavior — is what product managers do with choice architecture, default settings, and guardrails. Making the easy path and the correct path the same path is a design problem. OpenAI's contribution is applying that discipline to AI agent environments. The discipline itself has existed for decades.
The questions that matter for a well-designed harness are product questions: What should the agent never do? What context does it need to make good decisions? How will we know when it's drifting? How do we keep the system from rotting over time? And — the one OpenAI's framing skips — where do humans need to be inside the loop, not just around it?
Product managers who recognize harness engineering as their territory will shape how organizations adopt AI agents. The ones who cede it to engineering will watch constraints get designed without user judgment, organizational context, or coordination across teams.
OpenAI gave the discipline an engineering name. The work was always product work. Now it needs product people doing it.