The Steersman

Chapter 3

12 min read

The previous chapter dealt with the team — what infrastructure you need when AI stops being one person's trick and becomes an organizational capability. This chapter zooms in on the person. When the team's production system is running, what does the individual actually do?

An engineer opens the laptop and finds three pull requests waiting.

One fixes a flaky test. One rewrites a service method. One failed because the agent missed a contract edge case. The engineer does not start by typing code. They review the diffs, tighten a test, reject one approach, rewrite the constraint that produced the bad result, and kick off another run.

A few years ago, that would have sounded strange. Now it sounds increasingly normal.

The deeper change AI is creating in software is not better autocomplete. It is a shift in where the human sits relative to the work. The engineer is moving out of the middle of the execution loop and up into direction, constraint-setting, and correction.

Norbert Wiener had a word for this kind of role: steersman.

Closing the loop

Wiener helped pioneer cybernetics while working on anti-aircraft fire control during World War II. The key move was not training humans to react faster. It was closing the loop between sensing, prediction, and action so the system could correct itself continuously. The human still chose targets and intervened when needed. The mechanism handled the tracking.

Software is moving in the same direction.

At companies already using background agents in production, parts of the development loop now run without constant human supervision. Agents pick up tasks, make changes inside isolated environments, run tests, and open pull requests. Humans review the result, adjust the rules, and decide what deserves another cycle.

That is a different role than "person who writes most of the code by hand."

The shift is visible in how organizations describe their own engineers. OpenAI writes that humans now "interact with the system almost entirely through prompts: an engineer describes a task, runs the agent, and allows it to open a pull request." Stripe's engineering blog talks about agents that produce end-to-end pull requests — more than a thousand merged per week — with humans reviewing, not writing. Spotify has documented background agents handling migrations across hundreds of repositories while engineers define the transformation and verify the output.

Different orgs, same loop structure. The machine acts. The human steers.

What changed on the ground

The strongest evidence is operational, not philosophical.

I have run seven onboarding sessions across two companies and a personal network over the last three months, introducing people to AI-assisted development. The sessions ranged from individual engineers to a Center of Excellence with 65 people. Eight patterns emerged, and most of them point directly at the steersman role — or at the gap between knowing it exists and being able to occupy it.

The first pattern is that the adoption curve is real and predictable. People move through stages: individual agent use, then swarms of agents, then orchestration across agents, then system design that shapes how agents operate. Most people I worked with were between stage zero and stage one. They had the tools. They had not yet internalized the new posture.

The second pattern — and I think the most important — is what I have started calling the blank input problem. The primary blocker to effective AI use is not model quality. It is not cost. It is not access. It is that people do not know what to ask.

A PM with access to the best tools in the world sits there with a blinking cursor and freezes. An engineer who can write code fluently stares at the chat window because specifying intent — knowing what you actually want, clearly enough for a machine to do it — turns out to be a different skill than doing the work yourself. I watched a CoE with 65 people at a major enterprise where everyone had tool access and nobody knew what to hand off.

This is the blank input problem, and it is the steersman's first real challenge. You cannot steer if you cannot articulate where you want to go. The craft model let you think with your hands — you discovered the solution in the act of building it. The steersman model requires you to know (or at least roughly frame) what you want before the machine starts working. That is a harder cognitive task than most people expect.

The third pattern reinforces the steersman framing from a different angle: constraints enable exploration, they do not slow it. The sessions where people adopted fastest were the ones where I set up guardrails first — version control, sandboxed environments, reversible actions. Once people trusted that mistakes were cheap to undo, they tried more things. The steersman needs a system that is safe to steer aggressively. Without that, people default to the craft model because it feels more controllable.

What the steersman keeps

The human still matters. The question is where.

I think the remaining work clusters in three areas. This clustering is already visible in how the most advanced organizations describe what their engineers actually do, and it maps cleanly onto what Wiener was describing: the human retains the parts of the loop that require judgment about the world outside the system.

Problem selection

Which problems deserve attention? Which ones are symptoms and which ones are causes? Which tradeoff matters right now?

Agents can surface issues. They can summarize logs. They can rank failing tests. They still struggle with the social and product judgment wrapped around priority. Someone has to decide what is worth aiming at.

OpenAI's description of their engineers' work after adopting background agents is instructive. Their people spend time "breaking down goals into building blocks" and "identifying missing capabilities when agents struggle." The question they ask is not "how do I implement this?" but "what capability is missing, and how do I make it legible and enforceable for the agent?" That is problem selection operating at a different altitude.

Constraint definition

The agent needs a shape for good work before it can produce good work consistently.

What architecture should this preserve? What performance budget matters? What should never be touched? What is acceptable debt and what is not?

A lot of engineering judgment lives here already. AI just makes that surface explicit. Things senior engineers used to carry tacitly now have to be encoded in docs, tests, linters, review standards, and prompts that survive contact with the rest of the team.

Spotify's approach to background agents illustrates this well. They deliberately limit tool access — a verify tool, a git tool with restricted subcommands, and a strict bash allowlist. They prefer large static prompts that are version-controlled and testable. The constraint surface is narrow on purpose, because "the more tools you have, the more dimensions of unpredictability you introduce." The steersman at Spotify is not giving the agent maximum freedom. They are giving it a tight channel that produces predictable, mergeable pull requests across thousands of repositories.

Stripe takes the opposite approach — nearly 500 internal tools exposed to the agent through a single MCP server — but the principle is the same. Someone designed that tool surface. Someone decided what the agent could reach and what it could not. The constraint definition is just wider.

OpenAI goes further. They describe encoding "human taste" into the system continuously: review comments become documentation updates, documentation updates become linter rules, linter rules become architectural enforcement. "When documentation falls short, we promote the rule into code." That is constraint definition as a ratchet — judgment gets sanded into the system one correction at a time, and the system gets better at producing work that matches the team's standards without being told each time.

Drift correction

This is the part people underestimate.

An agent can produce code that compiles, passes tests, and is still wrong. It can solve the local problem while harming the system. It can obey the letter of the request while missing the intent.

The steersman's job is noticing the gap between output and intention, then correcting the course. Sometimes that means editing code. More often it means editing the environment that produced the code: the spec, the tests, the docs, the guardrails.

Spotify's verification system is built around this insight. They identify three failure modes in order of severity. An agent that fails to produce a PR is a minor annoyance — easily retried. An agent that produces a PR failing CI is frustrating but visible. The most dangerous failure is an agent that produces a PR that passes CI but is functionally incorrect. It looks right. It compiles. It merges. And it breaks something downstream that nobody catches until production.

Their response is layered verification: deterministic verifiers that activate automatically based on codebase content, then an LLM-as-judge that evaluates whether the agent stayed within scope. The judge vetoes about 25% of sessions. When vetoed, agents self-correct roughly half the time.

That 25% veto rate is the steersman in action — not writing code, but catching drift before it compounds.

OpenAI's team experienced this at scale. Their code throughput became so high that the bottleneck shifted to human QA capacity. They used to spend every Friday — 20% of the week — cleaning up what they called "AI slop" manually. That did not scale. So they automated the correction too: recurring background agents that scan for deviations, update quality grades, and open targeted refactoring PRs. Drift correction feeding back into the system that produces the drift.

What to practice now

If this role shift is real, the skill stack changes with it.

The engineers who get stronger here are not the ones who merely type faster with AI. They are the ones who get better at:

  • framing a problem crisply enough that a machine can act on it
  • writing constraints that survive execution across diverse codebases
  • reviewing generated work for correctness and fit, not just compilation
  • deciding what needs human judgment and what does not
  • turning repeated review comments into reusable system rules

That last point matters a lot. In the craft model, taste often stayed inside the person — an accumulated intuition about what "good" looked like, never fully externalized. In the steersman model, good teams gradually push taste into the system. Every correction becomes a potential rule. Every rule that sticks reduces the need for that correction next time.

The blank input problem suggests another skill worth developing: the ability to decompose intent. People who freeze at the empty prompt often have the judgment — they know what good work looks like — but lack the habit of articulating it in a form the machine can use. That is a trainable skill. It is also, I think, the main thing onboarding programs should focus on. Not "how to use the tool" but "how to know what to hand the tool."

Boundaries

Engineers do not disappear.

Implementation does not stop mattering. Someone still has to know whether the implementation is sound. The steersman who cannot read code is not steering — they are guessing.

And this is not the same thing as turning engineering into management. Steering is not generic supervision. It is technical judgment applied at a different layer. OpenAI's engineers are not managing agents the way a project manager manages people. They are identifying missing capabilities, encoding architectural invariants, and designing the verification systems that catch bad output. That requires deep technical knowledge. It just deploys that knowledge differently.

The boundary will not hold forever either. Some of what I am calling "human work" today will automate further. That is already happening. Problem selection is getting more instrumented — agents can surface signals from monitoring and user feedback. Constraint definition is getting more codified — linters and architectural tests encode what used to be judgment calls. Drift correction is getting better tooling — LLM-as-judge systems already catch a quarter of bad output before a human sees it.

Each of those three areas is partially automating. The layers keep collapsing. The steersman model is useful because it describes the current transition honestly. It should not be mistaken for a permanent settlement.

The real adjustment

The hardest part is psychological.

For a long time, being a strong engineer meant being deep inside the loop. You knew the code because you wrote it. You trusted the path because you walked it yourself. AI breaks that identity before it breaks the job.

That is why some of the resistance feels emotional even when people use technical language. The discomfort includes quality concerns, but proximity is the deeper issue. People are being pushed one layer up the stack and are not sure whether that is advancement, loss, or both.

I think it is both.

You lose some intimacy with implementation. You gain influence over a larger system. The question is whether you can operate comfortably at that new level without pretending the old one still defines the job.

I saw this in every onboarding session. The people who adopted fastest were not the most technically skilled. They were the ones most willing to let go of the feeling that they should be the one typing. The ones who stalled were often excellent engineers who could not shake the sense that delegating to a machine meant losing something important about their work. They were not wrong about the loss. They were wrong about what it meant.

Tomorrow morning there will still be pull requests waiting. The difference is that more engineers will meet them the way a steersman meets a current: by setting direction, correcting drift, and deciding what deserves force in the first place.