Humans and Agents in Software Engineering Loops
Summary
Software engineering is shifting from manual code inspection ("human in the loop") to managing the automated systems that govern agentic workflows ("human on the loop"). This transition focuses on "Harness Engineering," where developers build the specifications and quality checks that direct AI agents to produce reliable software.
Key Points
- The "Why Loop" represents the human-driven process of defining goals and turning ideas into software outcomes.
- The "How Loop" encompasses the technical implementation, including the creation and management of code, tests, infrastructure, and documentation.
- "Human in the loop" refers to developers acting as gatekeepers by manually inspecting or correcting agent-generated artifacts, which can create productivity bottlenecks.
- "Human on the loop" involves "Harness Engineering," the practice of designing the specifications, quality checks, and workflow guidance that control the agents.
- The "Agentic Flywheel" is a self-improving cycle where agents analyze performance data, logs, and test results to recommend and implement improvements to the harness.
- The "How Loop" is hierarchical, consisting of nested loops ranging from high-level architectural implementation to the innermost loop of code generation and testing.
Technical Details
The "How Loop" functions as a multi-layered hierarchy. The outermost loop delivers the software required for the "Why Loop," while the innermost loop handles the immediate generation and testing of code. Intermediate artifacts, such as Architecture Decision Records (ADRs), technical designs, and tests, serve as the connective tissue between these layers.
The core of the "on the loop" approach is the "agent harness." This harness is a collection of constraints and evaluators—including unit tests, performance benchmarks, and compliance checks—that define the boundaries for agentic execution. In an "Agentic Flywheel" configuration, agents are integrated with telemetry, such as production logs, user journey data, and pipeline stages. The agents use this data to identify failures or inefficiencies, subsequently proposing updates to the harness itself. This process can be automated: as confidence in the agent's recommendations grows, updates to the harness can be applied automatically based on pre-defined risk and cost scores.
Impact / Why It Matters
Developers must transition from writing implementation-level code to engineering the evaluation frameworks and harnesses that ensure agentic outputs meet functional and operational requirements. This shift moves the focus from manual debugging to the design of robust, self-improving software delivery systems.