Stop Blaming the Model: Build a Better A…

AGENTIC-WORKFLOWS PUB_DATE: 2026.04.21

STOP BLAMING THE MODEL: BUILD A BETTER AGENT HARNESS

A thoughtful write-up argues reliable AI agents come from the harness around the LLM, not the model itself. The piece explains why agents fail in production ev...

A thoughtful write-up argues reliable AI agents come from the harness around the LLM, not the model itself.

The piece explains why agents fail in production even when benchmarks look great: LLMs are stochastic, so consistency, tool usage, context control, and recovery must live outside the model. The author calls this the harness — the system that interprets outputs, validates structure, retries, and orchestrates tools.

Read the argument and patterns in the DEV post: Harness Engineering: The Most Important Part of AI Agents. The takeaway: treat the model as an unreliable dependency and engineer compensating controls like any other flaky external system.

[ WHY_IT_MATTERS ]

01.

Reliability, cost, and safety come from harness design — retries, validation, tooling, and observability — more than from swapping models.

02.

Backend and data teams already own these muscles: orchestration, schemas, idempotency, tracing, and SLOs.

[ WHAT_TO_TEST ]

terminal
Add JSON Schema validation with auto-retry-on-parse-fail; compare success rate, tail latency, and token spend before vs. after.
terminal
Introduce tool-call timeouts and circuit breakers; chaos-test flaky APIs to verify idempotency, rollback paths, and complete tracing.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Wrap existing LLM calls behind a single harness service with standardized prompts, schemas, retries, and tracing to reduce drift and hidden couplings.
02.
Centralize external tool execution with idempotent jobs and rate limiting; publish metrics for failure modes and backoff behavior.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Design the harness first: choose a simple state machine or workflow engine for steps, memory, and error paths; keep the model swappable.
02.
Define contracts early: input/output schemas, tool specs, and evaluation metrics; gate releases with harness-level tests.

Enjoying_this_story?

Get daily AGENTIC-WORKFLOWS + SDLC updates.

Practical tactics you can ship tomorrow
Tooling, workflows, and architecture notes
One short email each weekday

arrow_back

PREVIOUS_DATA_LOG

Claude Opus 4.7 lands: better coding/reasoning, same price—time to A/B against 4.6

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

—

arrow_forward