AGENTIC-WORKFLOWS PUB_DATE: 2026.04.21

STOP BLAMING THE MODEL: BUILD A BETTER AGENT HARNESS

A thoughtful write-up argues reliable AI agents come from the harness around the LLM, not the model itself. The piece explains why agents fail in production ev...

Stop Blaming the Model: Build a Better Agent Harness

A thoughtful write-up argues reliable AI agents come from the harness around the LLM, not the model itself.

The piece explains why agents fail in production even when benchmarks look great: LLMs are stochastic, so consistency, tool usage, context control, and recovery must live outside the model. The author calls this the harness — the system that interprets outputs, validates structure, retries, and orchestrates tools.

Read the argument and patterns in the DEV post: Harness Engineering: The Most Important Part of AI Agents. The takeaway: treat the model as an unreliable dependency and engineer compensating controls like any other flaky external system.

[ WHY_IT_MATTERS ]
01.

Reliability, cost, and safety come from harness design — retries, validation, tooling, and observability — more than from swapping models.

02.

Backend and data teams already own these muscles: orchestration, schemas, idempotency, tracing, and SLOs.

[ WHAT_TO_TEST ]
  • terminal

    Add JSON Schema validation with auto-retry-on-parse-fail; compare success rate, tail latency, and token spend before vs. after.

  • terminal

    Introduce tool-call timeouts and circuit breakers; chaos-test flaky APIs to verify idempotency, rollback paths, and complete tracing.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Wrap existing LLM calls behind a single harness service with standardized prompts, schemas, retries, and tracing to reduce drift and hidden couplings.

  • 02.

    Centralize external tool execution with idempotent jobs and rate limiting; publish metrics for failure modes and backoff behavior.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Design the harness first: choose a simple state machine or workflow engine for steps, memory, and error paths; keep the model swappable.

  • 02.

    Define contracts early: input/output schemas, tool specs, and evaluation metrics; gate releases with harness-level tests.

Enjoying_this_story?

Get daily AGENTIC-WORKFLOWS + SDLC updates.

  • Practical tactics you can ship tomorrow
  • Tooling, workflows, and architecture notes
  • One short email each weekday

FREE_FOREVER. TERMINATE_ANYTIME. View an example issue.

GET_DAILY_EMAIL
AI + SDLC // 5 MIN DAILY