ETH-ZURICH PUB_DATE: 2026.03.07

STUDY: LLM-GENERATED AGENTS.MD HURTS AGENT SUCCESS AND RAISES COST

A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs b...

Study: LLM-generated AGENTS.md hurts agent success and raises cost

A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs by over 20%.

Researchers from ETH Zurich and LogicStar.ai built AGENTBENCH, a suite of 138 real-world Python tasks, to measure how repository-level context files impact coding agents. They compared runs with no context file, an LLM-generated file, and a human-written file. The study summary and the paper report that LLM-generated context files reduced task success while raising inference costs by over 20%.

The authors argue that broad, global prompts push agents into aimless exploration instead of focused execution. Favor task-scoped prompts and retrieval of local code context over monolithic guides. If you use AGENTS.md, audit its effect on success rate, token usage, and wall-clock time before rolling it into default agent inputs.

[ WHY_IT_MATTERS ]
01.

Defaulting agents to consume AGENTS.md can slow delivery and increase spend without improving fix rates.

02.

Agent prompting needs guardrails that emphasize local, task-relevant context over global narratives.

[ WHAT_TO_TEST ]
  • terminal

    A/B test agent runs with no context file vs LLM-generated vs human-written, tracking success rate, tokens, and wall-clock time.

  • terminal

    Measure agent trace depth and tool calls to detect distraction from broad prompts versus task-scoped retrieval.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Pause auto-feeding AGENTS.md to agents and run controlled trials to confirm net benefit before keeping it.

  • 02.

    If AGENTS.md exists, trim to task-critical facts and gate via retrieval instead of stuffing into the system prompt.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Skip default AGENTS.md in agent inputs and design retrieval to pull only the code and config needed per task.

  • 02.

    Keep human-facing repo guides separate from agent prompts and enforce short, task-scoped system messages.

SUBSCRIBE_FEED
Get the digest delivered. No spam.