STUDY: LLM-GENERATED AGENTS.MD HURTS AGENT SUCCESS AND RAISES COST
A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs b...
A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs by over 20%.
Researchers from ETH Zurich and LogicStar.ai built AGENTBENCH, a suite of 138 real-world Python tasks, to measure how repository-level context files impact coding agents. They compared runs with no context file, an LLM-generated file, and a human-written file. The study summary and the paper report that LLM-generated context files reduced task success while raising inference costs by over 20%.
The authors argue that broad, global prompts push agents into aimless exploration instead of focused execution. Favor task-scoped prompts and retrieval of local code context over monolithic guides. If you use AGENTS.md, audit its effect on success rate, token usage, and wall-clock time before rolling it into default agent inputs.
Defaulting agents to consume AGENTS.md can slow delivery and increase spend without improving fix rates.
Agent prompting needs guardrails that emphasize local, task-relevant context over global narratives.
-
terminal
A/B test agent runs with no context file vs LLM-generated vs human-written, tracking success rate, tokens, and wall-clock time.
-
terminal
Measure agent trace depth and tool calls to detect distraction from broad prompts versus task-scoped retrieval.
Legacy codebase integration strategies...
- 01.
Pause auto-feeding AGENTS.md to agents and run controlled trials to confirm net benefit before keeping it.
- 02.
If AGENTS.md exists, trim to task-critical facts and gate via retrieval instead of stuffing into the system prompt.
Fresh architecture paradigms...
- 01.
Skip default AGENTS.md in agent inputs and design retrieval to pull only the code and config needed per task.
- 02.
Keep human-facing repo guides separate from agent prompts and enforce short, task-scoped system messages.