Study: LLM-generated AGENTS.md hurts age…

ETH-ZURICH PUB_DATE: 2026.03.07

STUDY: LLM-GENERATED AGENTS.MD HURTS AGENT SUCCESS AND RAISES COST

A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs b...

A new ETH Zurich and LogicStar.ai study finds that LLM-generated repository context files like AGENTS.md reduce coding agent success and raise inference costs by over 20%.

Researchers from ETH Zurich and LogicStar.ai built AGENTBENCH, a suite of 138 real-world Python tasks, to measure how repository-level context files impact coding agents. They compared runs with no context file, an LLM-generated file, and a human-written file. The study summary and the paper report that LLM-generated context files reduced task success while raising inference costs by over 20%.

The authors argue that broad, global prompts push agents into aimless exploration instead of focused execution. Favor task-scoped prompts and retrieval of local code context over monolithic guides. If you use AGENTS.md, audit its effect on success rate, token usage, and wall-clock time before rolling it into default agent inputs.

[ WHY_IT_MATTERS ]

01.

Defaulting agents to consume AGENTS.md can slow delivery and increase spend without improving fix rates.

02.

Agent prompting needs guardrails that emphasize local, task-relevant context over global narratives.

[ WHAT_TO_TEST ]

terminal
A/B test agent runs with no context file vs LLM-generated vs human-written, tracking success rate, tokens, and wall-clock time.
terminal
Measure agent trace depth and tool calls to detect distraction from broad prompts versus task-scoped retrieval.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Pause auto-feeding AGENTS.md to agents and run controlled trials to confirm net benefit before keeping it.
02.
If AGENTS.md exists, trim to task-critical facts and gate via retrieval instead of stuffing into the system prompt.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Skip default AGENTS.md in agent inputs and design retrieval to pull only the code and config needed per task.
02.
Keep human-facing repo guides separate from agent prompts and enforce short, task-scoped system messages.

arrow_back

PREVIOUS_DATA_LOG

GPT-5.4 boosts code generation, but maintenance and security debt are rising

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

Google Gemini Free tier gets clear limits and an upgrade path

arrow_forward