STATEFUL CODING AGENTS ARE MATURING—PRODUCTION SRE STILL TRIPS THEM UP
Anthropic is shifting Claude Code from ephemeral to persistent Tasks—DAG dependencies, local filesystem state (~/.claude/tasks), and cross‑session orchestration...
Anthropic is shifting Claude Code from ephemeral to persistent Tasks—DAG dependencies, local filesystem state (~/.claude/tasks), and cross‑session orchestration via CLAUDE_CODE_TASK_LIST_ID—while also extending MCP with a UI framework for app‑like agent tooling (VentureBeat1, The New Stack2). OpenAI’s Codex CLI published a rare deep dive on its agent loop, detailing prompt construction, tool‑calling cycles, and bottlenecks like quadratic prompt growth and cache misses Ars Technica 3. But production SRE remains a weak spot: OTelBench shows frontier LLMs top out at 29% pass rate on OpenTelemetry instrumentation across 23 tasks, highlighting the gap between codegen and cross‑cutting operational work Courier‑Journal press release 4.
-
Adds: specifics on Tasks (DAGs, durable state in ~/.claude/tasks, env‑var sharing) and enterprise stability focus. ↩
-
Adds: Anthropic extends MCP with a UI/app framework enabling richer agent tool UX. ↩
-
Adds: technical breakdown of Codex CLI’s agent loop, design trade‑offs, and known bottlenecks/bugs. ↩
-
Adds: independent benchmark quantifying poor LLM performance on OpenTelemetry instrumentation (29% pass rate across 23 tasks). ↩
Agents are gaining the statefulness and orchestration needed for real multi-step workflows and auditability.
LLMs still struggle with observability and cross-cutting SRE tasks, demanding guardrails and human review.
-
terminal
Pilot DAGged Tasks for a real multi-repo change (build→test→deploy) and measure recovery across agent restarts and sessions.
-
terminal
Run an OTelBench-style scenario: require end-to-end trace context propagation and gate with automated trace assertions in CI.
Legacy codebase integration strategies...
- 01.
Version-control ~/.claude/tasks and set CLAUDE_CODE_TASK_LIST_ID in dev/CI to make agent plans auditable without app changes.
- 02.
Start by instrumenting one latency-critical path with OpenTelemetry and create golden-trace tests before letting agents modify code.
Fresh architecture paradigms...
- 01.
Design repos for agentability: small scripts, explicit tool wrappers, and standardized tracing middleware from day one.
- 02.
Prefer MCP-integrated tools and a single task list per feature branch to reduce coordination tax and context thrash.