Agentic manual testing patterns for codi…

PYTHON PUB_DATE: 2026.03.06

AGENTIC MANUAL TESTING PATTERNS FOR CODING AGENTS

Have coding agents execute and manually test the code they write, using quick scripts and API exploration, to catch real-world failures that unit tests miss. S...

Have coding agents execute and manually test the code they write, using quick scripts and API exploration, to catch real-world failures that unit tests miss.

Simon Willison’s agentic manual testing guide argues that coding agents should run the code they generate, not just rely on unit tests. Tests help, but real behavior can still break, so agents should verify outcomes by exercising the system directly.

He outlines practical moves: use python -c to probe edge cases, write throwaway demos in /tmp, and spin up a dev server to explore JSON endpoints with curl. Encourage the agent to “explore” an API surface to cover more paths and surface gaps before merge.

[ WHY_IT_MATTERS ]

01.

Manual exercise by agents exposes integration and runtime issues that slip past unit tests.

02.

Quick, scriptable checks shorten feedback loops and reduce back-and-forth debugging.

[ WHAT_TO_TEST ]

terminal
Have agents run python -c snippets and curl calls as part of PR validation for new endpoints.
terminal
Track agent-found failures and fold them into regression tests to harden coverage.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Add agent-driven smoke scripts that hit existing APIs and edge cases without changing your test harness.
02.
Use /tmp throwaway demos to validate risky refactors before touching legacy suites.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Bake agent-run python -c and curl probes into your dev containers and CI from day one.
02.
Adopt a test-first flow but require agent-led manual exploration before merging features.

arrow_back

PREVIOUS_DATA_LOG

What Agentic AI Means for Backend Automation

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

GPT-5.4 hype: harden your model upgrade path

arrow_forward