Terminal Bench

Repo

Terminal Bench is an open benchmark repository that tests how well large language models can execute and reason about real Linux command-line tasks inside a sandboxed terminal. It is used by researchers and practitioners to compare agentic coding models such as OpenAI Codex and Anthropic Claude on end-to-end, tool-using workflows.

article 0 storys calendar_today First: 2026-02-09 update Last: 2026-04-11 menu_book Wikipedia