Promptfoo
Ai ToolPromptfoo is an open-source tool for testing and evaluating large-language-model prompts across providers, with automated metrics and regression suites. It helps developers and security teams catch prompt failures, jailbreaks, and other unwanted model behaviors before deploying to production.
article
7 storys
calendar_today
First: 2026-03-10
update
Last: 2026-04-29
Stories
Completed digest stories linked to this service.
-
Promptfoo joins OpenAI with a practical playbook for evaluating coding agents2026-04-29Promptfoo is now part of OpenAI and published a hands-on guide that reframes how to evaluate coding agents in ...
-
SWE-bench Verified is out; evals shift to deployment-grounded signals2026-04-28OpenAI retired SWE-bench Verified after audit results showed contamination and flawed tests, pushing teams tow...
-
Agent evals are now system tests, not model tests2026-04-25Coding AI moved from single-shot prompts to agents you must evaluate as full systems. The new [Promptfoo agen...
-
Agentic dev is outrunning your tests: here’s how teams are catching up2026-04-24Agentic coding is forcing teams to rethink test coverage and evaluation, with new guidance, real workflows, an...
-
Sandboxed coding agents: OpenAI updates its Agents SDK, and there’s a clear way ...2026-04-19OpenAI’s Agents SDK now includes sandboxing and a model harness, and there’s a practical way to benchmark agen...
-
Agentic coding moves from hype to ops: evals, observability, and resilience land...2026-04-18A cluster of releases and guides tightens the nuts and bolts of running coding agents in production. Promptfo...
-
OpenAI courts OSS maintainers: free Codex/ChatGPT access, Codex Security preview...2026-03-10OpenAI is pushing into open‑source maintenance and AI security with a support program, a new Codex Security ag...