Promptfoo

Ai Tool

Promptfoo is an open-source tool for testing and evaluating large-language-model prompts across providers, with automated metrics and regression suites. It helps developers and security teams catch prompt failures, jailbreaks, and other unwanted model behaviors before deploying to production.

article 7 storys calendar_today First: 2026-03-10 update Last: 2026-06-12

Stories

Completed digest stories linked to this service.

Promptfoo joins OpenAI with a practical playbook for evaluating coding agents

2026-04-29

Promptfoo is now part of OpenAI and published a hands-on guide that reframes how to evaluate coding agents in ...
SWE-bench Verified is out; evals shift to deployment-grounded signals

2026-04-28

OpenAI retired SWE-bench Verified after audit results showed contamination and flawed tests, pushing teams tow...
Agent evals are now system tests, not model tests

2026-04-25

Coding AI moved from single-shot prompts to agents you must evaluate as full systems. The new [Promptfoo agen...
Agentic dev is outrunning your tests: here’s how teams are catching up

2026-04-24

Agentic coding is forcing teams to rethink test coverage and evaluation, with new guidance, real workflows, an...
Sandboxed coding agents: OpenAI updates its Agents SDK, and there’s a clear way ...

2026-04-19

OpenAI’s Agents SDK now includes sandboxing and a model harness, and there’s a practical way to benchmark agen...
Agentic coding moves from hype to ops: evals, observability, and resilience land...

2026-04-18

A cluster of releases and guides tightens the nuts and bolts of running coding agents in production. Promptfo...
OpenAI courts OSS maintainers: free Codex/ChatGPT access, Codex Security preview...

2026-03-10

OpenAI is pushing into open‑source maintenance and AI security with a support program, a new Codex Security ag...