business

BrowseComp

Term

BrowseComp is a benchmark dataset used to measure how well large language models handle web-browsing and computer-use tasks. Model creators cite BrowseComp scores alongside SWE-bench and other evaluations to demonstrate real-world agentic performance.

article 1 story calendar_today First: 2026-03-04 update Last: 2026-04-24 menu_book Wikipedia

Stories

Completed digest stories linked to this service.

MiniMax-M2.5 launches with SOTA coding claims; verify SWE-bench results

2026-03-04

MiniMax launched MiniMax-M2.5, a fast, low-cost coding and agentic model, but teams should validate its headli...