OPENAI
30 days · UTC
Synchronizing with global intelligence nodes...
Sandboxed coding agents: OpenAI updates its Agents SDK, and there’s a clear way to evaluate them
OpenAI’s Agents SDK now includes sandboxing and a model harness, and there’s a practical way to benchmark agentic coding and SRE bots. OpenAI shipped...
Agentic coding moves from hype to ops: evals, observability, and resilience land across the stack
A cluster of releases and guides tightens the nuts and bolts of running coding agents in production. Promptfoo’s guide breaks down why agent evals di...
Multi-model AI solidifies around OpenAI-compatible gateways as Mozilla debuts a sovereign client
Teams are coalescing around OpenAI-compatible APIs and multi-model gateways, with a fresh push toward self-hosted, sovereign AI clients. A DEV piece ...
Agents grow up: sandboxed execution and first-class memory land in production stacks
OpenAI and Cloudflare shipped safety and memory primitives that make agentic systems more production-ready. OpenAI upgraded its Agents SDK with sandb...
OpenAI turns Codex into a multi‑agent superapp with background computer control
OpenAI expanded Codex from a coding helper into a multi‑agent, do‑the‑work app with background computer control, a built‑in browser, memory, and autom...
LangChain ships SSRF hardening and safer inputs across libs, plus a timely reminder: chunking can sink your RAG
LangChain shipped SSRF-hardening and safer defaults across core and partner packages, while a new piece stresses production-grade RAG chunking. Core ...
RAG isn’t enough: add a context layer, strict schemas, and data-quality gates
RAG alone breaks under real workloads; you need a context layer, strict output schemas, and data-quality gates to keep LLM apps reliable. A detailed ...
Cloudflare Agent Cloud + Codex: enterprise-ready agents on GPT-5.4, with some early quirks
OpenAI and Cloudflare made it easier to run enterprise-grade coding and workflow agents with GPT-5.4 and Codex, while early users report a few glitche...
Frontier AI crosses into practical offensive capability; vendors move to lock down access and channel it to defense
Independent tests and a new industry initiative signal that frontier models can autonomously hack real targets, and vendors are gating access to use t...
AI agents just got real: autonomy is near, but ops and unit economics will decide who wins
AI agents are moving from flashy demos to production, and the bottlenecks are reliability, orchestration, and unit economics. The big labs are steeri...
Build dependable document QA: production RAG patterns, the right long‑context model, and safer behavior shaping
If you’re shipping document QA, combine a solid RAG spine with model choice tuned for structure and tactics that stabilize behavior. A deep, opiniona...
Codex 0.120 adds background agent streaming; GPT‑5.4 pitched for end‑to‑end coding amid mixed model feedback
OpenAI shipped Codex updates for agents and tooling while positioning GPT‑5.4 for real multi‑step coding work, but some users report reasoning regress...
Before You Migrate to OpenAI’s Responses API, Read This
OpenAI’s new Responses API simplifies agentic workflows, but you give up determinism and tight orchestration control you had with Chat Completions. A...
LangChain Core 1.3.0a1 alpha: faster streaming, safer templates, Bedrock mappings, prompt API deprecations
LangChain released an alpha of langchain-core 1.3.0a1 with streaming performance tweaks, safer templating, Bedrock model mapping, and prompt API depre...
OpenAI reportedly slows o3 rollout over cybersecurity risk; expect tighter gating of advanced model capabilities
OpenAI is reportedly slowing the release of its o3 model over concerns it could materially assist cyberattacks. According to a report, OpenAI’s inter...
Codex 0.119–0.120: Realtime V2 progress streaming, stronger MCP, and sturdier remote/sandbox runs
OpenAI Codex shipped 0.119 and 0.120, adding Realtime V2 progress streaming and major stability fixes for remote and sandboxed workflows. The release...
OpenAI drops ChatGPT Pro to $100 and leans into Codex for power users
OpenAI repositioned ChatGPT Pro at $100 per month with bigger Codex allocations, turning up the heat on Anthropic for developer wallets. According to...
AI security pivots to defense: restricted LLMs, risky code assistants, and practical guardrails
Vendors are shifting from open access to locked-down, defense-first AI as code assistants prove easy to abuse. A report says OpenAI is prepping a res...
OpenAI Python v2.31.0: short‑lived tokens and raw WebSocket streaming land amid logging glitches
OpenAI’s Python SDK v2.31.0 adds short-lived token auth and raw WebSocket streaming, while developers report dashboard logging glitches. The new rele...
OpenAI Agents and Realtime look shiny on paper, but dev threads flag reliability and billing gotchas
OpenAI’s Agents/Realtime docs around GPT-5.4 arrived as community reports flag reliability bugs and billing glitches that complicate production use.
Claude Mythos posts record SWE-bench numbers, but it’s gated; tighten your evals and fix your AI test blind spots
Anthropic’s Claude Mythos preview claims record SWE-bench results, but it isn’t publicly available and public leaderboards don’t reflect it yet. A de...
Agent harnesses, not more agents: how teams are actually getting AI to production
Enterprises are shipping reliable agentic AI by building a hardened “agent harness” and resisting unnecessary multi-agent sprawl. Real deployments st...
OpenAI’s $122B raise signals massive infra buildout while devs still hit rate limits and rough edges
OpenAI reportedly closed a $122B round at an $852B valuation, promising scale while developer pain points still show up in the trenches. Reports say ...