OpenAI Codex-Spark debuts on Cerebras for near-instant agentic coding
OpenAI launched GPT-5.3-Codex-Spark, a fast, steerable coding model served on Cerebras hardware to deliver near-instant responses for real-time agentic development. OpenAI and Cerebras unveiled a research preview of Codex-Spark aimed at live, iterative coding with responsiveness over 1,000 tokens/s, enabled by the Cerebras Wafer-Scale Engine, and designed to keep developers “in the loop” during agentic work [Cerebras announcement](https://www.cerebras.ai/blog/openai-codexspark). Independent coverage frames this as OpenAI’s first major inference move beyond Nvidia, positioning Cerebras for ultra-low-latency workloads while acknowledging capability tradeoffs versus the full GPT‑5.3‑Codex on autonomous engineering benchmarks [VentureBeat](https://venturebeat.com/technology/openai-deploys-cerebras-chips-for-15x-faster-code-generation-in-first-major) and broader speed-focused reporting [The New Stack](https://thenewstack.io/openais-new-codex-spark-is-optimized-for-speed/). On the tooling front, the openai/codex v0.99.0 release adds app‑server APIs for steering active turns, enterprise controls via requirements.toml (e.g., web search modes, network constraints), improved TUI flows, and concurrent shell command execution—useful for orchestrating agent runs with higher control and safety [GitHub release notes](https://github.com/openai/codex/releases/tag/rust-v0.99.0). For adoption patterns, a practical guide outlines “agent‑first engineering” using Codex CLI/IDE, cloud sandboxes for parallel tasks, an SDK for programmatic control, and GitHub Actions to plug agents into CI/CD with clear definitions of “done” [agentic workflow guide](https://www.gend.co/fr/blog/codex-agent-first-engineering).