GENERAL PUB_DATE: 2026.W01

LOCALAI 3.9.0 ADDS AGENT JOBS AND SMARTER GPU MEMORY MANAGEMENT

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model e...

LocalAI 3.9.0 adds Agent Jobs and smarter GPU memory management

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model eviction to prevent OOM by auto-unloading unused models. It also adds MLX and CUDA 13 support, improving compatibility across Apple Silicon and newer NVIDIA stacks. The release focuses on stability and resource efficiency for local multi-model orchestration.

[ WHY_IT_MATTERS ]
01.

Reduces OOM failures and improves reliability for on-prem inference workloads.

02.

Enables scheduled evaluations, reports, and automation without external schedulers.

[ WHAT_TO_TEST ]
  • terminal

    Schedule Agent Jobs via cron and API with webhook callbacks to validate idempotency, retries, and CI/CD integration.

  • terminal

    Stress-test the Memory Reclaimer under concurrent model loads to tune LRU thresholds and measure latency impact.