LOCALAI PUB_DATE: 2025.12.26

LOCALAI 3.9.0 ADDS AGENT JOBS AND SMARTER GPU MEMORY MANAGEMENT

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model e...

LocalAI 3.9.0 adds Agent Jobs and smarter GPU memory management

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model eviction to prevent OOM by auto-unloading unused models. It also adds MLX and CUDA 13 support, improving compatibility across Apple Silicon and newer NVIDIA stacks. The release focuses on stability and resource efficiency for local multi-model orchestration.

[ WHY_IT_MATTERS ]
01.

Reduces OOM failures and improves reliability for on-prem inference workloads.

02.

Enables scheduled evaluations, reports, and automation without external schedulers.

[ WHAT_TO_TEST ]
  • terminal

    Schedule Agent Jobs via cron and API with webhook callbacks to validate idempotency, retries, and CI/CD integration.

  • terminal

    Stress-test the Memory Reclaimer under concurrent model loads to tune LRU thresholds and measure latency impact.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Map existing Airflow/cron jobs to Agent Jobs via API to avoid duplicate scheduling and ensure clear ownership.

  • 02.

    Pin CUDA/MLX versions and validate long-running services with LRU eviction to avoid unexpected model unloads.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Use LocalAI as the local inference orchestrator, wiring Agent Jobs + webhooks into pipeline triggers from day one.

  • 02.

    Design deployments around modest VRAM by leveraging LRU eviction and threshold tuning to maximize model concurrency.