LocalAI 3.9.0 adds Agent Jobs and smarter GPU memory management

LOCALAI PUB_DATE: 2025.12.26

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model e...

LocalAI 3.9.0 introduces an Agent Jobs panel and API to schedule background agent tasks (cron, webhooks, MCP) and adds a Smart Memory Reclaimer with LRU model eviction to prevent OOM by auto-unloading unused models. It also adds MLX and CUDA 13 support, improving compatibility across Apple Silicon and newer NVIDIA stacks. The release focuses on stability and resource efficiency for local multi-model orchestration.

[ WHY_IT_MATTERS ]

01.

Reduces OOM failures and improves reliability for on-prem inference workloads.

02.

Enables scheduled evaluations, reports, and automation without external schedulers.

[ WHAT_TO_TEST ]

terminal
Schedule Agent Jobs via cron and API with webhook callbacks to validate idempotency, retries, and CI/CD integration.
terminal
Stress-test the Memory Reclaimer under concurrent model loads to tune LRU thresholds and measure latency impact.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Map existing Airflow/cron jobs to Agent Jobs via API to avoid duplicate scheduling and ensure clear ownership.
02.
Pin CUDA/MLX versions and validate long-running services with LRU eviction to avoid unexpected model unloads.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Use LocalAI as the local inference orchestrator, wiring Agent Jobs + webhooks into pipeline triggers from day one.
02.
Design deployments around modest VRAM by leveraging LRU eviction and threshold tuning to maximize model concurrency.

arrow_back

PREVIOUS_DATA_LOG

Using third‑party LLM APIs in VS Code (Qwen via Together/DeepInfra)

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

DeepSeek Android app hits 50M+ installs; privacy and reliability notes

arrow_forward