pymupdf4llm

Repo

pymupdf4llm is an open-source Python library that converts PDF documents into structured, chunked data optimized for retrieval-augmented generation and other large-language-model workflows. It builds on the PyMuPDF renderer to provide geometry- and layout-aware JSON output for downstream AI pipelines.

article 1 story calendar_today First: 2026-01-06 update Last: 2026-01-06

Stories

Completed digest stories linked to this service.

Structured PDF extractor for RAG claims ~300 pages/s on CPU

2026-01-06

A new C-based PDF extractor with Python bindings outputs structured JSON (geometry, typography, headings) and ...