pymupdf4llm
Repopymupdf4llm is an open-source Python library that converts PDF documents into structured, chunked data optimized for retrieval-augmented generation and other large-language-model workflows. It builds on the PyMuPDF renderer to provide geometry- and layout-aware JSON output for downstream AI pipelines.
article
1 story
calendar_today
First: 2026-01-06
update
Last: 2026-01-06
Stories
Completed digest stories linked to this service.
-
Structured PDF extractor for RAG claims ~300 pages/s on CPU2026-01-06A new C-based PDF extractor with Python bindings outputs structured JSON (geometry, typography, headings) and ...