The Best Local OCR Models: Tests and Comparison on Real Documents

Опубликовано: 03 Июнь 2026
на канале: ServerFlow AI Lab - R&D в области ИИ и LLM

8,656

398

In this video, we test OCR models and the VLM model for recognizing text from documents, scans, PDFs, tables, mathematical formulas, and handwriting. We compare PaddleOCR-VL, MinerU, GLM-OCR, Chandra OCR 2, olmOCR 2, and Qwen3.6 35B.

We examine how the models preserve document structure, reading order, Markdown markup, tables, and formulas. We pay special attention to complex cases where OCR accuracy, recognition quality, and the absence of key data loss are crucial.

Finally, we compare results for text accuracy, CER, and unit-test pass rate to understand which models are best suited for OCR, RAG pipelines, document processing, and text extraction.

Telegram - https://t.me/serverflowofficial
ServerFlow Blog - https://serverflow.ru/blog/

00:00 - Introduction
02:08 - Model 1. MinerU2.5-Pro-1.2B
02:54 - Model 2. PaddleOCR-VL-1.5
03:37 - Model 3. GLM-OCR
04:16 - Model 4. Chandra-OCR-2
05:00 - Model 5. olmOCR-2-7B
05:28 - Model 6. Qwen3.6-35B
06:03 - Overall Analysis of Model Results
06:16 - Analysis of Test 1 Results
06:22 - Analysis of Test 2 Results
06:34 - Analysis of Test 3 Results
06:42 - Analysis of Test 4 Results
06:51 - Analysis of Test 5 Results
07:04 - Analysis of Test 6 Results
07:31 - Summary