DeepSeek-OCR: The AI That Makes Images Cheaper Than Text

Опубликовано: 21 Май 2026
на канале: GeoTech Pulse
26
2

Explains DeepSeek-OCR, a model where visual data is computationally cheaper than raw text.
Key concepts covered:
10x compression factor with 97% accuracy
Unified VLM architecture: Deep Encoder + DeepSeek 3B MoE Decoder
Staged Encoder: SAM for local details, CLIP for global layout
Memory solution: 16x downsampling before global attention
MoE Decoder: Large model power with small model efficiency
Gundam Mode: Dynamic tiling for ultra-high-resolution images