This video demonstrates an end-to-end Image Text Extraction & Quality Reasoning Pipeline built using OCR, classical computer vision, and LLM-based reasoning.
The system extracts text from images, computes interpretable visual quality features (blur, brightness, edge density), and uses a lightweight LLM to reason over these structured features and generate clean, explainable JSON outputs.
A key design choice is that the LLM never sees the image directly — it operates only on structured features, improving interpretability, control, and robustness under real-world API constraints.
🧠 What This Project Covers
OCR-based text extraction from images
Classical computer vision feature engineering
Structured LLM prompting and reasoning
JSON-based, machine-readable outputs
Real-world constraints like API quotas and rate limits
Honest discussion of limitations and future improvements
🛠 Tech Stack
Python
OpenCV
Tesseract OCR
Google Gemini (Flash)
Jupyter Notebook
📂 GitHub Repository
🔗 https://github.com/Nihal108-bi/Image-...
🎯 Why This Matters
This project focuses on engineering clarity over model size, demonstrating how to combine deterministic ML techniques with LLM reasoning in a production-oriented, explainable way.