Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers | Paper Presentation

Опубликовано: 17 Июнь 2026
на канале: Bhavesh Goyal

"Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers"
Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
Affiliations: SketchX, CVSSP, University of Surrey & iFlyTek-Surrey Joint Research Centre on AI

📖 Abstract
In this paper, the authors explore how text-to-image diffusion models, which have achieved state-of-the-art results in generative tasks, can be repurposed to solve zero-shot sketch-based image retrieval (ZS-SBIR) — a challenging task where a hand-drawn sketch is used to retrieve matching real-life photos. They demonstrate that, without any task-specific training, diffusion models can outperform previous state-of-the-art ZS-SBIR methods by generating highly representative text embeddings of both sketches and images.

🧠 Key Concepts Covered:
1.Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR)
2.Text-to-Image Diffusion Models
3.Image & Sketch Encoding via CLIP
4.Cross-modal Embedding Alignment
5.Diffusion Model's Transferability
6.Comparison with VGG and other baselines

🎓 Presented by Group 24
Contributors: Aahan Rupal, Bhavesh Goyal, Keval Patel, Nikhil Garg, Lakshya Godara, and Aditi Singh