"Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers"
Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
Affiliations: SketchX, CVSSP, University of Surrey & iFlyTek-Surrey Joint Research Centre on AI
📖 Abstract
In this paper, the authors explore how text-to-image diffusion models, which have achieved state-of-the-art results in generative tasks, can be repurposed to solve zero-shot sketch-based image retrieval (ZS-SBIR) — a challenging task where a hand-drawn sketch is used to retrieve matching real-life photos. They demonstrate that, without any task-specific training, diffusion models can outperform previous state-of-the-art ZS-SBIR methods by generating highly representative text embeddings of both sketches and images.
🧠 Key Concepts Covered:
1.Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR)
2.Text-to-Image Diffusion Models
3.Image & Sketch Encoding via CLIP
4.Cross-modal Embedding Alignment
5.Diffusion Model's Transferability
6.Comparison with VGG and other baselines
🎓 Presented by Group 24
Contributors: Aahan Rupal, Bhavesh Goyal, Keval Patel, Nikhil Garg, Lakshya Godara, and Aditi Singh