BERT Can See Out of the Box

Опубликовано: 13 Май 2026
на канале: Connor Shorten

1,935

The video explores an interesting paper seeing how easily (w.r.t fine-tuning effort) pre-trained visual embeddings can be combined with text captions for visual question generation in the BERT model. This video explores the approach the authors take for this, applications of vision-language models, and question generation. Thanks for watching! Please Subscribe!

Paper Links:
BERT Can See Out of the Box: https://arxiv.org/pdf/2002.10832.pdf
BERT: https://arxiv.org/pdf/1810.04805.pdf
ImageBERT: https://arxiv.org/pdf/2001.07966.pdf
Training QA Models from Synthetic Data: https://arxiv.org/pdf/2002.09599.pdf
AI2 BREAK: / break-mapping-natural-language-questions-t...
Google Street View Panoramas for Language Grounding Tasks: https://ai.googleblog.com/2020/02/enh...
Google Open Images V6 with Localized Narratives: https://ai.googleblog.com/2020/02/enh...
Salesforce Learning Reasoning Paths: https://blog.einstein.ai/learning-to-...

Thanks for Watching! Please Subscribe!