New embedding model: Contextual Document Embeddings

Опубликовано: 20 Ноябрь 2025
на канале: Weaviate vector database
2,480
66

Traditional document embeddings have a significant limitation: they encode documents independently, without considering their context or neighboring documents.

This means they have to choose a single global weighting for terms, potentially missing important contextual nuances, or overweighting terms that might occur a lot in the dataset. This can be problematic when embedding in different domains or contexts.

✨ The Solution: Contextual Document Embeddings (CDE) ✨

CDE operates in two stages:
1️⃣ Adversarial contrastive learning: batch and embed related context from neighboring documents
2️⃣ Embed the target document while considering the contextual embeddings of the related document batch

CDE can:
Improve performance in domain-specific scenarios
Better handle of out-of-domain queries

but also has the benefits of:
No additional storage requirements during retrieval
Maintains fast search capabilities

The approach has achieved state-of-the-art results on the MTEB benchmark: https://huggingface.co/spaces/mteb/le...

Want to dive deeper? Check out the full research paper: https://arxiv.org/abs/2410.02525
Or try it out with this notebook: https://github.com/weaviate/recipes/b...


▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT WITH US ▬▬▬▬▬▬▬▬▬▬▬▬

Visit http://weaviate.io/
Star us on GitHub https://github.com/weaviate/weaviate
Stay updated and subscribe to our newsletter: https://newsletter.weaviate.io/
Try out Weaviate Cloud for free here: https://console.weaviate.cloud/

Got a question?
Forum: https://forum.weaviate.io/
Slack: https://weaviate.io/slack

Connect with us on
Twitter:   / weaviate_io  
LinkedIn:   / weaviate-io