The Beep

The Beep

Share this post

The Beep
The Beep
Top Embedding Model for VectorDB

Top Embedding Model for VectorDB

Unordered Embedding Model List

Alamhanz's avatar
Alamhanz
Feb 22, 2024
∙ Paid

Share this post

The Beep
The Beep
Top Embedding Model for VectorDB
1
Share
alina grubnyak / unsplash.com

VectorDB is a Database of vectors where we can index all of our vector collection and query it. Therefore, as long as we can transform any entity—like image, text, or sound— to vectors or embedding, we can utilize VectorDB on those entities. In this era, where artificial neural networks are heavily used with capable machines, transforming objects to embedding is not hard. Here we will list some of those projects with some details to be considered before choosing them for VectorDB projects.

The Beep is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

RoBERTa

  • Vector Size: 768

  • Model Size: 476 MB

  • Description: A RoBERTa model train on data up to December 2022, for users who are familiar with the BERT model family.

  • Paper: https://arxiv.org/pdf/1907.11692.pdf

  • Use for: General Text Embedding

  • Limitations: It will truncate text longer than 512 tokens

  • Source: https://huggingface.co/olm/olm-roberta-base-dec-2022

Keep reading with a 7-day free trial

Subscribe to The Beep to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Andreas Chandra and Alamsyah Hanz
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share