VectorDB is a Database of vectors where we can index all of our vector collection and query it. Therefore, as long as we can transform any entity—like image, text, or sound— to vectors or embedding, we can utilize VectorDB on those entities. In this era, where artificial neural networks are heavily used with capable machines, transforming objects to embedding is not hard. Here we will list some of those projects with some details to be considered before choosing them for VectorDB projects.
RoBERTa
Vector Size: 768
Model Size: 476 MB
Description: A RoBERTa model train on data up to December 2022, for users who are familiar with the BERT model family.
Use for: General Text Embedding
Limitations: It will truncate text longer than 512 tokens
Source: https://huggingface.co/olm/olm-roberta-base-dec-2022
Keep reading with a 7-day free trial
Subscribe to The Beep to keep reading this post and get 7 days of free access to the full post archives.