While vector databases are used to power recommendations and search engines - they are ultimately still data science products, which mean they require constant iteration, analysis and feedback to improve. However, modern vector databases are not very easy to experiment with.
Typical Vector Database Workflow
Vector databases are at their core - designed for production-level purposes without enabling the developers to first test which vectors work best for their purposes.
️ Vectors require a significant amount of finetuning, experimentation and testing.
Vectors/embeddings rarely work as well as we want out of the box. They often require careful model selection, fine-tuning to a specific domain, access to experiment with different search methods and clever data processing to ensure they are successful.
Naturally, when you are experimenting with vector databases, you will come across a number of problems:
While these range from model problems to data problems, one thing is clear:
The quality of your embeddings is proportional to the number of experiments you are able to try.
An experimentation-first vector database allows developers to do just that - iterate on their vectors.
Experimentation-first Vector Database Workflow
Through our experience in developing vector-based applications, we understand the current vector search workflow and want to ensure practitioners and researchers are able to use the best model and best data for their searching/recommending/identification of nearest neighbors.
At a high level, a good vector database comprises a few components:
First-class metadata storage with the vectors
Infrastructure to help compare vector search results
Flexible vector search querying to allow you to search across multiple models and multiple fields in 1 line of code
Good integration and support with traditional search/hybrid search functionalities
Support for more complex data structures for your vectors to capture structural complexity