Dimensionality reduction

Dimensionality Reduction

If you have a single or multiple large vectors. You can reduce your vectors down to n_components easily. This is great for ensembling different vectors, but also interpretation by projecting onto a 3D/2D plane.

from relevanceai import Client
client = Client(token=YOUR_ACTIVATION_TOKEN)
ds = client.Dataset("quickstart")

vector_fields = ["text_vector"]
ds.reduce_dims(vector_fields=vector_fields, n_components=3)

The default model used is PCA from scikit-learn.

This outputs to the "{alias}{vector_field}_vector" by default. You can change it easily with output_field.

ds.reduce_dims(vector_fields=vector_fields, n_components=3, output_field="pca_vector_")

Out of the box support for many other dimensionality reduction algorithms.

Support for UMAP

https://umap-learn.readthedocs.io/en/latest/

pip install umap-learn

Then you can use model=umap with an easy string

dr_model = umap.UMAP()
ds.reduce_dims(vector_fields=vector_fields, n_components=3, 
               model="umap", output_field="pca_vector_")

We also support IVIS, TSNE out of the box. Just pass in "tsne", "ivis".

Combine with filters

Just like many functions in Relevance, dimensionality reduction can be combined with filters.

ds.reduce_dims(vector_fields=vector_fields, n_components=3, 
               output_field="pca_vector_", filters=filters)