Text embedding & indexing similarities for fast comparison

Selvan
2 min readMay 20, 2023

--

Text embeddings help us measure the relatedness of (paragraph of) texts in the context of a LLM.

LLM hosts such as Open AI, provide an API endpoint to fetch the relatedness of given texts. Response of this relatedness API is a list of floats (vectors).

We could very well store vector returned by the LLM API in a column of a database table for persistance.

If we could messaure relatedness of two given texts in the context of a LLM, we could also measure how close these texts with respect to each other by measuring vector distance between them.

Since we could store vectors in a database table, to compare a vector with rows of other vectors from a table, we have to compute vector distance for each row of vector and the input vector.

Is there a way to create index on vector values?, so that every vector comparision could be quicker instead of brute force way of comparing all the rows of vectors in the table?

It’s quite common to create indexes on columns to speedup comparision, thus fetching. Its easy to create indexes on numeric, boolean or string columns.

How do we create an index on a column of vectors?

Enter Aproximate Nearest Neighbor (ANN). ANN helps us to create index on vectors, thus accelerate vector comparision. Let’s look at tools that help us create ANN,

Use case walk through

Mechanics of ANN — https://www.youtube.com/watch?v=DRbjpuqOsjk

--

--

Selvan
Selvan

Written by Selvan

Excite about creating unique user experiences

No responses yet