Embedder#

Embedder service embeds a text into its vector representation. This allows us to search for similar text using vector search.

To convert a text into its embedding, we first get an instance of Embedder using get_embedder method, passing it the Embedder service identifier service_name.

Then, we invoke the embed method, passing it a list of Node to enrich the nodes with its embeddings.

Let’s try it out.

Setup#

  1. Ensure bodhilib is installed.

  2. We are going to use setence_transformers embedder for embedding the data. Ensure the bodhiext.sentence_transformers plugin is installed as well.

[1]:
!pip install -q bodhilib bodhiext.sentence_transformers
[2]:
# Load the Paul Graham essays from data/data-loader directory using `file` DataLoader
# And then convert it into Nodes using sentence_splitter
import os
from pathlib import Path
from bodhilib import get_data_loader, get_splitter

# Get data directory path and add it to data_loader
current_dir = current_working_directory = Path(os.getcwd())
data_dir = current_dir / ".." / "data" / "data-loader"
data_loader = get_data_loader('file')
data_loader.add_resource(dir=str(data_dir))
docs = data_loader.load()
splitter = get_splitter("text_splitter", max_len=300, overlap=30)
nodes = splitter.split(docs)
[3]:
# Get instance of Embedder

from bodhilib import get_embedder

embedder = get_embedder("sentence_transformers")
[4]:
# Enrich the nodes with embeddings

_ = embedder.embed(nodes)
[5]:
import reprlib
# analyze nodes to see if they are enriched with embeddings
print(reprlib.repr(nodes[0].embedding))
[-0.046271130442619324, -0.0969996377825737, 0.09419207274913788, 0.014190234243869781, 0.02280914969742298, -0.01539283711463213, ...]

🎉 We just created the vector embeddings for our nodes.

Next, let’s see how to insert these embeddings into VectorDB.