Composability#

bodhilib library is designed with composability in mind. It takes many ideas from strict functional languages like Haskell to design and implement its interface.

Using the bodhilib library, you can simplify the ingestion phase of your RAG process as follows:

Setup#

  1. Ensure bodhilib is installed, along with LLM plugin (bodhiext.openai), Embedder plugin (bodhiext.sentence_transformers), VectorDB plugin (bodhiext.qdrant).

  2. Ensure fn.py is installed for the functional composition methods.

  3. Ensure OPENAI_API_KEY is set in environment variables.

[1]:
!pip install -q bodhilib bodhiext.openai bodhiext.sentence_transformers bodhiext.qdrant fn.py python-dotenv
[2]:
import os
from getpass import getpass
from dotenv import load_dotenv

load_dotenv()
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key:")
[3]:
# load the components
from bodhilib import get_llm, get_data_loader, get_embedder, get_vector_db, get_splitter

data_loader = get_data_loader("file")
splitter = get_splitter("text_splitter")
embedder = get_embedder("sentence_transformers")
vector_db = get_vector_db("qdrant", location=":memory:")
llm = get_llm("openai_chat", model="gpt-3.5-turbo")
[4]:
vector_db.delete_collection("test")
vector_db.create_collection("test", dimension=embedder.dimension, distance="cosine")
[4]:
True
[5]:
vector_db.get_collections()
[5]:
['test']
[6]:
from fn import F
[7]:
data_loader.add_resource(dir="../data", recursive=True)

f = (
    F(data_loader.load)
    >> F(splitter.split)
    >> F(embedder.embed)
    >> F(lambda nodes: vector_db.upsert(collection_name="test", nodes=nodes))
)
[8]:
records = f()
len(records)
[8]:
26

And to query your VectorDB, you can compose it like:

[9]:
from bodhilib import PromptTemplate

template = """Below are the text chunks from a blog/article.
1. Read and understand the text chunks
2. After the text chunks, there are list of questions starting with `Question:`
3. Answer the questions from the information given in the text chunks
4. If you don't find the answer in the provided text chunks, say 'I couldn't find the answer to this question in the given text'

{% for text in texts %}
### START
{{ text }}
### END
{% endfor %}

Question: {{ query }}
Answer:
"""
prompt_template = PromptTemplate(template=template, format='jinja2')

input_query = "According to Paul Graham, how to tackle when you are in doubt?"
[11]:
import textwrap

answer = (
    F(embedder.embed)
    >> F(
        lambda e: vector_db.query(
            collection_name="test", embedding=e[0].embedding, limit=5
        )
    )
    >> F(lambda nodes: prompt_template.to_prompts(query=input_query, texts = [node.text for node in nodes]))
    >> F(llm.generate)
)

response = answer(input_query)

print(textwrap.fill(response.text, width=100, replace_whitespace=False))
According to Paul Graham, when you are in doubt about what to work on, you should optimize for
interestingness. He suggests trying lots of things, meeting lots of people, reading lots of books,
and asking lots of questions.