Components#

The core components of bodhilib are -

  1. DataLoader

  2. Splitter

  3. Embedder

  4. PromptSource

  5. VectorDB

  6. LLM

DataLoader#

DataLoader is used to load documents from various sources. These sources can be local file, or a URL, or a database.

A DataLoader is configured using the add_resource method. Once configured, it can be either iterated to fetch the resources as Document on-demand, or eager fetched using the load method to get it as a List[Document].

class DataLoader#

class DataLoader(Iterable[Document], abc.ABC):
    @abc.abstractmethod
    def add_resource(self, **kwargs: Dict[str, Any]) -> None: ...

    @abc.abstractmethod
    def __iter__(self) -> Iterator[Document]: ...

    def load(self) -> List[Document]: ...

Splitter#

Splitter is used to split Document into right-sized processible chunks. For flexibility and composability, it takes in SerializedInput, and returns a list of Node with text corresponding to splits done by the implementation.

Ideally, you pass in Document or a list of Document to get back a list of Node split into processible chunks.

class Splitter#

class Splitter(abc.ABC):
    @abc.abstractmethod
    def split(self, inputs: SerializedInput) -> List[Node]: ...

Embedder#

Embedder embeds a text and returns a vector representation of the given text.

Ideally, Embedder takes in Node or list of Node, and returns the Node enriched with embedding by populating the embedding field of the Node.

class Embedder(abc.ABC):
    @abc.abstractmethod
    def embed(self, inputs: SerializedInput) -> List[Node]: ...

PromptSource#

PromptSource provides you an interface to browse and search through collection of most effective prompts. This way, you can test multiple prompt templates for your use-case and find the one that works for you.

class PromptSource#

class PromptSource(abc.ABC):
    @abc.abstractmethod
    def find(self, keywords: str | List[str]) -> List[PromptTemplate]: ...

    @abc.abstractmethod
    def list_all(self) -> List[PromptTemplate]: ...

VectorDB#

VectorDB defines the interface to interact with various Vector Databases. VectorDB has two main interface - upsert and query.

upsert takes in a list of Node, and inserts or update the underlying VectorDB with the text, metadata and the embedding in the Node object. These can later be used to query based on property or vector search.

query method allows you to query the underlying vector database with the given embedding and property filters. The property filters uses the MongoDB query syntax, and not tied to specific vector database. These property filters are transformed by the VectorDB to the database specific filters.

class VectorDB#

class VectorDB(abc.ABC):
    @abc.abstractmethod
    def upsert(self, collection_name: str, nodes: List[Node]) -> List[Node]: ...

    @abc.abstractmethod
    def query(
        self, collection_name: str, embedding: Embedding, filter: Optional[Dict[str, Any]], **kwargs: Dict[str, Any]
    ) -> List[Node]: ...

LLM#

LLM defines the interface to interact with a Large Language Model.

The generate method takes in a flexible input type SerializedInput and returns a response generated by the LLM.

class LLM#

class LLM(abc.ABC):
    @abc.abstractmethod
    def generate(
        self,
        prompt_input: SerializedInput,
        *,
        stream: Optional[bool] = None,
        **kwargs: Dict[str, Any]) -> Union[Prompt, PromptStream]: ...

LLM flexible input types#

LLM takes in SerializedInput. So any of the calls below are valid -

llm.generate("tell me a joke")
llm.generate(["tell me a joke", "joke should be related to architects"])
llm.generate(Prompt("tell me a joke"))
llm.generate([Prompt("you are a helpful AI assistant.", role="system"), Prompt("tell me a joke")])
llm.generate({"text": "tell me a joke", "role": "user"})
llm.generate([{"text": "you are a helpful AI assistant.", "role": "system"}, {"text": "tell me a joke", "role": "user"}])

LLM asynchronous PromptStream#

LLM generate method returns a Prompt object, which is a synchronous response from the LLM.

If you pass stream=True, then the LLM generate method returns PromptStream that is a asynchronous response model, and gives you ability to asynchronously receive and process the response.


🎉 We just got familiar with the components of bodhilib.

Next, let’s see the functional paradigms that guided the design of bodhilib, and how it helps with Composability.