Notebooks
M
Milvus
Hybrid And Rerank With Langchain

Hybrid And Rerank With Langchain

image-searchvector-databasesemantic-searchmilvusembeddingsunstructured-dataquestion-answeringLLMmilvus-bootcampdeep-learningimage-recognitionimage-classificationaudio-searchPythonbootcampragadvanced_ragNLP

Open In Colab

Hybrid and rerank

Google Colab preparation[optional]

This is an optional step, if you want to run this notebook on Google Colab.

[ ]
[ ]
[ ]

Please prepare you VOYAGE_API_KEY and OPENAI_API_KEY in your environment variables.

If you are running this notebook on Google Colab, you have to restart this session by Cmd/Ctrl + M, then press . to make the environment take effect.

[ ]

Get started

Prepare the data

We use the Langchain WebBaseLoader to load documents from blog sources and split them into chunks using the RecursiveCharacterTextSplitter.

[1]

Build the chain

We load the docs into milvus vectorstore, and build a milvus retriever.

[2]

And build a bm25 retriever from the docs.

[3]

Build a vanilla RAG chain.

[4]

Prepare hybrid_and_rerank_retriever.

[5]

Build hybrid_and_rerank_chain.

[6]

Test the chain

[7]
len(milvus_retrieved_doc) = 4
len(bm25_retrieved_doc) = 4
len(unique_documents) = 6

[vanilla_result]:
The models that use tools are TALM (Tool Augmented Language Models; Parisi et al. 2022), Toolformer (Schick et al. 2023), and MRKL (Modular Reasoning, Knowledge and Language; Karpas et al. 2022). These models are fine-tuned to learn to use external tool APIs and other external modules for additional capabilities.

[hybrid_and_rerank_result]:
The models that use tools are ChatGPT Plugins and OpenAI API for function calling, HuggingGPT, TALM (Tool Augmented Language Models), Toolformer, and MRKL (Modular Reasoning, Knowledge and Language). These models are equipped with the capability to use external tools or APIs to extend their functionalities.

In [hybrid_and_rerank_result], it answers with more ground truth:

ChatGPT with Plugins and OpenAI API function calling, HuggingGPT

which does not appear in [vanilla result].

[8]

[11]

We can see that the hybrid retrieved results include ChatGPT and HuggingGPT. This document contains the word tool, which is consistent with the word in the query, that is where BM25 did better.

Therefore, we can use bm25 as a supplement to vector retriever to improve the accuracy of recall.

[ ]