Build RAG With Milvus And EmbedAnything
Building RAG with Milvus and EmbedAnything
EmbedAnything is a blazing-fast, lightweight embedding pipeline built in Rust that supports text, PDFs, images, audio, and more.
In this tutorial, we’ll demonstrate how to build a Retrieval-Augmented Generation (RAG) pipeline using EmbedAnything together with Milvus. Rather than tightly coupling with any specific database, EmbedAnything uses a pluggable adapter system — adapters serve as wrappers that define how embeddings are formatted, indexed, and stored in the target vector store.
By pairing EmbedAnything with a Milvus adapter, you can generate embeddings from diverse file types and efficiently store them in Milvus in just a few lines of code.
⚠️ Note: While the adapter in EmbedAnything handles insertion into Milvus, it does not support search out of the box. To build a full RAG pipeline, you’ll also need to instantiate a MilvusClient separately and implement the retrieval logic (e.g., similarity search over vectors) as part of your application.
Preparation
Dependencies and Environment
If you are using Google Colab, to enable dependencies just installed, you may need to restart the runtime (click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).
Clone the Repository and Load Adapter
Next, we’ll clone the EmbedAnything repo and add the examples/adapters directory to the Python path. This is where we store the custom Milvus adapter implementation, which allows EmbedAnything to communicate with Milvus for vector insertion.
✅ EmbedAnything cloned and adapter path added.
We will use OpenAI as the LLM in this RAG pipeline. You should prepare the api key OPENAI_API_KEY as an environment variable.
Build RAG
Initialize Milvus
Before we embed any files, we need to prepare two components that interact with Milvus:
MilvusVectorAdapter– This is the Milvus adapter for EmbedAnything, and is used only for vector ingestion (i.e., inserting embeddings and creating indexes). It currently does not support search operations.MilvusClient– This is the official client frompymilvus, which enables full access to Milvus capabilities including vector search, filtering, and collection management.
To avoid confusion:
- Think of
MilvusVectorAdapteras your "write-only" tool for storing vectors. - Think of
MilvusClientas your "read-and-search" engine to actually perform queries and retrieve documents for RAG.
Ok - Milvus DB connection established. Collection 'embed_anything_milvus_collection' created with index.
As for the argument of
MilvusVectorAdapterandMilvusClient:
- Setting the
urias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.- If you have large scale of data, say more than a million vectors, you can set up a more performant Milvus server on Docker or Kubernetes. In this setup, please use the server address and port as your uri, e.g.
http://localhost:19530. If you enable the authentication feature on Milvus, use "<your_username>:<your_password>" as the token, otherwise don't set the token.- If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the
uriandtoken, which correspond to the Public Endpoint and Api key in Zilliz Cloud.
Initialize Embedding Model and Embed PDF Document
Now we'll initialize the embedding model. We'll use the all-MiniLM-L12-v2 model from the sentence-transformers library, which is a lightweight yet powerful model for generating text embeddings. It produces 384-dimensional embeddings, so this aligns with our Milvus collection dimension being set to 384. This alignment is crucial and ensures compatibility between the vector dimensions stored in Milvus and those generated by the model.
EmbedAnything supports a lot more embedding models. For more details, please refer to the official documentation.
Now, let's embed a PDF file. EmbedAnything makes it easy to process PDF (and many more) documents and store their embeddings directly in Milvus.
Converted 12 embeddings for insertion. Successfully inserted 12 embeddings.
Retrieve and Generate Response
Again, the MilvusVectorAdapter from EmbedAnything currently is a lightweight abstraction for vector ingestion and indexing only. It does not support search or retrieval queries. Therefore, for search relevant documents to build our RAG pipeline, we must directly use the MilvusClient instance (milvus_client) to query our Milvus vector store.
Define a function to retrieve relevant documents from Milvus.
Define a function to generate a response using the retrieved documents in the RAG pipeline.
Let's test the RAG pipeline with a sample question.
Question: How does Milvus search for similar documents? Answer: Milvus searches for similar documents primarily through Approximate Nearest Neighbor (ANN) search, which finds the top K vectors closest to a given query vector. It also supports various other types of searches, such as filtering search under specified conditions, range search within a specified radius, hybrid search based on multiple vector fields, and keyword search based on BM25. Additionally, it can perform reranking to adjust the order of search results based on additional criteria, refining the initial ANN search results.