Build RAG With Milvus And Exa
Building a Dual-Source RAG Agent with Exa and Milvus
This tutorial demonstrates how to build an agent that searches both the public web (via Exa) and a private knowledge base (via Milvus), then synthesizes a unified answer. The agent uses OpenAI's function calling to automatically decide which source to query based on the user's question.
Exa is a search API designed for AI applications, which is proudly powered by Zilliz Cloud (fully managed Milvus). Unlike traditional keyword-based search engines, Exa supports semantic (neural) search — you describe what you want in natural language and it understands your intent. It also provides content extraction, highlights, and category-based filtering. Milvus is an open-source vector database built for scalable similarity search. By combining them with an LLM agent, you can build a system that retrieves both internal proprietary data and up-to-date web information in a single workflow.
Prerequisites
Before running this notebook, make sure you have the following dependencies installed:
Initialize Clients
Set up the Exa, OpenAI, and Milvus clients. We use OpenAI's text-embedding-3-small model to generate vector embeddings, and Milvus Lite for local vector storage with zero infrastructure setup.
As for the argument of
MilvusClient:
- Setting the
urias a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.- If you have large scale of data, you can set up a more performant Milvus server on docker or kubernetes. In this setup, please use the server uri, e.g.
http://localhost:19530, as youruri.- If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the
uriandtoken, which correspond to the Public Endpoint and Api key in Zilliz Cloud.
Define a helper function to generate embeddings. We will reuse this across the notebook for both indexing and querying:
Build the Private Knowledge Base (Milvus)
We simulate a set of internal company documents — product specs, policies, earnings reports, and API docs — that would not appear on the public web. In a real scenario, these could come from your internal wikis, databases, or document management systems.
Create the Milvus collection with an explicit schema, embed the documents, and insert them:
Inserted 5 documents into Milvus.
Let's verify the retrieval works with a quick test query:
[score=0.665] (return-policy.md) Our return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, on... [score=0.119] (q3-earnings.pdf) Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was primarily driven by enterprise customers adopting Widget Pro. ...
Explore Exa Search Capabilities
Before building the agent, let's explore Exa's search features. Exa supports multiple search modes that are useful for different scenarios.
Semantic search with content extraction — Exa can return not only links but also the article text, key highlights, and AI-generated summaries in a single request:
[The AI Trends Shaping 2026. A month into the new year is as good a… | by ODSC - Open Data Science | Mar, 2026 | Medium] URL: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49 Highlight: ahead. January brought Claude CoWork, Anthropic’s “AI coworker” that turns agents into desktop collaborators; OpenClaw (formerly Moltbot, formerly Cl... [AI agent trends 2026 report] URL: https://cloud.google.com/resources/content/ai-agent-trends-2026 Highlight: >. The era of simple prompts is over. We're witnessing the agent leap—where AI orchestrates complex, end-to-end workflows semi-autonomously. For enter... [The Rise of Agentic AI: Why 2026 is the Year AI Started 'Doing'] URL: https://www.marketdrafts.com/2026/02/rise-of-agentic-ai-2026-trends.html?m=1 Highlight: The era of "Generative AI" (which creates content) is being superseded by "Agentic AI" (which executes actions). We are witnessing a fundamental arch...
Category-based filtering — you can restrict results to specific content types such as "research paper", "news", "company", or "tweet". This is useful when you want high-quality sources and want to avoid noise:
- 10 RAG examples and use cases from real companies https://www.evidentlyai.com/blog/rag-examples - Implementing Retrieval-Augmented Generation (RAG) with Real-World Constraints https://dev.to/dextralabs/implementing-retrieval-augmented-generation-rag-with-real-world-constraints-3ajm - https://www.arxiv.org/pdf/2502.14930
Find similar articles — given a URL, Exa can find other articles with similar content. This is helpful for expanding research from a good starting point:
Articles similar to: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49 - AI Trends 2026: From Agent Demos to Production Reality https://opendatascience.com/the-ai-trends-shaping-2026/ - The Most Important AI Trends to Watch in 2026 https://medium.com/the-ai-studio/the-most-important-ai-trends-to-watch-in-2026-54af64d45021
Define the Agent Tools
Now we define the two tool functions that the agent will use. The private KB tool searches Milvus using vector similarity, while the web tool searches the public internet via Exa:
Build the Agent
The agent uses OpenAI's function calling to decide which tool(s) to invoke. It follows a simple loop: the LLM receives the user query, decides which tools to call (if any), executes them, and then synthesizes a final answer from the retrieved context.
Demo
Now let's test the agent with three scenarios that demonstrate different routing behaviors.
Scenario A: Internal question (routes to Milvus)
Ask about an internal policy — the agent should call search_private_kb and retrieve the answer from our private documents:
User: What is the return policy for Acme products? -> Calling search_private_kb(query='return policy Acme products') Agent: The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md]).
"The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md])."
Scenario B: External question (routes to Exa)
Ask about external trends — the agent should call search_web to fetch up-to-date information from the public internet:
User: What are the latest AI agent frameworks trending in 2026? -> Calling search_web(query='latest AI agent frameworks 2026') Agent: In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones: 1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration. 2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development. 3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently. 4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization. These frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into. Sources: - [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/) - [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally) - [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)
"In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:\n\n1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.\n\n2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.\n\n3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.\n\n4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.\n\nThese frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.\n\nSources:\n- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)\n- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)\n- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)"
Scenario C: Hybrid question (routes to both)
Ask a question that requires both internal specs and external benchmarks — the agent should call both tools and synthesize a comparison:
User: How does our Widget Pro's throughput compare to open-source alternatives on the market? -> Calling search_private_kb(query='Widget Pro throughput comparison') -> Calling search_web(query='open-source widget throughput comparison') Agent: The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison: ### Widget Pro - **Concurrent Connections**: Supports up to 10,000 concurrent connections. - **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]). - **API Rate Limits**: Offers different tiers: - Free tier: 100 requests/minute. - Pro tier: 5,000 requests/minute. - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]). ### Open-Source Alternatives From the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked. These alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency. In conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics.
"The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:\n\n### Widget Pro\n\n- **Concurrent Connections**: Supports up to 10,000 concurrent connections.\n- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).\n- **API Rate Limits**: Offers different tiers:\n - Free tier: 100 requests/minute.\n - Pro tier: 5,000 requests/minute.\n - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).\n\n### Open-Source Alternatives\n\nFrom the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.\n\nThese alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.\n\nIn conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics."
Cleanup
When you are done, drop the collection to free resources.
Conclusion
In this tutorial, we built a dual-source RAG agent that combines Milvus for private knowledge retrieval with Exa for public web search. The key components are:
- Milvus stores and retrieves internal documents via vector similarity search, ensuring proprietary data stays private and searchable.
- Exa provides semantic web search with features like category filtering, content extraction, and similar article discovery.
- OpenAI function calling enables the LLM to automatically route queries to the right source — or both — based on the question's intent.
This pattern is applicable to enterprise use cases where an AI assistant needs access to both confidential internal documents and real-time external information.