2 Query Pdfs
rag-milvus-1nebius-token-factory-cookbookrag
Export
RAG With llama-index + Milvus + Qwen - Part 2
References
Step-1: Configuration
[1]
✅ Found NEBIUS_API_KEY in environment, using it
Step-2: Setup Embedding Model
We have a choice of local embedding model (fast) or running it on the cloud
If running locally:
- choose smaller models
- less accuracy but faster
If running on the cloud
- We can run large models (billions of params)
[2]
Step-3: Connect to Milvus
[3]
/home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at schema.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at common.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at milvus.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at rg.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at feder.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at msg.proto. Please update the gencode to avoid compatibility violations in the next runtime release. warnings.warn( /home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/milvus_lite/__init__.py:15: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. from pkg_resources import DistributionNotFound, get_distribution
✅ Connected to Milvus instance: ./rag.db
[4]
✅ Connected Llama-index to Milvus instance: ./rag.db
Step-4: Load Document Index from DB
[5]
✅ Loaded index from vector db: ./rag.db CPU times: user 94 ms, sys: 20.7 ms, total: 115 ms Wall time: 113 ms
Step-5: Setup LLM
[6]
/home/sujee/my-stuff/projects/nebius/token-factory-cookbook-1/rag/rag-milvus-1/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Step-6: Query
[7]
Uber’s revenue for 2020 was $11.139 billion.
Making sure the model uses context
Let's ask a generic factual question "When was the moon landing".
Now the model should know this generic factual answer.
But since we are querying documents, we want to the model to find answers from within the documents.
It should come back with something like "provided context does not have information about moon landing"
[8]
The provided context does not contain information about the date of the moon landing.