Llamaindex Basic RAG
RAG Example Using NVIDIA API Catalog and LlamaIndex
This notebook introduces how to use LlamaIndex to interact with NVIDIA hosted NIM microservices like chat, embedding, and reranking models to build a simple retrieval-augmented generation (RAG) application.
Alternatively, for a more interactive experience with a graphical user interface, you can refer to our code and YouTube video for Gradio-based RAG Q&A reference application that also uses NVIDIA hosted NIM microservices.
Terminology
RAG
- RAG is a technique for augmenting LLM knowledge with additional data.
- LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on.
- If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs.
- The process of bringing the appropriate information and inserting it into the model prompt is known as retrieval augmented generation (RAG).
The preceding summary of RAG originates in the LangChain v0.2 tutorial Build a RAG App tutorial in the LangChain v0.2 documentation.
For comprehensive information, refer to the LLamaIndex documentation for Building an LLM Application.
NIM
- NIM microservices are containerized microservices that simplify the deployment of generative AI models like LLMs and are optimized to run on NVIDIA GPUs.
- NIM microservices support models across domains like chat, embedding, reranking, and more from both the community and NVIDIA.
NVIDIA API Catalog
- NVIDIA API Catalog is a hosted platform for accessing a wide range of microservices online.
- You can test models on the catalog and then export them with an NVIDIA AI Enterprise license for on-premises or cloud deployment
LlamaIndex Concepts
Data connectorsingest your existing data from their native source and format.Data indexesstructure your data in intermediate representations that are easy and performant for LLMs to consume.Enginesprovide natural language access to your data for building context-augmented LLM apps.
LlamaIndex also provides integrations like llms-nvidia, embeddings-nvidia & nvidia-rerank to work with NVIDIA microservices.
Installation and Requirements
Create a Python environment (preferably with Conda) using Python version 3.10.14. To install Jupyter Lab, refer to the installation page.
Getting Started!
To get started you need a NVIDIA_API_KEY to use NVIDIA AI Foundation models:
- Create a free account with NVIDIA.
- Click on your model of choice.
- Under Input select the Python tab, and click Get API Key and then click Generate Key.
- Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.
Enter your NVIDIA API key: ········
RAG Example using LLM and Embedding
1) Initialize the LLM
llama-index-llms-nvidia, also known as NVIDIA's LLM connector,
allows your connect to and generate from compatible models available on the NVIDIA API catalog.
Here we will use mixtral-8x7b-instruct-v0.1
2) Intiatlize the embedding
We selected NV-Embed-QA as the embedding
3) Obtain some toy text dataset
Here we are loading a toy data from a text documents and in real-time data can be loaded from various sources.
Real world documents can be very long, this makes it hard to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.
To handle this we’ll split the Document into chunks for embedding and vector storage.
Note:
SimpleDirectoryReadertakes care of storing basic file information such as the filename, filepath, and file type as metadata by default. This metadata can be used to keep track of the source file, allowing us to use it later for citation or metadata filtering.
4) Process the documents into VectorStoreIndex
In RAG, your data is loaded and prepared for queries or "indexed". User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.
5) Create a Query Engine to ask question over your data
Sweden is a Northern European country, occupying the eastern part of the Scandinavian Peninsula. It shares borders with Norway to the west and north, Finland to the east, and is linked to Denmark in the southwest by the Öresund Bridge. Sweden is the largest country in Northern Europe and the fifth largest in Europe, with a total area of 449,964 km2. The country stretches between latitudes 55° and 70° N, and mostly between longitudes 11° and 25° E. Sweden's diverse climate is influenced by its varied topography, which includes a long coastline, numerous lakes, vast forested areas, and the Scandes mountain range that separates it from Norway. The capital and largest city is Stockholm. Sweden has a population of approximately 10.5 million people, with the majority residing in urban areas. The country is known for its extensive coastline, numerous lakes, and vast forested areas, as well as its commitment to social welfare, gender equality, and environmental sustainability. Historically, Sweden has maintained a policy of neutrality and non-participation in military alliances. However, it has recently moved towards cooperation with NATO. Sweden is a highly developed country, ranked seventh in the Human Development Index. It is a constitutional monarchy and parliamentary democracy, with legislative power vested in the 349-member unicameral Riksdag. The country is known for its high standard of living, universal health care, and tertiary education for its citizens. The official language of Sweden is Swedish, a North Germanic language closely related to Danish and Norwegian. English is widely spoken and understood by a majority of Swedes. Sweden's economy is mixed and largely service-oriented, with a strong emphasis on engineering, telecommunications, automotive, and pharmaceutical industries. The country is home to several multinational corporations, including IKEA, Volvo, Ericsson, and H&M. In summary, Sweden is a highly developed, forested country located in Northern Europe, known for its extensive coastline, high standard of living, commitment to social welfare, and diverse climate.
RAG Example with LLM, Embedding & Reranking
I don't have information about a "Nordic Channel" in the context of your query. However, I can share that the Swedish broadcasting landscape has seen significant developments. Radio broadcasts started in 1925, and in response to pirate radio stations, a second and third network were established in 1954 and 1962, respectively. In 1989, a satellite service known as Kanal 5 began broadcasting, which might be the service you're referring to, although it's not specifically labeled as "Nordic Channel" in the information provided.
Enhancing accuracy for single data sources
This example demonstrates how a re-ranking model can be used to combine retrieval results and improve accuracy during retrieval of documents.
Typically, reranking is a critical piece of high-accuracy, efficient retrieval pipelines. Generally, there are two important use cases:
- Combining results from multiple data sources
- Enhancing accuracy for single data sources
Here, we focus on demonstrating only the second use case.
The Nordic Channel was a Swedish-language satellite service that was launched in 1989. It is now known as Kanal 5.
Note:
- In this notebook, we used NVIDIA NIM microservices from the NVIDIA API Catalog.
- The above APIs, NVIDIA (llms), NVIDIAEmbedding, and NVIDIARerank, also support self-hosted microservices.
- Change the
base_urlto your deployed NIM URL - Example: NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")
- NIM can be hosted locally using Docker, following the NVIDIA NIM for LLMs documentation.