RAG Chain Server API Client
Press Release Chat Bot
As part of this generative AI workflow, we create a NVIDIA PR chatbot that answers questions from the NVIDIA news and blogs from years of 2022 and 2023. For this, we have created a REST FastAPI server that wraps llama-index. The API server has two methods, upload_document and generate. The upload_document method takes a document from the user's computer and uploads it to a Milvus vector database after splitting, chunking and embedding the document. The generate API method generates an answer from the provided prompt optionally sourcing information from a vector database.
Step-1: Load the pdf files from the dataset folder.
You can upload the pdf files containing the NVIDIA blogs to query:8081/uploadDocument API endpoint
Step-2 : Ask a question without referring to the knowledge base
Ask Tensorrt LLM llama-2 13B model a question about "the nvidia grace superchip" without seeking help from the vectordb/knowledge base by setting use_knowledge_base to false
Now ask it the same question by setting use_knowledge_base to true
Next steps
We have setup a playground UI for you to upload files and get answers from, the UI is available on the same IP address as the notebooks: host_ip:8090/converse