Question Answering
This notebook demonstrates how Pinecone's similarity search as a service helps you build a question answering application. We will index a set of questions and retrieve the most similar stored questions for a new (unseen) question. That way, we can link a new question to answers we might already have.
You can build a questions answering application with Pinecone in three steps:
- Represent questions as vector embeddings so that semantically similar questions are in close proximity within the same vector space.
- Index vectors using Pinecone.
- Given a new question, query the index to fetch similar questions. This can allow us to store answers associated with these questions
In this notebook we will be dealing with indexing a set of quetions and retrieving similar questions for a new and unseen question.
Dependencies
Pinecone Installation and Setup
Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a free API key and enter it below where we will initialize our connection to Pinecone and create a new index.
Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all available providers and regions here.
Create the index:
Uploading Questions
The dataset used in this notebook is the Quora Question Pairs Dataset.
Let's download the dataset and load the data.
qid1 \
0 216488
1 424959
2 300233
3 302677
4 468590
question1
0 I would love to give a TED talk. What do I do?
1 Do all caps titles on YouTube videos attract more viewers than normal titles?
2 How do I start self-learning ethical hacking?
3 Should learning musical instruments in schools be made compulsory?
4 Does the success of a self proclaimed Acharya Pankaj Pathak in Assam prove that we, as a state, are regressing back instead of progressing?
Define the model
We will use the Averarage Word Embeddings Model for this example. This model has a high computation speed but relatively low quality of embeddings. You can look into other sentence embeddings models such as the Sentence Embeddings Models trained on Paraphrases for improving quality of embeddings.
Downloading: 0%| | 0.00/690 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/480M [00:00<?, ?B/s]
Downloading: 0%| | 0.00/4.61M [00:00<?, ?B/s]
Downloading: 0%| | 0.00/164 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/190 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/2.15k [00:00<?, ?B/s]
Downloading: 0%| | 0.00/122 [00:00<?, ?B/s]
Downloading: 0%| | 0.00/248 [00:00<?, ?B/s]
Creating Vector Embeddings
Batches: 0%| | 0/9083 [00:00<?, ?it/s]
Index the Vectors
Search
Once you have indexed the vectors it is very straightforward to query the index. These are the steps you need to follow:
- Select a set of questions you want to query with
- Use the Average Embedding Model to transform questions into embeddings.
- Send each question vector to the Pinecone index and retrieve most similar indexed questions
Original question : What is best way to make money online?
Most similar questions based on pinecone vector search:
id question score
0 57 What is best way to make money online? 1.000000
1 297469 What is the best way to make money online? 1.000000
2 55585 What is the best way for making money online? 0.989930
3 28280 What are the best ways to make money online? 0.981526
4 157045 What is the best way to make money on the internet? 0.978538
Original question : How can i build an e-commerce website?
Most similar questions based on pinecone vector search:
id question score
0 119383 How can I develop an e-commerce website? 0.925466
1 1713 How would I develop an e-commerce website? 0.925466
2 1714 How do I create an e-commerce website? 0.919407
3 79063 How do I build and host an e-commerce website? 0.918379
4 245780 What is the best platform to build an e-commerce website? 0.894444
Delete the Index
Delete the index once you are sure that you do not want to use it anymore. Once it is deleted, you cannot reuse it.