Towhee
Question Answering Using Milvus, Towhee and Hugging Face
In this notebook we go over how to search for the best answer to questions using Milvus as the Vector Database, Towhee as the pipeline, and Hugging Face as the model.
Packages
We first begin with importing the required packages. In this example, the only non-builtin packages are datasets, towhee and pymilvus. Datasets is the Hugging Face packages to load in the data, towhee is the pipelining application, and pymilvus is the client for Zilliz Cloud. If not present on your system, these packages can be installed using pip install towhee datasets pymilvus.
/Users/filiphaltmayer/miniconda3/envs/openai/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Parameters
Here we can find the main parameters that need to be modified for running with your own accounts. Beside each is a description of what it is.
Milvus
This segment deals with Milvus and setting up the database for this use case. Within Milvus we need to setup a collection and index the collection.
Insert Data
Once we have the collection setup we need to start inserting our data. This is done in three steps: tokenizing the original question, embedding the tokenized question, and inserting the embedding, original question, and answer. In our example we use Hugging Face Datasets to load in the dataset and then feed that into a Towhee pipeline for embedding and inserting.
2023-02-10 15:02:46,071 - 140704288082112 - builder.py-builder:785 - WARNING: Found cached dataset squad (/Users/filiphaltmayer/.cache/huggingface/datasets/squad/plain_text/1.0.0/d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453) 100%|██████████| 99/99 [00:00<00:00, 2055.91ex/s]
2023-02-10 15:02:46.444223: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Question: What was Orsini known for? Answer, Distance, Original Question: ['Napoleon III', 13.175783157348633, 'Who did Orsini try to assassinate?'] ['retracted his decision', 24.993133544921875, 'What did Nasser do after mass demonstrations?'] ['1865', 25.8225154876709, 'When did Palmerston die?'] ['US$320,000,000', 26.799659729003906, 'How much money did Nasser spend on weapons?'] ['bid for statehood', 27.64345932006836, 'Why was this constitutional convention held?'] ['Oba Kosoko', 27.844146728515625, 'Which Lagos king had supported the slave trade?'] ['2001', 28.558818817138672, 'When was the Doha Declaration adopted?'] ['James River and Kanawha Canal', 29.44527816772461, 'What man-made body of water was designed in part by George Washington?'] ['in a state of self-sacrifice', 29.550865173339844, 'How did the Cathars live?'] ['1748 with the signing of the Treaty of Aix-la-Chapelle', 29.62041473388672, 'What was the end of the War of the Austrian Succession?'] Question: What does the finding of gold cause? Answer, Distance, Original Question: ['gold rush', 17.19445037841797, 'What did the finding of gold in Victoria cause?'] ['Dirección Nacional de Transporte', 20.836410522460938, 'What does DNT stand for?'] ['Community Based Natural Resource Management', 21.718856811523438, 'What does CBNRM stand for?'] ['rendering of the Old Testament into Greek', 22.101085662841797, 'What is one of the first known instances of translation in the West?'] ['James River and Kanawha Canal', 23.819223403930664, 'What man-made body of water was designed in part by George Washington?'] ['when transporting large goods or moving groups of people between certain floors', 24.325286865234375, 'At what times is Independant service best utilized?'] ['obligations established by agreement (express or implied) between private parties', 25.668529510498047, 'What is contract law?'] ['an earthquake', 25.91346549987793, 'What occurrence is measured by a seismometer?'] ['Neo-Historicism/Revivalism, Traditionalism', 25.926233291625977, 'What is another name for New Classical Architecture?'] ['high', 26.297353744506836, 'What do dysfunctional people perceive the severity of their pain to be?']