Notebooks
W
Weaviate
TwelveLabs Weaviate RAG Colab

TwelveLabs Weaviate RAG Colab

vector-searchtwelvelabsvector-databaseretrieval-augmented-generationllm-frameworksweaviate-featuresfunction-callingweaviate-recipesmedia-searchPythongenerative-ai

Open In Colab

Twelve Labs Video RAG with Weaviate

Set Up Our Environment

Install Dependencies

[ ]
[ ]
[ ]

Set Up Twelve Labs and Weaviate SDKs

[ ]
[ ]
[ ]

Setting Up Our Video Data

[ ]

Setting Up Our Video Data

Some of our videos are too low resolution to use in the embedding engine, so we will double their their resolution with upscale_video.

read_video_pyav comes directly from the LLaVa-NeXT-Video collab notebook and it formats videos in the correct numpy representation for inference.

[ ]

Here we upscale all of our videos

[ ]

Compare Pegasus and LLaVA-NeXT-Video on a Single Video

We will start by comparing Pegausus and LLaVA-NeXT-Video on generating insights from a single video

Using Pegasus to Chat with our Video

To chat with our video, we first need to have Pegasus index it.

We will create an index named sports_videos and then upload our video to this index to be indexed before chatting with it. We only need to do this once per video.

In more complex workflows with multiple videos, we can upload all of can be done way ahead of time to reduce overhead and speed up the end-to-end workflow.

First we create the index.

[ ]

Then we create a funciton to upload our video to be indexed.

[ ]
[ ]

Next, we'll upload our video.

[ ]
[ ]

Finally we'll query it.

[ ]

Here is Pegasus' response:

The video showcases a pivotal moment in a football game between the New York Giants and the New England Patriots. Eli Manning, the Giants' quarterback, throws a pass that David Tyree catches spectacularly by pinning the ball against his helmet as he falls out of bounds. Multiple angles replay the catch, emphasizing its difficulty and precision. Tyree briefly celebrates after the play, and the video ends with him and other players walking off the field.

From the above response, we can see that Pegagus 1.2 can coherently resopnd to the question. Now, lets check and see if we can get a similar response from the Open Source model.

[ ]

Using LLaVa-NeXT-Video to Chat with our Video

For the Open Source model, we will need to setup up a video sampling for the model to consume and load the model from the Hugging Face Hub, format the input for inference, and then run the model on our inputs. We will modify the LLaVa-NeXT-Video Sampling code to get a uniform sample of 40 frames for each video.

[ ]
[ ]

Here we'll set up our LLaVa-NeXT-Video model.

[ ]

Next, we'll create a function to query our model.

[ ]
[ ]

Output:

Here is LLaVa-NeXT-Video's ouput:

What is happening in this video? Be concise ASSISTANT: The video shows a football game in progress, with various players on the field. It appears to be the Super Bowl III between the New York Giants and the New England Patriots, judging by the jersey numbers and the old-fashioned helmets worn by some players. One player is in mid-action, grabbing the ball and getting tackled by another player, while a referee is signaling a first down. There are also coaches and other game

While this model does recognize that there is a football game happening between the Giants and the Patriots, it tends to hallucinate other facts.

[ ]

RAG for Segment-Level Queries on a Single Video

We see that Pegasus is the clear winner on time and accuracy for this query when querying the entire video.

The open source model would likely perform better if we could constrict the video in question to a smaller segment. We can do this by creating queries that only need a subset of the video, and using RAG to get the relevant subset.

This is where the Marengo model will come in. We can use it to create embeddings for each segment of the video, and then use RAG to get the most relevant segment based on our queries.

We will start by creating embeddings for each segment of the video.

[ ]

Using Marengo to Create Full Video and Video Clip Embeddings

Marengo allows us to retrieve embeddings for the entire video and for clips at a set clip length all in one call.

[ ]

We'll save the task ID for use later when uploading our embeddings to Weaviate.

[ ]
[ ]

Prepare Video Segments for RAG

Next, we will split this video up into segments that mirror the timestamps for each embedding. This lets us later submit only this video chunk to our model for a RAG use case

[ ]
[ ]

Next, we'll upload the video segments to Pegaus to get their video ids. We will upload these to Weaviate along with the embeddings, so we can easily chat with the returned video. This is a great way to speed up results when you have videos that users will chat with.

Here we'll create and populate a dictionary mapping file names with pegasus video IDs.

[ ]
[ ]

We'll also add the video ID for the full video that we retrieved earlier

[ ]
[ ]

We'll also sample all of our videos for use with the LLaVa-NeXT-Video model

[ ]

Uploading Embeddings to Weaviate

Now we'll create a function to prepare our data to be uploaded to Weaviate

[ ]
[ ]

Now, we'll upload the data to our collection

[ ]

Testing the Vector Search

Now that we have everything in the collection, we can test and see that it properly returns the correct sample 'video_name':5.0

[ ]

Querying our Vector Database with Text Embeddings

To query the database, first we'll embed our text query with Marengo's text embedding feature. Then we will query the Weaviate database for the clip embedding that best matches our question embeddings. We will then use the pegasus video ID to ask our question for that clip.

[ ]
[ ]
[ ]

Chatting with our Video Segment: Pegasus vs LLaVa-NeTX-Video

Pegasus:

[ ]

LLaVa-NeXT-Video

[ ]
[ ]

Multi Video RAG with Marengo, Weaviate, and Pegasus

Now that we know how Marengo embeddings perform on individual clips from a single video, we will show how to use embeddings across mutiple videos for a more realistic RAG use case

Get Marengo Embeddings for All Videos

[ ]
[ ]

Split our Remaining Videos into Segments

[ ]

Get Pegasus Video IDs for All Videos and their Segments

Finally, we will upload the full videos and their segments to Pegasus so we can chat with them. We will paralellize this task to speed it up.

[ ]
[ ]

Upload Data to Weaviate

First we'll prepare our data to be uploaded

[ ]

Then, we will upload it to our collection.

[ ]

RAG Questions

We now have Marengo embeddings and Pegasus video IDs upload to Weaviate.

We can assess the performance of running queries on the clips and the full video in terms of answer accuracy and speed.

[ ]

Multi Video RAG with Pegasus

[ ]
[ ]
[ ]

Multi Video RAG with LLaVa-NeXT-Video

Now we can run our model on the full video, which outputs some more interesting answers

First we'll sample the rest of our video segments

[ ]
[ ]
[ ]
[ ]