Notebooks
A
Arize AI
Rag Couchbase Langchain

Rag Couchbase Langchain

arize-tutorialsLLMPythonrag

arize logo
Docs | GitHub | Community

Using Arize with Couchbase

This guide shows you how to create a retrieval augmented generation chatbot and evaluate performance with Arize and Couchbase. RAG is typically to respond to queries using a specified set of documents instead of using the LLM's own training data, reducing hallucination and incorrect generations.

We'll go through the following steps:

  • Create a RAG chatbot with Langchain and Couchbase

  • Trace the retrieval and llm calls using Arize

  • Create a dataset to benchmark performance

  • Evaluate performance using LLM as a judge

Much of the code in this tutorial is adapted from the Langchain Couchbase Tutorial.

Create a RAG chatbot using Langchain and Couchbase

Let's start with all of our boilerplate setup:

  1. Install packages for tracing and retrieval
  2. Setup our API keys
  3. Setup Arize for tracing
  4. Setup Couchbase
  5. Create our Langchain RAG query engine
  6. See your results in Arize

Install packages for tracing and retrieval

[ ]

Setup our API Keys

[ ]

Setup Arize for Tracing

To follow with this tutorial, you'll need to sign up for Arize and get your API key. You can see the guide here.

[ ]

Setup Couchbase

You'll need to setup your Couchbase cluster by doing the following:

  1. Create an account at Couchbase Cloud
  2. Create a free cluster
  3. Create cluster access credentials
  4. Allow access to the cluster from your local machine
  5. Create a bucket to store your documents

Screenshots below:

Create our Langchain RAG query engine

Once you've setup your cluster, you can connect to it using langchain's couchbase package.

[ ]
[ ]

Before this step, you must also create a search index. You can do this by going to the Couchbase UI and clicking on the "Search" tab. Make sure the names match up with the ones we've defined above.

Link below: https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html

[ ]
[ ]
[ ]

We can test the vector search directly with the following code:

[ ]

We can load different documents into the vector store to test with like below, with the metadata.source field used to filter the documents separately from vector queries.

[ ]

You may need to tag the embedding field as a vector field in the search index settings. See image below:

Let's try the vector search using the Langchain retriever interface across our new documents.

[ ]

Let's run an entire RAG query with the Langchain RAG query engine.

[ ]

See your results in the Arize UI

Once you've run a single query, you can see the trace in the Arize UI with each step taken by the retriever, the embedding, and the llm query.

Click through the queries to better understand how the query engine is performing. Arize can be used to understand and troubleshoot your RAG app by surfacing:

  • Application latency
  • Token usage
  • Runtime exceptions
  • Retrieved documents
  • Embeddings
  • LLM parameters
  • Prompt templates
  • Tool descriptions
  • LLM function calls
  • And more!

Create synthetic dataset of questions

Using the template below, we're going to generate a dataframe of 25 questions we can use to test our customer support agent.

[ ]
[ ]
[ ]
[ ]

Now let's run it and manually inspect the traces!

[ ]
[ ]

Evaluating your RAG app

Now that we have a set of test cases, we can create evaluators to measure performance. This way, we don't have to manually inspect every single trace to see if the LLM is doing the right thing.

[ ]

We will be creating an LLM as a judge using the prompt templates above by taking the spans recorded by Phoenix, and then giving them labels using the llm_classify function. This function uses LLMs to evaluate your LLM calls and gives them labels and explanations. You can read more detail here.

[ ]

Let's look at and inspect the results of our evaluatiion!

[ ]
[ ]

Experiment with different k-values, chunk sizes, and chunk overlaps

Let's change the number of documents retrieved from the vector store, the size of the chunks loaded into the vector store, and the chunk overlaps.

[ ]
[ ]
[ ]

Let's setup our evaluators to see how the performance changes.

[ ]

Let's log these results to Arize and see how they compare.

First we'll create a dataset to store our questions.

[ ]

Next we'll define which columns of our dataframe will be mapped to outputs and which will be mapped to evaluation labels and explanations..

[ ]

Now let's run it for each of our experiments.

[ ]
[ ]

You can see the experiment results in the Arize UI and see how each RAG method performs.