Notebooks
A
Arize AI
Sentiment Classification Tutorial

Sentiment Classification Tutorial

agentsllmsLlamaIndexarize-phoenixopenaitutorialsevalsllmopsai-monitoringaiengineeringprompt-engineeringdatasetsllm-evalai-observabilityllm-evaluationsmolagentsanthropiclangchain

phoenix logo
Docs | GitHub | Community

Root-Cause Analysis for a Drifting Sentiment Classification Model

Imagine you're in charge of maintaining a model that takes as input online reviews of your U.S.-based product and classifies the sentiment of each review as positive, negative, or neutral. Your model initially performs well in production, but its performance gradually degrades over time.

Phoenix helps you surface the reason for this regression by analyzing the embeddings representing the text of each review. Your model was trained on English reviews, but as you'll discover, it's encountering Spanish reviews in production that it can't correctly classify.

In this tutorial, you will:

  • Download curated datasets of embeddings and predictions
  • Define a schema to describe the format of your data
  • Launch Phoenix to visually explore your embeddings
  • Investigate problematic clusters to identify the root cause of your model performance issue

⚠️ This notebook runs slowly without a GPU. If you don't have access to a GPU, you can still use Phoenix by skipping the cells preceded by the 💬 emoji.

Let's get started!

Install Dependencies and Import Libraries

Install Phoenix.

[ ]

Import dependencies.

[ ]

Download the Data

Download training and production data from a model that classifies the sentiment of product reviews as positive, negative, or neutral.

[ ]

View a few training data points.

[ ]

The columns of the dataframe are:

  • prediction_ts: the Unix timestamps of your predictions
  • review_age, reviewer_gender, product_category, language: the features of your model
  • text: the text of each product review
  • text_vector: the embedding vectors representing each review
  • pred_label: the label your model predicted
  • label: the ground-truth label for each review

Compute Embeddings

💬 Compute embeddings using a DistilBERT model fine-tuned on a dataset of product reviews.

[ ]

Launch Phoenix

Define a schema to tell Phoenix what the columns of your dataframe represent (predictions, actuals, embeddings, etc.). See the docs for guides on how to define your own schema and API reference on phoenix.Schema and phoenix.EmbeddingColumnNames.

[ ]

Create Phoenix datasets that wrap your dataframes with schemas that describe them.

[ ]

Launch Phoenix. Follow the instructions in the cell output to open the Phoenix UI in your notebook or in a separate browser tab.

[ ]

Find the Root-Cause of Your Model Performance Issue

Click on "text_embedding" in the "Embeddings" section.

click on text embedding

In the Euclidean distance graph at the top of the page, click a point on the graph where the Euclidean distance is high.

select period of high drift

Click on the top cluster in the panel on the left.

select top cluster

Use the panel at the bottom to examine the data points in this cluster.

inspect points in cluster

What do you notice about the text in this cluster? Select other clusters and compare the text. Do you notice a difference?

It turns out that your model is seeing Spanish product reviews in production, but the training data is all in English. Congrats! You've identified the root-cause of the issue. As an actionable next step, you should enhance your model to support other languages, e.g., by fine-tuning on Spanish product reviews.