Evaluate Toxicity Classifications
Toxicity Classification Evals
Arize provides tooling to evaluate LLM applications, including tools to determine if the generation of a model (or user response) is toxic. This detection can look for racist, bias'ed, derogatory, and bad language/angry responses.
The purpose of this notebook is:
- to evaluate the performance of an LLM-assisted toxic detection
- to provide an experimental framework for users to iterate and improve on the default classification template.
Note: This notebook was last updated on May 30, 2025.
Install Dependencies and Import Libraries
ℹ️ To enable async request submission in notebook environments like Jupyter or Google Colab, optionally use nest_asyncio. nest_asyncio globally patches asyncio to enable event loops to be re-entrant. This is not required for non-notebook environments.
Without nest_asyncio, eval submission can be much slower, depending on your organization's rate limits. Speed increases of about 5x are typical.
Download Benchmark Dataset
We'll evaluate the evaluation system consisting of an LLM model and settings in addition to an evaluation prompt template against a benchmark datasets of toxic and non-toxic text with ground-truth labels. Currently supported datasets include:
- "wiki_toxic"
Display Toxicity Classification Template
View the default template used to classify toxicity. You can tweak this template and evaluate its performance relative to the default.
The template variables are:
- input: the text to be classified
Configure the LLM
Configure your OpenAI API key.
Benchmark Dataset Sample
Sample size determines run time Recommend iterating small: 100 samples Then increasing to large test set
Instantiate the LLM and set parameters.
LLM Evals: Toxicity Evals Classifications GPT-4
Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.
Evaluate the predictions against human-labeled ground-truth toxicity labels.
LLM Evals: Toxicity Evals Classifications GPT-3.5
Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.
LLM Evals: Toxicity Evals Classifications GPT-4 Turbo
Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.