Notebooks
A
Arize AI
Dspy Tracing Tutorial

Dspy Tracing Tutorial

agentsllmsLlamaIndexarize-phoenixopenaitracingtutorialsevalsllmopsai-monitoringaiengineeringprompt-engineeringdatasetsllm-evalai-observabilityllm-evaluationsmolagentsanthropiclangchain

Open In Colab

phoenix logo
Docs | GitHub | Community

Tracing and Evaluating a DSPy Application

DSPy is a framework for automatically prompting and fine-tuning language models. It provides:

  • Composable and declarative APIs that allow developers to describe the architecture of their LLM application in the form of a "module" (inspired by PyTorch's nn.Module),
  • Compilers known as "teleprompters" that optimize a user-defined module for a particular task. The term "teleprompter" is meant to evoke "prompting at a distance," and could involve selecting few-shot examples, generating prompts, or fine-tuning language models.

Phoenix makes your DSPy applications observable by visualizing the underlying structure of each call to your compiled DSPy module and surfacing problematic spans of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:

  • Build and compile a DSPy module that uses retrieval-augmented generation to answer questions over the HotpotQA dataset,
  • Instrument your application using OpenInference, and open standard for recording your LLM telemetry data,
  • Inspect the traces and spans of your application to understand the inner works of a DSPy forward pass.

ℹ️ This notebook requires an OpenAI API key.

1. Install Dependencies and Import Libraries

Install Phoenix, DSPy, and other dependencies.

[ ]

Import libraries.

[ ]

2. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

[ ]

3. Configure Module Components

A module consists of components such as a language model (in this case, OpenAI's GPT-4), akin to the layers of a PyTorch module and a retriever (in this case, ColBERTv2).

[ ]

4. Load Data

Load a subset of the HotpotQA dataset.

[ ]

Each example in our training set has a question and a human-annotated answer.

[ ]

Examples in the dev set have a third field containing titles of relevant Wikipedia articles.

[ ]

5. Define Your RAG Module

Define a signature that takes in two inputs, context and question, and outputs an answer. The signature provides:

  • A description of the sub-task the language model is supposed to solve.
  • A description of the input fields to the language model.
  • A description of the output fields the language model must produce.
[ ]

Define your module by subclassing dspy.Module and overriding the forward method.

[ ]

This module uses retrieval-augmented generation (using the previously configured ColBERTv2 retriever) in tandem with chain of thought in order to generate the final answer to the user.

6. Compile Your RAG Module

In this case, we'll use the default BootstrapFewShot teleprompter that selects good demonstrations from the the training dataset for inclusion in the final prompt.

[ ]

7. Instrument DSPy and Launch Phoenix

Now that we've compiled our RAG program, let's see what's going on under the hood.

Launch Phoenix, which will run in the background and collect spans and traces from your instrumented DSPy application.

[ ]

Then instrument your application with OpenInference, an open standard build atop OpenTelemetry that captures and stores LLM application executions. OpenInference provides telemetry data to help you understand the invocation of your LLMs and the surrounding application context, including retrieval from vector stores, the usage of external tools or APIs, etc.

DSPy uses LiteLLM under the hood to invoke LLMs. We add the LiteLLMInstrumentor here so we can get token counts for LLM spans.

[ ]

8. Run Your Application

Let's run our DSPy application on the dev set.

[ ]

Check the Phoenix UI to inspect the architecture of your DSPy module.

[ ]

A few things to note:

  • The spans in each trace correspond to the steps in the forward method of our custom subclass of dspy.Module,
  • The call to ColBERTv2 appears as a retriever span with retrieved documents and scores displayed for each forward pass,
  • The LLM span includes the fully-formatted prompt containing few-shot examples computed by DSPy during compilation.

a tour of your traces and spans in DSPy, highlighting retriever and LLM spans in particular

Congrats! You've used DSPy to bootstrap a multishot prompt with hard negative passages and chain of thought, and you've used Phoenix to observe the inner workings of DSPy and understand the internals of the forward pass.