Notebooks
W
Weights and Biases
W&B Dataset Visualization

W&B Dataset Visualization

wandb-examplesdatasets-predictionscolabs

Open In Colab

Weights & Biases

W&B Datasets & Predictions is currently in the early-access phase. You can use it in our production service at wandb.ai, with some limitations. APIs are subject to change. We'd love to hear questions, comments, and ideas! Drop us a line at feedback@wandb.com.

WandB Dataset Visualization Demo

This notebook demonstrates WandB's dataset visualization features. In particular we will show how WandB Artifacts can be used to visualize datasets and predictions, with a focus on image data. We will track model and data lineage as well as perform interactive model analysis on the resulting datasets. The overall flow will be:

  1. Create a dataset
  2. Split the dataset into train and test
  3. Train a model to make predictions on the transformed dataet
  4. Log predications from the model against training and evaluation sets
  5. Analyze the model in WandB's UI

Step 0: Setup

Install requirements & utils

For brevity, we put utility functions for working with the dataset in util.py.

[ ]
[ ]
[ ]

Login to wandb

[ ]
[ ]

Download the data

Before we get started, we will download an example dataset to our local machine. This is a big dataset, so please be patient if you are on a slow connection. For brevity, we put utility functions for working with the dataset in util.py. After the download is complete, we will show an example of the data.

Note: if you see the error "AttributeError: module 'PIL.TiffTags' has no attribute 'IFD'", this is likely a Colab issue which can be solved by restarting your runtime (header menu > Runtime > Restart runtime).

[ ]

Step 1: Build the dataset

First, let's build a dataset for use in the rest of this project. We will do this in the context of a wandb.Run. A Run is an isolated process which can optionally depend on upstream artifacts as well as optionally produce artifacts for later consumption. In this step, we will create a wandb.Table during our run and output it in an artifact. This table will contain all of our raw data for later use. Moreover, W&B offers rich tools to analyze and visualize such Tables in the interactive UI.

[ ]

Review the dataset in the Dashboard

Great, now if you click on the URL above, you should land on a run page. Since we did not log any metrics, there are no charts. Click the database icon (it looks like a stack of hockey pucks) on the left panel to see this run's artifacts. You should see something similar to the following:

Raw Data

Click on the "raw_data" row and navigate to the "Files" table. It should look like this:

Raw Data

Clicking on the "raw_examples.table.json" entry will launch an interactive data explorer to review the table we just built:

Raw Data

Step 2: Splitting the data into train and test

Next, we will split the data into a train and a test dataset. Similar to before, we will launch a Run to perform this operation. Remember, this new execution could happen on a different machine as we will dynamically load the needed resources. In particular, we will lood in the raw dataset from the last run, and output 2 new datasets.

[ ]

Review the splits in the Dashboard

Notice, in this step, the raw_data wandb.Table was reinstatiated and the data, images, etc... came along for the ride. This makes it easy for ML practitioners on a team to share data and assets easily. To manage this, you can see that we created an artifacts directory to save local data.

Now we have two new datasets. Feel free to browse them similar to our last step. However, this time, click "Graph View" rather than "Files" to see the lineage of the artifact:

We will come back to this graph view later on!

Step 3: Model Training

Now we will train a model to predict bounding boxes. For the sake of simplicity, we will "train" a model which splits the image into it's grayscale quantiles and assigns labels to each patch. As you can imagine, the model performance can be improved dramatically.

[ ]

Step 4: Model Evaluation

Now that we have a trained model, we want to score it on the test data which was held out in step 2. This code is very similar to the training step, with the execption of slightly different naming. The important difference is that we load the saved model from the artifact.

[ ]

Step 5: Model Analysis

This is where it all comes together. In this step, we join the train and test scoring results with the original dataset and output corresponding artifacts. The new idea introduced here is a wandb.JoinedTable which allows you to join two Tables for further analysis in the UI.

[ ]

Review the model analysis in the Dashboard

Now, click on the above Project page (second link). This will look like the following:

Click on the database icon, as previously, to see the artifacts. This time, you are seeing the artifacts for the entire project, with counts of their versions:

Go ahead and click the "model" artifact type, "Files", and "model.pkl". The viewer will provide different renderings based on the file type. For a pickled class, you get the following image. For deep networks saved as .h5 files, you can see all the layers and their attributes.

Next, head back to the artifact page, click Database type, expand summary_results, and select your most recent version. Click "Files" and select one of the join tables:

Exploring a bit, you can toggle the bounding boxes, masks, group, filter, and sort the data:

Finally, click graph view, and "explode". Now, you can visualize the entire process end-to-end: