Notebooks
A
Azure
Auto Ml Continuous Retraining

Auto Ml Continuous Retraining

how-to-use-azuremlazure-mldata-sciencenotebookmachine-learningazure-machine-learningautomated-machine-learningdeep-learningazuremlazure-ml-notebooksazurecontinuous-retraining

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Impressions

Automated Machine Learning

Continuous retraining using Pipelines and Time-Series TabularDataset

Contents

  1. Introduction
  2. Setup
  3. Compute
  4. Run Configuration
  5. Data Ingestion Pipeline
  6. Training Pipeline
  7. Publish Retraining Pipeline and Schedule
  8. Test Retraining

Introduction

In this example we use AutoML and Pipelines to enable contious retraining of a model based on updates to the training dataset. We will create two pipelines, the first one to demonstrate a training dataset that gets updated over time. We leverage time-series capabilities of TabularDataset to achieve this. The second pipeline utilizes pipeline Schedule to trigger continuous retraining. Make sure you have executed the configuration notebook before running this notebook. In this notebook you will learn how to:

  • Create an Experiment in an existing Workspace.
  • Configure AutoML using AutoMLConfig.
  • Create data ingestion pipeline to update a time-series based TabularDataset
  • Create training pipeline to prepare data, run AutoML, register the model and setup pipeline triggers.

Setup

As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments.

[ ]

This sample notebook may use features that are not available in previous versions of the Azure ML SDK.

Accessing the Azure ML workspace requires authentication with Azure.

The default authentication is interactive authentication using the default tenant. Executing the ws = Workspace.from_config() line in the cell below will prompt for authentication the first time that it is run.

If you have multiple Azure tenants, you can specify the tenant by replacing the ws = Workspace.from_config() line in the cell below with the following:

	from azureml.core.authentication import InteractiveLoginAuthentication
auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')
ws = Workspace.from_config(auth = auth)

If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the ws = Workspace.from_config() line in the cell below with the following:

	from azureml.core.authentication import ServicePrincipalAuthentication
auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')
ws = Workspace.from_config(auth = auth)

For more details, see aka.ms/aml-notebook-auth

[ ]

Compute

Create or Attach existing AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

Creation of AmlCompute takes approximately 5 minutes.

If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota.

[ ]

Run Configuration

[ ]

Data Ingestion Pipeline

For this demo, we will use NOAA weather data from Azure Open Datasets. You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based TabularDataset.

[ ]

Upload Data Step

The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based TabularDataset with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (dataset.data_changed_time) and creating a data.csv file.

[ ]

Submit Pipeline Run

[ ]
[ ]

Training Pipeline

Prepare Training Data Step

Script to check if new data is available since the model was last trained. If no new data is available, we cancel the remaining pipeline steps. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained.

[ ]
[ ]
[ ]

AutoMLStep

Create an AutoMLConfig and a training step.

[ ]
[ ]
[ ]

Register Model Step

Script to register the model to the workspace.

[ ]

Submit Pipeline Run

[ ]
[ ]
[ ]

Publish Retraining Pipeline and Schedule

Once we are happy with the pipeline, we can publish the training pipeline to the workspace and create a schedule to trigger on blob change. The schedule polls the blob store where the data is being uploaded and runs the retraining pipeline if there is a data change. A new version of the model will be registered to the workspace once the run is complete.

[ ]
[ ]

Test Retraining

Here we setup the data ingestion pipeline to run on a schedule, to verify that the retraining pipeline runs as expected.

Note:

  • Azure NOAA Weather data is updated daily and retraining will not trigger if there is no new data available.
  • Depending on the polling interval set in the schedule, the retraining may take some time trigger after data ingestion pipeline completes.
[ ]
[ ]