Auto Ml Regression
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Introduction
In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.
If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the configuration notebook first if you haven't already to establish your connection to the AzureML Workspace.
In this notebook you will learn how to:
- Create an
Experimentin an existingWorkspace. - Configure AutoML using
AutoMLConfig. - Train the model using local compute.
- Explore the results.
- Test the best fitted model.
Setup
As part of the setup you have already created an Azure ML Workspace object. For Automated ML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments.
This sample notebook may use features that are not available in previous versions of the Azure ML SDK.
Using AmlCompute
You will need to create a compute target for your AutoML run. In this tutorial, you use AmlCompute as your training compute resource.
Data
Load Data
Load the hardware dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.
Train
Instantiate an AutoMLConfig object to specify the settings and data used to run the experiment.
| Property | Description |
|---|---|
| task | classification, regression or forecasting |
| primary_metric | This is the metric that you want to optimize. Regression supports the following primary metrics: spearman_correlation normalized_root_mean_squared_error r2_score normalized_mean_absolute_error |
| n_cross_validations | Number of cross validation splits. |
| training_data | (sparse) array-like, shape = [n_samples, n_features] |
| label_column_name | (sparse) array-like, shape = [n_samples, ], targets values. |
You can find more information about primary metrics here
Call the submit method on the experiment object and pass the run configuration. Execution of remote runs is asynchronous. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting show_output=True and the execution will be synchronous.
Results
Widget for Monitoring Runs
The widget will first report a "loading" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.
Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details.
Retrieve the Best Model
Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration.
Best Model Based on Any Other Metric
Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):
Model from a Specific Iteration
Show the run and the model from the third iteration: