Notebooks
A
Amazon Web Services
Tensorflow2 California Housing Sagemaker Pipelines Deploy Endpoint

Tensorflow2 California Housing Sagemaker Pipelines Deploy Endpoint

data-scienceinferencearchivedamazon-sagemaker-examplesreinforcement-learningmachine-learningawsexamplesdeep-learningtensorflow2-california-housing-sagemaker-pipelines-deploy-endpointsagemakerjupyter-notebooktrainingmlops

SageMaker Pipelines California Housing - Taking different steps based on model performance


This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable


This notebook illustrates how to take different actions based on model performance in a SageMaker Pipeline.

The steps in this pipeline include:

  • Preprocessing the California Housing dataset.
  • Train a TensorFlow2 Artificial Neural Network (ANN) Model.
  • Evaluate the model performance - mean square error (MSE).
  • If MSE is higher than threshold, use a Lambda step to send an E-Mail to the Data Science team.
  • If MSE is lower than threshold, register the model into the Model Registry, and use a Lambda step to deploy the model to SageMaker Endpoint.

Prerequisites

Add AmazonSageMakerPipelinesIntegrations policy

The notebook execution role should have policies which enable the notebook to create a Lambda function. The Amazon managed policy AmazonSageMakerPipelinesIntegrations can be added to the notebook execution role.

The policy description is:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:InvokeFunction",
                "lambda:UpdateFunctionCode"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:*sagemaker*",
                "arn:aws:lambda:*:*:function:*sageMaker*",
                "arn:aws:lambda:*:*:function:*SageMaker*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "sqs:CreateQueue",
                "sqs:SendMessage"
            ],
            "Resource": [
                "arn:aws:sqs:*:*:*sagemaker*",
                "arn:aws:sqs:*:*:*sageMaker*",
                "arn:aws:sqs:*:*:*SageMaker*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [
                        "lambda.amazonaws.com"
                    ]
                }
            }
        }
    ]
}
    

Add inline policy to enable creation of IAM role required for the Lambda Function

The notebook execution role should have an inline policy which enable the notebook to create the IAM role required for the Lambda function. An inline policy can be added to the notebook execution role.

The policy description is:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:CreateRole",
                "iam:AttachRolePolicy"
            ],
            "Resource": "*"
        }
    ]
}
[ ]
[ ]
[ ]

Download California Housing dataset and upload to Amazon S3

We use the California housing dataset.

More info on the dataset:

This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/

The target variable is the median house value for California districts.

This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

Processing Step

The first step in the pipeline will preprocess the data to prepare it for training. We create a SKLearnProcessor object similar to the one above, but now parameterized, so we can separately track and change the job configuration as needed, for example to increase the instance type size and count to accommodate a growing dataset.

[ ]
[ ]

Train model step

In the second step, the train and validation output from the precious processing step are used to train a model.

[ ]

Evaluate model step

When a model is trained, it's common to evaluate the model on unseen data before registering it with the model registry. This ensures the model registry isn't cluttered with poorly performing model versions. To evaluate the model, create a ScriptProcessor object and use it in a ProcessingStep.

Note that a separate preprocessed test dataset is used to evaluate the model, and not the output of the processing step. This is only for demo purposes, to ensure the second run of the pipeline creates a model with better performance. In a real-world scenario, the test output of the processing step would be used.

[ ]
[ ]

Send E-Mail Lambda Step

When defining the LambdaStep, the SageMaker Lambda helper class provides helper functions for creating the Lambda function. Users can either use the lambda_func argument to provide the function ARN to an already deployed Lambda function OR use the Lambda class to create a Lambda function by providing a script, function name and role for the Lambda function.

When passing inputs to the Lambda, the inputs argument can be used and within the Lambda function's handler, the event argument can be used to retrieve the inputs.

The dictionary response from the Lambda function is parsed through the LambdaOutput objects provided to the outputs argument. The output_name in LambdaOutput corresponds to the dictionary key in the Lambda's return dictionary.

Define the Lambda function

Users can choose the leverage the Lambda helper class to create a Lambda function and provide that function object to the LambdaStep. Alternatively, users can use a pre-deployed Lambda function and provide the function ARN to the Lambda helper class in the lambda step.

Here, If the MSE is lower than threshold, an E-Mail will be sent to Data Science team.

Note that the E-Mail sending part is left for you to implement by the framework you choose.

[ ]

IAM Role

The Lambda function needs an IAM role that will allow it to read the evaluation.json from S3. The role ARN must be provided in the LambdaStep.

A helper function in iam_helper.py is available to create the Lambda function role. Please note that the role uses the Amazon managed policy - AmazonS3ReadOnlyAccess. This should be replaced with an IAM policy with the least privileges as per AWS IAM best practices.

[ ]

Create the Lambda Function step

[ ]

Register model step

If the trained model meets the model performance requirements a new model version is registered with the model registry for further analysis. To attach model metrics to the model version, create a ModelMetrics object using the evaluation report created in the evaluation step. Then, create the RegisterModel step.

[ ]

Create the model

The model is created and the name of the model is provided to the Lambda function for deployment. The CreateModelStep dynamically assigns a name to the model.

[ ]

Deploy model to SageMaker Endpoint Lambda Step

When defining the LambdaStep, the SageMaker Lambda helper class provides helper functions for creating the Lambda function. Users can either use the lambda_func argument to provide the function ARN to an already deployed Lambda function OR use the Lambda class to create a Lambda function by providing a script, function name and role for the Lambda function.

When passing inputs to the Lambda, the inputs argument can be used and within the Lambda function's handler, the event argument can be used to retrieve the inputs.

The dictionary response from the Lambda function is parsed through the LambdaOutput objects provided to the outputs argument. The output_name in LambdaOutput corresponds to the dictionary key in the Lambda's return dictionary.

Define the Lambda function

Here, the Lambda Function will deploy the model to SageMaker Endpoint.

[ ]

IAM Role

The Lambda function needs an IAM role that will allow it to deploy a SageMaker Endpoint. The role ARN must be provided in the LambdaStep.

A helper function in iam_helper.py is available to create the Lambda function role. Please note that the role uses the Amazon managed policy - AmazonSageMakerFullAccess. This should be replaced with an IAM policy with the least privileges as per AWS IAM best practices.

[ ]
[ ]

Accuracy condition step

Adding conditions to the pipeline is done with a ConditionStep. In this case, we only want to register the new model version with the model registry if the new model meets an accuracy condition.

[ ]

Pipeline Creation: Orchestrate all steps

Now that all pipeline steps are created, a pipeline is created.

[ ]

Execute the Pipeline

List the execution steps to check out the status and artifacts:

[ ]

Submit pipeline

[ ]

Execute pipeline using the default parameters

[ ]

Wait for pipeline to complete

[ ]

Visualize SageMaker Pipeline - MSE lower than the threshold

In SageMaker Studio, choose SageMaker Components and registries in the left pane and under Pipelines, click the pipeline that was created. Then all pipeline executions are shown, and the one just created should have a status of Succeded. Selecting that execution, the different pipeline steps can be tracked as they execute.

You can see that the Register-California-Housing-Model step was executed.

Start a pipeline with 2 epochs to trigger the send-email-to-ds-team-lambda Lambda Function

Run the pipeline again, but this time, with only 2 epochs and a lower MSE Threshold of 0.2. This will result in a higher MSE value on model evaluation, and will cause the send-email-to-ds-team-lambda Lambda Function to be triggered.

[ ]
[ ]

Visualize SageMaker Pipeline - MSE higher than the threshold

In SageMaker Studio, choose SageMaker Components and registries in the left pane and under Pipelines, click the pipeline that was created. Then all pipeline executions are shown, and the one just created should have a status of Succeded. Selecting that execution, the different pipeline steps can be tracked as they execute.

You can see that the Send-Email-To-DS-Team step was executed.

Clean up (optional)

Stop / Close the Endpoint

You should delete the endpoint before you close the notebook if you don't need to keep the endpoint running for serving real-time predictions.

[ ]

Delete the model registry and the pipeline to keep the studio environment tidy.

[ ]
[ ]

Delete the Lambda functions.

[ ]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable