Notebooks
A
Amazon Web Services
Xgboost Multi Model Endpoint Home Value

Xgboost Multi Model Endpoint Home Value

data-scienceinferencearchivedamazon-sagemaker-examplesreinforcement-learningmachine-learningawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

Amazon SageMaker Multi-Model Endpoints using XGBoost


This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable


With Amazon SageMaker multi-model endpoints, customers can create an endpoint that seamlessly hosts up to thousands of models. These endpoints are well suited to use cases where any one of a large number of models, which can be served from a common inference container to save inference costs, needs to be invokable on-demand and where it is acceptable for infrequently invoked models to incur some additional latency. For applications which require consistently low inference latency, an endpoint deploying a single model is still the best choice.

At a high level, Amazon SageMaker manages the loading and unloading of models for a multi-model endpoint, as they are needed. When an invocation request is made for a particular model, Amazon SageMaker routes the request to an instance assigned to that model, downloads the model artifacts from S3 onto that instance, and initiates loading of the model into the memory of the container. As soon as the loading is complete, Amazon SageMaker performs the requested invocation and returns the result. If the model is already loaded in memory on the selected instance, the downloading and loading steps are skipped and the invocation is performed immediately.

To demonstrate how multi-model endpoints are created and used, this notebook provides an example using a set of XGBoost models that each predict housing prices for a single location. This domain is used as a simple example to easily experiment with multi-model endpoints.

The Amazon SageMaker multi-model endpoint capability is designed to work across with Mxnet, PyTorch and Scikit-Learn machine learning frameworks (TensorFlow coming soon), SageMaker XGBoost, KNN, and Linear Learner algorithms.

In addition, Amazon SageMaker multi-model endpoints are also designed to work with cases where you bring your own container that integrates with the multi-model server library. An example of this can be found here and documentation here.

Generate synthetic data

The code below contains helper functions to generate synthetic data in the form of a 1x7 numpy array representing the features of a house.

The first entry in the array is the randomly generated price of a house. The remaining entries are the features (i.e. number of bedroom, square feet, number of bathrooms, etc.).

These functions will be used to generate synthetic data for training, validation, and testing. It will also allow us to submit synthetic payloads for inference to test our multi-model endpoint.

[12]
[2]
2.42.1
[3]
[4]
[5]
[6]
[7]

Train multiple house value prediction models

In the follow section, we are setting up the code to train a house price prediction model for each of 4 different cities.

As such, we will launch multiple training jobs asynchronously, using the XGBoost algorithm.

In this notebook, we will be using the AWS Managed XGBoost Image for both training and inference - this image provides native support for launching multi-model endpoints.

[8]

Split a given dataset into train, validation, and test

The code below will generate 3 sets of data. 1 set to train, 1 set for validation and 1 for testing.

[9]

Launch a single training job for a given housing location

There is nothing specific to multi-model endpoints in terms of the models it will host. They are trained in the same way as all other SageMaker models. Here we are using the XGBoost estimator and not waiting for the job to complete.

[10]

Kick off a model training job for each housing location

[11]
[12]
Training data uploaded: s3://sagemaker-us-west-2-688520471316/XGBOOST_BOSTON_HOUSING/model_prep/NewYork_NY
Training data uploaded: s3://sagemaker-us-west-2-688520471316/XGBOOST_BOSTON_HOUSING/model_prep/LosAngeles_CA
Training data uploaded: s3://sagemaker-us-west-2-688520471316/XGBOOST_BOSTON_HOUSING/model_prep/Chicago_IL
Training data uploaded: s3://sagemaker-us-west-2-688520471316/XGBOOST_BOSTON_HOUSING/model_prep/Houston_TX

4 training jobs launched: ['xgb-NewYork-NY-2021-05-28-20-27-47-850', 'xgb-LosAngeles-CA-2021-05-28-20-27-48-350', 'xgb-Chicago-IL-2021-05-28-20-27-51-370', 'xgb-Houston-TX-2021-05-28-20-27-53-744']

Wait for all model training to finish

[13]
[14]
Waiting for job: xgb-NewYork-NY-2021-05-28-20-27-47-850
xgb-NewYork-NY-2021-05-28-20-27-47-850 job status: InProgress
xgb-NewYork-NY-2021-05-28-20-27-47-850 job status: InProgress
xgb-NewYork-NY-2021-05-28-20-27-47-850 job status: InProgress
xgb-NewYork-NY-2021-05-28-20-27-47-850 job status: InProgress
DONE. Status for xgb-NewYork-NY-2021-05-28-20-27-47-850 is Completed

Waiting for job: xgb-LosAngeles-CA-2021-05-28-20-27-48-350
DONE. Status for xgb-LosAngeles-CA-2021-05-28-20-27-48-350 is Completed

Waiting for job: xgb-Chicago-IL-2021-05-28-20-27-51-370
DONE. Status for xgb-Chicago-IL-2021-05-28-20-27-51-370 is Completed

Waiting for job: xgb-Houston-TX-2021-05-28-20-27-53-744
DONE. Status for xgb-Houston-TX-2021-05-28-20-27-53-744 is Completed

Create the multi-model endpoint with the SageMaker SDK

Create a SageMaker Model from one of the Estimators

[15]

Create the Amazon SageMaker MultiDataModel entity

We create the multi-model endpoint using the MultiDataModel class.

You can create a MultiDataModel by directly passing in a sagemaker.model.Model object - in which case, the Endpoint will inherit information about the image to use, as well as any environmental variables, network isolation, etc., once the MultiDataModel is deployed.

In addition, a MultiDataModel can also be created without explictly passing a sagemaker.model.Model object. Please refer to the documentation for additional details.

[16]
[17]
[18]

Deploy the Multi Model Endpoint

You need to consider the appropriate instance type and number of instances for the projected prediction workload across all the models you plan to host behind your multi-model endpoint. The number and size of the individual models will also drive memory requirements.

[19]
-------------------!

Our endpoint has launched! Let's look at what models are available to the endpoint!

By 'available', what we mean is, what model artfiacts are currently stored under the S3 prefix we defined when setting up the MultiDataModel above i.e. model_data_prefix.

Currently, since we have no artifacts (i.e. tar.gz files) stored under our defined S3 prefix, our endpoint, will have no models 'available' to serve inference requests.

We will demonstrate how to make models 'available' to our endpoint below.

[20]
[]

Lets deploy model artifacts to be found by the endpoint

We are now using the .add_model() method of the MultiDataModel to copy over our model artifacts from where they were initially stored, during training, to where our endpoint will source model artifacts for inference requests.

model_data_source refers to the location of our model artifact (i.e. where it was deposited on S3 after training completed)

model_data_path is the relative path to the S3 prefix we specified above (i.e. model_data_prefix) where our endpoint will source models for inference requests.

Since this is a relative path, we can simply pass the name of what we wish to call the model artifact at inference time (i.e. Chicago_IL.tar.gz)

Dynamically deploying additional models

It is also important to note, that we can always use the .add_model() method, as shown below, to dynamically deploy more models to the endpoint, to serve up inference requests as needed.

[21]

We have added the 4 model artifacts from our training jobs!

We can see that the S3 prefix we specified when setting up MultiDataModel now has 4 model artifacts. As such, the endpoint can now serve up inference requests for these models.

[22]
['Chicago_IL.tar.gz',
, 'Houston_TX.tar.gz',
, 'LosAngeles_CA.tar.gz',
, 'NewYork_NY.tar.gz']

Get predictions from the endpoint

Recall that mme.deploy() returns a RealTimePredictor that we saved in a variable called predictor.

We will use predictor to submit requests to the endpoint.

XGBoost supports text/csv for the content type and accept type. For more information on XGBoost Input/Output Interface, please see here.

Since the default RealTimePredictor does not have a serializer or deserializer set for requests, we will also set these.

This will allow us to submit a python list for inference, and get back a float response.

[23]
[24]

Invoking models on a multi-model endpoint

Notice the higher latencies on the first invocation of any given model. This is due to the time it takes SageMaker to download the model to the Endpoint instance and then load the model into the inference container. Subsequent invocations of the same model take advantage of the model already being loaded into the inference container.

[25]
$395,909.25, took 1,469 ms

[26]
$372,641.38, took 23 ms

[27]
$344,676.03, took 1,145 ms

[28]
$479,065.41, took 19 ms

Updating a model

To update a model, you would follow the same approach as above and add it as a new model. For example, if you have retrained the NewYork_NY.tar.gz model and wanted to start invoking it, you would upload the updated model artifacts behind the S3 prefix with a new name such as NewYork_NY_v2.tar.gz, and then change the target_model field to invoke NewYork_NY_v2.tar.gz instead of NewYork_NY.tar.gz. You do not want to overwrite the model artifacts in Amazon S3, because the old version of the model might still be loaded in the containers or on the storage volume of the instances on the endpoint. Invocations to the new model could then invoke the old version of the model.

Alternatively, you could stop the endpoint and re-deploy a fresh set of models.

Using Boto APIs to invoke the endpoint

While developing interactively within a Jupyter notebook, since .deploy() returns a RealTimePredictor it is a more seamless experience to start invoking your endpoint using the SageMaker SDK. You have more fine grained control over the serialization and deserialization protocols to shape your request and response payloads to/from the endpoint.

This is great for iterative experimentation within a notebook. Furthermore, should you have an application that has access to the SageMaker SDK, you can always import RealTimePredictor and attach it to an existing endpoint - this allows you to stick to using the high level SDK if preferable.

Additional documentation on RealTimePredictor can be found here.

The lower level Boto3 SDK may be preferable if you are attempting to invoke the endpoint as a part of a broader architecture.

Imagine an API gateway frontend that uses a Lambda Proxy in order to transform request payloads before hitting a SageMaker Endpoint - in this example, Lambda does not have access to the SageMaker Python SDK, and as such, Boto3 can still allow you to interact with your endpoint and serve inference requests.

Boto3 allows for quick injection of ML intelligence via SageMaker Endpoints into existing applications with minimal/no refactoring to existing code.

Boto3 will submit your requests as a binary payload, while still allowing you to supply your desired Content-Type and Accept headers with serialization being handled by the inference container in the SageMaker Endpoint.

Additional documentation on .invoke_endpoint() can be found here.

[29]
[ ]

Clean up

Here, to be sure we are not billed for endpoints we are no longer using, we clean up.

[31]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable