1 Create Endpoint

data-scienceinferencearchivedamazon-sagemaker-examplesreinforcement-learningmachine-learningWorkshopslab-inference-components-with-scalingawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

Sagemaker Inference Components and Managed Instance Scaling

You can deploy models for realtime inference using SageMaker Hosting. With inference components you can manage multiple ML models deployed to a single endpoint as well as the number of copies of that model. You can also setfine grained policies to scale each workload by scaling the copies of a model as well as the number of compute instances. This is the 1st notebook in a series of 5 notebooks used to create the endpoint that you will use to deploy 3 models against in the following notebooks. The last notebook will show you other apis available and clean up the artifacts created.


This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable


Overview

This is a simple set of example notebooks for using SageMaker inference componenets and managed instance scaling. The first notebook (this notebook) will create an endpoint for you. The following notebooks show you how to deploy inference components to that endpoint using different models (these are all prefixed with "2_". Please note that notebook "2c_meta-llama7b-lmi-autoscaling.ipynb" also shows you how to set up autoscaling for you inference components and use managed instance scaling to scale your endpoint. Finally the last notebook (prefix "3_") will look at examples of some miscellanous functions and cleanup to delete your resources."

Tested using the Python 3 (Data Science) kernel on SageMaker Studio and conda_python3 kernel on SageMaker Notebook Instance.

General Setup

Install dependencies

Upgrade the SageMaker Python SDK.

[ ]

Import libraries

[ ]

Set configurations

We first by creating what we will need for our notebook. In particular, the boto3 library to create the various clients we will need to interact with SageMaker and other variables that will be referenced later in our notebook.

[ ]
[ ]

Create SageMaker Endpoint Configuration

There are a few parameters we want to setup for our endpoint. We first start by setting the variant name, and instance type we want our endpoint to use. In addition we set the model_data_download_timeout_in_seconds and container_startup_health_check_timeout_in_seconds to have some guardrails for when we deploy inference components to our endpoint. In addition we will use Managed Instance Scaling which allows SageMaker to scale the number of instances based on the requirements of the scaling of your inference components. We set a MinInstanceCount and MinInstanceCount variable to size this according to the workload you want to service and also maintain controls around cost. Lastly, we set RoutingStrategy for the endpoint to optimally tune how to route requests to instances and inference components for the best performance.

[ ]

Create SageMaker Endpoint

We can now use the EndpointConfiguration created in the last step to create and endpoint with SageMaker

[ ]

Wait for the endpoint to be in "InService" state.

[ ]

Thats it! Your endpoint is now ready. We can now reference the endpoint in the following notebooks to deploy inference components. Now that the endpoint is in service you can then start associate it with models by creating one or many inference components. In the next three notebooks denoted with a prefix of "2_" we show how you can deploy different models/inference components. Furthermore in the the notebook "2_meta-llama2-7b-lmi-autoscaling.ipynb" we show how you an attack autoscaling policies to a llama2-7b model.

In the following step we will store the endpoint_name as variable so it can be used in later notebooks.

[ ]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable

This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable