Get Started Mnist Deploy
Deploy a Trained TensorFlow V2 Model
This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
In this notebook, we walk through the process of deploying a trained model to a SageMaker endpoint. If you recently ran the notebook for training with %store% magic, the model_data can be restored. Otherwise, we retrieve the
model artifact from a public S3 bucket.
TensorFlow Model Object
The TensorFlowModel class allows you to define an environment for making inference using your
model artifact. Like TensorFlow estimator class we discussed
in this notebook for training an Tensorflow model, it is high level API used to set up a docker image for your model hosting service.
Once it is properly configured, it can be used to create a SageMaker endpoint on an EC2 instance. The SageMaker endpoint is a containerized environment that uses your trained model to make inference on incoming data via RESTful API calls.
Some common parameters used to initiate the TensorFlowModel class are:
- role: An IAM role to make AWS service requests
- model_data: the S3 bucket URI of the compressed model artifact. It can be a path to a local file if the endpoint is to be deployed on the SageMaker instance you are using to run this notebook (local mode)
- framework_version: version of the MXNet package to be used
- py_version: python version to be used
Execute the Inference Container
Once the TensorFlowModel class is initiated, we can call its deploy method to run the container for the hosting
service. Some common parameters needed to call deploy methods are:
- initial_instance_count: the number of SageMaker instances to be used to run the hosting service.
- instance_type: the type of SageMaker instance to run the hosting service. Set it to
localif you want run the hosting service on the local SageMaker instance. Local mode are typically used for debugging.
Note: local mode is not supported in SageMaker Studio
Making Predictions Against a SageMaker endpoint
Once you have the Predictor instance returned by model.deploy(...), you can send prediction requests to your endpoints. In this case, the model accepts normalized
batch images in depth-minor convention.
The formats of the input and output data correspond directly to the request and response
format of the Predict method in TensorFlow Serving REST API, for example, the key of the array to be
parsed to the model in the dummy_inputs needs to be called instances. Moreover, the input data needs to have a batch dimension.
Now, let's use real MNIST test to test the endpoint. We use helper functions defined in code.utils to
download MNIST data set and normalize the input data.
Since the model accepts normalized input, you will need to normalize the samples before sending it to the endpoint.
(Optional) Clean up
If you do not plan to use the endpoint, you should delete it to free up some computation resource. If you use local, you will need to manually delete the docker container bounded at port 8080 (the port that listens to the incoming request).
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.