Notebooks
A
Amazon Web Services
Generative Ai Deploying Clip Interrogator Amazon Sagemaker

Generative Ai Deploying Clip Interrogator Amazon Sagemaker

data-scienceinferencearchivedamazon-sagemaker-exampleslab13-clip-interrogatorreinforcement-learningmachine-learningWorkshopsawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

Building an image-to-text generative AI application using CLIP and BLIP models on Amazon SageMaker


This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable


In this notebook, we'll be showcasing how to build an image-to-text generative AI application using 2 models - CLIP and BLIP, let's first see what are these models used for -

  • CLIP (Contrastive Language-Image Pre-training) is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. CLIP trained on a dataset of400 million (image, text) pairs collected form a variety of publicly available sources on the Internet. The model architecture consists of an image encoder and a text encoder.

  • BLIP(Bootstrapping Language-Image Pre-training) is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks.

The clip interrogator is a prompt engineering tool that combines CLIP and BLIP to optimize text prompts to match a given image. You can then use the resulting prompts with text-to-image models like Stable Diffusion XL on Amazon SageMaker to create cool art!

We'll demonstrate how to use SageMaker large model inference container to host the clip interrogator solution on Amazon SageMaker using DJLServing. DJLServing is a high-performance universal model serving solution powered by the Deep Java Library (DJL) that is programming language agnostic. To learn more about DJL and DJLServing, you can refer to the blog post.

Finally, we'll use Prompt Engineering to restyle images with Stable Diffusion model.

Table of contents

Part 1 - Run clip Interrogator model

  1. Setup for Model Deployment
  2. Create SageMaker compatible model artifacts
  3. Create a model.py with custom inference code for Clip Interrogator deployment
  4. Create the Tarball and then upload to S3 location
  5. Create the SageMaker Model and SageMaker endpoint
  6. Run Inference

Part 2 - Use prompt engineering to restyle image with Stable Diffusion model

  1. Deploy Stable Diffusion v2.1 on SageMaker endpoint
  2. Examples of using Prompt Engineering to restyle images

Cleanup

1. Setup for Model Deployment

As a first step, we'll import the relevant libraries and configure several global variables such as the hosting image that will be used and the S3 location to store the relevant artifacts.

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

Upload the model artifacts to S3. If you choose a large Blip2 model, the below cell may take a few minutes to upload the files.

[ ]
[ ]

2. Create SageMaker compatible model artifacts

In order to prepare our model for deployment to a SageMaker Endpoint for hosting, we will need to prepare a few things for SageMaker and our container. We will use a local folder as the location of these files including serving.properties that defines parameters for the LMI container and requirements.txt to detail what dependencies to install.

In the serving.properties files define the engine to use and model to host. Note the tensor_parallel_degree parameter which is set to a value of 1 in this scenario. In this example, we will use the Salesforce/blip-image-captioning-large and the ViT-L-14/openai models. Since both of the two models can fit on a single GPU we do not have to divide the model into multiple parts. In this case we will use a 'ml.g5.2xlarge' instance which provides 1 GPU. Be careful not to specify a value larger than the instance provides, or your deployment will fail. If you are choosing a bigger model and needs more GPU resources, please update the instance type and tensor parallel degree accordingly

[ ]
[ ]
[ ]
[ ]

3. Create a model.py with custom inference code

SageMaker allows you to bring your own script for inference. Here we create our model.py file with the appropriate code for the solution. This code was adopted from the clip interrogator github example and modified to work with SageMaker endpoint. Note that, in this example, we will download the model artifacts directly from Hugging Face when creating the endpoint.

You can choose different CLIP or BLIP models by passing the caption model name and the clip model name through the model_name.json file created below.

[ ]

The inference script model.py contains a handle function that the DJL Serving will run your request by invoking this function. The Config object lets you configure CLIP Interrogator's processing. The BLIP and CLIP model will be loaded via the load_caption_model() and loca_clip_model() function during the initialisation of the Interrogator object. The predefined label tables stored at the data folder are also loaded. These predefined list of labels represents different types of captions/keywords in different domains, such as artists, mediums. You can add more labels to the list by adding additional text files with predefined categories of labels, and modify the model.py file to add the additional label table in the code.

The below entry point script will first deserialise the image data received in the invocation as well as the model parameters and pass them on to the Interrogator object through the interrogate() function to generate prompts based on the input image. Then the image caption will be generated using the caption model (BLIP model) and the image features will be generated using the image_to_features() function using the CLIP model. The solution will calculate the similarities between the image and the different labels (categories) to select the most similar or optimal prompts based on the captions and labels in different categories.

[ ]

4. Create the Tarball and then upload to S3 location

Next, we will package our artifacts as *.tar.gz files for uploading to S3 for SageMaker to use for deployment.

[ ]
[ ]

5. Create the SageMaker Model and SageMaker endpoint

Now that we have uploaded the model artifacts to S3, we can create a SageMaker endpoint using the high-level python sdk.

[ ]
[ ]

6. Run Inference

Once the endpoint is deployed, we can invoke the endpoint to test the solution.

[ ]
[22]
CPU times: user 17.2 ms, sys: 1.27 ms, total: 18.4 ms
Wall time: 21 s
{'outputs': 'croissant on a plate, pexels contest winner, aspect ratio 16:9, cgsocietywlop, 8 h, golden cracks, the artist has used bright, picture of a loft in morning, object features, stylized border, pastry, french emperor'}

Part 2 - Use prompt engineering to restyle image with Stable Diffusion model

We first deploy Stable Diffusion XL endpoint to use it for inference.

⚠️ ATTENTION ⚠️: This model is only currently available in selected regions, and you would need to subscribe to SatbilityAI marketplace to be able to deploy it from JumpStart.

1. Deploy Stable Diffusion XL on SageMaker endpoint

We use StabilityAI SDK to deploy our model from Amazon SageMaker JumpStart

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

We use the generated text from section above as prompt input for our stable diffusion model, and the generated images looks similar to the original one.

[ ]
[30]
croissant on a plate, pexels contest winner, aspect ratio 16:9, cgsocietywlop, 8 h, golden cracks, the artist has used bright, picture of a loft in morning, object features, stylized border, pastry, french emperor
Output

2. Use Prompt Engineering to modify and restyle images

We then apply out own styles by doing some simple prompt engineering, we chose two famous artists and one of their works to apply two different styles

[31]
This scene is a Van Gogh painting with The Starry Night style, croissant on a plate, pexels contest winner, aspect ratio 16:9, cgsocietywlop, 8 h, golden cracks, the artist has used bright, picture of a loft in morning, object features, stylized border, pastry, french emperor
Output

This scene is a Hokusai painting with The Great Wave off Kanagawa style, croissant on a plate, pexels contest winner, aspect ratio 16:9, cgsocietywlop, 8 h, golden cracks, the artist has used bright, picture of a loft in morning, object features, stylized border, pastry, french emperor