Notebooks
H
Hugging Face
Stable Diffusion Interpolation

Stable Diffusion Interpolation

hf-cookbookennotebooks

Images Interpolation with Stable Diffusion

Authored by: Rustam Akimov

This notebook shows how to use Stable Diffusion to interpolate between images. Image interpolation using Stable Diffusion is the process of creating intermediate images that smoothly transition from one given image to another, using a generative model based on diffusion.

Here are some various use cases for image interpolation with Stable Diffusion:

  • Data Augmentation: Stable Diffusion can augment training data for machine learning models by generating synthetic images that lie between existing data points. This can improve the generalization and robustness of machine learning models, especially in tasks like image generation, classification or object detection.
  • Product Design and Prototyping: Stable Diffusion can aid in product design by generating variations of product designs or prototypes with subtle differences. This can be useful for exploring design alternatives, conducting user studies, or visualizing design iterations before committing to physical prototypes.
  • Content Generation for Media Production: In media production, such as film and video editing, Stable Diffusion can be used to generate intermediate frames between key frames, enabling smoother transitions and enhancing visual storytelling. This can save time and resources compared to manual frame-by-frame editing.

In the context of image interpolation, Stable Diffusion models are often used to navigate through a high-dimensional latent space. Each dimension represents a specific feature that has been learned by the model. By walking through this latent space and interpolating between different latent representations of images, the model is able to generate a sequence of intermediate images which show a smooth transition between the original images. There are two types of latents in stable diffusion: prompt latents and image latents.

Latent space walking involves moving through a latent space along a path defined by two or more points (representing images). By carefully selecting these points and the path between them, it is possible to control the features of the generated images, such as style, content, and other visual aspects.

In this Notebook, we will explore examples of image interpolation using Stable Diffusion and demonstrate how latent space walking can be implemented and utilized to create smooth transitions between images. We'll provide code snippets and visualizations that illustrate this process in action, allowing for a deeper understanding of how generative models can manipulate and morph image representations in meaningful ways.

First, let's install all the required modules.

[ ]

Import modules

[ ]

Let's check if CUDA is available.

[ ]

These settings are used to optimize the performance of PyTorch models on CUDA-enabled GPUs, especially when using mixed precision training or inference, which can be beneficial in terms of speed and memory usage.
Source: https://huggingface.co/docs/diffusers/optimization/fp16#memory-efficient-attention

[ ]

Model

The runwayml/stable-diffusion-v1-5 model and the LMSDiscreteScheduler scheduler were chosen to generate images. Despite being an older technology, it continues to enjoy popularity due to its fast performance, minimal memory requirements, and the availability of numerous community fine-tuned models built on top of SD1.5. However, you are free to experiment with other models and schedulers to compare the results.

[ ]

These methods are designed to reduce the memory consumed by the GPU. If you have enough VRAM, you can skip this cell.

More detailed information can be found here: https://huggingface.co/docs/diffusers/en/optimization/opt_overview
In particular, information about the following methods can be found here: https://huggingface.co/docs/diffusers/optimization/memory

[ ]

The display_images function converts a list of image arrays into a GIF, saves it to a specified path and returns the GIF object for display. It names the GIF file using the current time and handles any errors by printing them out.

[ ]

Generation parameters

  • seed: This variable is used to set a specific random seed for reproducibility.
  • generator: This is set to a PyTorch random number generator object if a seed is provided, otherwise it is None. It ensures that the operations using it have reproducible outcomes.
  • guidance_scale: This parameter controls the extent to which the model should follow the prompt in text-to-image generation tasks, with higher values leading to stronger adherence to the prompt.
  • num_inference_steps: This specifies the number of steps the model takes to generate an image. More steps can lead to a higher quality image but take longer to generate.
  • num_interpolation_steps: This determines the number of steps used when interpolating between two points in the latent space, affecting the smoothness of transitions in generated animations.
  • height: The height of the generated images in pixels.
  • width: The width of the generated images in pixels.
  • save_path: The file system path where the generated gifs will be saved.
[ ]

Example 1: Prompt interpolation

In this example, interpolation between positive and negative prompt embeddings allows exploration of space between two conceptual points defined by prompts, potentially leading to variety of images blending characteristics dictated by prompts gradually. In this case, interpolation involves adding scaled deltas to original embeddings, creating a series of new embeddings that will be used later to generate images with smooth transitions between different states based on the original prompt.

Example 1

First of all, we need to tokenize and obtain embeddings for both positive and negative text prompts. The positive prompt guides the image generation towards the desired characteristics, while the negative prompt steers it away from unwanted features.

[ ]

Now let's look at the code part that generates a random initial vector using a normal distribution that is structured to match the dimensions expected by the diffusion model (UNet). This allows for the reproducibility of the results by optionally using a random number generator. After creating the initial vector, the code performs a series of interpolations between the two embeddings (positive and negative prompts), by incrementally adding a small step size for each iteration. The results are stored in a list named "walked_embeddings".

[ ]

Finally, let's generate a series of images based on interpolated embeddings and then displaying these images. We'll iterate over an array of embeddings, using each to generate an image with specified characteristics like height, width, and other parameters relevant to image generation. Then we'll collect these images into a list. Once generation is complete we'll call the display_image function to save and display these images as GIF at a given save path.

[ ]

Example 2: Diffusion latents interpolation for a single prompt

Unlike the first example, in this one, we are performing interpolation between the two embeddings of the diffusion model itself, not the prompts. Please note that in this case, we use the slerp function for interpolation. However, there is nothing stopping us from adding a constant value to one embedding instead.

Example 2

The function presented below stands for Spherical Linear Interpolation. It is a method of interpolation on the surface of a sphere. This function is commonly used in computer graphics to animate rotations in a smooth manner and can also be used to interpolate between high-dimensional data points in machine learning, such as latent vectors used in generative models.

The source is from Andrej Karpathy's gist: https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355.
A more detailed explanation of this method can be found at: https://en.wikipedia.org/wiki/Slerp.

[1]
[ ]

Example 3: Interpolation between multiple prompts

In contrast to the first example, where we moved away from a single prompt, in this example, we will be interpolating between any number of prompts. To do so, we will take consecutive pairs of prompts and create smooth transitions between them. Then, we will combine the interpolations of these consecutive pairs, and instruct the model to generate images based on them. For interpolation we will use the slerp function, as in the second example.

Example 3

Once again, let's tokenize and obtain embeddings but this time for multiple positive and negative text prompts.

[ ]

As stated earlier, we will take consecutive pairs of prompts and create smooth transitions between them with slerp function.

[ ]

Finally, we need to generate images based on the embeddings.

[ ]

Example 4: Circular walk through the diffusion latent space for a single prompt

This example was taken from: https://keras.io/examples/generative/random_walks_with_stable_diffusion/

Let's imagine that we have two noise components, which we'll call x and y. We start by moving from 0 to 2π and at each step we add the cosine of x and the sine of y to the result. Using this approach, at the end of our movement we end up with the same noise values ​​that we started with. This means that vectors end up turning into themselves, ending our movement.

Example 4

[ ]

Next Steps

Moving forward, you can explore various parameters such as guidance scale, seed, and number of interpolation steps to observe how they affect the generated images. Additionally, consider trying out different prompts and schedulers to further enhance your results. Another valuable step would be to implement linear interpolation (linspace) instead of spherical linear interpolation (slerp) and compare the results to gain deeper insights into the interpolation process.