Notebooks
H
Hugging Face
Exploring Simple Optimizations For Sdxl

Exploring Simple Optimizations For Sdxl

hf-notebooksdiffusers

Exploring simple optimizations for Stable Diffusion XL

[ ]
[ ]

Unoptimized setup

  • FP32 computation
  • Default attention processor
[ ]
[ ]
[ ]
[ ]

Just FP16

[ ]
[ ]

FP16 + SDPA

[ ]
[ ]

From here on, we refer to "FP16 + SDPA" as the default setting.

Default + torch.compile()

[ ]
[ ]

Default + Model CPU Offloading

Here we focus more on the memory optimization rather than inference speed.

[ ]
[ ]

Default + Sequential CPU Offloading

[ ]
[ ]

Default + VAE Slicing

Specifically suited for optimizing memory for decoding latents into higher-res images without compromising too much on the inference speed.

[ ]
[ ]

Default + VAE Slicing + Sequential CPU Offloading

[ ]
[ ]

Default + Precompting text embeddings

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

Default + Tiny Autoencoder

This is better suited for generating (almost) instant previews. The "instant" part is of course, GPU-dependent. On an A10G, for example, it can be achieved.

[ ]