Exploring Simple Optimizations For Sdxl
hf-notebooksdiffusers
Export
Exploring simple optimizations for Stable Diffusion XL
[ ]
[ ]
Unoptimized setup
- FP32 computation
- Default attention processor
[ ]
[ ]
[ ]
[ ]
Just FP16
[ ]
[ ]
FP16 + SDPA
[ ]
[ ]
From here on, we refer to "FP16 + SDPA" as the default setting.
Default + torch.compile()
[ ]
[ ]
Default + Model CPU Offloading
Here we focus more on the memory optimization rather than inference speed.
[ ]
[ ]
Default + Sequential CPU Offloading
[ ]
[ ]
Default + VAE Slicing
Specifically suited for optimizing memory for decoding latents into higher-res images without compromising too much on the inference speed.
[ ]
[ ]
Default + VAE Slicing + Sequential CPU Offloading
[ ]
[ ]
Default + Precompting text embeddings
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Default + Tiny Autoencoder
This is better suited for generating (almost) instant previews. The "instant" part is of course, GPU-dependent. On an A10G, for example, it can be achieved.
[ ]