Notebooks
H
Hugging Face
Fine Tune PaliGemma

Fine Tune PaliGemma

paligemmahf-notebooksexamples

PaliGemma Fine-tuning

In this notebook, we will fine-tune pretrained PaliGemma on a small split of VQAv2 dataset. Let's get started by installing necessary libraries.

[ ]

We will authenticate to access the model using notebook_login().

[ ]

Let's load the dataset.

[ ]
[ ]
[ ]
[ ]

Load the processor to preprocess the dataset.

[ ]

We will preprocess our examples. We need to prepare a prompt template and pass the text input inside, pass it with batches of images to processor. Then we will set the pad tokens and image tokens to -100 to let the model ignore them. We will pass our preprocessed input as labels to make the model learn how to generate responses.

[ ]

Our dataset is a very general one and similar to many datasets that PaliGemma was trained with. In this case, we do not need to fine-tune the image encoder, the multimodal projector but we will only fine-tune the text decoder.

[ ]

Alternatively, if you want to do LoRA and QLoRA fine-tuning, you can run below cells to load the adapter either in full precision or quantized.

[ ]

We will now initialize the TrainingArguments.

[ ]

We can now start training.

[ ]
[ ]
[ ]

You can find steps to infer here.