Unsloth Zephyr (7B) DPO

Zephyr (7B) DPO

unsloth-notebooksunslothnb

alph-notebooks/unsloth-notebooks / Zephyr_(7B)-DPO.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + ⭐ Star us on Github ⭐

To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.

You will learn how to do data prep, how to train, how to run the model, & how to save it

News

Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog

You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog

Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog

3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context

New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL

Visit our docs for all our model uploads and notebooks.

Installation

[ ]

Unsloth

[ ]

/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py:67: UserWarning: CUDA is not linked properly.
We shall run `ldconfig /usr/lib64-nvidia` to try to fix it.
  warnings.warn(

[ ]

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

config.json:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.1
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB
O^O/ \_/ \    CUDA capability = 7.5. Xformers = 0.0.22.post7. FA = False.
\        /    Pytorch version: 2.1.0+cu121. CUDA Toolkit = 12.1
 "-____-"     bfloat16 = FALSE. Platform = Linux

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.

model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

[ ]

Data Prep

We follow Hugging Face's Alignment Handbook for Zephyr and use the Ultra Feedback dataset, and sample 0.5% of it to speed things up. You can sample the full dataset for a full run.

[ ]

Downloading readme:   0%|          | 0.00/5.98k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.50M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/180M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.84M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.12M [00:00<?, ?B/s]

Generating train_sft split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_prefs split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/309 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/2000 [00:00<?, ? examples/s]

We shall print a random item from the dataset

[ ]

('<|system|>\n'
 '</s>\n'
 '<|user|>\n'
 'List two natural resources which was made in the factory.</s>\n'
 '<|assistant|>\n')
('Natural resources are not made in factories. Natural resources are materials '
 'and substances that occur naturally on Earth, such as water, minerals, '
 'forests, and fossil fuels. Factories typically produce man-made materials or '
 'process natural resources into finished products.</s>\n')
("I'm sorry, but it seems there might be some confusion in your question as "
 'natural resources are typically sourced from the earth or sea, and not made '
 'in a factory. However, factories often use natural resources to create '
 'various products. Two examples of natural resources that factories may use '
 'are crude oil and iron ore. Crude oil is refined to produce various '
 'petroleum products, such as gasoline and plastics, while iron ore is refined '
 'to create steel, which is used in the construction industry, vehicle '
 'manufacturing, and more. Does this help clarify things?</s>\n')

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

[ ]

Unsloth 2024.1 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

Train the DPO model

Now let's train our model. We do 3 epochs on 0.5% of the dataset to speed things up.

[ ]

/usr/local/lib/python3.10/dist-packages/trl/trainer/dpo_trainer.py:294: UserWarning: When using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your TrainingArguments we have set it for you, but you should do it yourself in the future.
  warnings.warn(

Map:   0%|          | 0/309 [00:00<?, ? examples/s]

[ ]

Unsloth: `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`
Could not estimate the number of tokens of the input, floating-point operations will not be computed

TrainOutput(global_step=114, training_loss=0.397163258179238, metrics={'train_runtime': 4017.5003, 'train_samples_per_second': 0.231, 'train_steps_per_second': 0.028, 'total_flos': 0.0, 'train_loss': 0.397163258179238, 'epoch': 2.94})

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Looking to use Unsloth locally? Read our Installation Guide for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
Learn how to do Reinforcement Learning with our RL Guide and notebooks.
Read our guides and notebooks for Text-to-speech (TTS) and vision model support.
Explore our LLM Tutorials Directory to find dedicated guides for each model.
Need help with Inference? Read our Inference & Deployment page for details on using vLLM, llama.cpp, Ollama etc.

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0