To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!
To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.
You will learn how to do data prep, how to train, how to run the model, & how to save it
News
Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog
You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog
Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog
3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context
New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL
Visit our docs for all our model uploads and notebooks.
Installation
Unsloth
/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py:67: UserWarning: CUDA is not linked properly. We shall run `ldconfig /usr/lib64-nvidia` to try to fix it. warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn(
config.json: 0%| | 0.00/1.04k [00:00<?, ?B/s]
==((====))== Unsloth: Fast Mistral patching release 2024.1 \\ /| GPU: Tesla T4. Max memory: 14.748 GB O^O/ \_/ \ CUDA capability = 7.5. Xformers = 0.0.22.post7. FA = False. \ / Pytorch version: 2.1.0+cu121. CUDA Toolkit = 12.1 "-____-" bfloat16 = FALSE. Platform = Linux You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.
model.safetensors: 0%| | 0.00/4.13G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/116 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/1.48k [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/1.80M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/624 [00:00<?, ?B/s]
Data Prep
We follow Hugging Face's Alignment Handbook for Zephyr and use the Ultra Feedback dataset, and sample 0.5% of it to speed things up. You can sample the full dataset for a full run.
Downloading readme: 0%| | 0.00/5.98k [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/222M [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/3.50M [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/180M [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/2.84M [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/222M [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/7.12M [00:00<?, ?B/s]
Generating train_sft split: 0%| | 0/61966 [00:00<?, ? examples/s]
Generating test_sft split: 0%| | 0/1000 [00:00<?, ? examples/s]
Generating train_gen split: 0%| | 0/61966 [00:00<?, ? examples/s]
Generating test_gen split: 0%| | 0/1000 [00:00<?, ? examples/s]
Generating train_prefs split: 0%| | 0/61966 [00:00<?, ? examples/s]
Generating test_prefs split: 0%| | 0/2000 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12): 0%| | 0/309 [00:00<?, ? examples/s]
Formatting comparisons with prompt template (num_proc=12): 0%| | 0/2000 [00:00<?, ? examples/s]
We shall print a random item from the dataset
('<|system|>\n'
'</s>\n'
'<|user|>\n'
'List two natural resources which was made in the factory.</s>\n'
'<|assistant|>\n')
('Natural resources are not made in factories. Natural resources are materials '
'and substances that occur naturally on Earth, such as water, minerals, '
'forests, and fossil fuels. Factories typically produce man-made materials or '
'process natural resources into finished products.</s>\n')
("I'm sorry, but it seems there might be some confusion in your question as "
'natural resources are typically sourced from the earth or sea, and not made '
'in a factory. However, factories often use natural resources to create '
'various products. Two examples of natural resources that factories may use '
'are crude oil and iron ore. Crude oil is refined to produce various '
'petroleum products, such as gasoline and plastics, while iron ore is refined '
'to create steel, which is used in the construction industry, vehicle '
'manufacturing, and more. Does this help clarify things?</s>\n')
We now add LoRA adapters so we only need to update 1 to 10% of all parameters!
Unsloth 2024.1 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
/usr/local/lib/python3.10/dist-packages/trl/trainer/dpo_trainer.py:294: UserWarning: When using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your TrainingArguments we have set it for you, but you should do it yourself in the future. warnings.warn(
Map: 0%| | 0/309 [00:00<?, ? examples/s]
Unsloth: `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False` Could not estimate the number of tokens of the input, floating-point operations will not be computed
TrainOutput(global_step=114, training_loss=0.397163258179238, metrics={'train_runtime': 4017.5003, 'train_samples_per_second': 0.231, 'train_steps_per_second': 0.028, 'total_flos': 0.0, 'train_loss': 0.397163258179238, 'epoch': 2.94}) And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Looking to use Unsloth locally? Read our Installation Guide for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
- Learn how to do Reinforcement Learning with our RL Guide and notebooks.
- Read our guides and notebooks for Text-to-speech (TTS) and vision model support.
- Explore our LLM Tutorials Directory to find dedicated guides for each model.
- Need help with Inference? Read our Inference & Deployment page for details on using vLLM, llama.cpp, Ollama etc.



