Notebooks
U
Unsloth
Qwen3 (0.6B) Reasoning Conversational ExecuTorch

Qwen3 (0.6B) Reasoning Conversational ExecuTorch

unsloth-notebooksunslothnb

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + ⭐ Star us on Github

To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.

You will learn how to do data prep, how to train, how to run the model, & how to save it

Installation

[ ]

Unsloth

[ ]

Data Prep

Qwen3 has both reasoning and a non reasoning mode. So, we should use 2 datasets:

  1. We use the Open Math Reasoning dataset which was used to win the AIMO (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and which got > 95% accuracy.

  2. We also leverage Maxime Labonne's FineTome-100k dataset in ShareGPT style. But we need to convert it to HuggingFace's normal multiturn format as well.

[ ]

Let's see the structure of both datasets:

[ ]
[ ]

We now convert the reasoning dataset into conversational format:

[ ]
[ ]

Let's see the first transformed row:

[ ]

Next we take the non reasoning dataset and convert it to conversational format as well.

We have to use Unsloth's standardize_sharegpt function to fix up the format of the dataset first.

[ ]

Let's see the first row

[ ]

Now let's see how long both datasets are:

[ ]

The non reasoning dataset is much longer. Let's assume we want the model to retain some reasoning capabilities, but we specifically want a chat model.

Let's define a ratio of chat only data. The goal is to define some mixture of both sets of data.

Let's select 75% reasoning and 25% chat based:

[ ]

Let's sample the reasoning dataset by 75% (or whatever is 100% - chat_percentage)

[ ]

Finally combine both datasets:

[ ]

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None.

[ ]

Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)

[ ]

Inference

Let's run the model via Unsloth native inference! According to the Qwen-3 team, the recommended settings for reasoning inference are temperature = 0.6, top_p = 0.95, top_k = 20

For normal chat based inference, temperature = 0.7, top_p = 0.8, top_k = 20

[ ]
[ ]

Saving

The final model can be saved locally or pushed to the Hub to share with the community.

[ ]

Once saved, we can export the model checkpoint to ExecuTorch. And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

  1. Looking to use Unsloth locally? Read our Installation Guide for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
  2. Learn how to do Reinforcement Learning with our RL Guide and notebooks.
  3. Read our guides and notebooks for Text-to-speech (TTS) and vision model support.
  4. Explore our LLM Tutorials Directory to find dedicated guides for each model.
  5. Need help with Inference? Read our Inference & Deployment page for details on using vLLM, llama.cpp, Ollama etc.

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0