Qwen2.5 Coder (1.5B) Tool Calling
To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!
To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.
You will learn how to do data prep, how to train, how to run the model, & how to save it
News
Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog
You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog
Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog
3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context
New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL
Visit our docs for all our model uploads and notebooks.
Installation
Unsloth
By definition, Tool Calling refers to the capability of models to interact with external tools. As you may already know, LLMs are excellent at generating text in the manner of question-and-answer pairs, chat-like conversation messages, and others. When provided useful context, LLMs are good at responding to prompts that can be used as actionable command parameters that another system would handle. This could be a search engine, calculators, company APIs, spreadsheets, or SQL queries. These systems when interacted with are usually done in a single command line or if more complex a runnable scripting language such as Bash or Python. The painful part is how to sequence these commands and assign the correct parameter names and values. What if we would just prompt the LLM to do these for us? For example. "Schedule an appointment for Alice with Bob Monday, 10:00am Eastern time"
For this notebook, we'll use a smaller Qwen2.5 model 1.5B. We will guide you on how to prompt a model to respond in JSON format so we can then parse the results and pass those arguments to a Python script. The intuition for using a small model is that we do not want the model to use its pretraining data for the responses when calculating the vector sum. Smaller models have less stored knowledge and it would be possible that our prompt is not on the training data.
Our intention is to provide a simple framework for integrating tool calling into your fine-tuning workflow for unsloth. Let us know how we can help you further.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.3.10: Fast Qwen2 patching. Transformers: 4.47.1. \\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0 \ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
config.json: 0%| | 0.00/765 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/3.09G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/265 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/7.51k [00:00<?, ?B/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
added_tokens.json: 0%| | 0.00/632 [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/613 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/7.03M [00:00<?, ?B/s]
Tool Definitions
This is a list of all the functions that we would provide to the model. The standard format is from OpenAI's Function Calling Definition It is highly possible that the model trained for tool calling was in the OpenAI standard format.
Below is an example of two function definitions. The function definitions of get_vector_sum and get_dot_product will then be added to the prompt as a context for our prompt:
Find the sum of a = [1, -1, 2] and b = [3, 0, -4].
We can just provide the correct one: get_vector_sum but to experiment if the model can identify the correct function to call, we will provide both.
Below is the user prompt declaration
Python Code
Below is the actual code for the function, you may notice that it has Python docstrings. This is because apply_chat_template() can accept and translate functions into OpenAI function definitions that are PEP 257 compliant.
Now let's prompt the model to provide us the arguments in JSON format. You may notice that we passed the actual function get_vector_sum() to the tool parameter and not the get_tool_definition_list(), You may try to change it from tools=[get_vector_sum], to tools=[get_tool_definition_list() to see if it works with the function definitions.
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_vector_sum", "description": "Performs element-wise addition of two numerical vectors.\n\nBoth vectors must be of the same length and contain numerical values.", "parameters": {"type": "object", "properties": {"a": {"type": "array", "items": {"type": "number"}, "description": "First vector containing numerical values"}, "b": {"type": "array", "items": {"type": "number"}, "description": "Second vector containing numerical values"}}, "required": ["a", "b"]}, "return": {"type": "array", "items": {"type": "number"}, "description": "Resulting vector where each element is the sum of corresponding elements in a and b"}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
Find the sum of a = [1, -1, 2] and b = [3, 0, -4].<|im_end|>
<|im_start|>assistant
Below is where we call the unsloth function generate_with_grammar(). This function uses Grammar-Constrained Decoding. Meaning it will only respond in JSON. It uses a library fork of transformers-CFG by Saibo-creator the output is very similar to the llama-cpp-python output. We decided to use this library so that our code would be portable to llama-cpp-python later during production.
If successful, the model should output a single valid JSON response with the following result:
[
{
"name": "get_vector_sum",
"arguments": {
"a": [1, -1, 2],
"b": [3, 0, -4]
}
}
]
tokenizer_config.json: 0%| | 0.00/7.51k [00:00<?, ?B/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/7.03M [00:00<?, ?B/s]
added_tokens.json: 0%| | 0.00/632 [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/613 [00:00<?, ?B/s]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
[
{
"name": "get_vector_sum",
"arguments": {
"a": [1, -1, 2],
"b": [3, 0, -4]
}
}
] From here we can now parse the arguments provided to us by the model
args a: [1, -1, 2], b: [3, 0, -4]
Here we actually call get_vector_sum() and capture the result
result: [4, -1, -2]
Below is the final prompt to the model in the form of a chat message. To ensure that the model responds with the actual answer prompted the model's answer with the following:
You are a super helpful AI assistant. You are asked to answer a question based on the following context information.
Question:
Answer:
Then we set continue_final_message=True for the tokenizer
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
You are a super helpful AI assistant.
You are asked to answer a question based on the following context information.
Question:
Find the sum of a = [1, -1, 2] and b = [3, 0, -4].<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "get_vector_sum", "arguments": {"a": [1, -1, 2], "b": [3, 0, -4]}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
[4, -1, -2]
</tool_response><|im_end|>
<|im_start|>assistant
Answer:
The sum of a = [1, -1, 2] and b = [3, 0, -4] is [4, -1, -2].
For comparison, if we would prompt the model without tool calling:
<|im_start|>system You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|> <|im_start|>user Find the sum of a = [1, -1, 2] and b = [3, 0, -4].<|im_end|> <|im_start|>assistant
To find the sum of two vectors \( \mathbf{a} = [1, -1, 2] \) and \( \mathbf{b} = [3, 0, -4] \), we need to add corresponding components of the vectors.
The sum of two vectors \( \mathbf{a} = [a_1, a_2, a_3] \) and \( \mathbf{b} = [b_1, b_2, b_3] \) is given by:
\[ \mathbf{a} + \mathbf{b} = [a_1 + b_1, a_2 + b_2, a_3 + b_3] \]
Let's compute this step-by-step using Python code.
```python
# Define the vectors
a = [1, -1, 2]
b = [3, 0, -4]
# Compute the sum of the vectors
result = [a_i + b_i for a_i, b_i in zip(a, b)]
print(result)
```
```output
[4, -1, -2]
```
The sum of the vectors \( \mathbf{a} = [1, -1, 2] \) and \( \mathbf{b} = [3, 0, -4] \) is \( \boxed{[4, -1, -2]} \). Example currency calculation with API call
For the example below, we are going to ask the model to compute a total list of items and convert it to Euro using a public API.
Pydantic is used below because it will help us with type safety and validation, and also helps us with clear function schemas.
For the prompt, we will ask the model "How much is the total cost of all inventory items in Euros?"
Based on the prompt we need to define three tools
- Get a list of all items in the inventory
- The conversion rate of the Euro currency
- Compute the total inventory cost in Euros
Below we validate the tools before passing it to the tokenizer
After ensuring the functions are valid we pass it to the tools param
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_all_items", "description": "Fetches all the inventory items", "parameters": {"type": "object", "properties": {}}, "return": {"type": "array", "items": {"type": "object"}}}}
{"type": "function", "function": {"name": "inventory_check", "description": "Calculates the total value of inventory items in the target conversion rate.\nWhen item_codes=None, calculates total value for all items.", "parameters": {"type": "object", "properties": {"item_codes": {"type": "array", "items": {"type": "string"}, "nullable": true, "description": "List of item codes to include (None for all items)"}, "conversion_rate": {"type": "number", "description": "Exchange rate to convert costs to target currency"}}, "required": ["item_codes", "conversion_rate"]}, "return": {"type": "number", "description": "Total value of matching items in target currency, rounded to 2 decimals"}}}
{"type": "function", "function": {"name": "get_usd_to_euro_conversion_rate", "description": "Gets the conversion rate from USD to EURO", "parameters": {"type": "object", "properties": {}}, "return": {"type": "number"}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
How much is the total cost of all inventory items in Euros?<|im_end|>
<|im_start|>assistant
[
{
"name": "inventory_check",
"arguments": {
"item_codes": [],
"conversion_rate": 0.85
}
}
] item_codes: [], conversion_rate: 0.85
Here we actually call inventory_check() and capture the result
55.96
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
You are a super helpful AI assistant.
You are asked to answer a question based on the following context information.
Question:
How much is the total cost of all inventory items in Euros?<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "inventory_check", "arguments": {"item_codes": [], "conversion_rate": 0.85}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
55.96
</tool_response><|im_end|>
<|im_start|>assistant
Answer:
The total cost of all inventory items in Euros is 55.96.
Let's try if the model can use the correct tools for fetching item names. Note that we added a prompt "Ensure to use fetch_item_by_name first for fetching the item code"
Below we declare a new function fetch_item_by_name which fetches a single item based on the item name. Next we append the new function to our tool list
Let's make sure that we have a valid tools definition
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "fetch_item_by_name", "description": "Fetch an item by name and returns the Item object.", "parameters": {"type": "object", "properties": {"item_name": {"type": "string", "description": "The human-readable name of the item to fetch"}}, "required": ["item_name"]}, "return": {"type": "object", "nullable": true, "description": "Optional[Item]: The Item with the given name, or None if not found"}}}
{"type": "function", "function": {"name": "get_all_items", "description": "Fetches all the inventory items", "parameters": {"type": "object", "properties": {}}, "return": {"type": "array", "items": {"type": "object"}}}}
{"type": "function", "function": {"name": "inventory_check", "description": "Calculates the total value of inventory items in the target conversion rate.\nWhen item_codes=None, calculates total value for all items.", "parameters": {"type": "object", "properties": {"item_codes": {"type": "array", "items": {"type": "string"}, "nullable": true, "description": "List of item codes to include (None for all items)"}, "conversion_rate": {"type": "number", "description": "Exchange rate to convert costs to target currency"}}, "required": ["item_codes", "conversion_rate"]}, "return": {"type": "number", "description": "Total value of matching items in target currency, rounded to 2 decimals"}}}
{"type": "function", "function": {"name": "get_usd_to_euro_conversion_rate", "description": "Gets the conversion rate from USD to EURO", "parameters": {"type": "object", "properties": {}}, "return": {"type": "number"}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
How much is the total inventory cost of item name: Bottled Water in Euros? Ensure to use fetch_item_by_name first for fetching the item code<|im_end|>
<|im_start|>assistant
[
{
"name": "fetch_item_by_name",
"arguments": {
"item_name": "Bottled Water"
}
},
{
"name": "inventory_check",
"arguments": {
"item_codes": ["12345"],
"conversion_rate": 0.85
}
}
] The model should return the values for the arguments of fetch_item_by_name and inventory_check
item_name: Bottled Water, item_code: ITEM-002 conversion_rate: 0.85
17.68
After computing (actually call the inventory total of Bottled Water, we prompt the model again.
Note that we've added the prompt below for for better accuracy.
"Answer:\n"
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
You are a super helpful AI assistant.
You are asked to answer a question based on the following context information.
Question:
How much is the total inventory cost of item name: Bottled Water in Euros? Ensure to use fetch_item_by_name first for fetching the item code<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "inventory_check", "arguments": {"item_codes": [], "conversion_rate": 0.85}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
17.68
</tool_response><|im_end|>
<|im_start|>assistant
Answer:
The total inventory cost of item name: Bottled Water is €17.68.
And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Looking to use Unsloth locally? Read our Installation Guide for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
- Learn how to do Reinforcement Learning with our RL Guide and notebooks.
- Read our guides and notebooks for Text-to-speech (TTS) and vision model support.
- Explore our LLM Tutorials Directory to find dedicated guides for each model.
- Need help with Inference? Read our Inference & Deployment page for details on using vLLM, llama.cpp, Ollama etc.



