Notebooks
M
Meta Llama
Tool Calling 101

Tool Calling 101

llamaagentsAIvllmmachine-learningend-to-end-use-casesllama2LLMAgents_Tutorialllama-cookbookPythonfinetuningpytorchlangchain

Tool Calling 101:

This tutorial shows you how to apply Tool Calling using Llama models. This tutorial uses Llama 3.3 models.

For continuity, we show built-in tool calling that we introduced in Llama-3.1 namely allowing you to use brave_search and wolfram_alpha.

However, please remember 3.3 models will work great with zero-shot tool calling which we showcase in second notebook. In fact that is the recommended path.

Note: If you are looking for 3.2 Featherlight Model (1B and 3B) instructions, please see the respective sections in our website, this one covers 3.1 models.

We are briefly introduction the 3.2 models at the end.

Note: The new vision models behave same as 3.1 models when you are talking to the models without an image

This is part (1/2) in the tool calling series, this notebook will cover the basics of what tool calling is and how to perform it with Llama 3.3 models

Here's what you will learn in this notebook:

  • Setup Groq to access Llama 3.3 70B model
  • Avoid common mistakes when performing tool-calling with Llama
  • Understand Prompt templates for Tool Calling
  • Understand how the tool calls are handled under the hood
  • 3.2 Model Tool Calling Format and Behaviour

In Part 2, we will learn how to build system that can get us comparison between 2 papers

What is Tool Calling?

This approach was popularised by the Gorilla paper-which showed that Large Language Model(s) can be fine-tuned on API examples to teach them calling an external API.

This is really cool because we can now use a LLM as a "brain" of a system and connect it to external systems to perform actions.

In simpler words, "Llama can order your pizza for you" :)

With the Llama 3.1 release, the models excel at tool calling and support out of box brave_search, wolfram_api and code_interpreter.

However, first let's take a look at a common mistake

Install and setup groq dependencies

  • Install groq api to access Llama model(s)
  • Configure our client and authenticate with API Key(s), Note: PLEASE UPDATE YOUR KEY BELOW
[ ]
[ ]

Common Mistake of Tool-Calling: Incorrect Prompt Template

While Llama 3.1 works with tool-calling out of box, a wrong prompt template can cause issues with unexpected behaviour.

Sometimes, even superheroes need to be reminded of their powers.

Let's first try "forcing a prompt response from the model"

Note: Remember this is the WRONG template, please scroll to next section to see the right approach if you are in a rushed copy-pasta sprint

This section will show you that the model will not use brave_search and wolfram_api out of the box unless the prompt template is set correctly. Even if the model is asked to do so!

[5]
[6]

Asking the model about a recent news

Since the prompt template is incorrect, it will answer using cutoff memory

[ ]
Assistant: As of my knowledge cutoff in December 2023, there has been no official announcement from FromSoftware, the developers of the Elden Ring series, regarding a release date for a new Elden Ring game.

However, it's worth noting that FromSoftware has mentioned that they are working on new projects, and there have been rumors and speculation about a potential Elden Ring sequel or DLC. But until an official announcement is made, we can't confirm any details about a new Elden Ring game.

If you're eager for more Elden Ring content, you can keep an eye on the official Elden Ring website, social media channels, and gaming news outlets for any updates or announcements. I'll be happy to help you stay informed if any new information becomes available!

Asking the model about a Math problem

Again, the model answer(s) based on memory and not tool-calling

[8]
Assistant: To find the square root of 23131231, we'll calculate it directly.

The square root of 23131231 is approximately 4807.035.

Can we solve this using a reminder prompt?

[9]
Assistant: To find the square root of 23131231, I can use a calculator or a computational tool.

Using a calculator, I get:

√23131231 ≈ 4817.42

So, the square root of 23131231 is approximately 4817.42.

Looks like we didn't get the wolfram_api call, let's try one more time with a stronger prompt:

[10]
Assistant: To find the square root of 23131231, I can use a tool like Wolfram Alpha.

The square root of 23131231 is approximately 4817.316.


Wolfram Alpha calculation:

√23131231 ≈ 4817.316

Official Prompt Template

As you can see, the model doesn't perform tool-calling in an expected fashion above. This is because we are not following the recommended prompting format.

The Llama Stack is the go to approach to use the Llama model family and build applications.

Let's first install the llama-stack Python package to have the Llama CLI available.

[13]

Now we can learn about the various prompt formats available

When you run the cell below-you will see models available and then we can check details for model specific prompts

[14]
usage: llama model prompt-format [-h] [-m MODEL_NAME]
llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
[17]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                    Llama 3.1 - Prompt Formats                                    ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛


                                               Tokens                                               

Here is a list of special tokens that are supported by Llama 3.1:                                   

<|begin_of_text|>: Specifies the start of the prompt                                             
<|end_of_text|>: Model will cease to generate more tokens. This token is generated only by the   
   base models.                                                                                     
<|finetune_right_pad_id|>: This token is used for padding text sequences to the same length in a 
   batch.                                                                                           
<|start_header_id|> and <|end_header_id|>: These tokens enclose the role for a particular        
   message. The possible roles are: [system, user, assistant and ipython]                           
<|eom_id|>: End of message. A message represents a possible stopping point for execution where   
   the model can inform the executor that a tool call needs to be made. This is used for multi-step 
   interactions between the model and any available tools. This token is emitted by the model when  
   the Environment: ipython instruction is used in the system prompt, or if the model calls for a   
   built-in tool.                                                                                   
<|eot_id|>: End of turn. Represents when the model has determined that it has finished           
   interacting with the user message that initiated its response. This is used in two scenarios:    
   at the end of a direct interaction between the model and the user                             
   at the end of multiple interactions between the model and any available tools This token      
      signals to the executor that the model has finished generating a response.                    
<|python_tag|>: Is a special tag used in the model's response to signify a tool call.            

There are 4 different roles that are supported by Llama 3.1                                         

system: Sets the context in which to interact with the AI model. It typically includes rules,    
   guidelines, or necessary information that helps the model respond effectively.                   
user: Represents the human interacting with the model. It includes the inputs, commands, and     
   questions to the model.                                                                          
ipython: A new role introduced in Llama 3.1. Semantically, this role means "tool". This role is  
   used to mark messages with the output of a tool call when sent back to the model from the        
   executor.                                                                                        
assistant: Represents the response generated by the AI model based on the context provided in the
   system, ipython and user prompts.                                                                


                                        Llama 3.1 Base Model                                        

Text completion for Llama 3.1 base model uses this format.                                          

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|>Color of sky is blue but sometimes can also be                                    
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
  red, orange, yellow, green, purple, pink, brown, gray, black, white, and even rainbow colors. The 
 color of the sky can change due to various reasons such as time of day, weather conditions,        
 pollution, and atmospheric phenomena.                                                              
 The color of the sky is primarily blue because of a phenomenon called                              
                                                                                                    

Note start special tag                                                                              


                                      Llama 3.1 Instruct Model                                      


                                  User and assistant conversation                                   

Here is a regular multi-turn user assistant conversation and how its formatted.                     

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>                      
                                                                                                    
 Answer who are you in the form of jeopardy?<|eot_id|><|start_header_id|>assistant<|end_header_id|> 
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 Here's my response                                                                                 
                                                                                                    
 "What is a helpful assistant?"<|eot_id|>                                                           
                                                                                                    


                                        Tool Calling Formats                                        

The three built-in tools (brave_search, wolfram_alpha, and code interpreter) can be turned on using 
the system prompt:                                                                                  

Brave Search: Tool call to perform web searches.                                                 
Wolfram Alpha: Tool call to perform complex mathematical calculations.                           
Code Interpreter: Enables the model to output python code.                                       


                                        Builtin Tool Calling                                        

Here is an example of a conversation using brave search                                             

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 Environment: ipython                                                                               
 Tools: brave_search, wolfram_alpha                                                                 
 Cutting Knowledge Date: December 2023                                                              
 Today Date: 21 September 2024                                                                      
                                                                                                    
 You are a helpful assistant.                                                                       
 <|eot_id|><|start_header_id|>user<|end_header_id|>                                                 
                                                                                                    
 Search the web for the latest price of 1oz                                                         
 gold?<|eot_id|><|start_header_id|>assistant<|end_header_id|>                                       
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 <|python_tag|>brave_search.call(query="latest price of 1oz gold")<|eom_id|>                        
                                                                                                    

Just including Environment: ipython turns on code interpreter; therefore, you don't need to      
   specify code interpretation on the Tools: line. The model can generate python code which is      
   interpreted by the executor, with the result provided back to the model.                         
The message body of the assistant response starts with a special tag <|python_tag|>              
As alluded to above, in such an environment, the model can generate <|eom_id|> instead of just   
   the standard <|eot_id|> . The latter indicates the turn is finished, while the former indicates  
   continued multi-step reasoning. That is, the model is expecting a continuation message with the  
   output of the tool call.                                                                         
The model tool call response is of the form tool.call(query="...") wher tool is brave_search or  
   wolfram_alpha                                                                                    


                                      Builtin Code Interpreter                                      

Here is an actual example of model responding with code                                             

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 Environment: ipython<|eot_id|><|start_header_id|>user<|end_header_id|>                             
                                                                                                    
 Write code to check if number is prime, use that to see if the number 7 is                         
 prime<|eot_id|><|start_header_id|>assistant<|end_header_id|>                                       
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 <|python_tag|>def is_prime(n):                                                                     
     if n <= 1                                                                                      
         return False                                                                               
     for i in range(2, int(n**0.5) + 1):                                                            
         if n % i == 0:                                                                             
             return False                                                                           
     return True                                                                                    
                                                                                                    
 print(is_prime(7))  # Output: True<|eom_id|>                                                       
                                                                                                    

Model starts with <|python_tag|> and continues writing python code that it needs to be executed  
No explicit mention of code_interpreter in system prompt. Environment: ipython implicitly enables
   it.                                                                                              


                                  Built-in tools full interaction                                   

Here is a full interaction with the built-in tools including the tool response and the final        
assistant response.                                                                                 

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 Environment: ipython                                                                               
 Tools: brave_search, wolfram_alpha                                                                 
 <|eot_id|><|start_header_id|>user<|end_header_id|>                                                 
                                                                                                    
 What is the 100th decimal of pi?<|eot_id|><|start_header_id|>assistant<|end_header_id|>            
                                                                                                    
 <|python_tag|>wolfram_alpha.call(query="100th decimal of                                           
 pi")<|eom_id|><|start_header_id|>ipython<|end_header_id|>                                          
                                                                                                    
                                                                                                    
 {                                                                                                  
     "queryresult": {                                                                               
         "success": true,                                                                           
         "inputstring": "100th decimal of pi",                                                      
         "pods": [                                                                                  
             {                                                                                      
                 "title": "Input interpretation",                                                   
                 "subpods": [                                                                       
                     {                                                                              
                         "title": "",                                                               
                         "plaintext": "100th digit | π"                                             
                     }                                                                              
                 ]                                                                                  
             },                                                                                     
             {                                                                                      
                 "title": "Nearby digits",                                                          
                 "subpods": [                                                                       
                     {                                                                              
                         "title": "",                                                               
                         "plaintext": "...86208998628034825342117067982148086513282306647093..."    
                     }                                                                              
                 ]                                                                                  
             },                                                                                     
             {                                                                                      
                 "title": "Result",                                                                 
                 "primary": true,                                                                   
                 "subpods": [                                                                       
                     {                                                                              
                         "title": "",                                                               
                         "plaintext": "7"                                                           
                     }                                                                              
                 ]                                                                                  
             }                                                                                      
         ]                                                                                          
     }                                                                                              
 }                                                                                                  
 <|eot_id|><|start_header_id|>assistant<|end_header_id|>                                            
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 The 100th decimal of pi is 7.<|eot_id|>                                                            
                                                                                                    

Note the <|python_tag|> in the assistant response.                                               
Role is ipython for the wolfram alpha response that is passed back to the model.                 
Final message from assistant has <|eot_id|> tag.                                                 


                                       Zero shot tool calling                                       


                                      JSON based tool calling                                       

Llama models can now output custom tool calls from a single message to allow easier tool calling.   
The following prompts provide an example of how custom tools can be called from the output of the   
model. It's important to note that the model itself does not execute the calls; it provides         
structured output to facilitate calling by an executor.                                             

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 Environment: ipython                                                                               
                                                                                                    
 Cutting Knowledge Date: December 2023                                                              
 Today Date: 21 September 2024                                                                      
                                                                                                    
 You are a helpful assistant.                                                                       
 <|eot_id|><|start_header_id|>user<|end_header_id|>                                                 
                                                                                                    
 Answer the user's question by making use of the following functions if needed.                     
 If none of the function can be used, please say so.                                                
 Here is a list of functions in JSON format:                                                        
 {                                                                                                  
     "type": "function",                                                                            
     "function": {                                                                                  
         "name": "trending_songs",                                                                  
         "description": "Returns the trending songs on a Music site",                               
         "parameters": {                                                                            
             "type": "object",                                                                      
             "properties": [                                                                        
                 {                                                                                  
                     "n": {                                                                         
                         "type": "object",                                                          
                         "description": "The number of songs to return"                             
                     }                                                                              
                 },                                                                                 
                 {                                                                                  
                     "genre": {                                                                     
                         "type": "object",                                                          
                         "description": "The genre of the songs to return"                          
                     }                                                                              
                 }                                                                                  
             ],                                                                                     
             "required": ["n"]                                                                      
         }                                                                                          
     }                                                                                              
 }                                                                                                  
                                                                                                    
 Return function calls in JSON format.<|eot_id|><|start_header_id|>user<|end_header_id|>            
                                                                                                    
 Use tools to get latest trending songs<|eot_id|><|start_header_id|>assistant<|end_header_id|>      
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 <|python_tag|>{                                                                                    
     "type": "function",                                                                            
     "name": "trending_songs",                                                                      
     "parameters": {                                                                                
         "n": "10",                                                                                 
         "genre": "all"                                                                             
     }                                                                                              
 }<|eom_id|>                                                                                        
                                                                                                    

JSON format for providing tools needs name, description and parameters                           
Model responds with <|python_tag|> and <|eom_id|> as Environment: ipython was in the system      
   prompt                                                                                           
Instructions for tools added as a user message                                                   
Only single tool calls are supported as of now                                                   


                               Example of a user defined tool calling                               


                                   <function> based tool calling                                    

Here is an example of how you could also write custom instructions for model to do zero shot tool   
calling. In this example, we define a custom tool calling format using the <function> tag.          

                                        Input Prompt Format                                         

                                                                                                    
 <|begin_of_text|><|start_header_id|>system<|end_header_id|>                                        
                                                                                                    
 Environment: ipython                                                                               
                                                                                                    
 Cutting Knowledge Date: December 2023                                                              
 Today Date: 21 September 2024                                                                      
                                                                                                    
 You are a helpful assistant.                                                                       
 <|eot_id|><|start_header_id|>user<|end_header_id|>                                                 
                                                                                                    
 You have access to the following functions:                                                        
                                                                                                    
 Use the function 'trending_songs' to 'Returns the trending songs on a Music site':                 
 {"name": "trending_songs", "description": "Returns the trending songs on a Music site",            
 "parameters": {"genre": {"description": "The genre of the songs to return", "param_type": "str",   
 "required": false}, "n": {"description": "The number of songs to return", "param_type": "int",     
 "required": true}}}                                                                                
                                                                                                    
 Think very carefully before calling functions.                                                     
 If you choose to call a function ONLY reply in the following format with no prefix or suffix:      
                                                                                                    
 <function=example_function_name>{"example_name": "example_value"}</function>                       
                                                                                                    
 Reminder:                                                                                          
 - If looking for real time information use relevant functions before falling back to brave_search  
 - Function calls MUST follow the specified format, start with <function= and end with </function>  
 - Required parameters MUST be specified                                                            
 - Only call one function at a time                                                                 
 - Put the entire function call reply on one line<|eot_id|><|start_header_id|>user<|end_header_id|> 
                                                                                                    
 Use tools to get latest trending songs<|eot_id|><|start_header_id|>assistant<|end_header_id|>      
                                                                                                    

                                       Model Response Format                                        

                                                                                                    
 <function=trending_songs>{"n": 10}</function><|eot_id|>                                            
                                                                                                    

In this case, model does NOT respond with <|python_tag|> and ends with <|eot_id|>                
Instructions for tools added as a user message                                                   

Thank You!                                                                                          

Tool Calling: Using the correct Prompt Template

With llama-stack we have already learned the right behaviour of the model

If everything is setup correctly-the model should now wrap function calls with the |<python_tag>| following the actually function call.

This can allow you to manage your function calling logic accordingly.

Time to test the theory

[ ]
Assistant: <|python_tag|>brave_search.call(query="Elden Ring sequel release date")
[96]
Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")

Using this knowledge in practice

A common misconception about tool calling is: the model can handle the tool call and get your output.

This is NOT TRUE, the actual tool call is something that you have to implement. With this knowledge, let's see how we can utilize brave search to answer our original question

[97]
[98]
Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")
[99]
<|python_tag|>wolfram_alpha.call(query="square root of 23131231")
[102]
Function name: wolfram_alpha
Method: call
Args: "square root of 23131231"

You can implement this in different ways but the idea is the same, the LLM gives an output with the <|python_tag|>, which should call a tool-calling mechanism.

This logic gets handled in the program and then the output is passed back to the model to answer the user

Code interpreter

With the correct prompt template, Llama model can output Python (as well as code in any-language that the model has been trained on)

[54]
Assistant: <|python_tag|>import math

# Define the variables
monthly_investment = 400
interest_rate = 0.05
target_amount = 100000

# Calculate the number of months it would take to reach the target amount
months = 0
current_amount = 0
while current_amount < target_amount:
    current_amount += monthly_investment
    current_amount *= 1 + interest_rate / 12  # Compound interest
    months += 1

# Print the result
print(f"It would take {months} months, approximately {months / 12:.2f} years, to reach the target amount of ${target_amount:.2f}.")

Let's validate the output by running the output from the model:

[55]
It would take 172 months, approximately 14.33 years, to reach the target amount of $100000.00.

3.2 Models Custom Tool Prompt Format

Life is great because Llama Team writes great docs for us, so we can conveniently copy-pasta examples from there :)

Here are the docs for your reference that we will be using.

Exercise for viewer: Use llama-toolchain again to verify like we did earlier and then start the prompt engineering for the small Llamas.

[3]
[4]
[5]

Note: We are assuming a structure for dataset here:

  • Name
  • Email
  • Age
  • Color request
[6]
Assistant: [get_user_info(user_id=7890, special='black')]

Dummy dataset to make sure our model stays happy :)

[7]
[8]
[{'name': 'Emma Davis',
,  'email': 'emma@example.com',
,  'age': 31,
,  'special_info': 'Special request: black'}]

Handling Tool-Calling logic for the model

Hello Regex, my good old friend :)

With Regex, we can write a simple way to handle tool_calling and return either the model or tool call response

[9]
[10]
Assistant: Function call result: {
  "name": "Emma Davis",
  "email": "emma@example.com",
  "age": 31,
  "special_info": "Special request: black"
}
[56]