Build an agent with tool-calling superpowers ๐ฆธ using smolagents
Authored by: Aymeric Roucher
This notebook demonstrates how you can use smolagents to build awesome agents!
What are agents? Agents are systems that are powered by an LLM and enable the LLM (with careful prompting and output parsing) to use specific tools to solve problems.
These tools are basically functions that the LLM couldn't perform well by itself: for instance for a text-generation LLM like Llama-3-70B, this could be an image generation tool, a web search tool, a calculator...
What is smolagents? It's an library that provides building blocks to build your own agents! Learn more about it in the documentation.
Let's see how to use it, and which use cases it can solve.
Run the line below to install required dependencies:
Let's login in order to call the HF Inference API:
1. ๐๏ธ Multimodal + ๐ Web-browsing assistant
For this use case, we want to show an agent that browses the web and is able to generate images.
To build it, we simply need to have two tools ready: image generation and web search.
- For image generation, we load a tool from the Hub that uses the HF Inference API (Serverless) to generate images using Stable Diffusion.
- For the web search, we use a built-in tool.
TOOLCODE:
from smolagents import Tool
from huggingface_hub import InferenceClient
class TextToImageTool(Tool):
description = "This tool creates an image according to a prompt, which is a text description."
name = "image_generator"
inputs = {"prompt": {"type": "string", "description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."}}
output_type = "image"
model_sdxl = "black-forest-labs/FLUX.1-schnell"
client = InferenceClient(model_sdxl)
def forward(self, prompt):
return self.client.text_to_image(prompt)

2. ๐๐ฌ RAG with Iterative query refinement & Source selection
Quick definition: Retrieval-Augmented-Generation (RAG) is โusing an LLM to answer a user query, but basing the answer on information retrieved from a knowledge baseโ.
This method has many advantages over using a vanilla or fine-tuned LLM: to name a few, it allows to ground the answer on true facts and reduce confabulations, it allows to provide the LLM with domain-specific knowledge, and it allows fine-grained control of access to information from the knowledge base.
-
Now letโs say we want to perform RAG, but with the additional constraint that some parameters must be dynamically generated. For example, depending on the user query we could want to restrict the search to specific subsets of the knowledge base, or we could want to adjust the number of documents retrieved. The difficulty is: how to dynamically adjust these parameters based on the user query?
-
A frequent failure case of RAG is when the retrieval based on the user query does not return any relevant supporting documents. Is there a way to iterate by re-calling the retriever with a modified query in case the previous results were not relevant?
๐ง Well, we can solve the points above in a simple way: we will give our agent control over the retriever's parameters!
โก๏ธ Let's show how to do this. We first load a knowledge base on which we want to perform RAG: this dataset is a compilation of the documentation pages for many huggingface packages, stored as markdown.
Now we prepare the knowledge base by processing the dataset and storing it into a vector database to be used by the retriever. We are going to use LangChain, since it features excellent utilities for vector databases:
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/ipykernel_16932/1458839689.py:15: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFaceEmbeddings``. embedding_model = HuggingFaceEmbeddings(model_name="thenlper/gte-small")
Now that we have the database ready, letโs build a RAG system that answers user queries based on it!
We want our system to select only from the most relevant sources of information, depending on the query.
Our documentation pages come from the following sources:
['datasets-server', 'datasets', 'optimum', 'gradio', 'blog', 'course', 'hub-docs', 'pytorch-image-models', 'peft', 'evaluate', 'diffusers', 'hf-endpoints-documentation', 'deep-rl-class', 'transformers']
๐ Now let's build a RetrieverTool that our agent can leverage to retrieve information from the knowledge base.
Since we need to add a vectordb as an attribute of the tool, we cannot simply use the simple tool constructor with a @tool decorator: so we will follow the advanced setup highlighted in the advanced agents documentation.
Optional: Share your Retriever tool to Hub
To share your tool to the Hub, first copy-paste the code in the RetrieverTool definition cell to a new file named for instance retriever.py.
When the tool is loaded from a separate file, you can then push it to the Hub using the code below (make sure to login with a write access token)
Run the agent!
What happened here? First, the agent launched the retriever with specific sources in mind (['transformers', 'blog']).
But this retrieval did not yield enough results โ no problem! The agent could iterate on previous results, so it just re-ran its retrieval with less restrictive search parameters. Thus the research was successful!
Note that using an LLM agent that calls a retriever as a tool and can dynamically modify the query and other retrieval parameters is a more general formulation of RAG, which also covers many RAG improvement techniques like iterative query refinement.
3. ๐ป Debug Python code
Since the CodeAgent has a built-in Python code interpreter, we can use it to debug our faulty Python script!
As you can see, the agent tried the given code, gets an error, analyses the error, corrects the code and returns it after veryfing that it works!
And the final code is the corrected code:
numbers=[0, 1, 2]
for i in range(len(numbers)):
print(numbers[i])
โก๏ธ Conclusion
The use cases above should give you a glimpse into the possibilities of our Agents framework!
For more advanced usage, read the documentation.
All feedback is welcome, it will help us improve the framework! ๐