Working With Charts Graphs And Slide Decks
Copyright 2025 Google LLC.
Gemini API: Working with Charts, Graphs, and Slide Decks
Configure your API key
To run the following cell, your API key must be stored in a Colab Secret named GOOGLE_API_KEY. If you don't already have an API key, or you're not sure how to create a Colab Secret, see Authentication for an example.
Setup
You will be using images from Priyanka Vergadia's GCPSketchnote repository. These pages contain many details that should provide a good benchmark for Gemini's capabilities.
These images are on Creative Commons Attribution 4.0 International Public License.
fatal: destination path 'GCPSketchnote' already exists and is not an empty directory.
72
Interpreting a single chart
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 129k 100 129k 0 0 739k 0 --:--:-- --:--:-- --:--:-- 742k
The image needs to be transformed into a .jpg, since .gif is not supported by Gemini API at the moment.
Now, you will define helper functions for shrinking the image and querying the model with images.
NOTE: In this example you will be using Pillow library to load images, but using Image from IPython.display, or using dictionary with mime_type and data fields will also work.
Now, let's see how the LLM can handle the following query.
Extracting information from a single slide
You will use the model to extract information from a single slide. In this case graph describing pub/sub. It is not a complicated usecase, however it will showcase, how you can call the model.
You need to download an example chart.
Start with something simple:
You can also use it to extract information from specific parts of the image:
Slide Decks
While most models can receive only a handful images at once, The Gemini Flash model is able to receive up to 3,600 images in a single request. This means that most slide decks can be passed without any splitting to the model.
In this case you will use the LLM to create a set of questions that check the knowledge of GCP products:
Summary
The Gemini API's great capabilities in processing images such as charts, graphs, and slide decks highlights the power of multimodal LLMs. Thanks to the model's ability to read and understand these visual elements, everyone can unlock great ideas, simplify tasks, and save valuable time.
Imagine the impact of leveraging Gemini API to implement AI solutions that describe surroundings for the disabled community, making technology more inclusive and accessible to all.
This is just one of the exciting possibilities. Now, it's your turn to explore Gemini further!