Notebooks
G
Google Gemini
Working With Charts Graphs And Slide Decks

Working With Charts Graphs And Slide Decks

gemini-cookbookgemini-apiexamplesgemini
Copyright 2025 Google LLC.
[23]

Gemini API: Working with Charts, Graphs, and Slide Decks

Gemini models are powerful multimodal LLMs that can process both text and image inputs.

This notebook shows how Gemini Flash model is capable of extracting data from various images.

[24]
[25]

Configure your API key

To run the following cell, your API key must be stored in a Colab Secret named GOOGLE_API_KEY. If you don't already have an API key, or you're not sure how to create a Colab Secret, see Authentication for an example.

[26]

Setup

You will be using images from Priyanka Vergadia's GCPSketchnote repository. These pages contain many details that should provide a good benchmark for Gemini's capabilities.

These images are on Creative Commons Attribution 4.0 International Public License.

[27]
fatal: destination path 'GCPSketchnote' already exists and is not an empty directory.
[28]
[29]
72

Interpreting a single chart

[30]
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  129k  100  129k    0     0   739k      0 --:--:-- --:--:-- --:--:--  742k

The image needs to be transformed into a .jpg, since .gif is not supported by Gemini API at the moment.

[31]

Now, you will define helper functions for shrinking the image and querying the model with images.

NOTE: In this example you will be using Pillow library to load images, but using Image from IPython.display, or using dictionary with mime_type and data fields will also work.

[32]
[33]
MODEL_ID
[34]
Output

Now, let's see how the LLM can handle the following query.

[35]

Extracting information from a single slide

You will use the model to extract information from a single slide. In this case graph describing pub/sub. It is not a complicated usecase, however it will showcase, how you can call the model.

You need to download an example chart.

[36]
[37]
Output

Start with something simple:

[38]

You can also use it to extract information from specific parts of the image:

[39]

Slide Decks

While most models can receive only a handful images at once, The Gemini Flash model is able to receive up to 3,600 images in a single request. This means that most slide decks can be passed without any splitting to the model.

In this case you will use the LLM to create a set of questions that check the knowledge of GCP products:

[40]

Summary

The Gemini API's great capabilities in processing images such as charts, graphs, and slide decks highlights the power of multimodal LLMs. Thanks to the model's ability to read and understand these visual elements, everyone can unlock great ideas, simplify tasks, and save valuable time.

Imagine the impact of leveraging Gemini API to implement AI solutions that describe surroundings for the disabled community, making technology more inclusive and accessible to all.

This is just one of the exciting possibilities. Now, it's your turn to explore Gemini further!