Google Gemini Get Started LiveAPI

Get Started LiveAPI

quickstartsgemini-cookbookgemini-apigemini

alph-notebooks/gemini-cookbook / Get_started_LiveAPI.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Copyright 2025 Google LLC.

[ ]

Multimodal Live API - Quickstart

Preview: The Live API is in preview.

This notebook demonstrates simple usage of the Gemini Multimodal Live API. For an overview of new capabilities refer to the Gemini Live API docs.

This notebook implements a simple turn-based chat where you send messages as text, and the model replies with audio. The API is capable of much more than that. The goal here is to demonstrate with simple code.

Some features of the API are not working in Colab, to try them it is recommended to have a look at this Python script and run it locally.

If you aren't looking for code, and just want to try multimedia streaming use Live API in Google AI Studio.

The Next steps section at the end of this tutorial provides links to additional resources.

Native audio output

Info: Gemini 2.5 introduces native audio generation, which directly generates audio output, providing a more natural sounding audio, more expressive voices, more awareness of additional context, e.g., tone, and more proactive responses. You can try a native audio example in this script.

Setup

Install SDK

The new Google Gen AI SDK provides programmatic access to Gemini 2.5 (and previous models) using both the Google AI for Developers and Vertex AI APIs. With a few exceptions, code that runs on one platform will run on both.

More details about this new SDK on the documentation or in the Getting started notebook.

[1]

Note: you may need to restart the kernel to use updated packages.


[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip

Set up your API key

To run the following cell, your API key must be stored in a Colab Secret named GOOGLE_API_KEY. If you don't already have an API key, or you're not sure how to create a Colab Secret, see Authentication for an example.

[ ]

Initialize SDK client

The client will pick up your API key from the environment variable.

[26]

Select a model

The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output.

[27]

MODEL

Import

Import all the necessary modules.

[28]

Text to Text

The simplest way to use the Live API is as a text-to-text chat interface, but it can do a lot more than this.

[29]

>  Hello? Gemini are you there? 

- Hello
-  there! I am indeed here. How can I help you today?

Simple text to audio

The simplest way to playback the audio in Colab, is to write it out to a .wav file. So here is a simple wave file writer:

[30]

The next step is to tell the model to return audio by setting "response_modalities": ["AUDIO"] in the LiveConnectConfig.

When you get a response from the model, then you write out the data to a .wav file.

[31]

>  Hello? Gemini are you there? 

audio/pcm;rate=24000
................

Towards Async Tasks

The real power of the Live API is that it's real time, and interruptable. You can't get that full power in a simple sequence of steps. To really use the functionality you will move the send and recieve operations (and others) into their own async tasks.

Because of the limitations of Colab this tutorial doesn't totally implement the interactive async tasks, but it does implement the next step in that direction:

It separates the send and receive, but still runs them sequentially.
In the next tutorial you'll run these in separate async tasks.

Setup a quick logger to make debugging easier (switch to setLevel('DEBUG') to see debugging messages).

[32]

The class below implements the interaction with the Live API.

[33]

There are 3 methods worth describing here:

run - The main loop

This method:

Opens a websocket connecting to the Live API.
Calls the initial setup method.
Then enters the main loop where it alternates between send and recv until send returns False.
The next tutorial will demonstrate how to stream media and run these asynchronously.

send - Sends input text to the api

The send method collects input text from the user, wraps it in a client_content message (an instance of BidiGenerateContentClientContent), and sends it to the model.

If the user sends a q this method returns False to signal that it's time to quit.

recv - Collects audio from the API and plays it

The recv method collects audio chunks in a loop and writes them to a .wav file. It breaks out of the loop once the model sends a turn_complete method, and then plays the audio.

To keep things simple in Colab it collects all the audio before playing it. Other examples demonstrate how to play audio as soon as you start to receive it (using PyAudio), and how to interrupt the model (implement input and audio playback on separate tasks).

Run

Run it:

[34]

message > Hello
audio/pcm;rate=24000
....................
<Turn complete>

message > What's your name?
audio/pcm;rate=24000
..........
<Turn complete>

Working with resumable sessions

Session resumption allows you to return to a previous interaction with the Live API by sending the last session handle you got from the previous session.

When you set your session to be resumable, the session information keeps stored on the Live API for up to 24 hours. In this time window, you can resume the conversation and refer to previous information you have shared with the model.

Helper functions

Start by creating the helper functions for your resumable interaction with the Live API. It will include:

[37]

Now you can start interacting with the Live API (type q to finish the conversation):

[38]

{
  "session_resumption_update": {}
}
Hello there! How can I help you today?{
  "server_content": {
    "generation_complete": true
  }
}
{
  "server_content": {
    "turn_complete": true
  },
  "usage_metadata": {
    "prompt_token_count": 9,
    "response_token_count": 10,
    "total_token_count": 19,
    "prompt_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 9
      }
    ],
    "response_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 10
      }
    ]
  }
}
{
  "session_resumption_update": {
    "new_handle": "Cig2N3lqa3d3MXd4eHFoeDk3cnhmeHUydjlhdHN2cms1bDRnc3c0N2Zq",
    "resumable": true
  }
}
1:00
{
  "session_resumption_update": {}
}
The capital of Brazil is **Brasília**.{
  "server_content": {
    "generation_complete": true
  }
}
{
  "server_content": {
    "turn_complete": true
  },
  "usage_metadata": {
    "prompt_token_count": 36,
    "response_token_count": 9,
    "total_token_count": 45,
    "prompt_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 36
      }
    ],
    "response_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 9
      }
    ]
  }
}
{
  "session_resumption_update": {
    "new_handle": "Cig0ZDR1OTViNHVjOWh6aGJvMmhwdWk3NzJiZWRwYW91bnNtajgxZHN1",
    "resumable": true
  }
}

With the session resumption you have the session handle to refer to your previous sessions. In this example, the handle is saved at the last_handle variable as below:

[39]

'Cig0ZDR1OTViNHVjOWh6aGJvMmhwdWk3NzJiZWRwYW91bnNtajgxZHN1'

Now you can start a new Live API session, but this time pointing to a handle from a previous session. Also, to test you could gather information from the previous session, you will ask the model what was the second question you asked before (in this example, it was "what is the capital of Brazil?"). You can see the Live API recovering that information:

[40]

{
  "session_resumption_update": {}
}
The last question you asked was: "what is the capital of brazil?"{
  "server_content": {
    "generation_complete": true
  }
}
{
  "server_content": {
    "turn_complete": true
  },
  "usage_metadata": {
    "prompt_token_count": 63,
    "response_token_count": 15,
    "total_token_count": 78,
    "prompt_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 63
      }
    ],
    "response_tokens_details": [
      {
        "modality": "TEXT",
        "token_count": 15
      }
    ]
  }
}
{
  "session_resumption_update": {
    "new_handle": "CihyNDg4YTkxanl5cThzYmo4a29lMHRveDJlY3U1amRyNHlqeWF0bWU2",
    "resumable": true
  }
}

Next steps

This tutorial just shows basic usage of the Live API, using the Python GenAI SDK.

If you aren't looking for code, and just want to try multimedia streaming use Live API in Google AI Studio.
If you want to see how to setup streaming interruptible audio and video using the Live API see the Audio and Video input Tutorial.
If you're interested in the low level details of using the websockets directly, see the websocket version of this tutorial.
Try the Tool use in the live API tutorial for an walkthrough of Gemini-2.5's new use capabilities.
There is a Streaming audio in Colab example, but this is more of a demo, it's not optimized for readability.
Other nice Gemini 2.5 examples can also be found in the Cookbook's example directory, in particular the video understanding and the spatial understanding ones.