Run Colab

Open in Colab

Run OpenAI gpt-oss 20B in a FREE Google Colab

OpenAI released gpt-oss 120B and 20B. Both models are Apache 2.0 licensed.

Specifically, gpt-oss-20b was made for lower latency and local or specialized use cases (21B parameters with 3.6B active parameters).

Since the models were trained in native MXFP4 quantization it makes it easy to run the 20B even in resource constrained environments like Google Colab.

Authored by: Pedro and VB

Setup environment

Since support for mxfp4 in transformers is bleeding edge, we need a recent version of PyTorch and CUDA, in order to be able to install the mxfp4 triton kernels.

We also need to install transformers from source, and we uninstall torchvision and torchaudio to remove dependency conflicts.

[ ]
[ ]
[ ]

Please, restart your Colab runtime session after installing the packages above.

Load the model from Hugging Face in Google Colab

We load the model from here: openai/gpt-oss-20b

[ ]

Setup messages/ chat

You can provide an optional system prompt or directly the input.

[ ]

Specify Reasoning Effort

Simply pass it as an additional argument to apply_chat_template(). Supported values are "low", "medium" (default), or "high".

[ ]

Try out other prompts and ideas!

Check out our blogpost for other ideas: https://hf.co/blog/welcome-openai-gpt-oss

[ ]