GPT-5 New Params and Tools
We’re introducing new developer controls in the GPT-5 series that give you greater control over model responses—from shaping output length and style to enforcing strict formatting. Below is a quick overview of the latest features:
| # | Feature | Overview | Values / Usage |
|---|---|---|---|
| 1. | Verbosity Parameter | Lets you hint the model to be more or less expansive in its replies. Keep prompts stable and use the parameter instead of re-writing. | • low → terse UX, minimal prose. • medium (default) → balanced detail. • high → verbose, great for audits, teaching, or hand-offs. |
| 2. | Freeform Function Calling | Generate raw text payloads—anything from Python scripts to SQL queries—directly to your custom tool without JSON wrapping. Offers greater flexibility for external runtimes like: • Code sandboxes (Python, C++, Java, …) • SQL databases • Shell environments • Config generators | Use when structured JSON isn’t needed and raw text is more natural for the target tool. |
| 3. | Context-Free Grammar (CFG) | A set of production rules defining valid strings in a language. Each rule rewrites a non-terminal into terminals and/or other non-terminals, independent of surrounding context. Useful for constraining output to match the syntax of programming languages or custom formats in OpenAI tools. | Use as a contract to ensure the model emits only valid strings accepted by the grammar. |
| 4. | Minimal Reasoning | Runs GPT-5 with few or no reasoning tokens to minimize latency and speed time-to-first-token. Ideal for deterministic, lightweight tasks (extraction, formatting, short rewrites, simple classification) where explanations aren’t needed. If not specified, effort defaults to medium. | Set reasoning effort: "minimal". Avoid for multi-step planning or tool-heavy workflows. |
Supported Models:
- gpt-5
- gpt-5-mini
- gpt-5-nano
Supported API Endpoints
- Responses API
- Chat Completions API
Note: We recommend to use Responses API with GPT-5 series of model to get the most performance out of the models.
Prerequisites
Let's begin with updating your OpenAI SDK that supports the new params and tools for GPT-5. Make sure you've set OPENAI_API_KEY as an environment variable.
openai 1.99.2 pandas 2.3.1
1. Verbosity Parameter
1.1 Overview
The verbosity parameter lets you hint the model to be more or less expansive in its replies.
Values: "low", "medium", "high"
- low → terse UX, minimal prose.
- medium (default) → balanced detail.
- high → verbose, great for audits, teaching, or hand-offs.
Keep prompts stable and use the param rather than re-writing.
The output tokens scale roughly linearly with verbosity: low (560) → medium (849) → high (1288).
2.3 Using Verbosity for Coding Use Cases
The verbosity parameter also influences the length and complexity of generated code, as well as the depth of accompanying explanations. Here's an example, wherein we use various verboisty levels for a task to generate a Python program that sorts an array of 1000000 random numbers.
--------------------------------
Verbosity: low
Output:
```python
#!/usr/bin/env python3
import random
import time
def main():
N = 1_000_000
arr = [random.random() for _ in range(N)]
t0 = time.perf_counter()
arr.sort()
t1 = time.perf_counter()
print(f"Sorted {N} numbers in {t1 - t0:.4f} seconds")
print("First 10:", arr[:10])
print("Last 10:", arr[-10:])
if __name__ == "__main__":
main()
```
Tokens => input: 21 | output: 575
Notice that the code output is a plain script. Now, lets run with 'medium'
--------------------------------
Verbosity: medium
Output:
Here's a simple Python script that generates 1,000,000 random numbers, sorts them using the built-in Timsort, and reports timings and a small sample of the sorted output:
```python
#!/usr/bin/env python3
import random
import time
def main():
N = 1_000_000
random.seed(42) # remove or change for different runs
t0 = time.perf_counter()
data = [random.random() for _ in range(N)]
t1 = time.perf_counter()
data.sort()
t2 = time.perf_counter()
# Basic verification and sample output
is_sorted = all(data[i] <= data[i+1] for i in range(len(data)-1))
print(f"Generated {N} random numbers in {t1 - t0:.3f} seconds")
print(f"Sorted in {t2 - t1:.3f} seconds")
print("Sorted check:", is_sorted)
print("First 10 values:", data[:10])
print("Last 10 values:", data[-10:])
if __name__ == "__main__":
main()
```
Notes:
- This uses Python's built-in list sort (Timsort), which is efficient for general-purpose sorting.
- If you need more memory- and performance-efficient numeric operations on large arrays, consider using NumPy (numpy.random.random and numpy.sort).
Tokens => input: 21 | output: 943
Medium verboisty, generated richer code with additioanl explanations. Let's do the same with high.
--------------------------------
Verbosity: high
Output:
Here's a single, self-contained Python program that generates 1,000,000 random numbers and sorts them. It supports two backends: the built-in Python list sort (Timsort) and NumPy (if you have NumPy installed). It measures and prints the time taken for generation, sorting, and verification.
Copy the code into a file (for example sort_random.py) and run it. By default it uses the pure Python backend; pass --backend numpy to use NumPy.
Note: Sorting a million Python floats uses a moderate amount of memory (Python floats and list overhead). NumPy will typically be faster and use less overhead but requires the numpy package.
Program:
import time
import random
import argparse
import sys
def is_sorted_list(a):
# Linear check for sortedness
return all(a[i] <= a[i+1] for i in range(len(a)-1))
def main():
parser = argparse.ArgumentParser(description="Generate and sort random numbers.")
parser.add_argument("--n", type=int, default=1_000_000, help="Number of random numbers (default: 1,000,000)")
parser.add_argument("--backend", choices=["python", "numpy"], default="python",
help="Sorting backend to use: 'python' (default) or 'numpy'")
parser.add_argument("--seed", type=int, default=0, help="Random seed (default: 0)")
parser.add_argument("--sample", type=int, default=10, help="How many sample elements to print (default: 10)")
args = parser.parse_args()
n = args.n
backend = args.backend
seed = args.seed
sample = args.sample
print(f"Generating {n:,} random numbers using backend: {backend!r}, seed={seed}")
random.seed(seed)
if backend == "python":
# Generate list of floats in Python
t0 = time.perf_counter()
data = [random.random() for _ in range(n)]
t1 = time.perf_counter()
gen_time = t1 - t0
print(f"Generated {n:,} numbers in {gen_time:.4f} s")
if sample > 0:
print("Sample before sort:", data[:sample])
# Sort in-place
t0 = time.perf_counter()
data.sort()
t1 = time.perf_counter()
sort_time = t1 - t0
print(f"Sorted {n:,} numbers in {sort_time:.4f} s (Python list.sort)")
if sample > 0:
print("Sample after sort: ", data[:sample])
# Verify sortedness
t0 = time.perf_counter()
ok = is_sorted_list(data)
t1 = time.perf_counter()
verify_time = t1 - t0
print(f"Verified sortedness in {verify_time:.4f} s -> {'OK' if ok else 'NOT SORTED'}")
else: # numpy backend
try:
import numpy as np
except ImportError:
print("NumPy is not installed. Install it with 'pip install numpy' or use the python backend.")
sys.exit(1)
# Use the new Generator API for reproducible generation
rng = np.random.default_rng(seed)
t0 = time.perf_counter()
data = rng.random(n) # numpy array of floats
t1 = time.perf_counter()
gen_time = t1 - t0
print(f"Generated {n:,} numbers in {gen_time:.4f} s (NumPy)")
if sample > 0:
print("Sample before sort:", data[:sample])
# Sort in-place using NumPy's sort
t0 = time.perf_counter()
data.sort() # in-place quicksort/mergesort (NumPy chooses default)
t1 = time.perf_counter()
sort_time = t1 - t0
print(f"Sorted {n:,} numbers in {sort_time:.4f} s (NumPy sort)")
if sample > 0:
print("Sample after sort: ", data[:sample])
# Verify sortedness
t0 = time.perf_counter()
ok = np.all(np.diff(data) >= 0)
t1 = time.perf_counter()
verify_time = t1 - t0
print(f"Verified sortedness in {verify_time:.4f} s -> {'OK' if ok else 'NOT SORTED'}")
print("Done.")
if __name__ == "__main__":
main()
Usage examples:
- Pure Python (default):
python sort_random.py
- NumPy backend (if installed):
python sort_random.py --backend numpy
- Use a different size:
python sort_random.py --n 500000
Notes and tips:
- Pure Python uses random.random in a list comprehension, then list.sort(). Sorting a list of 1,000,000 Python floats is quite feasible but uses more memory than a NumPy array because of Python object overhead.
- NumPy's random generation and sorting are implemented in C and are typically much faster and more memory efficient for large numeric arrays.
- You can change the seed to get different random sequences, or omit seed for non-deterministic results.
- If you plan to sort data that doesn't fit in memory, consider external sorting approaches (merge sort with chunking to disk) or use specialized libraries.
Tokens => input: 21 | output: 2381
High verbosity yielded additional details and explanations.
1.3 Takeaways
The new verbosity parameter reliably scales both the length and depth of the model’s output while preserving correctness and reasoning quality - without changing the underlying prompt. In this example:
- Low verbosity produces a minimal, functional script with no extra comments or structure.
- Medium verbosity adds explanatory comments, function structure, and reproducibility controls.
- High verbosity yields a comprehensive, production-ready script with argument parsing, multiple sorting methods, timing/verification, usage notes, and best-practice tips.
2. Free‑Form Function Calling
2.1 Overview
GPT‑5 can now send raw text payloads - anything from Python scripts to SQL queries - to your custom tool without wrapping the data in JSON using the new tool "type": "custom". This differs from classic structured function calls, giving you greater flexibility when interacting with external runtimes such as:
- code_exec with sandboxes (Python, C++, Java, …)
- SQL databases
- Shell environments
- Configuration generators
Note that custom tool type does NOT support parallel tool calling.
2.2 Quick Start Example - Compute the Area of a Circle
The code below produces a simple python code to calculate area of a circle, and instruct the model to use the freeform tool call to output the result.
[ResponseReasoningItem(id='rs_6894e31b1f8081999d18325e5aeffcfe0861a2e1728d1664', summary=[], type='reasoning', content=[], encrypted_content=None, status=None), ResponseCustomToolCall(call_id='call_Gnqod2MwPvayp2JdNyA0z0Ah', input='# Counting \'r\'s in the word "strawberry" and computing circle area with that radius\nimport math\nr = "strawberry".count(\'r\')\narea = math.pi * r**2\n{"radius": r, "area": area, "area_exact": f"{r}*pi"}', name='code_exec', type='custom_tool_call', id='ctc_6894e31c66f08199abd622bb5ac3c4260861a2e1728d1664', status='completed')]
The model emits a tool call containing raw Python. You execute that code server‑side, capture the printed result, and send it back in a follow‑up responses.create call.
2.3 Mini‑Benchmark – Sorting an Array in Three Languages
To illustrate the use of free form tool calling, we will ask GPT‑5 to:
- Generate Python, C++, and Java code that sorts a fixed array 10 times.
- Print only the time (in ms) taken for each iteration in the code.
- Call all three functions, and then stop
--- tool name ---
code_exec_python
--- tool call argument (generated code) ---
import time
arr = [448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949]
start = time.perf_counter()
for _ in range(10):
b = arr[:] # copy
b.sort()
elapsed_ms = int((time.perf_counter() - start) * 1000)
print(elapsed_ms, end="")
--- tool name ---
code_exec_cpp
--- tool call argument (generated code) ---
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
using namespace std;
int main() {
vector<int> a = {448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949};
auto start = chrono::high_resolution_clock::now();
for (int i = 0; i < 10; ++i) {
auto b = a;
sort(b.begin(), b.end());
}
auto end = chrono::high_resolution_clock::now();
auto ms = chrono::duration_cast<chrono::milliseconds>(end - start).count();
cout << ms;
return 0;
}
--- tool name ---
code_exec_java
--- tool call argument (generated code) ---
import java.util.*;
public class Main {
public static void main(String[] args) {
int[] a = new int[] {448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949};
long start = System.nanoTime();
for (int i = 0; i < 10; i++) {
int[] b = Arrays.copyOf(a, a.length);
Arrays.sort(b);
}
long elapsedMs = (System.nanoTime() - start) / 1_000_000L;
System.out.print(elapsedMs);
}
}
The model output three code blocks in Python, C++ and Java for the same algorithm. The output of the function call was chained back into the model as input to allow model to keep going until all the functions have been called exactly once.
2.4 Takeaways
Freeform tool calling in GPT-5 lets you send raw text payloads—such as Python scripts, SQL queries, or config files—directly to custom tools without JSON wrapping. This provides greater flexibility for interacting with external runtimes and allows the model to generate code or text in the exact format your tool expects. It’s ideal when structured JSON is unnecessary and natural text output improves usability.
3. Context‑Free Grammar (CFG)
3.1 Overview
A context‑free grammar is a collection of production rules that define which strings belong to a language. Each rule rewrites a non‑terminal symbol into a sequence of terminals (literal tokens) and/or other non‑terminals, independent of surrounding context—hence context‑free. CFGs can capture the syntax of most programming languages and, in OpenAI custom tools, serve as contracts that force the model to emit only strings that the grammar accepts.
3.2 Grammar Fundamentals
Supported Grammar Syntax
- Lark - https://lark-parser.readthedocs.io/en/stable/
- Regex - https://docs.rs/regex/latest/regex/#syntax
We use LLGuidance under the hood to constrain model sampling: https://github.com/guidance-ai/llguidance.
Unsupported Lark Features
- Lookaround in regexes (
(?=...),(?!...), etc.) - Lazy modifier (
*?,+?,??) in regexes. - Terminal priorities, templates, %declares, %import (except %import common).
Terminals vs Rules & Greedy Lexing
| Concept | Take-away |
|---|---|
| Terminals (UPPER) | Matched first by the lexer – longest match wins. |
| Rules (lower) | Combine terminals; cannot influence how text is tokenised. |
| Greedy lexer | Never try to “shape” free text across multiple terminals – you’ll lose control. |
Correct vs Incorrect Pattern Design
✅ One bounded terminal handles free‑text between anchors
start: SENTENCE
SENTENCE: /[A-Za-z, ]*(the hero|a dragon)[A-Za-z, ]*(fought|saved)[A-Za-z, ]*(a treasure|the kingdom)[A-Za-z, ]*\./
❌ Don’t split free‑text across multiple terminals/rules
start: sentence
sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/
3.3 Example - SQL Dialect — MS SQL vs PostgreSQL
The following code example is now the canonical reference for building multi‑dialect SQL tools with CFGs. It demonstrates:
- Two isolated grammar definitions (
mssql_grammar_definition,postgres_grammar_definition) encoding TOP vs LIMIT semantics. - How to prompt, invoke, and inspect tool calls in a single script.
- A side‑by‑side inspection of the assistant’s responses.
Define the LARK grammars for different SQL dialects
3.4 Generate specific SQL dialect
Let's define the prompt, and call the function to produce MS SQL dialect
--- MS SQL Query --- SELECT TOP 5 customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC;
The output SQL accurately uses "SELECT TOP" construct.
--- PG SQL Query --- SELECT customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC LIMIT 5;
Output highlights the same logical query - different physical syntax. Supply distinct grammars so the model can only produce valid statements for the chosen dialect.
| Dialect | Generated Query | Key Difference |
|---|---|---|
| MS SQL Server | SELECT TOP 5 customer_id, … ORDER BY order_date DESC; | Uses TOP N clause before column list. |
| PostgreSQL | SELECT customer_id, … ORDER BY order_date DESC LIMIT 5; | Uses LIMIT N after ORDER BY. |
3.5 Example - Regex CFG Syntax
The following code example demonstrates using the Regex CFG syntax to constrain the freeform tool call to a certain timestamp pattern.
--- Timestamp --- 2025-08-07 10:00
3.5 Best Practices
Lark grammars can be tricky to perfect. While simple grammars perform most reliably, complex grammars often require iteration on the grammar definition itself, the prompt, and the tool description to ensure that the model does not go out of distribution.
- Keep terminals bounded – use
/[^.\n]{0,10}*\./rather than/.*\./. Limit matches both by content (negated character class) and by length ({M,N}quantifier). - Prefer explicit char‑classes over
.wildcards. - Thread whitespace explicitly, e.g. using
SP = " ", instead of a global%ignore. - Describe your tool: tell the model exactly what the CFG accepts and instruct it to reason heavily about compliance.
Troubleshooting
- API rejects the grammar because it is too complex ➜ Simplify rules and terminals, remove
%ignore.*. - Unexpected tokens ➜ Confirm terminals aren’t overlapping; check greedy lexer.
- When the model drifts "out‑of‑distribution" (shows up as the model producing excessively long or repetitive outputs, it is syntactically valid but is semantically wrong):
- Tighten the grammar.
- Iterate on the prompt (add few-shot examples) and tool description (explain the grammar and instruct the model to reason to conform to it).
- Experiment with a higher reasoning effort (e.g, bump from medium to high).
Resources:
- Lark Docs – https://lark-parser.readthedocs.io/en/stable/
- Lark IDE – https://www.lark-parser.org/ide/
- LLGuidance Syntax – https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md
- Regex (Rust crate) – https://docs.rs/regex/latest/regex/#syntax
3.6 Takeaways
Context-Free Grammar (CFG) support in GPT-5 lets you strictly constrain model output to match predefined syntax, ensuring only valid strings are generated. This is especially useful for enforcing programming language rules or custom formats, reducing post-processing and errors. By providing a precise grammar and clear tool description, you can make the model reliably stay within your target output structure.
4. Minimal Reasoning
4.1 Overview
GPT-5 now support for a new minimal reasoning effort. When using minimal reasoning effort, the model will output very few or no reasoning tokens. This is designed for use cases where developers want a very fast time-to-first-user-visible token. Note: If no reasoning effort is supplied, the default value is medium.
-------------------------------- Output: positive
4.2 Takeaways
Minimal reasoning runs GPT-5 with few or no reasoning tokens to minimize latency and speed up time-to-first-token. Use it for deterministic, lightweight tasks (extraction, formatting, short rewrites, simple classification) where explanations aren’t needed. If you don’t specify effort, it defaults to medium—set minimal explicitly when you want speed over deliberation.