Anthropic Prompt Caching

Prompt Caching

Export

Run Notebooks

idle

Contents

No cells yet

Add cells to see them here

Prompt caching through the Claude API

Prompt caching allows you to store and reuse context within your prompt. This makes it more practical to include additional information in your prompt—such as detailed instructions and example responses—which help improve every response Claude generates.

In addition, by fully leveraging prompt caching within your prompt, you can reduce latency by >2x and costs up to 90%. This can generate significant savings when building solutions that involve repetitive tasks around detailed book_content.

In this cookbook, we will demonstrate how to use prompt caching in a single turn and across a multi-turn conversation.

Setup

First, let's set up our environment with the necessary imports and initializations:

[3]

Note: you may need to restart the kernel to use updated packages.

[16]

Now let's fetch some text content to use in our examples. We'll use the text from Pride and Prejudice by Jane Austen which is around ~187,000 tokens long.

[ ]

Example 1: Single turn

Let's demonstrate prompt caching with a large document, comparing the performance and cost between cached and non-cached API calls.

Part 1: Non-cached API Call (Baseline)

First, let's make a truly non-cached API call without the cache_control parameter. This will establish our baseline performance.

We'll ask for a short output to keep response generation time low, since prompt caching only affects input processing time.

[17]

Non-cached API call time: 6.86 seconds
Input tokens: 187363
Output tokens: 8

Response:
Pride and Prejudice

Part 2: First Cached API Call (Cache Creation)

Now let's enable prompt caching by adding cache_control: {"type": "ephemeral"} to the book content.

Important: The first call with cache_control will create the cache entry. This initial call will have similar timing to the non-cached call because it still needs to process all tokens. However, it will store them in the cache for future use.

Look for the cache_creation_input_tokens field in the usage stats to see how many tokens were cached.

[18]

First cached API call time: 5.96 seconds
Input tokens: 16
Output tokens: 8
Cache creation tokens: 187347

Response:
Pride and Prejudice

Note: This first call creates the cache but doesn't benefit from it yet - timing is similar to non-cached call.

Part 3: Second Cached API Call (Cache Hit)

Now let's make another API call with the same cache_control parameter. Since the cache was created in Part 2, this call will read from the cache instead of processing all tokens again.

This is where you see the real performance benefit! Look for the cache_read_input_tokens field in the usage stats.

[19]

Second cached API call time: 3.66 seconds
Input tokens: 16
Output tokens: 8
Cache read tokens: 187347

Response:
Pride and Prejudice

======================================================================
PERFORMANCE COMPARISON
======================================================================
Non-cached call:       6.86s
First cached call:     5.96s (creates cache)
Second cached call:    3.66s (reads from cache)

Speedup from caching:  1.9x faster!
======================================================================

Summary of Example 1

This example demonstrated three distinct scenarios:

Non-cached call - Without cache_control, Claude processes all ~187k tokens normally
First cached call - With cache_control, Claude processes all tokens AND stores them in cache (similar timing to non-cached)
Second cached call - With cache_control, Claude reads from the existing cache (2-10x faster!)

The key insight: Prompt caching requires two calls to show benefits

The first call with cache_control creates the cache entry
Subsequent calls with the same cache_control read from the cache for dramatic speedups

This is especially valuable for:

Large documents or codebases that remain constant across multiple queries
System prompts with detailed instructions
Multi-turn conversations (as shown in Example 2 below)

Example 2: Multi-turn Conversation with Incremental Caching

Now, let's look at a multi-turn conversation where we add cache breakpoints as the conversation progresses.

[21]

Turn 1:
User: What is the title of this novel?
Assistant: The title of this novel is **Pride and Prejudice** by Jane Austen.
User input tokens: 3
Output tokens: 22
Input tokens (cache read): 0
Input tokens (cache write): 187360
0.0% of input prompt cached (3 tokens)
Time taken: 5.79 seconds

Turn 2:
User: Who are Mr. and Mrs. Bennet?
Assistant: Mr. and Mrs. Bennet are the parents of five daughters: Jane, Elizabeth, Mary, Kitty, and Lydia.

**Mr. Bennet** is described as an intelligent, sarcastic man with "quick parts, sarcastic humour, reserve, and caprice." He tends to be detached and ironic, often amusing himself at his wife's expense, and prefers to spend time in his library rather than deal with family matters.

**Mrs. Bennet** is described as "a woman of mean understanding, little information, and uncertain temper." She is nervous, excitable, and foolish, with her main goal in life being to get her daughters married. She lacks the intelligence and social graces of her husband and is often oblivious to her own impropriety.

Their contrasting personalities create much of the domestic tension and comedy in the novel. Mr. Bennet married Mrs. Bennet when he was "captivated by youth and beauty," but her weak understanding soon ended any real affection he had for her, leaving him to cope with his disappointment through ironic detachment.
User input tokens: 3
Output tokens: 247
Input tokens (cache read): 187360
Input tokens (cache write): 36
100.0% of input prompt cached (187363 tokens)
Time taken: 8.29 seconds

Turn 3:
User: What is Netherfield Park?
Assistant: **Netherfield Park** is a large estate in Hertfordshire that is rented by Mr. Bingley at the beginning of the novel.

It is located about three miles from Longbourn, the Bennet family's home, making it conveniently close for social visits. The estate becomes the center of much excitement and speculation when Mr. Bingley, a wealthy young bachelor, takes up residence there.

Key points about Netherfield:

- It's described as a good house with pleasant grounds
- Mr. Bingley rents it (rather than owning it), as he has not yet purchased an estate of his own
- His sisters, Caroline Bingley and Mrs. Hurst, live with him there, along with Mrs. Hurst's husband
- Mr. Darcy, Bingley's close friend, is a frequent visitor
- The famous ball where Elizabeth and Darcy have their first significant interactions takes place at Netherfield
- Jane Bennet stays there when she falls ill after riding over in the rain, which allows Elizabeth to visit and spend time at the estate

Netherfield Park is important to the plot as it brings the wealthy Mr. Bingley (and Mr. Darcy) into the neighborhood, setting the main romantic storylines in motion.
User input tokens: 3
Output tokens: 293
Input tokens (cache read): 187396
Input tokens (cache write): 258
100.0% of input prompt cached (187399 tokens)
Time taken: 10.14 seconds

Turn 4:
User: What is the main theme of this novel?
Assistant: The main themes of **Pride and Prejudice** include:

**1. Pride and Prejudice (as the title suggests)**
- The novel explores how pride and prejudice create misunderstandings and obstacles to happiness
- **Darcy's pride** in his social status initially leads him to insult Elizabeth and look down on her family
- **Elizabeth's prejudice** against Darcy (based on first impressions and Wickham's lies) blinds her to his true character
- Both must overcome these flaws to find happiness together

**2. Class and Social Status**
- The rigid class distinctions of Regency England and their impact on relationships and marriage prospects
- The tension between wealth, birth, and merit as measures of a person's worth
- Lady Catherine's objections to Elizabeth based on her "inferior" connections

**3. Marriage and Economics**
- The novel examines different motivations for marriage: love (Jane and Bingley), practicality (Charlotte and Mr. Collins), lust and recklessness (Lydia and Wickham), and the ideal combination of respect, affection, and compatibility (Elizabeth and Darcy)
- The economic pressures on women to marry well, especially with the entailment of the Bennet estate

**4. Self-Knowledge and Personal Growth**
- Elizabeth's journey from prejudice to understanding
User input tokens: 3
Output tokens: 300
Input tokens (cache read): 187654
Input tokens (cache write): 305
100.0% of input prompt cached (187657 tokens)
Time taken: 9.53 seconds

As you can see in this example, response times decreased from nearly 24 seconds to just 7-11 seconds after the initial cache setup, while maintaining the same level of quality across the answers. Most of this remaining latency is due to the time it takes to generate the response, which is not affected by prompt caching.

And since nearly 100% of input tokens were cached in subsequent turns as we kept adjusting the cache breakpoints, we were able to read the next user message nearly instantly.