Crop Tool

claude-cookbooksmultimodal

Giving Claude a Crop Tool for Better Image Analysis

When Claude analyzes images, it sees the entire image at once. For detailed tasks—like reading small text, comparing similar values in a chart, or examining fine details—this can be limiting.

The solution: Give Claude a tool that lets it "zoom in" by cropping regions of interest.

This notebook shows how to build a simple crop tool and demonstrates when it's useful.

When is a Crop Tool Useful?

  • Charts and graphs: Comparing bars/lines that are close in value, reading axis labels
  • Documents: Reading small text, examining signatures or stamps
  • Technical diagrams: Following wires/connections, reading component labels
  • Dense images: Any image where details are small relative to the whole

Setup

[1]
Note: you may need to restart the kernel to use updated packages.
[28]

Load an Example Chart

We'll use a chart from the FigureQA dataset to demonstrate.

[3]
Question: Is Cyan the minimum?
Answer: Yes.
Output

Define the Crop Tool

The crop tool uses normalized coordinates (0-1) so Claude doesn't need to know the image dimensions:

  • (0, 0) = top-left corner
  • (1, 1) = bottom-right corner
  • (0.5, 0.5) = center
[4]

Let's test the crop tool manually:

[24]
Cropped to (0.00,0.00)-(0.40,0.35): 167x140px
Output

The Agentic Loop

Now we connect everything: send the image to Claude with the crop tool available, and handle tool calls in a loop until Claude provides a final answer.

[32]

Demo: Chart Analysis

Let's ask Claude to analyze our chart. Watch how it uses the crop tool to examine specific regions.

[33]
Question: Is Cyan the minimum?

Output

Claude's analysis:

[Assistant] I'll help you answer whether Cyan is the minimum in this pie chart. Let me examine the image more closely.
[Tool] crop_image({'x1': 0.0, 'y1': 0.0, 'x2': 0.3, 'y2': 0.3})
[Tool] crop_image({'x1': 0.3, 'y1': 0.3, 'x2': 0.7, 'y2': 0.7})
OutputOutput
[Assistant] Based on my examination of the pie chart, I can see the relative sizes of each segment:

1. **Royal Blue** - appears to be the largest segment, taking up roughly 35-40% of the pie
2. **Peru** (brown/orange) - appears to be the second largest, roughly 25-30%
3. **Red** - appears to be third, roughly 20%
4. **Light Slate** (gray) - appears to be fourth, roughly 15%
5. **Cyan** - appears to be the smallest segment, roughly 5-10%

**Answer: Yes, Cyan is the minimum.** 

Cyan has the smallest slice in the pie chart, making it the minimum value among all five categories shown.

Ground truth: Yes.

Try Another Example

[45]
Question: Is Forest Green greater than Medium Orchid?

Output

Claude's analysis:

[Assistant] I'll help you answer whether Forest Green is greater than Medium Orchid. Let me first examine the image to identify these colors and their values.
[Tool] crop_image({'x1': 0, 'y1': 0, 'x2': 0.3, 'y2': 1})
[Tool] crop_image({'x1': 0, 'y1': 0.85, 'x2': 1, 'y2': 1})
OutputOutput
[Assistant] Based on the chart, I can see:

- **Forest Green** (the first bar on the left, dark green color) has a value of approximately 35
- **Medium Orchid** (the second bar, purple/orchid color) has a value of approximately 34

**Answer: Yes, Forest Green is greater than Medium Orchid.**

Forest Green has a slightly higher value (around 35) compared to Medium Orchid (around 34), though the difference is quite small.

Ground truth: Yes.

Summary

The crop tool pattern is simple but powerful:

  1. Define a tool that takes normalized bounding box coordinates
  2. Return the cropped image as base64 in the tool result
  3. Let Claude decide when and where to crop

This works because Claude can see the full image first, identify regions that need closer inspection, and iteratively zoom in.

Alternative: Using the Claude Agent SDK

The Claude Agent SDK provides a cleaner way to define tools using Python decorators and handles the agentic loop automatically.

[11]
Note: you may need to restart the kernel to use updated packages.
[12]
[13]
Question: Is Cyan the minimum?

[Assistant] I'll first read the image to understand its content, then examine specific regions if needed.
[Tool] Read({'file_path': 'chart.png'})
Output
[Assistant] Looking at this pie chart, I can clearly see the different segments and their relative sizes. Let me crop the area showing the Cyan segment to examine it more closely.
[Tool] mcp__crop__crop_image({'image_path': 'chart.png', 'x1': 0.4, 'y1': 0.6, 'x2': 0.7, 'y2': 0.9})
Output
[Assistant] Now I can clearly analyze the chart. Looking at the pie chart:

**Yes, Cyan is the minimum.**

The pie chart shows 5 categories with the following relative sizes (from largest to smallest):
1. **Royal Blue** - the largest segment (takes up roughly half the pie)
2. **Peru** (tan/brown) - second largest
3. **Red** - medium-sized segment
4. **Light Slate** (gray) - smaller segment
5. **Cyan** - the smallest segment

The Cyan segment is clearly the thinnest slice of the pie, making it the minimum value among all the categories shown.