Export
Introduction
In this notebook, we'll explore Qwen3-VL's spatial reasoning capabilities using Together AI's API. We'll cover:
- Understanding spatial relationships between objects
- Perceiving object affordances (what actions are possible)
- Integrating spatial reasoning with action planning
These capabilities enable embodied AI applications like robotics and navigation.
Install required libraries
[ ]
[1]
Using model: Qwen/Qwen3-VL-32B-Instruct API Key configured: True
[2]
1. Understand Spatial Relationships Between Objects
After identifying objects, the model can understand their relative spatial positions.
[3]
Prompt: Which object, in relation to your current position, holds the farthest placement in the image? Answer options: A. chair B. plant C. window D. tv stand. Answer: The object that holds the farthest placement in the image, relative to the viewer's perspective, is: **C. window** Explanation: Looking at the room layout, the window is located at the back wall, directly opposite the viewer’s vantage point. The chair and table are in front of the window, closer to the viewer. The plant is on top of the TV stand, which is on the left side of the image and closer to the foreground. The TV stand itself is also closer than the window. Therefore, the window is the farthest object in the scene. ✅ Correct answer: **C. window**
2. Perceive Object Affordances
The model can understand what actions are enabled by specific parts of objects or empty space.
[4]
Prompt: Locate the free space on the white table on the right in this image. Output the point coordinates in JSON format.
Answer: ```json
[
{"point_2d": [907, 607], "label": "point_1"}
]
```
[5]
Prompt: Can the speaker fit behind the guitar? Answer: To determine if the speaker can fit behind the guitar, let’s analyze the spatial arrangement and dimensions based on the image. ### Observations: - The **guitar** is a **4-string bass guitar**, standing upright on a **black guitar stand**. - The **speaker** is a **small black rectangular box**, placed on a **white stool** next to the guitar. - The speaker is currently **to the left and slightly behind** the guitar from the viewer’s perspective. - The guitar stand has a **T-shaped base** and a **vertical support** that holds the guitar’s body. The neck of the guitar extends upward and slightly backward, leaning against the wall. ### Physical Constraints: - The **guitar stand** is positioned so that the guitar is **leaning against the wall**, meaning the space **directly behind the guitar’s body** is **occupied by the stand’s vertical support and the wall**. - The **speaker** is currently on a stool, which is **to the side** of the guitar, not directly behind it. - The **width** of the guitar body (about 10–12 inches) and the **depth** of the speaker (likely 5–7 inches) suggest that the speaker **could physically fit behind the guitar** if there were enough clearance. - However, the **guitar stand’s vertical post** is directly behind the guitar’s body, which would **block** the speaker from being placed directly behind the guitar. ### Possible Solutions: - If you **move the speaker to the stool** and **rotate the stool** so that the speaker is **behind the guitar**, it might work — **but only if the stool can fit** between the guitar stand and the wall. - Alternatively, **removing the speaker from the stool and placing it directly on the floor** behind the guitar might work, but again, the **stand’s post** would likely obstruct it. ### Conclusion: **No, the speaker cannot fit directly behind the guitar** as currently arranged, because the **guitar stand’s vertical support** occupies that space. The speaker is already positioned to the side and slightly behind, which is the most feasible location given the setup. ✅ **Final Answer: No, the speaker cannot fit behind the guitar due to the obstruction from the guitar stand.**
3. Integrate Spatial Reasoning and Action Planning
The model can synthesize spatial relationships and affordances to select correct actions, reasoning like an embodied agent.
[7]
Prompt: What color arrow should the robot follow to move the apple in between the green can and the orange? Choices: A. Red. B. Blue. C. Green. D. Orange. Answer: To determine which arrow the robot should follow to move the apple **between** the green can and the orange, let’s analyze the positions: - The **green can** is on the **left**. - The **orange** is on the **right**. - The **apple** is currently **to the right of the can** and **to the left of the orange** — meaning it’s **already between** them. Wait — looking at the arrows: - **Red arrow** points **left** (toward the can). - **Blue arrow** points **down/left** (toward the center, between can and orange). - **Green arrow** points **right** (toward the cup). But if the apple is **already between** the can and the orange, then **no movement is needed** — unless the question implies the apple is **not yet** between them. Looking at the image again: The apple is **slightly to the right of the can**, and the orange is further right. So yes — the apple is **already between** the can and the orange. But perhaps the question assumes the apple is **not yet** positioned correctly? Or maybe the robot is holding the apple and needs to place it between them. The robot’s arm is holding the apple, and the arrows are pointing from the apple’s current position. - **Red arrow** → left → toward the can → would place it **too close to the can**, possibly even to the left of it. - **Blue arrow** → down and slightly left → toward the center between can and orange → **correct position**. - **Green arrow** → right → toward the cup → would place it **to the right of the orange**, not between. So, to place the apple **between** the green can and the orange, the robot should follow the **blue arrow**. ✅ **Correct answer: B. Blue.**
[8]
Prompt: Which motion can help change the coffee pod? Choices: A. A. B. B. C. C. D. D. Answer: To change the coffee pod in an espresso machine like the one shown, you typically need to access the **portafilter**, which holds the coffee pod or ground coffee. The portafilter is inserted into the group head (the part where coffee is brewed). Looking at the labeled parts: - **A** points to a metal cup (likely a milk frothing pitcher or a small container), not involved in pod changing. - **B** points to the **drip tray** at the base — this collects coffee drips, not involved in changing pods. - **C** points to a plastic bag with coffee beans — not directly involved in changing the pod. - **D** points to the **portafilter holder** or **group head** — this is where the portafilter (with the coffee pod) is inserted and removed. Therefore, to change the coffee pod, you would **remove the portafilter** from the group head — which is located at **D**. ✅ **Correct answer: D. D** This motion (removing the portafilter from the group head) allows you to replace or change the coffee pod.