Notebooks
A
Arize AI
Browser Use Arize Final

Browser Use Arize Final

agentsarize-tutorialsLLMPython

🌐 Building an Intelligent Browser Agent with Llama 4 and Arize Observability

This notebook provides a step-by-step guide to creating an AI-powered browser agent capable of navigating and interacting with websites autonomously. By combining the power of Llama 4 Scout, Playwright, Together AI, and Arize observability, this agent can perform tasks seamlessly while providing detailed tracing and visualization of its workflow.

Demo

For a detailed explanation of the code and a demo video, visit our blog post: Blog Post and Demo Video

Features
  • Visual understanding of web pages through screenshots
  • Autonomous navigation and interaction
  • Natural language instructions for web tasks
  • Persistent browser session management
  • Full observability with Arize agent node visualization
  • Hierarchical tracing of planning and execution phases
  • Detailed performance metrics and error tracking

For example, you can ask the agent to:

  • Search for a product on Amazon
  • Find the cheapest flight to Tokyo
  • Buy tickets for the next Warriors game

All while tracking the agent's decision-making process in Arize!

What's in this Notebook?

This recipe walks you through:

  • Setting up the environment and installing dependencies including Arize
  • Configuring Arize for agent observability with hierarchical node visualization
  • Automating browser interactions using Playwright with full tracing
  • Defining a structured prompt for the LLM to understand the task and execute the next action
  • Leveraging Llama 4 Scout for content comprehension
  • Creating a persistent and intelligent browser agent with comprehensive observability
  • Viewing agent workflows in Arize with parent-child relationships between components

*Please note that the agent is not perfect and may not always behave as expected.

1. Install Required Libraries

This cell installs the necessary Python packages for the script, such as together, playwright, and Arize for observability. It also ensures that Playwright is properly installed to enable automated browser interactions.

[ ]

2. Import Modules, Set Up Arize Observability

[ ]

3. Set up Environment Variables

Make sure to set the following environment variables in your .env file:

ARIZE_SPACE_ID=your_arize_space_id
ARIZE_API_KEY=your_arize_api_key

You can get your Arize credentials from the Arize platform.

Vision Query Example

This function converts an image file into a Base64-encoded string, which is required for LLM querying.

The next cell shows an example of how to use the encode_image function to convert an image file into a Base64-encoded string, which is then used in a chat completion request to the Llama 4 Scout model.

[ ]
[ ]

Helper Functions to Parse the Accessibility Tree

The agent will use the accessibility tree to understand the elements on the page and interact with them. A helper function is defined here to help simplity the accessibility tree for the agent.

[ ]

3. Define Prompts

a) Planning Prompt: Create a structured prompt for the LLM to understand the task and execute the next action.

b) Agent Execution Prompt A structured prompt is created, specifying the instructions for processing the webpage content and screenshots.

[ ]

Few Shot Examples

Performance improves drastically by adding a few shot examples.

[ ]

4. Define a task and generate a plan of actions to execute

You can define your own task or use one of the examples below

[ ]

Execute Planner Agent

The next cell queries the LLM using the planning prompt to generate a plan of actions to execute. This then becomes each of the individual subtasks for the execution agent to complete.

[ ]

5. Create the Browser environment and Run the Executor Agent

The necessary modules for web scraping are imported, and the setup for using Playwright asynchronously is initialized.

The context is provided to the LLM to help it understand its current state and generate the next required action to complete the provided task.

  • At any step, you can press enter to continue or 'q' to quit the agent loop.
[ ]

And that's it! Congratulations! 🎉🎉

You've just created a browser agent that can navigate websites, understand page content through vision, plan and execute actions based on natural language commands, and maintain context across multiple interactions.

Collaborators

Feel free to reach out with any questions or feedback!

Miguel Gonzalez on X or LinkedIn

Dimitry Khorzov on X or LinkedIn

6. View Traces in Arize

After running the agent, you can view the traces in the Arize platform:

  1. Navigate to your Arize dashboard
  2. Select your space and model
  3. View the agent traces with hierarchical visualization showing:
    • Orchestrator: The main workflow coordinator
    • Planner: The planning phase that breaks down the task
    • Executor: The execution phase managing browser interactions
    • Actions: Individual browser actions with their sub-components:
      • Context Extractor: Captures page state
      • Decision Maker: LLM reasoning about next action
      • Action Executor: Performs the browser action

The agent node visualization will show the flow between different components, making it easy to debug and understand the agent's behavior.