Multi Modal
Export
Multi-Modal
In this notebook, we show how to use Anthropic MultiModal LLM class/abstraction for image understanding/reasoning.
Installation
[ ]
Setup API key
[1]
Download Sample Images
[2]
--2024-03-08 11:53:40-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/images/prometheus_paper_card.png Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1002436 (979K) [image/png] Saving to: ‘prometheus_paper_card.png’ prometheus_paper_ca 100%[===================>] 978.94K --.-KB/s in 0.005s 2024-03-08 11:53:40 (175 MB/s) - ‘prometheus_paper_card.png’ saved [1002436/1002436] --2024-03-08 11:53:40-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/images/ark_email_sample.PNG Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 56608 (55K) [image/png] Saving to: ‘ark_email_sample.png’ ark_email_sample.pn 100%[===================>] 55.28K --.-KB/s in 0.001s 2024-03-08 11:53:40 (72.9 MB/s) - ‘ark_email_sample.png’ saved [56608/56608]
Use Anthropic to understand Images from Local directory
[3]
<matplotlib.image.AxesImage at 0x7f69551b93c0>
[4]
[5]
[6]
The image is a diagram titled "Prometheus: Inducing Fine-Grained Evaluation Capability In Language Models". It outlines the key components and workflow of the Prometheus system. The main sections are: 1. Contributions: Describes Prometheus as an open-source LLM evaluator that uses custom rubrics for fine-grained evaluations. 2. Feedback Collection: A dataset for fine-tuning evaluator LLMs with custom, fine-grained score rubrics. This section visually shows the process of seeding score rubrics, generating scores, generating instructions, and outputting training instances to create the Feedback Collection. 3. Results: Lists 3 key results - Prometheus matches or outperforms GPT-4 on 3 evaluation datasets, can function as a reward model to help LLMs achieve high agreement with human evaluators on ranking, and enables reference answers for LM evaluations via an ablation study and feedback distillation. 4. Insights: Notes that strong LLMs like GPT-4 show high agreement with human evaluations, but their closed-source nature and uncontrolled variations render them a less than ideal choice for many LLM application developers compared to an equally-good open-source option. 5. Technical Bits: Provides a citation to the full paper with more technical details. The diagram uses
Use AnthropicMultiModal to reason images from URLs
[ ]
Load images with url
[8]
[9]
[10]
The image shows a table comparing the benchmark scores of various Claude 3 AI models (Opus, Sonnet, Haiku) against GPT-4, GPT-3.5, and two versions of Gemini (1.0 Ultra and 1.0 Pro) across different academic subjects and tests. The subjects covered include undergraduate and graduate level knowledge, grade school math, math problem-solving, multilingual math, code, reasoning over text, mixed evaluations, knowledge Q&A, and common knowledge. The scores are presented as percentages, except for the "Reasoning over text" row which shows raw scores out of a certain number of shots. Overall, the Claude 3 models show competitive performance compared to the GPT and Gemini models across most of the benchmarks. The Gemini models have a slight edge in some categories like undergraduate knowledge and math problem-solving.
Structured Output Parsing from an Image
Here, we use our multi-modal Pydantic program to generate structured output from an image.
[11]
[12]
<matplotlib.image.AxesImage at 0x7f68972716c0>
[13]
[14]
[15]
[1;3;38;2;90;149;237m> Raw output: {
"fund": "ARKK",
"tickers": [
{
"direction": "Buy",
"ticker": "TSLA",
"company": "TESLA INC",
"shares_traded": 93664,
"percent_of_total_etf": 0.2453
}
]
}
[0m[16]
fund='ARKK' tickers=[TickerInfo(direction='Buy', ticker='TSLA', company='TESLA INC', shares_traded=93664, percent_of_total_etf=0.2453)]