Llama Nemotron VL Nano 8B
Llama Nemotron Nano VL 8B - A Simple Notebook Walkthrough!
Explore NVIDIA's 8B Vision Language Model capable of OCR and Document Understanding. You can use this notebook to try out Llama Nemotron Nano VL model - hosted on build.nvidia.com.
All you need to run this model is an NVIDIA API KEY, which you can find on the model page linked above by clicking on "Get API Key". Make sure to login or sign up!

NOTE: Don't worry about any credits to use this model, although there is a rate limit of 40 requests per minute.
In case, you are running this notebook in a local environment, let's ensure we install all the required libraries to run this notebook.
Requirement already satisfied: pyarrow in /usr/local/lib/python3.11/dist-packages (18.1.0) Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.10.0) Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.2.2) Requirement already satisfied: openai in /usr/local/lib/python3.11/dist-packages (1.82.1) Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (2.32.3) Requirement already satisfied: Pillow in /usr/local/lib/python3.11/dist-packages (11.2.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.3.2) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.58.1) Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.8) Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.0.2) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (24.2) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.3) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2) Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from openai) (4.9.0) Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from openai) (1.9.0) Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from openai) (0.28.1) Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from openai) (0.10.0) Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.11/dist-packages (from openai) (2.11.5) Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from openai) (1.3.1) Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.11/dist-packages (from openai) (4.67.1) Requirement already satisfied: typing-extensions<5,>=4.11 in /usr/local/lib/python3.11/dist-packages (from openai) (4.13.2) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests) (3.4.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests) (2.4.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests) (2025.4.26) Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.9) Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.16.0) Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (0.7.0) Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (2.33.2) Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (0.4.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)
NOTE: Please restart the kernel after installation to avoid import errors.
Provide NVDEV API Key: ··········
Set up OpenAI Client
First, we'll need to point our OpenAI client to build.nvidia.com API
Let's also create a function that calls Llama Nemotron Nano VL. This function takes image(s) and text prompt as input, and gives out the model response.
We will have to combine the text prompt along with the images to craft a prompt to the model. Dive deeper into the API Reference here.
Best Practice: Order of items in the messages dictionary matter. First put your images, and then your text prompt.
Invoice/Receipt Understanding
We are going to demonstrate the capability of this small VLM on an invoice. This model shines with it's OCR capabilities. Let's grab an invoice from a dataset on HuggingFace katanaml-org/invoices-donut-data-v1. We are going to pull an invoice image from the test set of this dataset.
--2025-06-06 00:09:50-- https://huggingface.co/datasets/katanaml-org/invoices-donut-data-v1/resolve/main/data/test-00000-of-00001-56af6bd5ff7eb34d.parquet Resolving huggingface.co (huggingface.co)... 3.166.152.44, 3.166.152.65, 3.166.152.110, ... Connecting to huggingface.co (huggingface.co)|3.166.152.44|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn-lfs.hf.co/repos/8b/53/8b532de81ab76db9001c6481db8bba0d5b5ec4539ffe0aaebec9bcfafdadaba2/712a9a65000d6a2cf04d41bc794843ca30452bb87473951b1adafd3610eab08b?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27test-00000-of-00001-56af6bd5ff7eb34d.parquet%3B+filename%3D%22test-00000-of-00001-56af6bd5ff7eb34d.parquet%22%3B&Expires=1749172190&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0OTE3MjE5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy84Yi81My84YjUzMmRlODFhYjc2ZGI5MDAxYzY0ODFkYjhiYmEwZDViNWVjNDUzOWZmZTBhYWViZWM5YmNmYWZkYWRhYmEyLzcxMmE5YTY1MDAwZDZhMmNmMDRkNDFiYzc5NDg0M2NhMzA0NTJiYjg3NDczOTUxYjFhZGFmZDM2MTBlYWIwOGI%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=kzODCOTkdIdrBfnxCar47ay8Ss7OSkSDyAKa29cHvikoBlP6IxLwtU6q7B05mrnzJ6FAuzXJM57Vd0kwJ979wl3CeAS8kBILd766cUnAgkQGOGWr2CwU3KJpq6jcSJsb9jbwXp0R5XqjdcC4ec7NI85WaoqD%7Ex8UCsl3CCfy%7EhjbLELAJABSuCtqMKz4lFx97WODnCHPG6vpvZTUtQkmWf2v%7EPIXMeh8y8a3jO-N8wm0Qp%7EF8f0ZjcpzCxIeC8r7q37mwlZ10WS19ToPvDknwxpeibzh5beo-GfJzu5u7c6pMXAirvRD1nOmXuJkSCd9RHG8LyvU4OFyK12BEVNDyg__&Key-Pair-Id=K3RPWS32NSSJCE [following] --2025-06-06 00:09:50-- https://cdn-lfs.hf.co/repos/8b/53/8b532de81ab76db9001c6481db8bba0d5b5ec4539ffe0aaebec9bcfafdadaba2/712a9a65000d6a2cf04d41bc794843ca30452bb87473951b1adafd3610eab08b?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27test-00000-of-00001-56af6bd5ff7eb34d.parquet%3B+filename%3D%22test-00000-of-00001-56af6bd5ff7eb34d.parquet%22%3B&Expires=1749172190&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0OTE3MjE5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy84Yi81My84YjUzMmRlODFhYjc2ZGI5MDAxYzY0ODFkYjhiYmEwZDViNWVjNDUzOWZmZTBhYWViZWM5YmNmYWZkYWRhYmEyLzcxMmE5YTY1MDAwZDZhMmNmMDRkNDFiYzc5NDg0M2NhMzA0NTJiYjg3NDczOTUxYjFhZGFmZDM2MTBlYWIwOGI%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=kzODCOTkdIdrBfnxCar47ay8Ss7OSkSDyAKa29cHvikoBlP6IxLwtU6q7B05mrnzJ6FAuzXJM57Vd0kwJ979wl3CeAS8kBILd766cUnAgkQGOGWr2CwU3KJpq6jcSJsb9jbwXp0R5XqjdcC4ec7NI85WaoqD%7Ex8UCsl3CCfy%7EhjbLELAJABSuCtqMKz4lFx97WODnCHPG6vpvZTUtQkmWf2v%7EPIXMeh8y8a3jO-N8wm0Qp%7EF8f0ZjcpzCxIeC8r7q37mwlZ10WS19ToPvDknwxpeibzh5beo-GfJzu5u7c6pMXAirvRD1nOmXuJkSCd9RHG8LyvU4OFyK12BEVNDyg__&Key-Pair-Id=K3RPWS32NSSJCE Resolving cdn-lfs.hf.co (cdn-lfs.hf.co)... 99.84.252.15, 99.84.252.38, 99.84.252.37, ... Connecting to cdn-lfs.hf.co (cdn-lfs.hf.co)|99.84.252.15|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 10441675 (10.0M) [binary/octet-stream] Saving to: ‘test-00000-of-00001-56af6bd5ff7eb34d.parquet.1’ test-00000-of-00001 100%[===================>] 9.96M --.-KB/s in 0.1s 2025-06-06 00:09:50 (71.5 MB/s) - ‘test-00000-of-00001-56af6bd5ff7eb34d.parquet.1’ saved [10441675/10441675]
Let's select a random image from the data.
Starting with simple transcription. But before that we also need to convert our image to base64 encoded string.
Best Practice: When images have tables, ask the model to extract in LaTeX format for an accurate full page OCR.
We are now ready to prompt the model.
Invoice no: 26343874
Date of issue: 07/17/2013
Seller:
Smith Ltd 290 Jodi Gardens Charlesview, TN 10958
Tax Id: 916-97-6743 IBAN: GB78TZMX55978564099007
Client:
Herring-Floyd 8113 Hansen Cliff Apt. 826 Port Alice, ID 99663
Tax Id: 913-80-4636
ITEMS
SUMMARY
\begin{tabular}{cccccccc} **No.** & **Description** & **Qty** & **UM** & **Net price** & **Net worth** & **VAT [%]** & **Gross worth**\\ 1. & Microsoft Xbox One X 1TB 4k Ultra HD Console black controller works great 1787 & 5,00 & each & 249,00 & 1 245,00 & 10% & 1 369,50\\ 2. & Cables 500gb,Sony PlayStation 4 PS4 CUH 1215A Jet Black Console w/ Controller & 5,00 & each & 100,00 & 500,00 & 10% & 550,00\\ 3. & PS2 Slim Console System SCPH-77000 PINK Playstation 2 Tested & 4,00 & each & 252,00 & 1 008,00 & 10% & 1 108,80\\ 4. & Nintendo Game Boy Micro Console - Black (OXYSFBB) & 2,00 & each & 130,00 & 260,00 & 10% & 286,00\\ 5. & Nintendo GameCube System Console Bundle Mario Sunshine + Genuine controller LOT & 1,00 & each & 159,99 & 159,99 & 10% & 175,99\\ \end{tabular}
\begin{tabular}{ccccc} & VAT [%] & Net worth & VAT & Gross worth\\ & 10% & 3 172,99 & 317,30 & 3 490,29\\ Total & & $ 3 172,99 & $ 317,30 & $ 3 490,29\\ \end{tabular}
You can now transcribe invoices for quick extraction of content. This model is also capable at answering questions on an image.
Let's ask a question on the invoice to get a Monetary insight.
Best Practice: Be concise and Don't be vague on your ask.
No
10%
You can ask a question to do line-level Item Analysis.
5
Try to prompt for Entity Detection!
No, there are no visible logos or branding that indicate a company identity in the provided image.
Based on the image, there are no visible handwritten or stamped elements on the invoice. All text appears to be printed, with no signs of alterations or additional markings that would indicate hand-written or stamped content.
This time, let's grab an example image from the web. Again, first we need to convert the image to a base64-encoded string.
Starting with simple transcription.
Invoice vs. Receipt Receipt Example East Repair Inc. 1912 Harvest Lane New York, NY 12210 BILL TO John Smith 2 Court Square New York, NY 12210 SHIP TO John Smith 3787 Pineview Drive Cambridge, MA 12210 RECEIPT # RECEIPT DATE P.O.# US-001 11/02/2019 2312/2019 Transaction Date Receipt Total $145.00 QTY DESCRIPTION UNIT PRICE AMOUNT 1 Front and rear brake cables 100.00 100.00 2 New set of pedal arms 15.00 30.00 3 Labor 3hrs 5.00 15.00 Total 145.00 Invoice Example Bill to: John Smith 442 Swanssea Street Denver, CO 80303 United States Invoice: Invoice Date: Due Date: 03.01.2021 03.15.2021 Description Quantity Unit Price Amount Consultation 1 each 150.00 150.00 Discount 10% Subtotal without Tax Total USD -15.00 150.00 135.00 Amount Paid 0.00 Amount Due (USD) 135.00 Terms & Conditions 10% new customer discount. Please pay the invoice within 14 days via the payment link below.
Let's prompt to test the model's ability to understand different tables in an image.
The total amount paid on the receipt is $145, and on the invoice is $135.
It accurately pulled and mapped the totals to receipt and invoice.
Let's ask an extractive question.
The invoice was issued on 03.01.2021.
Llama Nemotron VL is also good at text grounding, accurately telling where text is located in the image.
The terms and conditions are written at the bottom of the invoice example.
##📌 Conclusion
In this notebook, we explored the capabilities of Llama Nemotron Nano VL 8B, a lightweight vision-laguage model developed by NVIDIA, through invoice and document understanding tasks.
Here is a quick summary of what we achieved:
- ✅ Performed OCR-based transcription with strong layout awareness and LaTeX formatting.
- 💬 Asked semantic questions about discounts, tax rates, and itemized billing.
- 🔍 Evaluated the model's ability to detect entities and text grounding abilities.
Llama Nemotron Nano VL 8B proves to be an efficient and practical tool for intelligent document processing at scale. It is especially useful for developers looking for affordable and responsive models with OCR + question/answering capabilities.