Document Metadata Extraction
Metadata Extraction from Invoice Documents
Copyright (c) Meta Platforms, Inc. and affiliates. This software may be used and distributed according to the terms of the Llama Community License Agreement.
The tutorial shows you how to build an invoice processing system that automatically extracts structured data from invoice images. Using a two-stage pipeline with Llama's multimodal and text models, you'll transform varied invoice formats into clean, validated JSON data ready for seamless import into accounting systems.
While traditional OCR tools struggle with diverse layouts and require extensive template configuration, the approach here uses Llama's vision capabilities to understand any invoice format, enrich data with external services, and flag exceptions for human review—delivering the accuracy required for financial automation.
What you will learn
- Build a two-stage processing pipeline that separates visual extraction from intelligent refinement for optimal accuracy and cost efficiency.
- Leverage Llama's multimodal capabilities to extract data from invoice images with diverse layouts and formats.
- Use Llama API's tool calling to enrich data with external services such as currency conversion APIs.
- Implement JSON structured output to ensure consistent, reliable data extraction every time.
| Component | Choice | Why |
|---|---|---|
| Architecture | Two-Stage Pipeline | Separates concerns: Stage 1 focuses on accurate transcription, Stage 2 on refinement |
| Stage 1 Model | Llama 4 Maverick | Advanced vision capabilities for accurate text extraction from complex invoice layouts |
| Stage 2 Model | Llama 4 Scout | Fast performance and tool calling for data refinement, validation, and enrichment |
| Infrastructure | Llama API | Provides serverless, production-ready access to Llama models using the llama_api_client software development kit (SDK) |
| Output Format | JSON Structured Output | Guarantees consistent schema compliance for seamless integration with accounting systems |
Note on Inference Providers: This tutorial uses Llama API for demonstration purposes. However, you can run Llama models with any preferred inference provider. Common examples include Amazon Bedrock and Together AI. The core logic of this tutorial can be adapted to any of these providers.
Scenario snapshot
- Corpus: The Scanned Receipts OCR and Information Extraction (SROIE) dataset, publicly available on Kaggle. The dataset contains real-world receipts and invoices featuring diverse layouts, fonts, and formats that realistically simulate the challenges an AP department faces. Documents range from simple receipts to complex invoices with varying quality—from pristine digital exports to faded scans.
Example Invoice from SROIE Dataset:

Sample Malaysian invoice showing typical extraction challenges: foreign currency (RM), mixed date formats (29/01/2018), vendor information, and varying text quality.
-
Users: The target users are Accounts Payable (AP) specialists and accounting clerks whose primary responsibility is to process incoming invoices for payment accurately and efficiently. They typically spend 3-5 minutes per invoice on manual data entry, leading to processing bottlenecks and human errors.
-
Typical Asks: The core task is to extract structured data from each invoice document. A typical extraction task would be:
- Extract the following from the invoice: vendor_name, invoice_date, vendor_address, total_amount, and currency_symbol
- Convert any foreign currency amounts to USD using current exchange rates
- Flag any invoices with missing required fields or validation errors for manual review
Solution: Two-stage intelligent processing pipeline
The solution is a two-stage pipeline that optimizes for both accuracy and cost by using different Llama models specialized for each stage's task:
Stage 1: Accurate transcription (vision to raw data)
The first stage acts as the system's "eyes," focusing entirely on converting visual information into structured text with maximum accuracy.
Key Strategy: The model is instructed to extract, not invent. It captures any visual ambiguities in the extraction_notes field (e.g., "Currency symbol appears to be 'RM' but could be 'SR' due to print quality"), creating a raw but faithful digital representation with documented uncertainties.
Stage 2: Intelligent refinement (data to insights)
The second stage acts as the system's "brain," applying business logic, external knowledge, and validation rules to produce clean, reliable data.
The separation of concerns enables you to use expensive multimodal models only for visual tasks while leveraging faster, cheaper text models for data refinement—reducing costs by up to 60% compared to using multimodal models throughout.
Prerequisites
Before you begin, ensure you have a Llama API key from Llama API.
Install dependencies
You will need a few libraries for this project: llama-api-client for API access and pillow for image handling.
Imports & Llama API client setup
Import the necessary modules and initialize the LlamaAPIClient using your API key as an environment variable.
Note: This tutorial uses Llama API, but you can adapt it to other inference providers such as Amazon Bedrock and Together AI.
Model configuration
The tutorial uses two specialized Llama models, optimized for different tasks: Llama 4 Maverick for multimodal input processing (text + images), while Llama 4 Scout for fast text performance with tool calling capabilities.
Load sample invoice data
The tutorial uses a small subset of the SROIE dataset (Scanned Receipts OCR and Information Extraction), containing real-world receipts and invoices with diverse layouts.
The full SROIE dataset contains thousands of invoices with varying quality—from pristine digital exports to faded scans.
✅ 10 invoices loaded
Stage 1 - Visual extraction with multimodal model
Stage 1 uses Llama 4 Maverick's multimodal capabilities for accurate text extraction from complex invoice layouts. The key is to be faithful to the source—capturing exactly what's visible, including any ambiguities.
Define the invoice schema
We'll use Pydantic models to define our expected output structure, ensuring consistent extraction across all invoices.
Implement visual extraction
Prompt strategy: Instructs the model to act as a high-fidelity transcriber, separating numeric amounts from currency symbols and documenting visual uncertainties in extraction_notes for Stage 2 resolution.
Now let's process all SROIE invoices through Stage 1:
🔍 Processing invoices through Stage 1... [1] X00016469670.jpg Extracted: OJC MARKETING SDN BHD | 193.00 Target OJC MARKETING SDN BHD | 193.00 📝 Notes: The currency symbol 'SR' is used, which typically represents Saudi Riyal. The invoice is clearly marked as a 'TAX INVOICE' and includes details such as invoice number, date, cashier, sales person, and bill to information. The product details and total amount are also clearly listed. ✅ ✓ (Company:✓ | Amount:✓) [2] X00016469671.jpg Extracted: OJC MARKETING SDN BHD | 170.00 Target OJC MARKETING SDN BHD | 170.00 📝 Notes: The currency symbol is not explicitly shown on the invoice, but the amounts are listed with two decimal places, suggesting a currency that uses this format, such as MYR (Malaysian Ringgit). The vendor is based in Malaysia, supporting this interpretation. ✅ ✓ (Company:✓ | Amount:✓) [3] X51005200931.jpg Extracted: PERNIAGAAN ZHENG HUI | 436.20 Target PERNIAGAAN ZHENG HUI | 436.20 📝 Notes: The invoice is clear and legible, with all necessary information visible. The date format is DD/MM/YYYY. ✅ ✓ (Company:✓ | Amount:✓) [4] X51005230605.jpg Extracted: PETRON BKT LANJAN SB | : 4.90 Target PETRON BKT LANJAN SB | 4.90 📝 Notes: The receipt appears to be from a Petron gas station, and it includes a purchase of food items and GST. The total amount is clearly stated as RM 4.90. The date is in the format DD/MM/YYYY. ❌ ✗ (Company:✓ | Amount:✗) [5] X51005230616.jpg Extracted: Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd) | 38.90 Target GERBANG ALAF RESTAURANTS SDN BHD | 38.90 📝 Notes: The currency symbol is assumed to be 'RM' as it is the local currency in Malaysia where the invoice is from, but it is not explicitly shown on the invoice. ❌ ✗ (Company:✗ | Amount:✓) [6] X51005230621.jpg Extracted: SIN LIANHAP SDN BHD | .$30 Target SIN LIANHAP SDN BHD | 7.30 📝 Notes: The total amount is listed as '7.30' under 'Payment', and the currency symbol is 'RM' as indicated next to the item prices. ❌ ✗ (Company:✓ | Amount:✗) [7] X51005230648.jpg Extracted: CROSS CHANNEL NETWORK SDN. BHD. | 6.35 Target CROSS CHANNEL NETWORK SDN. BHD. | 6.35 📝 Notes: The invoice number is BTG-052332. The product purchased is 'SCHNEIDER E15R 13A SWITCH SOCKET OUTLET' with a quantity of 1. The total amount includes GST at 6%. The paid amount was RM 10.00, and the change given was RM 3.65. The GST summary shows SR @ A with an amount of RM 6.00 and tax of RM 0.36. ✅ ✓ (Company:✓ | Amount:✓) [8] X51005230657.jpg Extracted: CROSS CHANNEL NETWORK SDN. BHD. | 10.00 Target CROSS CHANNEL NETWORK SDN. BHD. | 7.95 📝 Notes: The invoice is clear and legible. The date is in the format DD/MM/YYYY and includes a timestamp. The total amount is clearly stated as 'Total Amt Payable: 10.00'. The currency symbol 'RM' is used consistently throughout the invoice. ❌ ✗ (Company:✓ | Amount:✗) [9] X51005230659.jpg Extracted: SWC ENTERPRISE SDN BHD | $patchy image obscuring total amount Target SWC ENTERPRISE SDN BHD | 8.00 📝 Notes: The total amount is partially obscured by a patchy image, making it difficult to determine the exact value. The visible amount is '8.00', but it's unclear if this is the total or a subtotal. The currency symbol is not explicitly shown on the invoice. ❌ ✗ (Company:✓ | Amount:✗) [10] X51005268275.jpg Extracted: LIGHTROOM GALLERY SDN BHD | 278.80 Target LIGHTROOM GALLERY SDN BHD | 278.80 📝 Notes: The image is a clear receipt from Lightroom Gallery Sdn Bhd, dated 20/11/2017. The total amount is RM 278.80. The receipt includes details of items purchased, GST, and payment information. ✅ ✓ (Company:✓ | Amount:✓) ✅ Stage 1: 10/10 processed | Accuracy: 90.0% company, 60.0% amount
Stage 2 - Intelligent refinement with tool calling
Stage 2 uses Llama 4 Scout for fast performance and tool calling, applying business logic to resolve ambiguities and enrich data with external services. This is where the system becomes truly intelligent by resolving ambiguities and enriching data with external information.
Define tools for external services
We'll create tools that the model can use to enrich the extracted data. These tools enable the system to perform currency conversion and data enrichment.
Learn more about tool calling: For comprehensive guidance on implementing tool calling with Llama models, see the Meta Tool Calling Guide.
Tool strategy: In this tutorial we use currency conversion to demonstrate tool calling with clear purpose and structured data handling. This same pattern extends to other tools such as vendor validation, tax calculation, compliance checks, and other business logic integrations.
Implementation note: The currency conversion implementation below uses static exchange rates for tutorial simplicity. In production, you would integrate with live currency APIs such as ExchangeRate-API, Fixer.io, or your financial system's currency service.
Define enriched output schema
The enriched schema structures Stage 1's raw extraction into clean, business-ready data with currency conversion and processing transparency.
Implement intelligent refinement
Processing strategy: We use Llama 4 Scout with structured JSON output and tool calling to enrich the raw data from Stage 1, resolve currency ambiguities, standardize dates, and convert amounts to USD. The model analyzes extraction_notes to resolve documented ambiguities using business context, standardizes dates, follows strict numeric formatting rules, and converts amounts to USD.
Now let's process the successful Stage 1 results through Stage 2:
🧠 Processing invoices through Stage 2 (intelligent refinement)...
[1] X00016469670.jpg
Enriched: OJC MARKETING SDN BHD | 193.00
Target: OJC MARKETING SDN BHD | 193.00
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $46.09
[2] X00016469671.jpg
Enriched: OJC MARKETING SDN BHD | 170.00
Target: OJC MARKETING SDN BHD | 170.00
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $40.09
[3] X51005200931.jpg
Enriched: PERNIAGAAN ZHENG HUI | 436.20
Target: PERNIAGAAN ZHENG HUI | 436.20
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $97.53
[4] X51005230605.jpg
Enriched: PETRON BKT LANJAN SB | 4.90
Target: PETRON BKT LANJAN SB | 4.90
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $1.16
[5] X51005230616.jpg
Enriched: Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd) | 38.90
Target: GERBANG ALAF RESTAURANTS SDN BHD | 38.90
❌ ✗ (Company:✗ | Amount:✓)
💱 Converted: MYR → USD $,{
[6] X51005230621.jpg
Enriched: SIN LIANHAP SDN BHD | 7.30
Target: SIN LIANHAP SDN BHD | 7.30
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $1.75
[7] X51005230648.jpg
Enriched: CROSS CHANNEL NETWORK SDN. BHD. | 6.35
Target: CROSS CHANNEL NETWORK SDN. BHD. | 6.35
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $[convert_currency(amount=6.35, from_currency='MYR', to_currency='USD')]
[8] X51005230657.jpg
Enriched: CROSS CHANNEL NETWORK SDN. BHD. | 10.00
Target: CROSS CHANNEL NETWORK SDN. BHD. | 7.95
❌ ✗ (Company:✓ | Amount:✗)
💱 Converted: MYR → USD $2.40
[9] X51005230659.jpg
Enriched: SWC ENTERPRISE SDN BHD | 8.00
Target: SWC ENTERPRISE SDN BHD | 8.00
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $1.92
[10] X51005268275.jpg
Enriched: LIGHTROOM GALLERY SDN BHD | 278.80
Target: LIGHTROOM GALLERY SDN BHD | 278.80
✅ ✓ (Company:✓ | Amount:✓)
💱 Converted: MYR → USD $62.49
✅ Stage 2: 10/10 enriched | Accuracy: 90.0% company, 90.0% amount
Currency conversions: 10 invoices
Let's examine the final structured outputs from our two-stage pipeline:
📋 Final Structured Outputs from Two-Stage Pipeline:
[1] X00016469670.jpg:
{
"vendor_name": "OJC MARKETING SDN BHD",
"vendor_address": "NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR",
"invoice_date": "2019-01-15",
"original_amount": "193.00",
"original_currency": "MYR",
"converted_amount_usd": "46.09",
"exchange_rate": "0.2387",
"reasoning_notes": "The currency symbol 'SR' was initially provided, but based on the vendor address in Malaysia and the extraction notes, it seems there was a confusion. The address suggests the currency is likely MYR. The amount 193.00 was converted from MYR to USD using the exchange rate 0.2387, resulting in 46.09 USD."
}
--------------------------------------------------
[2] X00016469671.jpg:
{
"vendor_name": "OJC MARKETING SDN BHD",
"vendor_address": "NO 2 & 4, JALAN BAYU 4, BANDAR SERI ALAM, 81750 MASAI, JOHOR",
"invoice_date": "2019-02-01",
"original_amount": "170.00",
"original_currency": "MYR",
"converted_amount_usd": "40.09",
"exchange_rate": "0.2357",
"reasoning_notes": "The currency symbol was not explicitly shown, but the vendor is based in Malaysia, and the amounts have two decimal places, suggesting MYR. The exchange rate used for conversion is based on static rates."
}
--------------------------------------------------
[3] X51005200931.jpg:
{
"vendor_name": "PERNIAGAAN ZHENG HUI",
"vendor_address": "NO.59 JALAN PERMAS 9/5 BANDAR BARU PERMAS JAYA 81750 JOHOR BAHRU",
"invoice_date": "2018-02-09",
"original_amount": "436.20",
"original_currency": "MYR",
"converted_amount_usd": "97.53",
"exchange_rate": "0.2236",
"reasoning_notes": "The invoice contains a Malaysian address, suggesting the currency is MYR. The extraction notes mention that the invoice is clear and legible. The currency symbol 'RM' is commonly used in Malaysia to represent MYR. Therefore, it is reasonable to assume that the original currency is MYR. The convert_currency tool was used to convert the amount from MYR to USD."
}
--------------------------------------------------
[4] X51005230605.jpg:
{
"vendor_name": "PETRON BKT LANJAN SB",
"vendor_address": "KM 458.4 BKT LANJAN UTARA, L/RAYA UTARA SELATAN,SG BULOH 47000 SUNGAI BULOH",
"invoice_date": "2018-02-01",
"original_amount": "4.90",
"original_currency": "MYR",
"converted_amount_usd": "1.16",
"exchange_rate": "0.237",
"reasoning_notes": "The extraction_notes mention that the receipt appears to be from a Petron gas station in Malaysia, and the total amount is clearly stated as RM 4.90. The vendor_address also suggests a Malaysian location, which implies the currency is MYR. The convert_currency tool was used to convert MYR 4.90 to USD."
}
--------------------------------------------------
[5] X51005230616.jpg:
{
"vendor_name": "Gerbang Alaf Restaurants Sdn Bhd (formerly known as Golden Arches Restaurants Sdn Bhd)",
"vendor_address": "Level 6, Bangunan TH, Damansara Uptown3 No.3, Jalan SS21/39, 47400 Petaling Jaya Selangor",
"invoice_date": "2018-01-18",
"original_amount": "38.90",
"original_currency": "MYR",
"converted_amount_usd": ",{",
"exchange_rate": "",
"reasoning_notes": ""
}
--------------------------------------------------
[6] X51005230621.jpg:
{
"vendor_name": "SIN LIANHAP SDN BHD",
"vendor_address": "LOT 13, JALAN IPOH, KG BATU 30, ULU YAM LAMA 44300 BTG KALI, SELANGOR",
"invoice_date": "2018-05-02",
"original_amount": "7.30",
"original_currency": "MYR",
"converted_amount_usd": "1.75",
"exchange_rate": "0.24",
"reasoning_notes": "The vendor address is in Malaysia, suggesting MYR currency. The extraction notes mention 'RM' next to item prices, which is the currency symbol for Malaysian Ringgit. The total amount is listed as '7.30' under 'Payment'. Using convert_currency tool to convert MYR to USD."
}
--------------------------------------------------
[7] X51005230648.jpg:
{
"vendor_name": "CROSS CHANNEL NETWORK SDN. BHD.",
"vendor_address": "47, JALAN MERANTI 1, SEK. 3, BANDAR UTAMA BATANG KALI, 44300 BATANG KALI, SELANGOR",
"invoice_date": "2018-01-29",
"original_amount": "6.35",
"original_currency": "MYR",
"converted_amount_usd": "[convert_currency(amount=6.35, from_currency='MYR', to_currency='USD')]",
"exchange_rate": "[convert_currency(amount=1, from_currency='MYR', to_currency='USD')]",
"reasoning_notes": "The vendor address is in Malaysia, suggesting MYR currency. The currency symbol 'RM' is consistent with MYR. The extraction notes confirm the total amount includes GST at 6%, and the paid amount and change given are in RM, further supporting MYR as the original currency."
}
--------------------------------------------------
[8] X51005230657.jpg:
{
"vendor_name": "CROSS CHANNEL NETWORK SDN. BHD.",
"vendor_address": "47, JALAN MERANTI 1, SEK. 3, BANDAR UTAMA BATANG KALI, 44300 BATANG KALI, SELANGOR",
"invoice_date": "2017-12-31",
"original_amount": "10.00",
"original_currency": "MYR",
"converted_amount_usd": "2.40",
"exchange_rate": "0.24",
"reasoning_notes": "The vendor address suggests a Malaysian origin, and the currency symbol 'RM' is commonly used in Malaysia, which corresponds to MYR. The extraction notes confirm the currency symbol 'RM' is used consistently throughout the invoice. Therefore, the original currency is MYR. The amount '10.00' is converted to USD using the convert_currency tool."
}
--------------------------------------------------
[9] X51005230659.jpg:
{
"vendor_name": "SWC ENTERPRISE SDN BHD",
"vendor_address": "NO. 5-7, Jalan Mahagoni 7/1, Sekysen 4, Bandar Utama, 44300 Batang Kali, Selangor.",
"invoice_date": "2018-01-08",
"original_amount": "8.00",
"original_currency": "MYR",
"converted_amount_usd": "1.92",
"exchange_rate": "0.24",
"reasoning_notes": "The vendor address suggests a Malaysian origin, which implies the currency might be MYR. Given the partial obscuration of the total amount and the visible '8.00', it is reasonable to assume this is the total in MYR. The exchange rate used for conversion is based on static rates."
}
--------------------------------------------------
[10] X51005268275.jpg:
{
"vendor_name": "LIGHTROOM GALLERY SDN BHD",
"vendor_address": "No: 28, JALAN ASTANA 1C, BANDAR BUKIT RAJA, 41050 KLANG SELANGOR D.E, MALAYSIA",
"invoice_date": "2017-11-20",
"original_amount": "278.80",
"original_currency": "MYR",
"converted_amount_usd": "62.49",
"exchange_rate": "0.224",
"reasoning_notes": "The extraction_notes indicate that the receipt is from Lightroom Gallery Sdn Bhd, dated 20/11/2017, and the total amount is RM 278.80. The vendor_address suggests a Malaysian location, which implies the currency is MYR. The currency_symbol 'RM' is commonly used for Malaysian Ringgit. Therefore, the original_currency is MYR. Using the convert_currency tool, we can convert the amount to USD."
}
--------------------------------------------------
Analyzing Common Failure Patterns
Even with high accuracy, LLM-based extraction can produce errors that reveal common challenges:
-
Over-extraction: The model extracts technically correct but contextually excessive information (e.g., including a company's former name). This pattern often requires stricter output formatting rules or more sophisticated post-processing.
-
Ambiguous Layout: The model incorrectly identifies a field because the document layout contains multiple plausible candidates (e.g., extracting a "payment due" amount instead of the "invoice total"). This class of errors is often best handled by implementing confidence scores to flag ambiguous cases for human review.
To address these failure patterns, you can implement confidence scoring to automatically flag ambiguous layouts for human review—a critical step for high-value transactions. For issues like over-extraction, you can refine your prompts with more specific formatting instructions or provide few-shot examples to guide the model toward the desired output structure.
These failure patterns underscore that the goal is not to eliminate human involvement, but to augment it. A successful system reliably handles the majority of invoices, while intelligently flagging complex exceptions for AP specialists to review, allowing them to focus on high-value decisions.
Next steps and upgrade paths
You've built an invoice processing system that combines Llama's multimodal capabilities to handle real-world document complexity. The two-stage architecture provides a flexible foundation that can be adapted to various industries and scale requirements. Here's how to extend this system for specific business needs and scale requirements.
| Invoice Type | Recommended Approach | Why |
|---|---|---|
| Simple receipts (< 10 items) | Stage 1 only | Multimodal extraction suffices for straightforward layouts |
| Complex invoices (multiple currencies) | Both stages | Stage 2 enrichment adds critical currency normalization |
| High-value transactions (> $10K) | Both stages + confidence scoring | Add verification techniques for risk mitigation |
| Batch processing (> 100/day) | Adaptive routing | Use confidence thresholds to route only ambiguous cases to Stage 2 |
Expanding with production tools
While this tutorial uses currency conversion to demonstrate tool calling, production systems typically integrate high-impact business tools:
Vendor Validation: validate_vendor checks vendors against approved supplier databases, reducing fraud risk and ensuring compliance with procurement policies.
Duplicate Detection: duplicate_detection prevents double payments by comparing invoice amounts, dates, and vendor details against recent payment history.
Budget Approval: check_budget_approval verifies purchases against approved budgets and spending limits, enabling automated approval workflows for compliant transactions.
Each additional tool follows the same pattern: define the tool schema, implement the function, and let the Llama model decide when to use it based on invoice data and business rules.