Notebooks
W
Weaviate
Cardinal Weaviate

Cardinal Weaviate

vector-searchdata-platformsvector-databaseretrieval-augmented-generationllm-frameworksfunction-callingweaviate-recipesintegrationsPythongenerative-aicardinal

For this demo, we're using version 4.6.5 Weaviate python client, and the Cardinal API.

Author: Jianna Liu from Cardinal

Jianna's X handle: @jianna_liu Jianna's LinkedIn: https://www.linkedin.com/in/jianna-liu-90747413b/

Cardinal's site: https://trycardinal.ai/

Cardinal ↔ Weaviate RAG Demo

We will:

  • Pull PDFs from S3 (or use URLs)
  • Send each file to Cardinal /rag
  • Convert Cardinal’s inch-based boxes → points → normalized percentages
  • Upsert chunks to Weaviate
  • Run aggregate, hybrid, vector, and generative queries

Requires: Cardinal API key, Weaviate (Cloud/Embedded/Local), AWS creds (if using S3)

If you enjoyed this, follow our socials!

0) Install deps

[2]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 325.7/325.7 kB 5.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.0/40.0 kB 3.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.3/139.3 kB 11.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.1/24.1 MB 82.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 97.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/2.7 MB 88.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 7.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.7/85.7 kB 7.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.0/322.0 kB 27.6 MB/s eta 0:00:00
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
firebase-admin 6.9.0 requires httpx[http2]==0.28.1, but you have httpx 0.27.0 which is incompatible.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.32.1 which is incompatible.
tensorflow 2.19.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3, but you have protobuf 6.32.1 which is incompatible.
mcp 1.13.1 requires httpx>=0.27.1, but you have httpx 0.27.0 which is incompatible.
google-ai-generativelanguage 0.6.15 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2, but you have protobuf 6.32.1 which is incompatible.
google-genai 1.33.0 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.0 which is incompatible.

1) Load environment variables

2) Connect to Weaviate

[ ]
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[60]
Weaviate client version: 4.6.5
Connected to Weaviate: True

3) Create a Weaviate collection (CardinalDemo)

[61]
Deleted existing CardinalDemo collection
Created CardinalDemo collection

4) Import helper functions

[ ]

5) Get S3 URLs

[48]

6) Process files and collect objects

[63]
(1, ['s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf'])
[ ]
Processing files:   0%|          | 0/1 [00:00<?, ?it/s]
Processing: https://public-cardinal-bucket.s3.us-east-2.amazonaws.com/menus/Butterflake%20Croissant%20Sandwiches.pdf
Processing files: 100%|██████████| 1/1 [00:20<00:00, 20.89s/it]
Extracted 25 objects from Butterflake Croissant Sandwiches.pdf

First object structure:
{
  "properties": {
    "text": "BUTTER\nFLAKE\nCROISSANT SANDWICHES",
    "type": "paragraph",
    "element_id": "s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf#p1:999507151",
    "page_number": 1,
    "page_width_pts": 612.0,
    "page_height_pts": 792.0,
    "bbox_in": {
      "min_x": 0.6523,
      "min_y": 0.5002,
      "max_x": 2.8522,
      "max_y": 3.1314
    },
    "bbox_pts": {
      "x": 46.9656,
      "y": 36.014399999999995,
      "w": 158.3928,
      "h": 18...

Total objects to insert: 25

7) Batch insert into Weaviate using the recommended batch method

[65]
Successfully inserted all 25 objects using batch method

8) Test query

[66]

=== Testing Query ===
Total documents in collection: 25

--- Fetching sample documents to verify structure ---

Sample 1:
  Text: DRINKS
Coffee $3.49
Orange Juice $3.49
Apple Juice $3.49
Milk $3.49...
  Filename: Butterflake Croissant Sandwiches.pdf
  Page: 1
  Type: paragraph
  Source URL: s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf
  BBox: left=59.9%, top=89.3%, width=10.7%, height=5.7%

Sample 2:
  Text: # MENU...
  Filename: Butterflake Croissant Sandwiches.pdf
  Page: 1
  Type: paragraph
  Source URL: s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf
  BBox: left=68.8%, top=42.2%, width=11.2%, height=3.3%

--- Hybrid Search Results ---
Found 3 results for 'Croissant sandwich':

--- Result 1 ---
Text: CROISSANT SANDWICHES
all served on a toasted Croissant
THE STANDARD
$9.99
Scrambled Egg | American Cheese | Frank's Redhot
Add Bacon $
Add Sausage $
SWEET HAMMY
$10.99
Scrambled Egg | Smoked Ham | Swi...
Filename: Butterflake Croissant Sandwiches.pdf
Page: 1
Score: 0.8672

--- Result 2 ---
Text: BUTTER
FLAKE
CROISSANT SANDWICHES...
Filename: Butterflake Croissant Sandwiches.pdf
Page: 1
Score: 0.7295

--- Result 3 ---
Text: BREAD: Croissant, 2.5oz DAIRY: American Cheese, Swiss,
Feta Crumbles, Milk PROTEIN: Eggs, Sausage Patty, Bacon,
Smoked Ham PRODUCE: Iceberg Lettuce, Tomato, Green
Pepper, Onion, Spinach CONDIMENTS/SPI...
Filename: Butterflake Croissant Sandwiches.pdf
Page: 1
Score: 0.6735