Notebooks
W
Weaviate
Listwise Ranking Openai

Listwise Ranking Openai

vector-searchvector-databaseretrieval-augmented-generationllm-frameworksweaviate-featuresfunction-callingweaviate-recipesPythongenerative-aiopenai-rankingreranking

Open In Colab

Listwise Reranking using OpenAI

Listwise ranking

Listwise reranking takes all the retrieved documents and query as input to the ranker. Listwise reranking uses prompt engineering to feed in the input of the retrieved documents + query and returns a structured output of the results [Doc B > Doc C > Doc A]. The objective of the LLM is to find the best document ordering that maximizes the retrieval metric (i.e. nDCG, precision).

Connect to Weaviate instance

[1]
{"action":"startup","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-12-27T12:19:05-03:00"}
{"action":"startup","auto_schema_enabled":true,"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"module offload-s3 is enabled","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"warning","msg":"Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"open cluster service","servers":{"Embedded_at_8079":60173},"time":"2024-12-27T12:19:05-03:00"}
{"address":"192.168.28.99:60174","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"starting cloud rpc server ...","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"starting raft sub-system ...","time":"2024-12-27T12:19:05-03:00"}
{"address":"192.168.28.99:60173","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"tcp transport","tcpMaxPool":3,"tcpTimeout":10000000000,"time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"loading local db","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"local DB successfully loaded","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"schema manager loaded","n":0,"time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","metadata_only_voters":false,"msg":"construct a new raft node","name":"Embedded_at_8079","time":"2024-12-27T12:19:05-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","index":840,"level":"info","msg":"raft initial configuration","servers":"[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.28.99:58227}]]","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","last_snapshot_index":0,"last_store_applied_index_on_start":844,"level":"info","msg":"raft node constructed","raft_applied_index":0,"raft_last_index":844,"time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","hasState":true,"level":"info","msg":"raft init","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"attempting to join","remoteNodes":["192.168.28.99:60173"],"time":"2024-12-27T12:19:05-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","follower":{},"leader-address":"","leader-id":"","level":"info","msg":"raft entering follower state","time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"attempted to join and failed","remoteNode":"192.168.28.99:60173","status":8,"time":"2024-12-27T12:19:05-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"attempting to join","remoteNodes":["192.168.28.99:60173"],"time":"2024-12-27T12:19:06-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"attempted to join and failed","remoteNode":"192.168.28.99:60173","status":8,"time":"2024-12-27T12:19:06-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","last-leader-addr":"","last-leader-id":"","level":"warning","msg":"raft heartbeat timeout reached, starting election","time":"2024-12-27T12:19:07-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"raft entering candidate state","node":{},"term":188,"time":"2024-12-27T12:19:07-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"raft pre-vote successful, starting election","refused":0,"tally":1,"term":188,"time":"2024-12-27T12:19:07-03:00","votesNeeded":1}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"raft election won","tally":1,"term":188,"time":"2024-12-27T12:19:07-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","leader":{},"level":"info","msg":"raft entering leader state","time":"2024-12-27T12:19:07-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","docker_image_tag":"localhost","level":"info","msg":"configured versions","server_version":"1.26.6","time":"2024-12-27T12:19:07-03:00"}
{"action":"grpc_startup","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"grpc server listening at [::]:50050","time":"2024-12-27T12:19:07-03:00"}
{"action":"restapi_management","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","docker_image_tag":"localhost","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-12-27T12:19:07-03:00"}
{"address":"192.168.28.99:60173","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"current Leader","time":"2024-12-27T12:19:07-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"attempting to join","remoteNodes":["192.168.28.99:60173"],"time":"2024-12-27T12:19:07-03:00"}
{"action":"raft","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","command":0,"level":"info","msg":"raft updating configuration","server-addr":"192.168.28.99:60173","server-id":"Embedded_at_8079","servers":"[[{Suffrage:Voter ID:Embedded_at_8079 Address:192.168.28.99:60173}]]","time":"2024-12-27T12:19:07-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"reload local db: update schema ...","time":"2024-12-27T12:19:08-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","index":"MyCollection","level":"info","msg":"reload local index","time":"2024-12-27T12:19:08-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","index":"BlogPost","level":"info","msg":"reload local index","time":"2024-12-27T12:19:08-03:00"}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","index":"FAQ_Answers","level":"info","msg":"reload local index","time":"2024-12-27T12:19:08-03:00"}
{"action":"telemetry_push","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"telemetry started","payload":"\u0026{MachineID:b5a11f64-c335-4a4a-929e-4a82377dcf5b Type:INIT Version:1.26.6 NumObjects:0 OS:darwin Arch:arm64 UsedModules:[generative-openai text2vec-openai]}","time":"2024-12-27T12:19:08-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property__id/segment-1735308659428598000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property_content/segment-1735308659428051000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property_content/segment-1735308681860851000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property__id/segment-1735308681860851000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/objects/segment-1735308659426845000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/objects/segment-1735308681860851000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property_content_searchable/segment-1735308659428314000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"MyCollection","index":"mycollection","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/mycollection/SgZG5BZ5gAjb/lsm/property_content_searchable/segment-1735308681865661000","shard":"SgZG5BZ5gAjb","time":"2024-12-27T12:19:09-03:00"}
{"action":"hnsw_prefill_cache_async","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-12-27T12:19:09-03:00","wait_for_cache_prefill":false}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"Completed loading shard mycollection_SgZG5BZ5gAjb in 23.125709ms","time":"2024-12-27T12:19:09-03:00"}
{"action":"hnsw_vector_cache_prefill","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-12-27T12:19:09-03:00","took":221333}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__id/segment-1735308659271205000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_doc_id/segment-1735308659266438000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_document_id/segment-1735308659270010000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_text/segment-1735308659268867000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_ref_doc_id/segment-1735308659265131000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__id/segment-1735308682164480000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_doc_id/segment-1735308682164479000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_content/segment-1735308659267701000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_text/segment-1735308682169157000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_document_id/segment-1735308682164501000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_ref_doc_id/segment-1735308682169156000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_content/segment-1735308682169169000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_type/segment-1735308659263146000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_type/segment-1735308682169157000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/objects/segment-1735308659260627000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/objects/segment-1735308682169158000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_text_searchable/segment-1735308659269437000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_text_searchable/segment-1735308682178168000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_content_searchable/segment-1735308659268281000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_doc_id_searchable/segment-1735308659267046000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_document_id_searchable/segment-1735308659270615000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_content_searchable/segment-1735308682178159000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_document_id_searchable/segment-1735308682173161000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_type_searchable/segment-1735308659264255000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_doc_id_searchable/segment-1735308682173163000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property__node_type_searchable/segment-1735308682178158000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_ref_doc_id_searchable/segment-1735308659265805000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"lsm_recover_from_active_wal","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","class":"BlogPost","index":"blogpost","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/Users/dudanogueira/.local/share/weaviate/blogpost/pCCvvGK0UfZZ/lsm/property_ref_doc_id_searchable/segment-1735308682178158000","shard":"pCCvvGK0UfZZ","time":"2024-12-27T12:19:09-03:00"}
{"action":"hnsw_prefill_cache_async","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-12-27T12:19:09-03:00","wait_for_cache_prefill":false}
{"build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","level":"info","msg":"Completed loading shard blogpost_pCCvvGK0UfZZ in 18.786333ms","time":"2024-12-27T12:19:09-03:00"}
{"action":"hnsw_vector_cache_prefill","build_git_commit":"ab0312d5d","build_go_version":"go1.23.1","build_image_tag":"localhost","build_wv_version":"1.26.6","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-12-27T12:19:09-03:00","took":55041}
[2]
Client Version: 4.10.2 Server Version: 1.26.6

Load in FAQ json

[3]

Create Weaviate Schema

[4]
Successfully created the schema.

Upload answers to Weaviate

[5]
{'question': 'Why would I use Weaviate as my vector database?', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.', 'number': '1'}
{'question': 'What is the difference between Weaviate and for example Elasticsearch?', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.', 'number': '2'}
{'question': 'Do I need to know about Docker (Compose) to use Weaviate?', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.', 'number': '3'}
{'question': 'What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there', 'number': '4'}
{'question': "Are there any 'best practices' or guidelines to consider when designing a schema?", 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.", 'number': '5'}
{'question': 'Is it possible to create one-to-many relationships in the schema?', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.', 'number': '6'}
{'question': 'Do Weaviate classes have namespaces?', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.', 'number': '7'}
{'question': 'Are there restrictions on UUID formatting? Do I have to adhere to any standards?', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.", 'number': '8'}
{'question': 'If I do not specify a UUID during adding data objects, will Weaviate create one automatically?', 'answer': 'Yes, a UUID will be created if not specified.', 'number': '9'}
{'question': 'Can I use Weaviate to create a traditional knowledge graph?', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.', 'number': '10'}
{'question': 'Why does Weaviate have a schema and not an ontology?', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.", 'number': '11'}
{'question': 'How can I retrieve the total object count in a class?', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here', 'number': '12'}
{'question': "How do I get the cosine similarity from Weaviate's certainty?", 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1", 'number': '13'}
{'question': 'What is the best way to iterate through objects? Can I do paginated API calls?', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.', 'number': '14'}
{'question': "How does Weaviate's vector and scalar filtering work?", 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.", 'number': '15'}
{'question': 'Can I request a feature in Weaviate?', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.", 'number': '16'}

Retrieve information

Prompt 1
[6]
[ ]
[7]
ITERATIONS 1
Weaviate search results 10


INPUT:
QUERY: Why would I use Weaviate as my vector database?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



OPENAI 1st Rank = Answer id: 2]
[Answer id: 1]
[Answer id: 11]
[Answer id: 10]
[Answer id: 5]
[Answer id: 7]
[Answer id: 3]
[Answer id: 14]
[Answer id: 8]
[Answer id: 13

GROUND TRUTH = 1
ITERATIONS 2
Weaviate search results 10


INPUT:
QUERY: What is the difference between Weaviate and for example Elasticsearch?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



OPENAI 1st Rank = Answer id: 2]
[Answer id: 1]
[Answer id: 3]
[Answer id: 5]
[Answer id: 15]
[Answer id: 11]
[Answer id: 10]
[Answer id: 7]
[Answer id: 8]
[Answer id: 14

GROUND TRUTH = 2
ITERATIONS 3
Weaviate search results 10


INPUT:
QUERY: Do I need to know about Docker (Compose) to use Weaviate?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



OPENAI 1st Rank = Answer id: 3] 
[Answer id: 1] 
[Answer id: 4] 
[Answer id: 2]
[Answer id: 10]
[Answer id: 11] 
[Answer id: 14]
[Answer id: 8]
[Answer id: 7]
[Answer id: 16

GROUND TRUTH = 3
ITERATIONS 4
Weaviate search results 10


INPUT:
QUERY: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]



OPENAI 1st Rank = Answer id: 4]
[Answer id: 3]
[Answer id: 1]
[Answer id: 2]
[Answer id: 7]
[Answer id: 10]
[Answer id: 11]
[Answer id: 14]
[Answer id: 8]
[Answer id: 9

GROUND TRUTH = 4
ITERATIONS 5
Weaviate search results 10


INPUT:
QUERY: Are there any 'best practices' or guidelines to consider when designing a schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]



OPENAI 1st Rank = Answer id: 10]
[Answer id: 5]
[Answer id: 1]
[Answer id: 11]
[Answer id: 7]
[Answer id: 6]
[Answer id: 2]
[Answer id: 15]
[Answer id: 14]
[Answer id: 12

GROUND TRUTH = 5
ITERATIONS 6
Weaviate search results 10


INPUT:
QUERY: Is it possible to create one-to-many relationships in the schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]



OPENAI 1st Rank = Answer id: 6]
[Answer id: 10]
[Answer id: 11]
[Answer id: 7]
[Answer id: 14]
[Answer id: 1]
[Answer id: 2]
[Answer id: 5]
[Answer id: 15]
[Answer id: 9

GROUND TRUTH = 6
ITERATIONS 7
Weaviate search results 10


INPUT:
QUERY: Do Weaviate classes have namespaces?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



OPENAI 1st Rank = Answer id: 7]
[Answer id: 11]
[Answer id: 1]
[Answer id: 10]
[Answer id: 2]
[Answer id: 3]
[Answer id: 6]
[Answer id: 8]
[Answer id: 14]
[Answer id: 13

GROUND TRUTH = 7
ITERATIONS 8
Weaviate search results 10


INPUT:
QUERY: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Please rerank these search results.
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



OPENAI 1st Rank = Answer id: 8

GROUND TRUTH = 8
ITERATIONS 9
Weaviate search results 10


INPUT:
QUERY: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Please rerank these search results.
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



OPENAI 1st Rank = Answer id: 8

GROUND TRUTH = 9
ITERATIONS 10
Weaviate search results 10


INPUT:
QUERY: Can I use Weaviate to create a traditional knowledge graph?
Please rerank these search results.
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



OPENAI 1st Rank = Answer id: 10]
[Answer id: 11]
[Answer id: 1]
[Answer id: 2]
[Answer id: 3]
[Answer id: 6]
[Answer id: 7]
[Answer id: 8]
[Answer id: 14]
[Answer id: 15

GROUND TRUTH = 10
ITERATIONS 11
Weaviate search results 10


INPUT:
QUERY: Why does Weaviate have a schema and not an ontology?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



OPENAI 1st Rank = Answer id: 11]
[Answer id: 10]
[Answer id: 1]
[Answer id: 2]
[Answer id: 3]
[Answer id: 14]
[Answer id: 8]
[Answer id: 5]
[Answer id: 6]
[Answer id: 7

GROUND TRUTH = 11
ITERATIONS 12
Weaviate search results 10


INPUT:
QUERY: How can I retrieve the total object count in a class?
Please rerank these search results.
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



OPENAI 1st Rank = Answer id: 14]
[Answer id: 6]
[Answer id: 7]
[Answer id: 5]
[Answer id: 12]
[Answer id: 13]
[Answer id: 4]
[Answer id: 10]
[Answer id: 11]
[Answer id: 16

GROUND TRUTH = 12
ITERATIONS 13
Weaviate search results 10


INPUT:
QUERY: How do I get the cosine similarity from Weaviate's certainty?
Please rerank these search results.
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]



OPENAI 1st Rank = Answer id: 13

GROUND TRUTH = 13
ITERATIONS 14
Weaviate search results 10


INPUT:
QUERY: What is the best way to iterate through objects? Can I do paginated API calls?
Please rerank these search results.
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



OPENAI 1st Rank = Answer id: 14]
[Answer id: 6]
[Answer id: 12]
[Answer id: 7]
[Answer id: 10]
[Answer id: 2]
[Answer id: 11]
[Answer id: 1]
[Answer id: 5]
[Answer id: 15

GROUND TRUTH = 14
ITERATIONS 15
Weaviate search results 10


INPUT:
QUERY: How does Weaviate's vector and scalar filtering work?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



OPENAI 1st Rank = Answer id: 2]
[Answer id: 15]
[Answer id: 5]
[Answer id: 13]
[Answer id: 11]
[Answer id: 10]
[Answer id: 3]
[Answer id: 1]
[Answer id: 14]
[Answer id: 8

GROUND TRUTH = 15
ITERATIONS 16
Weaviate search results 10


INPUT:
QUERY: Can I request a feature in Weaviate?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



OPENAI 1st Rank = Answer id: 16

GROUND TRUTH = 16
0.0
Prompt 2
[8]
[9]
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why would I use Weaviate as my vector database?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [2],[1],[11],[10],[5],[3],[7],[14],[13],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 1
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the difference between Weaviate and for example Elasticsearch?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [2],[1],[15],[3],[11],[10],[7],[14],[5],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 2
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do I need to know about Docker (Compose) to use Weaviate?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [3],[1],[4]

OPENAI 1st Rank = 3

GROUND TRUTH = 3
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]



RAW OUTPUT FROM OPENAI = [4],[3],[1]

OPENAI 1st Rank = 4

GROUND TRUTH = 4
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there any 'best practices' or guidelines to consider when designing a schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]



RAW OUTPUT FROM OPENAI = [11],[10],[7],[5],[6]

OPENAI 1st Rank = 11

GROUND TRUTH = 5
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Is it possible to create one-to-many relationships in the schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]



RAW OUTPUT FROM OPENAI = [6],[11],[10],[1],[5],[14],[7],[2],[15],[9]

OPENAI 1st Rank = 6

GROUND TRUTH = 6
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do Weaviate classes have namespaces?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [7],[6],[1],[3],[2],[11],[10],[14],[8],[13]

OPENAI 1st Rank = 7

GROUND TRUTH = 7
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Please rerank these search results.
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [8],[9]

OPENAI 1st Rank = 8

GROUND TRUTH = 8
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Please rerank these search results.
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [8],[9]

OPENAI 1st Rank = 8

GROUND TRUTH = 9
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I use Weaviate to create a traditional knowledge graph?
Please rerank these search results.
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [10],[11],[1],[2],[14],[3],[7],[6],[15],[8]

OPENAI 1st Rank = 10

GROUND TRUTH = 10
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why does Weaviate have a schema and not an ontology?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [11],[10],[1],[2],[3],[14],[8],[5],[6],[7]

OPENAI 1st Rank = 11

GROUND TRUTH = 11
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How can I retrieve the total object count in a class?
Please rerank these search results.
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [14],[7],[5],[6],[10],[11],[4],[13],[12],[16]

OPENAI 1st Rank = 14

GROUND TRUTH = 12
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How do I get the cosine similarity from Weaviate's certainty?
Please rerank these search results.
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]



RAW OUTPUT FROM OPENAI = [13],[2],[1],[10],[11],[14],[3],[5],[15],[8]

OPENAI 1st Rank = 13

GROUND TRUTH = 13
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the best way to iterate through objects? Can I do paginated API calls?
Please rerank these search results.
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [14],[1],[2],[10],[5],[11],[6],[7],[15],[12]

OPENAI 1st Rank = 14

GROUND TRUTH = 14
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How does Weaviate's vector and scalar filtering work?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [15],[2],[1]

OPENAI 1st Rank = 15

GROUND TRUTH = 15
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I request a feature in Weaviate?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [16],[1],[2],[3],[10],[11],[14],[6],[8],[13]

OPENAI 1st Rank = 16

GROUND TRUTH = 16
75.0
Prompt 3
[10]
[11]
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why would I use Weaviate as my vector database?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [2],[1],[11],[10],[5],[3],[7],[14],[13],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 1
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the difference between Weaviate and for example Elasticsearch?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [2],[1],[15],[5],[3]

OPENAI 1st Rank = 2

GROUND TRUTH = 2
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do I need to know about Docker (Compose) to use Weaviate?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [3],[1],[4]

OPENAI 1st Rank = 3

GROUND TRUTH = 3
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]



RAW OUTPUT FROM OPENAI = [4],[3],[1],[2],[7],[11],[10],[14],[8],[9]

OPENAI 1st Rank = 4

GROUND TRUTH = 4
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there any 'best practices' or guidelines to consider when designing a schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]



RAW OUTPUT FROM OPENAI = [11],[10],[7],[6]

OPENAI 1st Rank = 11

GROUND TRUTH = 5
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Is it possible to create one-to-many relationships in the schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]



RAW OUTPUT FROM OPENAI = [6],[11],[10],[5],[7],[14],[1],[2],[15],[9]

OPENAI 1st Rank = 6

GROUND TRUTH = 6
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do Weaviate classes have namespaces?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [7],[1],[2],[3],[11],[10],[14],[6],[8],[13]

OPENAI 1st Rank = 7

GROUND TRUTH = 7
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Please rerank these search results.
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [8],[9]

OPENAI 1st Rank = 8

GROUND TRUTH = 8
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Please rerank these search results.
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [8],[9]

OPENAI 1st Rank = 8

GROUND TRUTH = 9
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I use Weaviate to create a traditional knowledge graph?
Please rerank these search results.
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [10],[11],[1],[2],[14],[3],[8],[6],[15],[7]

OPENAI 1st Rank = 10

GROUND TRUTH = 10
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why does Weaviate have a schema and not an ontology?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [11],[10],[1],[2]

OPENAI 1st Rank = 11

GROUND TRUTH = 11
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How can I retrieve the total object count in a class?
Please rerank these search results.
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [14],[7],[5],[6],[10],[11],[4],[16],[13],[12]

OPENAI 1st Rank = 14

GROUND TRUTH = 12
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How do I get the cosine similarity from Weaviate's certainty?
Please rerank these search results.
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]



RAW OUTPUT FROM OPENAI = [13],[1],[2],[14],[3],[10],[11],[5],[15],[8]

OPENAI 1st Rank = 13

GROUND TRUTH = 13
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the best way to iterate through objects? Can I do paginated API calls?
Please rerank these search results.
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [14],[1],[10],[2],[5],[11],[6],[7],[15],[12]

OPENAI 1st Rank = 14

GROUND TRUTH = 14
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How does Weaviate's vector and scalar filtering work?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [2],[15],[1],[5],[11],[10],[14],[3],[8],[13]

OPENAI 1st Rank = 2

GROUND TRUTH = 15
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I request a feature in Weaviate?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [16],[1],[2],[10],[3],[14],[11],[6],[8],[13]

OPENAI 1st Rank = 16

GROUND TRUTH = 16
68.75
Prompt 4
[12]
[13]
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why would I use Weaviate as my vector database?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [2],[1],[10],[7],[11],[3],[14],[5],[13],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 1
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the difference between Weaviate and for example Elasticsearch?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [2],[1],[10],[11],[15],[7],[5],[3],[14],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 2
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do I need to know about Docker (Compose) to use Weaviate?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [3],[1],[4]

OPENAI 1st Rank = 3

GROUND TRUTH = 3
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?
Please rerank these search results.
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]



RAW OUTPUT FROM OPENAI = [4],[3],[1],[14],[2],[11],[10],[7],[8],[9]

OPENAI 1st Rank = 4

GROUND TRUTH = 4
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there any 'best practices' or guidelines to consider when designing a schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]



RAW OUTPUT FROM OPENAI = [11],[10],[7],[6],[5],[1],[14],[15],[2],[12]

OPENAI 1st Rank = 11

GROUND TRUTH = 5
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Is it possible to create one-to-many relationships in the schema?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]



RAW OUTPUT FROM OPENAI = [6],[10],[11],[7]

OPENAI 1st Rank = 6

GROUND TRUTH = 6
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Do Weaviate classes have namespaces?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [7],[6],[1],[2],[3],[10],[11],[14],[8],[13]

OPENAI 1st Rank = 7

GROUND TRUTH = 7
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Are there restrictions on UUID formatting? Do I have to adhere to any standards?
Please rerank these search results.
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [8],[9],[15],[12],[6],[1],[7],[10],[11],[5]

OPENAI 1st Rank = 8

GROUND TRUTH = 8
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('bdd07683-ca36-4142-a854-ca21e112cf28'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '9', 'answer': 'Yes, a UUID will be created if not specified.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: If I do not specify a UUID during adding data objects, will Weaviate create one automatically?
Please rerank these search results.
[Answer id: 9, Answer: Yes, a UUID will be created if not specified.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [8],[9],[4],[2],[7],[14],[3],[1],[10],[11]

OPENAI 1st Rank = 8

GROUND TRUTH = 9
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I use Weaviate to create a traditional knowledge graph?
Please rerank these search results.
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [10],[11],[1],[2],[6],[14],[7],[15],[3],[8]

OPENAI 1st Rank = 10

GROUND TRUTH = 10
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Why does Weaviate have a schema and not an ontology?
Please rerank these search results.
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [11],[10],[1],[2],[14],[3],[7],[6],[5],[8]

OPENAI 1st Rank = 11

GROUND TRUTH = 11
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('c753f12d-3e10-4ed5-95ca-c9b18e8a9b51'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '4', 'answer': 'There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How can I retrieve the total object count in a class?
Please rerank these search results.
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 4, Answer: There are three levels: You have no volume configured (the default in our Docker Compose files), if the container restarts (e.g. due to a crash, or because of docker stop/start) your data is kept. You have no volume configured (the default in our Docker Compose files), if the container is removed (e.g. from docker compose down or docker rm) your data is gone. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]



RAW OUTPUT FROM OPENAI = [14],[6],[7]

OPENAI 1st Rank = 14

GROUND TRUTH = 12
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How do I get the cosine similarity from Weaviate's certainty?
Please rerank these search results.
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]



RAW OUTPUT FROM OPENAI = [13],[2],[1],[5],[11],[10],[3],[14],[15],[8]

OPENAI 1st Rank = 13

GROUND TRUTH = 13
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('33e43661-b54b-4eaf-9d00-76dc55b8ec5b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '12', 'answer': 'Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f7f26081-e472-4d41-9379-e5a5f5bb5433'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '7', 'answer': 'Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.'}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: What is the best way to iterate through objects? Can I do paginated API calls?
Please rerank these search results.
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 12, Answer: Sometimes, users work with custom terminology, which often comes in the form of abbreviations or jargon. You can find more information on how to use the endpoint here]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]



RAW OUTPUT FROM OPENAI = [14],[6],[2]

OPENAI 1st Rank = 14

GROUND TRUTH = 14
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('1578a3df-2653-4ad0-a839-e87af7be872b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '5', 'answer': "As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('4da88886-c097-4dbf-9815-31b77c6f5b8a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '15', 'answer': "It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached."}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: How does Weaviate's vector and scalar filtering work?
Please rerank these search results.
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
[Answer id: 15, Answer: It's a 2-step process: 1. The inverted index (which is built at import time) queries to produce an allowed list of the specified document ids. Then the ANN index is queried with this allow list (the list being one of the reasons for our custom implementation). 2. If we encounter a document id which would be a close match, but isn't on the allow list the id is treated as a candidate (i.e. we add it to our list of links to evaluate), but is never added to the result set. Since we only add allowed IDs to the set, we don't exit early, i.e. before the top k elements are reached.]



RAW OUTPUT FROM OPENAI = [2],[15],[5],[13],[11],[1],[10],[14],[3],[8]

OPENAI 1st Rank = 2

GROUND TRUTH = 15
Weaviate search results QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('f2791143-c2ae-43d2-9917-1e57c0a25942'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '1', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('0ae59a26-a5d8-487d-803c-61e18d875a8d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '2', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('d0ef8536-2cea-4983-95e0-e8e63f90f69b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '3', 'answer': 'Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('6a89189c-cd22-4027-88e4-fbbd6588823d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '14', 'answer': 'Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('27308607-c83f-405e-8d59-1445004cf382'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '10', 'answer': 'Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('9af93587-2b65-413a-927e-0467b6ebd676'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '11', 'answer': "We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('35f3e15e-670b-4c17-b658-8d7d40b47255'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '8', 'answer': "The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('b39bb168-1f95-476b-b928-d12058bfc6a0'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '16', 'answer': "Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇."}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('e087409a-4ff8-4fdb-989c-457c2e74edfb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '6', 'answer': 'Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.'}, references=None, vector={}, collection='FAQ_Answers'), Object(uuid=_WeaviateUUIDInt('7a32c7bd-7171-45f3-9d9d-549735e8b37b'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'number': '13', 'answer': "To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1"}, references=None, vector={}, collection='FAQ_Answers')])


INPUT: 
QUERY: Can I request a feature in Weaviate?
Please rerank these search results.
[Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
[Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
[Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
[Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
[Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
[Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
[Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
[Answer id: 16, Answer: Sure (also, feel free to issue a pull request 😉) you can add those requests here. The only thing you need is a GitHub account, and while you're there, make sure to give us a star 😇.]
[Answer id: 6, Answer: Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available soon.]
[Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]



RAW OUTPUT FROM OPENAI = [16],[1],[2],[3],[14],[10],[11],[8],[6],[13]

OPENAI 1st Rank = 16

GROUND TRUTH = 16
68.75