Property Graph Extractors Retrievers
Extractors and Retrievers in Property Graph
In this notebook, we will explore how to define extractors and retrievers for the PropertyGraph Index.
A property graph is a structured collection of labeled nodes (such as entity categories and text labels) with properties (metadata), interconnected by relationships to form structured paths (triplets).
In LlamaIndex, the PropertyGraphIndex plays a crucial role in:
• Constructing a graph
• Querying a graph
Building and Using PropertyGraph
Property graph construction involves executing a series of knowledge graph extractors on each chunk, and attaching entities and relations as metadata to each node.
You can use as many extractors as needed, and all will be applied.
If not provided, the defaults are SimpleLLMPathExtractor and ImplicitPathExtractor.
SimpleLLMPathExtractor
Use an LLM to extract short statements and parse single-hop paths in the format (entity1, relation, entity2).
If desired, you can also customize both the prompt and the function used for parsing the paths.
Here’s a straightforward (though simplistic) example:
ImplicitPathExtractor
Extract paths using the node.relationships attribute on each LlamaIndex node object.
This extraction process does not require an LLM or embedding model, as it simply parses properties that already exist on the node objects.
SchemaLLMPathExtractor
Extract paths by adhering to a strict schema that specifies allowed entities, relationships, and the connections between them.
Using Pydantic, structured outputs from LLMs, and some intelligent validation, we can dynamically define a schema and verify the extractions for each path.(triplet)
Retrieval and Querying
Labeled property graphs offer various querying methods to retrieve nodes and paths. In LlamaIndex, we have the ability to simultaneously combine multiple node retrieval techniques!
If no sub-retrievers are specified, the default retrievers used are the LLMSynonymRetriever and VectorContextRetriever (if embeddings are enabled).
Currently, the following retrievers are included:
• LLMSynonymRetriever: Retrieves nodes based on keywords and synonyms generated by an LLM.
• VectorContextRetriever: Retrieves nodes based on embedded graph nodes.
• TextToCypherRetriever: Directs the LLM to generate Cypher queries based on the schema of the property graph.
• CypherTemplateRetriever: Utilizes a Cypher template with parameters inferred by the LLM.
• CustomPGRetriever: Easily subclassed to implement custom retrieval logic.
LLMSynonymRetriever
This retriever takes the input query and attempts to generate relevant keywords and synonyms. These are used to retrieve nodes and consequently, the paths connected to those nodes.
Explicitly declaring this retriever in your configuration allows for the customization of several options.
VectorContextRetriever
This retriever identifies nodes based on their vector similarity, subsequently fetching the paths connected to those nodes.
If your graph store natively supports vector capabilities, managing that graph store alone suffices for storage. However, if vector support is not inherent, you will need to supplement the graph store with a vector store. By default, this setup uses the in-memory SimpleVectorStore.
TextToCypherRetriever
This retriever utilizes a graph store schema, your query, and a prompt template for text-to-cypher conversion to generate and execute a Cypher query.
Note: Since the SimplePropertyGraphStore is not a full-fledged graph database, it does not support Cypher queries.
To inspect the schema, you can use the method: index.property_graph_store.get_schema_str().
CypherTemplateRetriever
This is a more constrained version of the TextToCypherRetriever. Instead of allowing the LLM free rein to generate any Cypher statement, we can provide a Cypher template and have the LLM fill in the blanks.