Search Operations

Perform semantic search and Cypher queries against your warehouse data.

Semantic Search

Perform semantic search across warehouse data:

warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_semantic_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

results = response.get("retrieval_results", [])
for result in results:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")

Parameters:

query (str): Natural language search query
n_objects (int, optional): Number of results. Default: 10
indexes (list[str], optional): Specific indexes. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []

Details on the Tags (included or excluded) There are 2 possibilities for the identification of tags:

either by a tag id and its parent, for example: {"parent": "job-category", "children": "data-engineer"}
or by its hierarchy and its id, for example {"id": "fullstack-engineer","id_tree": "job-roles"}

Returns: A list of retrieved objects.

Full text search

Performs a full text search on the knowledge graph fulltext indexes.

warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_full_text_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

results = response.get("retrieval_results", [])
for result in results:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")

Parameters:

query (str): Natural language search query
n_objects (int, optional): Number of results. Default: 10
indexes (list[str], optional): Specific indexes. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []

Returns: A list of retrieved objects.

Hybrid Search

Performs a semantic search and fulltext search and combine the two results. The scores are combined using Reciprocal Rank Fusion(RRF). Returns the Score (semantic similarity score), Full-text score (BM25 score), and RRF score (the final score used for getting the top N).

warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_hybrid_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

for result in response:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")

Parameters:

query (str): Natural language search query
n_objects (int, optional): Number of results. Default: 10
indexes (list[str], optional): Specific indexes. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []

Cypher Queries

Generate a cypher query from a user instruction

instruction = "All the documents which contains chunks about financial data"

response = warehouse.generate_cypher_query(instruction)
cypher_query = response["generated_cypher_query"]

Execute Neo4j Cypher queries against the warehouse graph database:

cypher_query = """
MATCH (d:Document)-[:CONTAINS]->(c:Chunk)
WHERE d.title CONTAINS 'financial'
RETURN d.title, c.content, d.upload_date
ORDER BY d.upload_date DESC
LIMIT 10
"""

response = warehouse.retrieve_with_cypher(cypher_query)
results = response["retrieval_results"]

for result in results:
    print(f"Document: {result['d.title']}")

Parameters:

cypher_query (str): Valid Cypher query string

Returns: Dict with results under “retrieval_results” key

Execute a templated Cypher query against the warehouse graph database:

query_template = """
MATCH (n:Object)-[r]-(m:Chunk)
WHERE n. location = $job_location
LIMIT 100
WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes,
    collect(DISTINCT r) AS relationships
RETURN nodes, relationships
LIMIT 10
"""
template_variables={"job_location": "San Francisco"}

results = warehouse.retrieve_with_templated_query(query_template=cypher_query, template_variables=template_variables, enrich_with_chunks=True)

for result in results:
    print(f"Result: {result}")

Parameters:

query_template (str): Valid Cypher query string. This query string can include variables ($something)
template_variables: (dict) A dictionary of {variable_name: variable_value}
enrich_with_chunks: (bool) If true, objects are enriched with the content of the chunk they derive from

Returns: A list of dicts, each representing objects or chunks.

Generating answers

Generate answers to questions using different search strategies to retrieve relevant context.

Using Semantic Search

Generate an answer using semantic search to retrieve relevant context:

warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_semantic_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    n_objects=10,  # Optional, defaults to 10
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=False,  # Optional, defaults to False
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")

Parameters:

question (str): The question to answer
query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
n_objects (int, optional): Number of context objects to retrieve. Default: 10
indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False

Returns: Generated answer string or None if generation fails

Using Full-text Search

Generate an answer using full-text search to retrieve relevant context:

warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_full_text_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    n_objects=10,  # Optional, defaults to 10
    enrich_with_chunks=False,  # Optional, defaults to False
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")

Parameters:

question (str): The question to answer
query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
n_objects (int, optional): Number of context objects to retrieve. Default: 10
enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False

Returns: Generated answer string or None if generation fails

Using Hybrid Search

Generate an answer using hybrid search (combining semantic and full-text search) to retrieve relevant context:

warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_hybrid_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    n_objects=10,  # Optional, defaults to 10
    enrich_with_chunks=False,  # Optional, defaults to False
    rrf_k=60,  # Optional, RRF constant for score combination
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")

Parameters:

question (str): The question to answer
query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
n_objects (int, optional): Number of context objects to retrieve. Default: 10
enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
rrf_k (int, optional): Constant used in Reciprocal Rank Fusion algorithm. Default: 60
included_tags (list[dict], optional): Tags to include in the search. Default: []
excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False

Returns: Generated answer string or None if generation fails

Overview

Python Client

REST API

Search Operations

Search Operations

Semantic Search

Full text search

Hybrid Search

Cypher Queries

Generate a cypher query from a user instruction

Execute Neo4j Cypher queries against the warehouse graph database:

Execute a templated Cypher query against the warehouse graph database:

Generating answers

Using Semantic Search

Using Full-text Search

Using Hybrid Search

Overview

Python Client

REST API

Documentation Index

​Search Operations

​Semantic Search

​Full text search

​Hybrid Search

​Cypher Queries

​Generate a cypher query from a user instruction

​Execute Neo4j Cypher queries against the warehouse graph database:

​Execute a templated Cypher query against the warehouse graph database:

​Generating answers

​Using Semantic Search

​Using Full-text Search

​Using Hybrid Search

Search Operations

Semantic Search

Full text search

Hybrid Search

Cypher Queries

Generate a cypher query from a user instruction

Execute Neo4j Cypher queries against the warehouse graph database:

Execute a templated Cypher query against the warehouse graph database:

Generating answers

Using Semantic Search

Using Full-text Search

Using Hybrid Search