Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt

Use this file to discover all available pages before exploring further.

Search Operations

Perform semantic search and Cypher queries against your warehouse data. Perform semantic search across warehouse data:
warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_semantic_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

results = response.get("retrieval_results", [])
for result in results:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")
Parameters:
  • query (str): Natural language search query
  • n_objects (int, optional): Number of results. Default: 10
  • indexes (list[str], optional): Specific indexes. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
  • included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
  • excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
  • selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []
Details on the Tags (included or excluded) There are 2 possibilities for the identification of tags:
  • either by a tag id and its parent, for example: {"parent": "job-category", "children": "data-engineer"}
  • or by its hierarchy and its id, for example {"id": "fullstack-engineer","id_tree": "job-roles"}
Returns: A list of retrieved objects. Performs a full text search on the knowledge graph fulltext indexes.
warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_full_text_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

results = response.get("retrieval_results", [])
for result in results:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")
Parameters:
  • query (str): Natural language search query
  • n_objects (int, optional): Number of results. Default: 10
  • indexes (list[str], optional): Specific indexes. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
  • included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
  • excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
  • selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []
Returns: A list of retrieved objects. Performs a semantic search and fulltext search and combine the two results. The scores are combined using Reciprocal Rank Fusion(RRF). Returns the Score (semantic similarity score), Full-text score (BM25 score), and RRF score (the final score used for getting the top N).
warehouse = client.get_warehouse("warehouse-id")

response = warehouse.retrieve_with_hybrid_search(
    query="financial reports from Q4",
    n_objects=10,
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=True, # Optional, defaults to False
    included_tags = [{"parent": "job-category", "children": "data-engineer"}],
    excluded_tags = [{"parent": "job-category", "children": "data-engineer"}],
    rerank=True, # Optional, defaults to False
)

for result in response:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Content: {result.get('content', '')[:100]}...")
Parameters:
  • query (str): Natural language search query
  • n_objects (int, optional): Number of results. Default: 10
  • indexes (list[str], optional): Specific indexes. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • enrich_with_chunks (bool, optional): Whether to enrich the retrieved objects with their original chunks. Default: False
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude in the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
  • included_objects (list[dict], optional): Objects to include in the retrieval. A list of [{"name": entity_name}]. Default: []
  • excluded_objects (list[dict], optional): Objects to exclude in the retrieval. A list of [{"name": entity_name}]. Default: []
  • selected_documents (list[dict], optional): List of {"id": document_uuid} to select from. Default []

Cypher Queries

Generate a cypher query from a user instruction

instruction = "All the documents which contains chunks about financial data"

response = warehouse.generate_cypher_query(instruction)
cypher_query = response["generated_cypher_query"]

Execute Neo4j Cypher queries against the warehouse graph database:

cypher_query = """
MATCH (d:Document)-[:CONTAINS]->(c:Chunk)
WHERE d.title CONTAINS 'financial'
RETURN d.title, c.content, d.upload_date
ORDER BY d.upload_date DESC
LIMIT 10
"""

response = warehouse.retrieve_with_cypher(cypher_query)
results = response["retrieval_results"]

for result in results:
    print(f"Document: {result['d.title']}")
Parameters:
  • cypher_query (str): Valid Cypher query string
Returns: Dict with results under “retrieval_results” key

Execute a templated Cypher query against the warehouse graph database:

query_template = """
MATCH (n:Object)-[r]-(m:Chunk)
WHERE n. location = $job_location
LIMIT 100
WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes,
    collect(DISTINCT r) AS relationships
RETURN nodes, relationships
LIMIT 10
"""
template_variables={"job_location": "San Francisco"}

results = warehouse.retrieve_with_templated_query(query_template=cypher_query, template_variables=template_variables, enrich_with_chunks=True)

for result in results:
    print(f"Result: {result}")
Parameters:
  • query_template (str): Valid Cypher query string. This query string can include variables ($something)
  • template_variables: (dict) A dictionary of {variable_name: variable_value}
  • enrich_with_chunks: (bool) If true, objects are enriched with the content of the chunk they derive from
Returns: A list of dicts, each representing objects or chunks.

Generating answers

Generate answers to questions using different search strategies to retrieve relevant context. Generate an answer using semantic search to retrieve relevant context:
warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_semantic_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    n_objects=10,  # Optional, defaults to 10
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    enrich_with_chunks=False,  # Optional, defaults to False
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")
Parameters:
  • question (str): The question to answer
  • query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
  • n_objects (int, optional): Number of context objects to retrieve. Default: 10
  • indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
Returns: Generated answer string or None if generation fails Generate an answer using full-text search to retrieve relevant context:
warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_full_text_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    n_objects=10,  # Optional, defaults to 10
    enrich_with_chunks=False,  # Optional, defaults to False
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")
Parameters:
  • question (str): The question to answer
  • query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
  • indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • n_objects (int, optional): Number of context objects to retrieve. Default: 10
  • enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
Returns: Generated answer string or None if generation fails Generate an answer using hybrid search (combining semantic and full-text search) to retrieve relevant context:
warehouse = client.get_warehouse("warehouse-id")

answer = warehouse.generate_answer_with_hybrid_search(
    question="What were our Q4 financial results?",
    query=None,  # Optional, defaults to question if not provided
    indexes=["chunks"],  # Optional, defaults to ["chunks"]
    reformulate_query=True, # Optional, defaults to False
    n_objects=10,  # Optional, defaults to 10
    enrich_with_chunks=False,  # Optional, defaults to False
    rrf_k=60,  # Optional, RRF constant for score combination
    included_tags=[{"parent": "department", "children": "finance"}],  # Optional
    excluded_tags=[],  # Optional
    rerank=True, # Optional
)

print(f"Answer: {answer}")
Parameters:
  • question (str): The question to answer
  • query (str, optional): Custom search query for context retrieval. Defaults to question if not provided
  • indexes (list[str], optional): Specific indexes to search. Default: [“chunks”]
  • reformulate_query (bool, optional): Whether to perform HyDE query reformulation before retrieving objects. Default: False
  • n_objects (int, optional): Number of context objects to retrieve. Default: 10
  • enrich_with_chunks (bool, optional): Whether to enrich retrieved objects with original chunks. Default: False
  • rrf_k (int, optional): Constant used in Reciprocal Rank Fusion algorithm. Default: 60
  • included_tags (list[dict], optional): Tags to include in the search. Default: []
  • excluded_tags (list[dict], optional): Tags to exclude from the search. Default: []
  • rerank (bool, optional): Whether to perform reranking on retrieved objects. Default: False
Returns: Generated answer string or None if generation fails