Documentation Index
Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt
Use this file to discover all available pages before exploring further.
The platform provides powerful retrieval capabilities using semantic search, full-text search, and hybrid approaches.
Sync Data to Retrieval Engine
Before performing retrieval, sync your data to the retrieval engine (Neo4j graph database):
# Sync all data to retrieval
sync_task = warehouse.run_sync_to_retrieval()
result = sync_task.wait_for_completion()
print(f"Sync status: {result['status']}")
# Or sync specific tables
chunks_table = warehouse.get_table("chunks")
sync_task = chunks_table.push_to_retrieval()
sync_task.wait_for_completion()
Semantic Search (Vector Search)
Retrieve relevant content using embedding-based similarity:
# Semantic search on chunks
results = warehouse.retrieve_with_semantic_search(
query="What are the payment terms?",
n_objects=10, # Number of results
indexes=["chunks"], # Search in chunks
reformulate_query=False, # Use LLM to reformulate query
rerank=True # Use reranker for better results
)
# Access results
for result in results:
print(f"Score: {result['score']}")
print(f"Content: {result['content'][:200]}...")
print(f"Document: {result.get('name', 'N/A')}")
print("---")
Full-Text Search
Retrieve content using keyword-based search:
# Full-text search
results = warehouse.retrieve_with_full_text_search(
query="invoice payment terms",
indexes=["chunks"],
n_objects=10
)
Hybrid Search
Combine semantic and full-text search for optimal results:
# Hybrid search (recommended)
results = warehouse.retrieve_with_hybrid_search(
query="payment terms and conditions",
indexes=["chunks"],
n_objects=10,
rrf_k=60, # Reciprocal Rank Fusion parameter
rerank=True
)
Search with Filters
Apply filters to narrow down results:
# Filter by tags
results = warehouse.retrieve_with_semantic_search(
query="financial data",
n_objects=10,
indexes=["chunks"],
included_tags={"document_type": "invoice"}, # Only invoices
excluded_tags={"status": "draft"} # Exclude drafts
)
# Filter by documents
results = warehouse.retrieve_with_semantic_search(
query="contract terms",
n_objects=10,
indexes=["chunks"],
selected_documents=[{"id": "doc-id-1"}, {"id": "doc-id-2"}]
)
# Filter by objects
results = warehouse.retrieve_with_semantic_search(
query="high value items",
n_objects=10,
indexes=["objects"],
included_objects=[{"table_name": "invoices"}]
)
Generate Answers with Retrieval
Get AI-generated answers based on retrieved context:
# Generate answer with semantic search
answer = warehouse.generate_answer_with_semantic_search(
question="What are the payment terms in the contract?",
query=None, # If None, uses the question as query
n_objects=10,
indexes=["chunks"]
)
print(f"Answer: {answer}")
# Generate answer with hybrid search (recommended)
answer = warehouse.generate_answer_with_hybrid_search(
question="What are the payment terms?",
query="payment terms conditions",
n_objects=10,
indexes=["chunks"],
rrf_k=60
)
Cypher Queries
For advanced use cases, query the knowledge graph directly:
# Execute a Cypher query
cypher_query = """
MATCH (c:Chunk)-[:BELONGS_TO]->(d:Document)
WHERE d.name CONTAINS 'invoice'
RETURN c.content, d.name
LIMIT 10
"""
results = warehouse.retrieve_with_cypher(cypher_query)
# Use templated queries
template = """
MATCH (c:Chunk)-[:BELONGS_TO]->(d:Document)
WHERE d.name CONTAINS $doc_name
RETURN c.content
LIMIT $limit
"""
results = warehouse.retrieve_with_templated_query(
query_template=template,
template_variables={"doc_name": "invoice", "limit": 10}
)
Filtering with attributes
When searching on objects index, you can leverage attributes filter to restrict the search on objects that have specific attribute values. In that case:
- only the objects that have the given attribute values will be retrieved.
- Attributes filter is only available for objects that have been pushed to retrieval : push_to_retrieval
- attribute values are case sensitive
- operator can be
AND or OR
# Imagine we have the following object class
#
# from pydantic import BaseModel
# class Benefits(BaseModel):
# healthcare: str
#
# class JobDescription(BaseModel):
# work_type: str
# role_type: str
# location: str
# benefits: Benefits
included_attributes = {
"values":[
{"work_type":"remote"},
{"role_type":"individual contributor"}
],
"operator":"AND"
}
excluded_attributes = {
"values":[
{"location":"Not Specified"},
{"healthcare":"Not Specified"}
],
"operator":"OR"
}
results = warehouse.retrieve_with_semantic_search("Job descriptions for data engineers",
n_objects=5,
indexes=["objects"],
included_attributes=included_attributes,
excluded_attributes=excluded_attributes)
retrieved_objects = pd.DataFrame(results)