Documentation Index
Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt
Use this file to discover all available pages before exploring further.
Tasks represent asynchronous operations like parsing, extraction, and syncing. They allow you to monitor and wait for long-running operations.
List Tasks
# List all tasks in the warehouse
tasks = warehouse.list_tasks()
for task in tasks:
print(f"Task ID: {task.task_id}")
print(f"Name: {task.name}")
print(f"Status: {task.status}")
Get a Specific Task
# Get task by ID
task = warehouse.get_task(task_id="task-id-123")
print(f"Task status: {task.status}")
Wait for Task Completion
# Wait for a task to complete (blocks until done)
result = task.wait_for_completion()
print(f"Final status: {result['status']}")
# Check if successful
if result['status'] == 'completed':
print("Task completed successfully!")
else:
print(f"Task failed or was cancelled: {result.get('error', 'Unknown error')}")
Run Parsing Task
Parse or re-parse documents in the warehouse:
# Parse all documents
task = warehouse.run_parsing_task(mode="recreate-all")
result = task.wait_for_completion()
# Parse specific documents only
task = warehouse.run_parsing_task(
mode="recreate-all",
document_ids=["doc-id-1", "doc-id-2"]
)
Run Sync to Retrieval (Sync to Graph)
Synchronize data to the retrieval engine (Neo4j graph database):
# Sync all tables to retrieval
sync_task = warehouse.run_sync_to_retrieval()
result = sync_task.wait_for_completion()
print(f"Sync status: {result['status']}")
# Provide custom sync configuration
custom_sync_config = {...} # Custom sync steps
sync_task = warehouse.run_sync_to_retrieval(push_to_retrieval_json=custom_sync_config)
# Alternative: sync individual tables
chunks_table = warehouse.get_table("chunks")
sync_task = chunks_table.push_to_retrieval()
sync_task.wait_for_completion()
# Sync all tables with deployed versions
warehouse.sync_to_retrieval() # Blocking call that syncs all eligible tables
Extract document-level tags using LLM:
# Run document tags extraction on all documents
task = warehouse.run_document_tags_extraction(
table_id="doc-tags-table-id",
mode="recreate-all", # or "incremental"
llm_model="gpt-4o-mini"
)
result = task.wait_for_completion()
# Run on specific documents
task = warehouse.run_document_tags_extraction(
table_id="doc-tags-table-id",
mode="recreate-all",
llm_model="gpt-4o-mini",
document_ids=["doc-id-1", "doc-id-2"],
should_push_to_graph=True, # Automatically sync to retrieval after extraction
metadata_structure={"custom_field": "value"} # Optional metadata
)
result = task.wait_for_completion()
print(f"Extraction status: {result['status']}")