Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt

Use this file to discover all available pages before exploring further.

Document Management

Manage documents within your warehouse including batch uploads and bulk operations.

Upload Documents

Upload multiple documents to the warehouse:
warehouse = client.get_warehouse("warehouse-id")

file_paths = [
    "path/to/document1.pdf",
    "path/to/document2.txt",
    "path/to/document3.docx"
]

responses = warehouse.upload_documents(file_paths)
print(f"Uploaded {len(responses)} batches")
Parameters:
  • file_paths (List[str]): List of file paths to upload
  • skip_parsing (bool, optional): Skip document parsing. Default: False
  • batch_size (int, optional): Files per batch. Default: 10
Returns: List of batch upload responses

Options

# Upload without parsing
warehouse.upload_documents(file_paths, skip_parsing=True)

# Custom batch size
warehouse.upload_documents(file_paths, batch_size=5)

List Documents

Get all documents in the warehouse:
documents = warehouse.list_documents()

for doc in documents:
    print(f"Document: {doc['name']} (ID: {doc['id']})")
Returns: List of document dictionaries

Document Properties

  • id - Unique document identifier
  • name - Original filename
  • size - File size in bytes
  • upload_date - When uploaded
  • status - Processing status
  • content_type - MIME type

Delete Documents

Delete multiple documents:
# Get document IDs to remove
documents = warehouse.list_documents()
doc_ids_to_remove = [doc['id'] for doc in documents[:5]]

# Remove the documents
result = warehouse.delete_documents(doc_ids_to_remove)
Parameters:
  • document_ids (List[str]): Document IDs to remove
Returns: Removal operation result

Complete Workflow Example

import os

# Get warehouse
warehouse = client.get_warehouse("warehouse-id")

# 1. Prepare files
documents_dir = "path/to/documents"
file_paths = [
    os.path.join(documents_dir, f) 
    for f in os.listdir(documents_dir)
    if f.endswith(('.pdf', '.txt', '.docx'))
]

# 2. Upload documents
upload_responses = warehouse.upload_documents(file_paths, batch_size=5)
print(f"Uploaded in {len(upload_responses)} batches")

# 3. List uploaded documents
all_docs = warehouse.list_documents()
print(f"Warehouse contains {len(all_docs)} documents")

Supported File Types

  • PDF - .pdf
  • Text - .txt, .md
  • Microsoft Office - .docx, .xlsx, .pptx
  • Images - .jpg, .png, .tiff (with OCR)
  • Other formats - Depends on configuration

Integration with Tables

After uploading, documents are processed into the chunks table:
# Upload documents
warehouse.upload_documents(file_paths)

# Wait for processing, then check chunks
chunks_table = warehouse.get_table("chunks")
chunks_data = chunks_table.get_data(max_results=100)
print(f"Generated {len(chunks_data)} chunks")