Documentation Index
Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt
Use this file to discover all available pages before exploring further.
Document Management
Manage documents within your warehouse including batch uploads and bulk operations.
Upload Documents
Upload multiple documents to the warehouse:
warehouse = client.get_warehouse("warehouse-id")
file_paths = [
"path/to/document1.pdf",
"path/to/document2.txt",
"path/to/document3.docx"
]
responses = warehouse.upload_documents(file_paths)
print(f"Uploaded {len(responses)} batches")
Parameters:
file_paths (List[str]): List of file paths to upload
skip_parsing (bool, optional): Skip document parsing. Default: False
batch_size (int, optional): Files per batch. Default: 10
Returns: List of batch upload responses
Options
# Upload without parsing
warehouse.upload_documents(file_paths, skip_parsing=True)
# Custom batch size
warehouse.upload_documents(file_paths, batch_size=5)
List Documents
Get all documents in the warehouse:
documents = warehouse.list_documents()
for doc in documents:
print(f"Document: {doc['name']} (ID: {doc['id']})")
Returns: List of document dictionaries
Document Properties
id - Unique document identifier
name - Original filename
size - File size in bytes
upload_date - When uploaded
status - Processing status
content_type - MIME type
Delete Documents
Delete multiple documents:
# Get document IDs to remove
documents = warehouse.list_documents()
doc_ids_to_remove = [doc['id'] for doc in documents[:5]]
# Remove the documents
result = warehouse.delete_documents(doc_ids_to_remove)
Parameters:
document_ids (List[str]): Document IDs to remove
Returns: Removal operation result
Complete Workflow Example
import os
# Get warehouse
warehouse = client.get_warehouse("warehouse-id")
# 1. Prepare files
documents_dir = "path/to/documents"
file_paths = [
os.path.join(documents_dir, f)
for f in os.listdir(documents_dir)
if f.endswith(('.pdf', '.txt', '.docx'))
]
# 2. Upload documents
upload_responses = warehouse.upload_documents(file_paths, batch_size=5)
print(f"Uploaded in {len(upload_responses)} batches")
# 3. List uploaded documents
all_docs = warehouse.list_documents()
print(f"Warehouse contains {len(all_docs)} documents")
Supported File Types
- PDF -
.pdf
- Text -
.txt, .md
- Microsoft Office -
.docx, .xlsx, .pptx
- Images -
.jpg, .png, .tiff (with OCR)
- Other formats - Depends on configuration
Integration with Tables
After uploading, documents are processed into the chunks table:
# Upload documents
warehouse.upload_documents(file_paths)
# Wait for processing, then check chunks
chunks_table = warehouse.get_table("chunks")
chunks_data = chunks_table.get_data(max_results=100)
print(f"Generated {len(chunks_data)} chunks")