Document Management

Manage documents within your warehouse including batch uploads and bulk operations.

Upload Documents

Upload multiple documents to the warehouse:

warehouse = client.get_warehouse("warehouse-id")

file_paths = [
    "path/to/document1.pdf",
    "path/to/document2.txt",
    "path/to/document3.docx"
]

responses = warehouse.upload_documents(file_paths)
print(f"Uploaded {len(responses)} batches")

Parameters:

file_paths (List[str]): List of file paths to upload
skip_parsing (bool, optional): Skip document parsing. Default: False
batch_size (int, optional): Files per batch. Default: 10

Returns: List of batch upload responses

Options

# Upload without parsing
warehouse.upload_documents(file_paths, skip_parsing=True)

# Custom batch size
warehouse.upload_documents(file_paths, batch_size=5)

List Documents

Get all documents in the warehouse:

documents = warehouse.list_documents()

for doc in documents:
    print(f"Document: {doc['name']} (ID: {doc['id']})")

Returns: List of document dictionaries

Document Properties

id - Unique document identifier
name - Original filename
size - File size in bytes
upload_date - When uploaded
status - Processing status
content_type - MIME type

Delete Documents

Delete multiple documents:

# Get document IDs to remove
documents = warehouse.list_documents()
doc_ids_to_remove = [doc['id'] for doc in documents[:5]]

# Remove the documents
result = warehouse.delete_documents(doc_ids_to_remove)

Parameters:

document_ids (List[str]): Document IDs to remove

Returns: Removal operation result

Complete Workflow Example

import os

# Get warehouse
warehouse = client.get_warehouse("warehouse-id")

# 1. Prepare files
documents_dir = "path/to/documents"
file_paths = [
    os.path.join(documents_dir, f) 
    for f in os.listdir(documents_dir)
    if f.endswith(('.pdf', '.txt', '.docx'))
]

# 2. Upload documents
upload_responses = warehouse.upload_documents(file_paths, batch_size=5)
print(f"Uploaded in {len(upload_responses)} batches")

# 3. List uploaded documents
all_docs = warehouse.list_documents()
print(f"Warehouse contains {len(all_docs)} documents")

Supported File Types

PDF - .pdf
Text - .txt, .md
Microsoft Office - .docx, .xlsx, .pptx
Images - .jpg, .png, .tiff (with OCR)
Other formats - Depends on configuration

Integration with Tables

After uploading, documents are processed into the chunks table:

# Upload documents
warehouse.upload_documents(file_paths)

# Wait for processing, then check chunks
chunks_table = warehouse.get_table("chunks")
chunks_data = chunks_table.get_data(max_results=100)
print(f"Generated {len(chunks_data)} chunks")

Overview

Python Client

REST API

Document Management

Document Management

Upload Documents

Options

List Documents

Document Properties

Delete Documents

Complete Workflow Example

Supported File Types

Integration with Tables

Overview

Python Client

REST API

Documentation Index

​Document Management

​Upload Documents

​Options

​List Documents

​Document Properties

​Delete Documents

​Complete Workflow Example

​Supported File Types

​Integration with Tables

Document Management

Upload Documents

Options

List Documents

Document Properties

Delete Documents

Complete Workflow Example

Supported File Types

Integration with Tables