Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt

Use this file to discover all available pages before exploring further.

Tables are the fundamental data containers within your Clarifeye warehouse. They organize and structure your knowledge, making it searchable and retrievable for AI applications.

What is a Table?

Tables are the atomic data containers within your Clarifeye warehouse. They are used to store and manage your data.

Table Types

Tables are classified into multiple types: Built-in Tables:
  • Blocks Table - Contains processed document sections and content blocks
  • Chunks Table - Stores document chunks optimized for retrieval and embedding
  • Parsed Documents Table - Manages parsed document metadata and processing status
Object Tables: -
  • Object Tables - Store structured data extracted from your documents (e.g., “people”, “companies”, “events”)
  • Tag Tables - Any tags extracted from your documents (e.g., “financial”, “legal”, “marketing”)
  • Document Tags Table - Any tags extracted from your documents (e.g., “year”, “type of document”)
You can explore the different tables through the UI on the tables page.

Create a table

There are two ways to create a table:
  • As a standalone table
  • As a table linked to an extractor

Create a standalone table

Table can be created through the UI by clicking on the “Create Table” button on the tables page. Users can choose the type of table you want to create, if you select a table or object type you will need to include additional information such as the pydantic model of the objects or the tagging tree in the case of a tag table. You can also use the python client to perform this operation programmatically.
from typing import List
from pydantic import BaseModel, Field

class JobDescription(BaseModel):
    """
    Captures the essential details of a job description.
    """
    job_title: str = Field(
        ...,
        description="The title of the job as it appears in the text."
    )
    company_name: str = Field(
        ...,
        description="The name of the company offering the job as it appears in the text."
    )
    required_skills: List[str] = Field(
        ...,
        description="A list of skills required for the job as mentioned in the text."
    )
    responsibilities: List[str] = Field(
        ...,
        description="A list of the responsibilities or tasks that the candidate will be doing"
    )
    benefits: List[str] = Field(
        ...,
        description="A list of the benefits offered for the job as described in the text."
    )
Now let’s create the corresponding object table in our warehouse
job_descriptions_table = warehouse.create_objects_table(table_name="job_descriptions",
                                                        object_class=JobDescription)

Create a table linked to an extractor

When creating a visual object or tag extractor, an underlying table will automatically be created to store the results of the extraction. Those tables are created with the default name “extracted_objects” or “extracted_tags” depending on the type of extractor followed by the id of the extractor.

View the generated data model.

When new objects or tags tables are created, they are automatically indexed in the warehouse and relations between those are automatically inferred. You can check the data-model tab of the UI to visualize those.

What are Table Versions?

Each table has a versioning system that allows you to track changes and manage deployments. This is useful to experiment with new data structures or content without affecting your production setup or compare different versions of your data. For example, if you are working on a structured extraction to improve the quality of the data for your agents, you can create a new version of the table with the new data structure, test it with your agents and select the best version you want to keep. You can view the versions of a table on the table page and also list them with the api.
table = warehouse.get_table("my-table")

# List all versions of a table
versions = table.list_versions()

print(len(versions))
You can also create a new version of a table with the api.
table.create_version()
You can explore all the data operations in the data operations section.

What is a Deployed Version?

Clarifeye provides a single endpoint to retrieve your data. To do so a graph will be created storing all the data from your different tables. To give more control over the data that is available for retrieval you can deploy one and only one version of your tables. This concept is called a deployed version. Deploy version only applies to tables of types “object” and “tag”. The deployed version can be checked through the deployed property of each version and through the UI.
When you change the deployed version, you will need to resync the table to the retrieval.

Retrieval on tables

Once the deployed version is synced to retriever, you will be able to access the data from the table through the retrieval endpoint, either using the cypher query or natural language.