Documentation Index
Fetch the complete documentation index at: https://docs.clarifeye.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Retrieval operations allow you to search, query, and access the processed data stored in your Clarifeye warehouses. Clarifeye provides multiple retrieval methods:- Semantic search for finding relevant objects or chunks
- Full-text search for finding relevant objects or chunks
- Hybrid search, combining semantic and full-text searches, for finding relevant objects or chunks
- Query based retrieval
- Direct retrieval from the tables
Retrieval Methods
Semantic Search
You can use semantic search to find chunks or extracted objects that are close to the query.Semantic search is only available for objects and chunks that have been pushed to retrieval : push_to_retrieval
score field of the retrieved objects.
The results will contain a reference url directing users to the Clarifeye viewer where they can explore the source of the retrieved objects.
Full Text Search
You can also use full text search to retrieve data from the warehouse.Full text search is only available for objects and chunks that have been pushed to retrieval : push_to_retrieval
score field of the retrieved objects. The results will contain a reference url directing users to the Clarifeye viewer where they can explore the source of the retrieved objects.
Hybrid Search
You can also use hybrid search to retrieve data from the warehouse. This will combine the semantic search and the full text search.Full text search is only available for objects and chunks that have been pushed to retrieval : push_to_retrieval
score: semantic similarity score(cosine similarity)fulltext_score: full text score(BM25)rrf_score: combined score(RRF). Used to sort the final result
Filtering with tags
You can leverage tags filter to restrict the search to a specific tag. In that case:- only the chunks that have been tagged with the given tag will be retrieved.
- only the objects that belong to a chunk that has been tagged with the given tag will be retrieved.
excluded_tags parameter to exclude chunks or objects that have been tagged with the given tag.
Generate the answer
You can generate answers based on the data returned by the different retrieval methods. The first option is to leverage the functions generate_answer_with_semantic_search, generate_answer_with_full_text_search, generate_answer_with_hybrid_search.Additional options
For each of the provided retrieval methods, you can also use some of the following advanced options:enrich_with_chunks
Whether or not to include the source chunk of an object, it will be available under the source_chunk_content key of the results. Enable this by passingenrich_with_chunks=True.
rerank
You can improve the relevancy and quality of retrieved results by passingrerank=True to any of the three search methods. Reranking uses Cohere’s rerank-v3.5 model.
When enabled, the result will contain relevance_score, ranging 0(low relevancy) to 1(high relevancy).
Reranking typically adds the most value when the query:
- is long and complex
- includes ambiguous or polysemous terms (e.g. “AI jobs involving data labeling”)
- has multiple intents or facets (e.g. “Remote React jobs with equity at seed-stage companies”)
- needs matching with implicit semantics (e.g. “What jobs suit someone with NLP research experience?”)
- contains negative or comparative conditions (e.g. “Jobs that are not junior-level”)
reformulate_query
Clarifeye leverages HyDE(Hypothetical Document Embeddings), a technique to generate hypothetical answer to your query which will be used to retrieve objects. When enabled, your query will be first sent to OpenAI’sgpt-4o-mini endpoint to generate hypotherical answers.
You can enable this by passing reformulate_query=True.
Typical queries that benefit the most from using HyDE are:
- open-domain questions (e.g. “Which jobs involve working with distributed systems?”)
- long-tail or niche queries (e.g. “Roles where you need experience with WebAssembly”)
- underspecified queries (e.g. “Jobs where I can work on core systems”)
- queries containing jargons (e.g. “Positions that follow ISO 27001 compliance”)
Structured Queries
You can also use cypher queries to retrieve data from the warehouse.Cypher queries are only available for objects and chunks that have been pushed to retrieval : push_to_retrieval