Retrieve Chunks | Documentation

This page shows you how to conduct a smart search in a Catalog using a text prompt.

This API performs semantic search by embedding user queries with the a preset Indexing Embed pipeline that is used by the Process Files operation. The user query embedding is then compared to the embeddings of the chunks in the specified Catalog to find and return the most contextually similar chunks.

#Retrieve Chunks via API

The Retrieve Chunks API allows you to perform a semantic search within a Catalog by providing a text prompt. This operation returns the most contextually similar chunks based on the provided text.

cURL

Python

export INSTILL_API_TOKEN=********
curl -X POST 'HOST_URL/v1alpha/namespaces/NAMESPACE_ID/catalogs/CATALOG_ID/chunks/retrieve' \
--header "Authorization: Bearer $INSTILL_API_TOKEN" \
--header "Content-Type: application/json" \
--data-raw '{
  "textPrompt": "example text to search",
  "topK": 5
}'

Replace NAMESPACE_ID with the Catalog owner's ID (namespace), CATALOG_ID with the identifier of the Catalog you are searching.

#Body Parameters

textPrompt (string, required): The text prompt to search for in the Catalog.
topK (integer, optional): Specifies the number of similar chunks to return. Defaults to 5.

INFO

If you are using Instill Core as a managed service, set HOST_URL to https://private.instill-ai.com. If you are self-hosting Instill Core, use http://localhost:8080.

#Example Response

A successful response will return a list of similar chunks found in the Catalog:

{
  "similarChunks": [
    {
      "chunkUid": "ba30f524-889c-4dc7-82a2-33a8f7be2d47",
      "similarityScore": 0.95,
      "textContent": "Instill Core is a full-stack AI solution to accelerate AI development...",
      "sourceFile": "core-intro.txt"
    },
    {
      "chunkUid": "757ab6d9-e5b4-482e-8017-5582b578e57a",
      "similarityScore": 0.90,
      "textContent": "Transform unstructured data into a knowledge base with a unified format...",
      "sourceFile": "catalog-intro.pdf"
    }
  ]
}

#Output Description

similarChunks (array of objects): An array where each object represents a similar chunk found in the Catalog.
- chunkUid (string): The unique identifier of the chunk.
- similarityScore (number): The similarity score between the input text prompt and the chunk content. Scores range from 0 to 1, with higher scores indicating greater relevance.
- textContent (string): The content of the similar chunk.
- sourceFile (string): The name of the source file from which the chunk was extracted.

Notes:

Ensure that the Authorization header contains a valid API token with the Bearer prefix.
Adjust the topK parameter based on how many context chunks you want to retrieve for your search. If omitted, it defaults to 5.
The API performs semantic search using embeddings, so the results will be based on contextual similarity rather than exact keyword matches.

#Error Responses

401 Unauthorized: Returned when the client credentials are not valid. Ensure your API token is correct and has the necessary permissions.
default: An unexpected error response. The response will include an rpcStatus object with details about the error.