Configuration | Documentation

Instill Core automatically loads a .env file that contains key/value pairs defining required environment variables. You can customize the file based on your configuration.

Besides, Instill Core uses Koanf library for configuration. It supports loading configuration from multiple sources and makes it available to the service. To override the default configuration, you can set the corresponding environment variables which are passed to the Docker Compose file. All configuration environment variables for each service are prefixed with CFG_.

#Configure Instill Core Services

Read the default configuration files for a full overview of all supported configuration options of each service:

api-gateway(localhost:8080): a service to handle API requests and responses
pipeline-backend(localhost:8081): a service to build and manage unstructured data pipelines
artifact-backend(localhost:8082): a service for managing all stateful resources
model-backend(localhost:8083): a service to import and serve ML models
mgmt-backend(localhost:8084): a service to handle user management, token management, and metrics
console(localhost:3000): a web-based UI app to provide a unified, clean, and intuitive user experience of Instill Core

#Configure the Console

To access Instill Core Console, set the host by overriding the environment variables:

.env

INSTILL_CORE_HOST={your hostname to access Instill Core}

By default it is set to localhost.

#Configure the observability stack

Observability is critical for distributed microservice architecture. Through OpenTelemetry, we can generate, collect and export metrics, logs and traces to help analyze the performance and behavior of Instill Core services.

The observability stack is disabled by default. You can enable the stack setting OBSERVE_ENABLED=true in the .env file. The following telemetry tools are supported now:

Jaeger (localhost:16686): OpenTelemetry allows us to export spans to Jaeger. Use Jaeger when you want to debug the complete flow of a request through the Instill Core services.
InfluxDB (localhost:8086, username:admin, password:password): detailed metrics are sent to InfluxDB for monitoring, and are imported into the Grafana dashboard
Grafana (localhost:3001, username:admin, password:admin): the Grafana dashboard visualizes the metrics to monitor the performance and anomalies of Instill Core services
Prometheus (localhost:9090): Instill Core exports metrics to Prometheus.
Ray Dashboard (localhost:8265): Ray Dashboard for model serving observability in Instill Core.

#Configure the Embedding Feature

To enable the embedding feature in Artifact, you must configure your deployment with a valid OpenAI Secret Key. This key is necessary for the Process Files API, which uses embedding models to encode text data. For now, the OpenAI API is the only supported embedding option, but in the future we plan to offer additional options, including local embedding solutions.

Artifact uses a pipeline that connects to OpenAI to carry out embedding tasks. Therefore, the secret key is configured through the CFG_COMPONENT_SECRETS_OPENAI_APIKEY environment variable in that service.

#On Docker Compose

Open the .env.component File:
- Locate and open the .env.component file in your project directory.
Add the OpenAI Secret Key:
- Insert the following line into the .env.component file, replacing sk-XXX with your actual OpenAI secret key:
.env.component
# Set the OpenAI secret key for embedding feature CFG_COMPONENT_SECRETS_OPENAI_APIKEY='sk-XXX'
Restart Instill Core:
- After setting the environment variable, restart the instill-core service to apply the changes.

#On Kubernetes

Upgrade you Helm installation with the CFG_COMPONENT_SECRETS_OPENAI_APIKEY, replacing sk-XXX with your actual OpenAI secret key:

helm install core instill-ai/core --devel --namespace instill-ai --create-namespace \
  --set tags.prometheusStack=true \
  --set 'pipelineBackend.extraEnv[0].name=CFG_COMPONENT_SECRETS_OPENAI_APIKEY' \
  --set 'pipelineBackend.extraEnv[0].value='sk-XXX'

#Anonymized Usage Collection

To help us better understand how Instill Core is used and can be improved, Instill Core collects and reports anonymize usage statistics.

#What Data is Collected

INFO

We value your privacy. So, we went for the anonymous data and selected a set of details to share from your Instill Core instance that would give us insights about how to improve Instill Core without being invasive.

When a new Instill Core is running, the usage client in services including pipeline-backend, model-backend and mgmt-backend will ask for a new session, respectively. Our usage server returns a token used for future reporting. For each session, we collect Session data including some basic information about the service and the system details the service is running on:

name of the service to collect data from, e.g., SERVICE_PIPELINE for pipeline-backend
edition of the service to identify the deployment, e.g., local-ce for local community edition deployment
version of the service, e.g., 0.5.0-alpha
architecture of the system the service is running on, e.g., amd64
operating system the service is running on, e.g., Linux
uptime in seconds to identify the rough life span of the service

Each session is assigned a random UUID for tracking and identification. Then, each session will collect and send its own SessionReport data every 10 minutes:

MgmtUsageData reports data for mgmt-backend session
- UUID of the onboarded User
  - a list of user metadata
PipelineUsageData reports data for pipeline-backend session of the onboarded User
- UUID of the onboarded User
- a list of pipeline trigger metadata
ModelUsageData reports data for model-backend session of the onboarded User
- UUID of the onboarded User
- a list of model trigger metadata

You can check the full usage data structs in protobufs. These data do not allow us to track Personal Data but enable us to measure session counts and usage statistics.

#Implementation

The anonymous usage report client library is in usage-client. To limit risk exposure, we keep the usage server implementation private for now. In summary, the Session data and SessionReport sent from each session get updated in the usage server.

Additionally, The frontend Console sends event data to Amplitude.

#Opting out

Instill Core usage collection helps the entire community. We'd appreciate it if you can leave it on. However, if you want to opt out, you can disable it by overriding the .env file in Core:

.env

USAGE_ENABLED=false

This will disable the Instill Core usage collection for the entire project.

#Acknowledgements

Our anonymized usage collection was inspired by KrakenD's How we built our telemetry service and would love to acknowledge that their design has helped us to bootstrap our usage collection project.