The service enriches vector store entries, like Pinecone, with classification metadata by invoking an external classification service and persisting the results back to the vector store. It supports both scheduled enrichment and on-demand enrichment through an API.
This service operates as part of the PlainID Edge enrichment flow and prepares vector metadata for downstream authorization and governance use cases.
Configuration
Configuration is loaded from a YAML file and merged with default values. All configuration values support Environment Variables substitution.
Configuration File and Loading
-
Default path:
config/config.yaml, or a path provided to the application at startup. -
Environment Variables substitution:
- Use
${VAR}for required variables. - Use
${VAR:default}for optional variables with a default value, for example${LOG_LEVEL:info}.
- Use
The following top-level keys are used by the application: server, log, http, management, jwt, and databases.
The framework consumes server and log. All other sections are consumed by the enrichment agent application.
Parameters
| Section | Description |
|---|---|
server.name |
Optional. Application name. |
log.level |
Optional. Log level, for example info or debug. |
http |
HTTP server and API configuration. |
management |
Health and metrics server configuration. |
jwt |
JWT validation configuration for the enrichment API. |
databases |
List of vector databases and enrichment targets. |
http Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
port |
integer | No | 8080 |
API server port. |
useMux |
boolean | No | Application default | Enables use of the micro-infra HTTP multiplexer. |
openApiSpecPath |
string | No | Application default | Path to the OpenAPI specification used by the service. |
enableXSSValidator |
boolean | No | Application default | Enables XSS validation middleware. |
xssWhitelistType |
string | No | Application default | Defines the whitelist mode used by the XSS validator. |
enableExternalMonitor |
boolean | No | Application default | Enables the external monitoring endpoint. |
externalMonitorPath |
string | No | Application default | Path for the external monitoring endpoint. |
All parameters, except for port, follow the micro-infra HttpConfig. Default values may also be applied by the application at runtime.
Management Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
port |
integer | No | 8081 |
Port used for management endpoints, such as health and metrics. |
prefix |
string | No | /health |
Path prefix for readiness, liveness, and metrics endpoints (for example /health/readiness or /health/metrics). |
JWT Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
enabled |
boolean | No | false |
Enables or disables JWT validation for the enrichment API. |
jwksUrl |
string | Required when enabled=true |
— | URL of the JWKS endpoint used to validate JWT tokens. |
Databases Parameters
Each entry in the databases array defines a single enrichment target.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
id |
string | Yes | — | Unique database identifier used by the scheduler and API. |
type |
string | Yes | — | Vendor type, for example PINECONE. |
periodicStart |
string | No | — | Cron expression for scheduled enrichment. An empty value disables scheduling. |
classificationService |
object | Yes | — | Classification service used to assign categories to vectors. |
metadataKey |
string | No | category |
Metadata key where the category is written in the vector store. |
vendor |
object | Yes | — | Vendor-specific connection and filtering configuration. |
Classification Service Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
serviceUrl |
string | Yes | — | Base URL of the HTTP classification service. |
Pinecone Vendor Configuration
pinecone Object
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
apiKey |
string | Yes | — | Pinecone API key used to access the vector database. |
collections Object
Controls which namespaces are processed. Matching is applied to the string indexName_namespaceName (for example my-index_users).
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
mode |
string | No | — | Namespace selection mode. Supported values: include or exclude. |
patterns |
array of strings | No | — | Regular expression patterns used to match namespaces. |
With include, only namespaces matching at least one pattern are processed.
With exclude, all namespaces except those matching are processed.
Filter Behavior
If no patterns are defined:
includeprocesses all namespaces.excludeprocesses no namespaces.
Patterns are compiled and evaluated as regular expressions.
Configuration Examples
Minimal Pinecone Configuration
Below is an example configuration with a single database, no schedule, and no namespace filtering:
server:
name: enrichment-agent
log:
level: info
jwt:
jwksUrl: ${JWKS_URL:}
enabled: false
databases:
- id: pineconeDb
type: PINECONE
classificationService:
serviceUrl: http://localhost:8000/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_API_KEY}
Full Pinecone Configuration
Below is an example with scheduled enrichment, namespace filtering, JWT enabling, and Environment Variables substitution.
Key elements include:
periodicStart: "0 * 20 * * ?"- Runs enrichment every day at 20:00 using a six-field Cron format with seconds.collections.modeandcollections.patternscontrol namespace inclusion or exclusion.- Environment Variables such as
CLASSIFICATION_SERVICE_URL,PINECONE_API_KEY,JWKS_URL,JWT_VALIDATION_ENABLED, andLOG_LEVEL.
server:
name: enrichment-agent
log:
level: ${LOG_LEVEL:info}
jwt:
jwksUrl: ${JWKS_URL:}
enabled: ${JWT_VALIDATION_ENABLED:true}
databases:
- id: pineconeDb
type: PINECONE
periodicStart: "0 * 20 * * ?"
classificationService:
serviceUrl: ${CLASSIFICATION_SERVICE_URL}
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_API_KEY}
collections:
mode: exclude
patterns:
- users
- books_.*
Multiple Pinecone Databases
Multiple enrichment targets can be defined with different identifiers, API keys, classification services, filters, or metadata keys.
Example:
databases:
- id: pineconeProd
type: PINECONE
periodicStart: "0 0 2 * * ?"
classificationService:
serviceUrl: https://classifier.example.com/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_PROD_API_KEY}
collections:
mode: include
patterns:
- "index1_.*"
- id: pineconeStaging
type: PINECONE
classificationService:
serviceUrl: http://localhost:8000/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_STAGING_API_KEY}
Cron Scheduling (periodicStart)
The periodicStart field uses a six-field Cron format, including seconds:
second minute hour day-of-month month day-of-week
Example:
"0 * 20 * * ?" runs every day at 20:00:00.
Leave this value empty to disable scheduled enrichment for a specific database.
Common Environment Variables
The following Environment Variables are commonly used.
| Variable | Description |
|---|---|
LOG_LEVEL |
Log level, for example info or debug. |
JWKS_URL |
JWKS URL for JWT validation. Required when JWT is enabled. |
JWT_VALIDATION_ENABLED |
Enables or disables JWT validation. |
CLASSIFICATION_SERVICE_URL |
Base URL of the classification HTTP service. |
PINECONE_API_KEY |
Pinecone API key. |
Environment Variables can be defined in the Runtime environment, for example via Helm env or secret, or referenced directly in the configuration file using the ${VAR:default} syntax.