The service enriches vector store entries, such as Pinecone, with classification metadata by invoking an external classification service and persisting the results back to the vector store. It supports both scheduled enrichment and on-demand enrichment through an API.
This service operates as part of the PlainID Edge enrichment flow and prepares vector metadata for downstream authorization and governance use cases.
Configuration
Configuration is loaded from a YAML file and merged with default values. All configuration values support Environment Variables substitution.
Configuration File and Loading
-
Default path:
config/config.yaml, or a path provided to the application at startup. -
Environment Variables substitution:
- Use
${VAR}for required variables. - Use
${VAR:default}for optional variables with a default value, for example${LOG_LEVEL:info}.
- Use
The following top-level keys are used by the application: server, log, http, management, jwt, and databases.
The framework consumes server and log. All other sections are consumed by the enrichment agent application.
Parameters
| Section | Description |
|---|---|
server.name |
Optional. Application name. |
log.level |
Optional. Log level, for example info or debug. |
http |
HTTP server and API configuration. |
management |
Health and metrics server configuration. |
jwt |
JWT validation configuration for the enrichment API. |
databases |
List of vector databases and enrichment targets. |
http Parameters
port(integer, default8080). API server port.
Other fields such as useMux, openApiSpecPath, enableXSSValidator, xssWhitelistType, enableExternalMonitor, and externalMonitorPath follow the micro-infra HttpConfig. The application may apply default values in code.
Management Parameters
port(integer, default8081). Port for health and metrics endpoints.prefix(string, default/health). Path prefix for readiness, liveness, and metrics endpoints, for example/health/readinessor/health/metrics.
JWT Parameters
jwksUrl(string). URL of the JWKS endpoint. Required whenenabledis set totrue.enabled(boolean). Enables or disables JWT validation for the enrichment API.
Databases Parameters
Each entry in the databases array defines a single enrichment target.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Unique database identifier used by the scheduler and API. |
type |
string | Yes | Vendor type, for example PINECONE. |
periodicStart |
string | No | Cron expression for scheduled enrichment. An empty value disables scheduling. |
classificationService |
object | Yes | Classification service used to assign categories to vectors. |
metadataKey |
string | No | Metadata key where the category is written in the vector store. Default is category. |
vendor |
object | Yes | Vendor-specific connection and filtering configuration. |
Classification Service Parameters
serviceUrl(string, required). Base URL of the HTTP classification service.
Pinecone Vendor Configuration
-
pinecone(object, required):apiKey(string, required). Pinecone API key.
-
collections(object, optional). Controls which namespaces are processed. Matching is applied to the stringindexName_namespaceName, for examplemy-index_users.mode(string). One ofincludeorexclude.patterns(array of strings). Regular expression patterns.
With include, only namespaces matching at least one pattern are processed. With exclude, all namespaces except those matching are processed.
Filter Behavior
If no patterns are defined:
includeprocesses all namespaces.excludeprocesses no namespaces.
Patterns are compiled and evaluated as regular expressions.
Configuration Examples
Minimal Pinecone Configuration
Below is an example configuration with a single database, no schedule, and no namespace filtering:
server:
name: enrichment-agent
log:
level: info
jwt:
jwksUrl: ${JWKS_URL:}
enabled: false
databases:
- id: pineconeDb
type: PINECONE
classificationService:
serviceUrl: http://localhost:8000/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_API_KEY}
Full Pinecone Configuration
Below is an example with scheduled enrichment, namespace filtering, JWT enabled, and Environment Variables substitution.
Key elements include:
periodicStart: "0 * 20 * * ?"runs enrichment every day at 20:00 using a six-field cron format with seconds.collections.modeandcollections.patternscontrol namespace inclusion or exclusion.- Environment Variables such as
CLASSIFICATION_SERVICE_URL,PINECONE_API_KEY,JWKS_URL,JWT_VALIDATION_ENABLED, andLOG_LEVEL.
server:
name: enrichment-agent
log:
level: ${LOG_LEVEL:info}
jwt:
jwksUrl: ${JWKS_URL:}
enabled: ${JWT_VALIDATION_ENABLED:true}
databases:
- id: pineconeDb
type: PINECONE
periodicStart: "0 * 20 * * ?"
classificationService:
serviceUrl: ${CLASSIFICATION_SERVICE_URL}
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_API_KEY}
collections:
mode: exclude
patterns:
- users
- books_.*
Multiple Pinecone Databases
Multiple enrichment targets can be defined with different identifiers, API keys, classification services, filters, or metadata keys.
Example:
databases:
- id: pineconeProd
type: PINECONE
periodicStart: "0 0 2 * * ?"
classificationService:
serviceUrl: https://classifier.example.com/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_PROD_API_KEY}
collections:
mode: include
patterns:
- "index1_.*"
- id: pineconeStaging
type: PINECONE
classificationService:
serviceUrl: http://localhost:8000/classify
metadataKey: category
vendor:
pinecone:
apiKey: ${PINECONE_STAGING_API_KEY}
Cron Scheduling (periodicStart)
The periodicStart field uses a six-field cron format, including seconds:
second minute hour day-of-month month day-of-week
Example:
"0 * 20 * * ?" runs every day at 20:00:00.
Leave this value empty to disable scheduled enrichment for a specific database.
Common Environment Variables
The following Environment Variables are commonly used.
| Variable | Description |
|---|---|
LOG_LEVEL |
Log level, for example info or debug. |
JWKS_URL |
JWKS URL for JWT validation. Required when JWT is enabled. |
JWT_VALIDATION_ENABLED |
Enables or disables JWT validation. |
CLASSIFICATION_SERVICE_URL |
Base URL of the classification HTTP service. |
PINECONE_API_KEY |
Pinecone API key. |
Environment Variables can be defined in the runtime environment, for example via Helm env or secret, or referenced directly in the configuration file using the ${VAR:default} syntax.