Enrichment Agent

Prev Next

Early Access Capability

The service enriches vector store entries, like Pinecone, with classification metadata by invoking an external classification service and persisting the results back to the vector store. It supports both scheduled enrichment and on-demand enrichment through an API.

This service operates as part of the PlainID Edge enrichment flow and prepares vector metadata for downstream authorization and governance use cases.


Configuration

Configuration is loaded from a YAML file and merged with default values. All configuration values support Environment Variables substitution.

Configuration File and Loading

  • Default path: config/config.yaml, or a path provided to the application at startup.

  • Environment Variables substitution:

    • Use ${VAR} for required variables.
    • Use ${VAR:default} for optional variables with a default value, for example ${LOG_LEVEL:info}.

The following top-level keys are used by the application: server, log, http, management, jwt, and databases.

The framework consumes server and log. All other sections are consumed by the enrichment agent application.


Parameters

Section Description
server.name Optional. Application name.
log.level Optional. Log level, for example info or debug.
http HTTP server and API configuration.
management Health and metrics server configuration.
jwt JWT validation configuration for the enrichment API.
databases List of vector databases and enrichment targets.

http Parameters

Parameter Type Required Default Description
port integer No 8080 API server port.
useMux boolean No Application default Enables use of the micro-infra HTTP multiplexer.
openApiSpecPath string No Application default Path to the OpenAPI specification used by the service.
enableXSSValidator boolean No Application default Enables XSS validation middleware.
xssWhitelistType string No Application default Defines the whitelist mode used by the XSS validator.
enableExternalMonitor boolean No Application default Enables the external monitoring endpoint.
externalMonitorPath string No Application default Path for the external monitoring endpoint.

All parameters, except for port, follow the micro-infra HttpConfig. Default values may also be applied by the application at runtime.


Management Parameters

Parameter Type Required Default Description
port integer No 8081 Port used for management endpoints, such as health and metrics.
prefix string No /health Path prefix for readiness, liveness, and metrics endpoints (for example /health/readiness or /health/metrics).

JWT Parameters

Parameter Type Required Default Description
enabled boolean No false Enables or disables JWT validation for the enrichment API.
jwksUrl string Required when enabled=true — URL of the JWKS endpoint used to validate JWT tokens.

Databases Parameters

Each entry in the databases array defines a single enrichment target.

Parameter Type Required Default Description
id string Yes — Unique database identifier used by the scheduler and API.
type string Yes — Vendor type, for example PINECONE.
periodicStart string No — Cron expression for scheduled enrichment. An empty value disables scheduling.
classificationService object Yes — Classification service used to assign categories to vectors.
metadataKey string No category Metadata key where the category is written in the vector store.
vendor object Yes — Vendor-specific connection and filtering configuration.

Classification Service Parameters

Parameter Type Required Default Description
serviceUrl string Yes — Base URL of the HTTP classification service.

Pinecone Vendor Configuration

pinecone Object

Parameter Type Required Default Description
apiKey string Yes — Pinecone API key used to access the vector database.

collections Object

Controls which namespaces are processed. Matching is applied to the string indexName_namespaceName (for example my-index_users).

Parameter Type Required Default Description
mode string No — Namespace selection mode. Supported values: include or exclude.
patterns array of strings No — Regular expression patterns used to match namespaces.

With include, only namespaces matching at least one pattern are processed.
With exclude, all namespaces except those matching are processed.

Filter Behavior

If no patterns are defined:

  • include processes all namespaces.
  • exclude processes no namespaces.

Patterns are compiled and evaluated as regular expressions.


Configuration Examples

Minimal Pinecone Configuration

Below is an example configuration with a single database, no schedule, and no namespace filtering:

server:
  name: enrichment-agent
log:
  level: info
jwt:
  jwksUrl: ${JWKS_URL:}
  enabled: false
databases:
  - id: pineconeDb
    type: PINECONE
    classificationService:
      serviceUrl: http://localhost:8000/classify
    metadataKey: category
    vendor:
      pinecone:
        apiKey: ${PINECONE_API_KEY}

Full Pinecone Configuration

Below is an example with scheduled enrichment, namespace filtering, JWT enabling, and Environment Variables substitution.

Key elements include:

  • periodicStart: "0 * 20 * * ?" - Runs enrichment every day at 20:00 using a six-field Cron format with seconds.
  • collections.mode and collections.patterns control namespace inclusion or exclusion.
  • Environment Variables such as CLASSIFICATION_SERVICE_URL, PINECONE_API_KEY, JWKS_URL, JWT_VALIDATION_ENABLED, and LOG_LEVEL.
server:
  name: enrichment-agent
log:
  level: ${LOG_LEVEL:info}
jwt:
  jwksUrl: ${JWKS_URL:}
  enabled: ${JWT_VALIDATION_ENABLED:true}
databases:
  - id: pineconeDb
    type: PINECONE
    periodicStart: "0 * 20 * * ?"
    classificationService:
      serviceUrl: ${CLASSIFICATION_SERVICE_URL}
    metadataKey: category
    vendor:
      pinecone:
        apiKey: ${PINECONE_API_KEY}
      collections:
        mode: exclude
        patterns:
          - users
          - books_.*

Multiple Pinecone Databases

Multiple enrichment targets can be defined with different identifiers, API keys, classification services, filters, or metadata keys.

Example:

databases:
  - id: pineconeProd
    type: PINECONE
    periodicStart: "0 0 2 * * ?"
    classificationService:
      serviceUrl: https://classifier.example.com/classify
    metadataKey: category
    vendor:
      pinecone:
        apiKey: ${PINECONE_PROD_API_KEY}
      collections:
        mode: include
        patterns:
          - "index1_.*"
  - id: pineconeStaging
    type: PINECONE
    classificationService:
      serviceUrl: http://localhost:8000/classify
    metadataKey: category
    vendor:
      pinecone:
        apiKey: ${PINECONE_STAGING_API_KEY}

Cron Scheduling (periodicStart)

The periodicStart field uses a six-field Cron format, including seconds:

second minute hour day-of-month month day-of-week

Example:

"0 * 20 * * ?" runs every day at 20:00:00.

Leave this value empty to disable scheduled enrichment for a specific database.


Common Environment Variables

The following Environment Variables are commonly used.

Variable Description
LOG_LEVEL Log level, for example info or debug.
JWKS_URL JWKS URL for JWT validation. Required when JWT is enabled.
JWT_VALIDATION_ENABLED Enables or disables JWT validation.
CLASSIFICATION_SERVICE_URL Base URL of the classification HTTP service.
PINECONE_API_KEY Pinecone API key.

Environment Variables can be defined in the Runtime environment, for example via Helm env or secret, or referenced directly in the configuration file using the ${VAR:default} syntax.


© 2025 PlainID LTD. All rights reserved.