Discovery Agent

Prev Next

The service is responsible for discovery processes, including connecting to vector databases, identifying relevant metadata filters for authorization, and discovering MCP services, tools, and other connectors. It also ensures secure communication with the PlainID Cloud platform to deliver discovered Assets.

Connectors

Pinecone

The purpose of the connector is to discover for a pinecone instance all indexes and namespace all metadata keys of documents with configurable filters. Vectors are processed in batches to minimize memory footprint, making it suitable for large datasets with millions of vectors

The connector provides:

  • Automatic discovery of all indexes and namespaces
  • Metadata key extraction and type inference (string/number/bool/array_string)
  • Regex-based filtering of namespaces and metadata keys
  • Batch processing: Processes vectors in batches (1000 at a time) to optimize memory usage

Configuration

Values can use Environment Variable substitutions (like ${LOG_LEVEL:info}). Some settings (e.g. http, management) are defined by the micro-infra framework and may have additional defaults.

High-level Parameters

Parameter Type Required Default Description
log.level string no info Log level (e.g. debug, info, warn, error). Can be set via LOG_LEVEL env.
plainIdUrl string yes* Base URL of PlainID API (used to obtain JWT). Required for sources that send data to the orchestrator.
plainIdDiscoveryUrl string yes* Base URL of the orchestrator (pinecone-orch-connector or mcp-orch-connector). Required when using Pinecone or PlainIdMcpGateway.
discoverySources array yes List of discovery source entries.
http object no (micro-infra) HTTP server. Code defaults: port = 8080, jwt.jwtIgnoreVerification = true. Other fields from plainttp.
management object no (micro-infra) Monitoring/health. Schema from monitor.MonitorManagerConfig.

Discovery Source Parameters

Parameter Type Required Description
popId string yes Unique identifier for the source (Point of Presence).
environmentId string (UUID) yes PlainID environment UUID; required for sending to the orchestrator.
type string yes Source type: Pinecone or PlainIdMcpGateway.
periodicStart string (cron) no Cron expression (6 fields, with seconds). If empty, no periodic discovery is scheduled.
vendor object yes Type-specific config (see below).
plainIdCredentials object yes clientId, clientSecret for JWT.
metadataKeys object no Metadata key filter: mode (include/exclude), patterns (regex). Used by Pinecone.
availabilityThreshold float no Availability threshold (0–1). Default: 0.1. Pinecone only.

Pinecone Parameters

Parameter Type Required Default Description
vendor.pinecone.apiKey string yes Pinecone API key.
vendor.pinecone.sampleLimit uint no 0 (no limit) Max number of vectors to analyze per namespace. 0 = no limit.
vendor.collections object no Namespace filter: mode (include/exclude), patterns (regex).

FilterConfig (collections / metadataKeys) Parameters

Parameter Type Description
mode string include or exclude.
patterns []string List of regex patterns.

PlainID MCP Gateway Parameters

Parameter Type Required Description
vendor.plainIdMcpGateway.url string yes URL of pid-mcp service (e.g. http://pid-mcp:5235).

plainIdCredentials Parameters

Parameter Type Required Description
clientId string yes (for orchestrator sources) PlainID API client ID.
clientSecret string yes (for orchestrator sources) PlainID API client secret.

Configuration examples

Example: Pinecone only

log:
  level: info

plainIdUrl: "https://api.dev8.plainid.cloud"
plainIdDiscoveryUrl: "https://api.app.dev8.plainid.cloud"

discoverySources:
  - popId: POP456
    environmentId: "550e8400-e29b-41d4-a716-446655440000"
    type: Pinecone
    periodicStart: "0 0 1 * * ?"
    vendor:
      pinecone:
        apiKey: "your-pinecone-api-key"
        sampleLimit: 50000   # optional; 0 = no limit
      collections:
        mode: exclude
        patterns:
          - "users"
          - "books_.*"
    plainIdCredentials:
      clientId: "your-client-id"
      clientSecret: "your-client-secret"
    metadataKeys:
      mode: exclude
      patterns:
        - "createdAt"
        - "timestamp.*"
    availabilityThreshold: 0.1

Example: PlainID MCP Gateway only

log:
  level: info

plainIdUrl: "https://api.dev8.plainid.cloud"
plainIdDiscoveryUrl: "https://api.app.dev8.plainid.cloud"

discoverySources:
  - popId: POP789
    environmentId: "550e8400-e29b-41d4-a716-446655440000"
    type: PlainIdMcpGateway
    periodicStart: "0 0 1 * * ?"
    vendor:
      plainIdMcpGateway:
        url: "http://pid-mcp:5235"
    plainIdCredentials:
      clientId: "your-client-id"
      clientSecret: "your-client-secret"

Example: Pinecone and PlainID MCP Gateway in one configuration

A single process can run both source types. There is one top-level plainIdDiscoveryUrl. If you need different base URLs per source, run separate instances with different configs.

log:
  level: info

plainIdUrl: "https://api.dev8.plainid.cloud"
plainIdDiscoveryUrl: "https://api.app.dev8.plainid.cloud"

discoverySources:
  - popId: Pinecone-1
    environmentId: "550e8400-e29b-41d4-a716-446655440000"
    type: Pinecone
    periodicStart: "0 0 1 * * ?"
    vendor:
      pinecone:
        apiKey: "your-pinecone-api-key"
      collections:
        mode: include
        patterns: ["prod_.*"]
    plainIdCredentials:
      clientId: "your-client-id"
      clientSecret: "your-client-secret"
    metadataKeys:
      mode: exclude
      patterns: 
        - "createdAt"
    availabilityThreshold: 0.1

  - popId: MCP-1
    environmentId: "550e8400-e29b-41d4-a716-446655440000"
    type: PlainIdMcpGateway
    periodicStart: "0 0 2 * * ?"
    vendor:
      plainIdMcpGateway:
        url: "http://pid-mcp:5235"
    plainIdCredentials:
      clientId: "your-client-id"
      clientSecret: "your-client-secret"