The RAG Enricher and Classifier is a core component of PlainID Edge, designed to enable fine-grained authorization for Retrieval-Augmented Generation pipelines without exposing or transferring customer data.
It transforms existing vector database metadata and embeddings into Policy-ready authorization Building Blocks that can be consumed by PlainID Cloud to control which data an agent may retrieve at Runtime.
The RAG Enricher operates entirely within the customer environment and works exclusively on metadata, structure, and vector embeddings. It does not access, reconstruct, or transmit raw documents or original content.
Design Principles and Security Model
The RAG Enricher is built on strict data-handling guarantees:
- Raw customer data is never accessed or transmitted.
- Documents and original content are not sent to PlainID Cloud.
- Only metadata and derived authorization Attributes are published.
- Vector embeddings are processed locally and are never reconstructed into readable content.
Core Functionalities
Metadata-First Authorization Modeling
The primary and preferred flow relies on existing metadata and filters already present in the customer vector database.
PlainID Edge connects to supported vector databases and:
- Retrieves available metadata fields and filters.
- Normalizes them into a consistent authorization schema.
- Publishes them to PlainID Cloud as ready-to-use authorization Building Blocks.
This approach enables immediate Policy authoring without requiring additional enrichment.
Embedding-Based Enrichment
When existing metadata cannot express the required authorization logic, PlainID Edge supports an optional enrichment flow based solely on vector embeddings.
In this flow:
- Vector embeddings are retrieved without accessing raw documents.
- Embeddings are analyzed locally using PlainID ML-based enrichment.
- Documents are classified, grouped, and enriched with inferred authorization Attributes.
At no point is the original content reconstructed, inspected, or transmitted.
The inferred Attributes are converted into authorization Building Blocks and published to PlainID Cloud.
Two Complementary Authorization Flows
Flow 1: Metadata Discovery
Use case: Existing metadata is sufficient for access control.
- Connect to the customer vector database.
- Retrieve existing metadata fields and filters.
- Normalize and map them to PlainID authorization objects.
- Publish them as authorization Building Blocks in PlainID Cloud.
This flow is lightweight and does not require enrichment.
Flow 2: Vector-Based Enrichment
Use case: Metadata alone cannot express the required authorization requirements.
- Retrieve vector embeddings only.
- Apply PlainID ML-based classification and enrichment.
- Generate inferred authorization Attributes.
- Publish enriched authorization Building Blocks to PlainID Cloud.
This flow increases authorization expressiveness without increasing data exposure.
Continuous Synchronization
The RAG Enricher maintains an ongoing connection to the vector database and:
- Detects schema or metadata changes.
- Updates authorization Building Blocks accordingly.
- Keeps PlainID Cloud aligned with the current data structure.
This ensures that authorization Policies reflect the evolving state of the RAG system.
Deployment Model
The RAG Enricher and Classifier is deployed as part of PlainID Edge within the customer environment.
Key characteristics include:
- Runs alongside vector databases.
- No outbound transfer of raw data.
- Secure and minimal connectivity to PlainID Cloud.
- Designed for plug-and-play integration.