Input & Output Guardrails

Prev Next

Guardrails define the safety boundaries of an AI interaction. They ensure that only approved inputs reach an Agent and that only authorized outputs are returned to the user.

In the Policy Builder, guardrails are first-class components. They are applied directly to the interaction flow and evaluated in real time at Runtime, making them an integral part of every Policy.

The PlainID Authorization Platform supports two types of guardrails:

  • Input Guardrails control what users are allowed to ask.
  • Output Guardrails control what information is allowed to be returned.

Together, they create a complete safety envelope around your AI Application.


Input Guardrails

Input Guardrails determine which prompts are allowed to enter the system and be forwarded to an Agent.

If a submitted prompt does not comply with the defined Input Guardrails, the request is blocked immediately and is not executed.

Input Guardrails answer a key question:

What kind of questions is the user allowed to ask?

You can define guardrails at a high level, for example, HR related questions, or with fine grained intent control such as compensation, employee evaluations, or recruitment topics.

To define Input Guardrails

  1. In the Policies section in the Environment sidebar, click on the relevant Authorization Workspace.

  2. Select a Policy or create one.

  3. In the Policy Canvas, click the plus (+) icon in the Input Guardrails component.
    Image

  4. In the side panel, select the relevant entries from the Groups or Categories tabs.

    • Groups represent broader aggregations of intent categories. Selecting a group automatically includes all associated categories within that domain. For example, selecting the HR group includes categories such as compensation, employee assessments, recruitment, and benefits.
    • Categories provide granular control and allow you to assemble a custom set of allowed intents. Use this option to define precise permissions tailored to specific roles or use cases.
  5. Confirm your selection. The selected entries appear in the Input Guardrails widget on the canvas.

  6. To remove an entry, clear the corresponding checkbox in the side panel or remove it directly from the canvas.


Output Guardrails

Output Guardrails prevent unauthorized or sensitive information from being exposed to users. They apply masking and filtering rules before a response is returned, even if the Agent has internal access to the data.

Output Guardrails answer a key question:

What data must never be exposed to this user?

To define Output Guardrails

  1. In the Policies section in the Environment sidebar, click on the relevant Authorization Workspace.

  2. Select a Policy or create one.

  3. In the Policy Canvas, click the plus (+) icon in the Output Guardrails component.
    Image

  4. In the side panel, select the relevant entries from the Groups or Categories tabs.

    • Groups represent collections of sensitive data types. Selecting a group automatically includes all related data categories. Use groups when users should not have access to any data within a defined sensitivity domain, such as:
      • PII (Personally Identifiable Information)
      • PHI (Protected Health Information)
      • PCI (Payment Card Information)
      • MNPI (Material Non-Public Information)

    • Categories allow fine-grained control over specific data elements, such as:
      • SSN
      • IBAN
      • Credit card numbers
      • Employee IDs
  5. For each selected entry, choose the masking type. Masking is selected by default.

  6. To remove an entry, clear the corresponding checkbox in the side panel or remove it directly from the canvas.

By combining Input Guardrails and Output Guardrails, you define both what can be asked and what can be returned. This ensures your AI Applications remain useful, compliant, and secure by design.


Next, proceed to Controls, where you define access to tools, data sources, and organizational Assets.