Databricks Column-Level Security

Prev Next

Column-Level Security (CLS) allows organizations to control access to specific columns within a table or view, ensuring that sensitive data is only visible to authorized users.

In Databricks, Column-Level Security is enforced using SQL User-Defined Functions (UDFs).

Each column can be associated with a single masking function, which dynamically determines at query time whether to return the original value or a redacted version, based on the user's identity or context.

This decision is driven by conditional logic and identity-aware functions such as is_member(group), is_account_group_member(group), or other Databricks identity functions.

Dynamic Data Masking

Databricks supports the following Masking Policy:

  • Column Masks
    Column masks are rules directly attached to specific columns in a table or view, enforcing a masking function at query time to protect sensitive data based on user identity or context.
    • Best practice: Targeted column protection when the logic is simple and specific to a particular use case.

Creating a Masking Policy

Note

Ensure that your Databricks app in Orchestration is switched to Manage mode to start building your Masking Policy in the Policy catalog section. See Managing POPs for more information on how to change the POP Mode.

Masking Policies in PlainID can be created using the following methods:

  • Wizard
  • Code
  • Native

For more information, refer to Creating Policies.

To Create a Masking Policy with the Policy Wizard:

  1. Input the Policy Details. For more information, refer to the Creating Policies article

  2. Input the Databricks POP Details specific to the vendor Policy:

    • Under the Vendor Policy Kind dropdown, select Masking Policy. (Required)
    • Enter a Vendor Policy Name, a unique name (in lower-case) for the Column masks function that will appear in the Databricks vendor. (Required)
      • Note: The same name can be used across multiple PlainID Policies to define multiple logic statements within a single Masking Policy.
    • Under the Vendor Policy ID, note that the ID is automatically generated. It is the ID used in Policy management operations like deploy, update, and override.
    • Define the Vendor Policy Order, which relates to the execution order for the logical case statement within the same Vendor Policy ID. (Required).
      • The Default value is 1. Use this field when enforcing multiple logic conditions for different Identity Groups within the same Policy.
    • Enter a Comment, add any additional information or clarifications about your Policy. (Optional)
    • Under Catalog, select the Databricks catalog where the Policy will be deployed.
    • Under Schema, select the schema within the catalog where the Policy resides.
  3. Click Continue.

Who Section

In the Who section, you can assign Column-level access by creating a new Dynamic Group or selecting an existing one. A single Policy can evaluate multiple Dynamic Groups, giving you flexibility in defining access control logic.

Define a new Dynamic Group using the available vendor identity functions to grant access based on user identity parameters.

Supported Databricks identity functions include is_account_group_member(group), is_member(group), session_user(), and current_user().

You can use multiple Identity functions per Dynamic Group. If you want to apply your Policy logic to all users, you can still connect the default "All Users" Dynamic Group.

image.png

For additional information on how to create a Dynamic group, see Managing Dynamic Groups.

What Section

In the What section, you can define access through your Asset Types.
For more details on Asset Types, refer to Managing Asset Types.

To define access, you are required to define the following Asset Types:

  1. Click Add Asset Type. Ensure that your Asset Type meets the requirements listed below.
  • From the dropdown, select an Asset Type that defines your masking policy logic.
    • Only one Asset Type can be selected in a single masking policy.
  • Select existing Rulesets for your policy logic or create a new one. A Policy can include multiple Rulesets, note all combined with logical OR.
  • Click Save.
  1. To add Databricks Columns (Asset Type), Click Add Asset Type.
  • From the dropdown, select Databricks Columns Asset type. For more information, see Use Databricks Columns Asset type listed below.
  • Select the Masking Action, which defines how data will be masked. For more information, please refer to Define Masking Instructions below.
  • Select the Assets to mask.
    • A single Masking Policy can be applied to only one Asset column.
    • A Ruleset is not required for these Asset Types.
    • To apply the same masking logic across multiple columns, create multiple masking policies that share the same Vendor Policy ID and Signature Asset Type.
  • Click Save.

Rulesets and Actions are currently not supported for Databricks Columns

Asset Types for Column-Level Requirements

Ensure that your Asset Type/s meet the following requirements:

Table Mapping

To set Table Mapping:

  1. Ensure that "Is used for data filtering?" is enabled.
  2. Select Set Table Mapping to link your logical Asset Type with one or more physical Tables or Views in Databricks, where the Masking policy should apply.
    • The mapping must include the fully qualified path (in lower-case) to the table or view, following the format: catalog_name.schema_name.table_name.

image.png

For more information about Table Mapping, refer to Dynamic Data Mapping.

Attribute Mapping

To set Attribute Mapping:

  1. Choose an existing Attribute or create a new one to use in your Masking logic.
  2. Define the data type (for example, String, Numeric).
  3. Enable Available for Policies = Yes. These attributes act as placeholders for filtering conditions.
  4. Click Set Data Column Mapping to map the logical Attributes to one or more physical Databricks column/s (in lower-case).

image.png

image.png

To connect an Asset Type with the Databricks Application:

  1. Once the Asset Type is created, open the Databricks application.
  2. Navigate to the Asset Types tab.
  3. Click Edit.
  4. Select the relevant Asset Types you want to associate with the Databricks application.

image.png

Defining a Ruleset for Masking Policies

To define a Ruleset:
In the Asset type within the Ruleset tab, define the Masking logic using the Asset Attributes you created (for example, employment_type = 'Full-Time').

You can use Databricks tables as an external Identity Source for contextual or dynamic rules (for example, Department = user.department). See our article Managing Identity Source for more information.

Use PlainID's flexible logic to combine operators (=, !=, >, IN, etc.) with complex AND/OR relationships.

image.png

Databricks Column Asset Types

The Databricks Column Asset Types are automatically generated when the Policy Orchestration Point (POP) is created. It represents a logical component that holds the available Databricks columns for masking.

  • Databricks Columns Assets

All those columns appear in the Assets section, and are either generated automatically from the masking policy in the discovery process or created by the user. For more information on how to create new Assets, please refer to managing columns for masking.

In order to use the columns' assets within a masking policy, the Signature Asset Type's table Mapping needs to be predefined correctly. See Requirements above.

  • Databricks Column Actions

The Actions tab is used to define how the data should be masked. For more information, refer to Define Masking Instructions.

image.png

Defining Masking Instructions

In the Databricks Columns or Tags Asset Type, the Actions tab is used to define how the data should be masked. The masking instructions are applied to the selected Assets in your Policy.

You can either use a default masking instruction (e.g., Default for STRING, Default for INT) or create a custom masking instruction by specifying:

  • Input an Action ID: The display name of the action.
  • Input a Value: The actual masking logic (e.g., 'MASKED').
  • Choose a Type: The returned Masked Value's data type. This must match the RETURNS type declared in the masking policy (e.g., STRING, INT).

When Section

In the When section, you define the Conditions that provide contextual identity data for your Policy, which determine when the Policy applies.

To define a Condition:

  1. Select an existing Condition or create a new one.
  • A Policy can include multiple Conditions, combined with logical AND, meaning all Conditions must be met for the Policy to grant access. Ensure that you select Conditions from Identity Attributes linked to the POP's Identity Source.
  1. Click Save.

Conditions for Masking Policy Requirements

  • Only Conditions based on Identity Attributes are supported for use in Masking Policies.
  • Ensure that an Identity Source table is defined within your Policy Orchestration Point (POP).
  • Use an Identity Attribute that is mapped to one of the additional Identity Sources associated with the selected POP. For more details, refer to our Managing Identity Source article.
  • Within a single Condition, you can only use Attributes from one Source.
    image.png

Deploying the Policy

Once complete, navigate to the Orchestration Workspace and deploy the Policy to Databricks. Databricks enforces the access decision based on the Row-level filtering you've defined in the PlainID platform.
image.png

Databricks Column Masks- SQL Function Structure

This table explains the components of a Databricks Column Masks Function in SQL and compares them to their equivalents in PlainID, helping you translate and build your policies easily within the PlainID platform.

Policy Syntax Table

Databricks terminology Databricks Syntax Description PlainID terminology
Policy Declaration CREATE OR REPLACE FUNCTION <Function_name> Defines the Function name. The function can be used for Row-level filtering or for column masking. Used to declare or replace an existing Function. Vendor Policy Name
Policy Signature AS (Argument1 STRING, Argument2 STRING...) RETURNS STRING Declares the Policy's input arguments and return type. Arguments are used in the Policy masking logic.
The return type of the masking policy must match the data type of its first argument.
The Column Masks Function is set on the first argument.
Policy Signature mapped to Asset Type. Each argument that is used within the policy logic is created as an Asset Attribute. The STRING return type defines the Vendor Policy Kind as Masking Policy.
Policy Logic WHEN... Incorporates Databricks identity functions: is_account_group_member(group), is_member(group), session_user(), current_user().
This determines who should receive access in a given context.
Dynamic group
CASE... Use SQL expressions such as CASE statements to define the masking logic. Policy logic uses the Policy arguments mapped into Rulesets, which can be dynamic (based on external identity data) or static (use fixed values).
EXISTS (SELECT... FROM... WHERE...) Use SQL subqueries with EXISTS SELECT for dynamic, context-based filtering. Policy logic uses an external table mapped into conditions, where expressions typically compare attributes to static values. Note that a correlation must be defined using one of the Databricks identity functions is_account_group_member(group), is_member(group), session_user(), current_user().
Policy Output THEN 'MASK' ELSE Argument1 Defines the instructions used for masking data. Based on the policy logic, the output can be the original column value, a partially masked version, or a fully masked. The masking instructions are applied to the first argument. Action in the Databricks Columns Asset type.
Policy Application Apply to a column: ALTER TABLE 'catalog.schema.table' ALTER COLUMN 'column1' SET MASK 'function_name' USING COLUMNS (column2); Applies the Columns Mask Function to a specific column in Databricks Unity Catalog. The same Policy can be reused across multiple columns and tables for consistent masking behavior. The Column-based masking is defined in the Asset Type Data Settings. The first column is the masked value, mapped as an Asset in the Columns Asset Type.

Column Mask Example

The following example demonstrates a masking policy that returns a 'MASKED' value for the salary column unless the user is in the 'HR' group, has a security level of 'A+', and the employee is a full-time worker.

CREATE OR REPLACE FUNCTION mask_salary(
  Argument1 STRING,
  Argument2 STRING
)
RETURNS STRING
RETURN CASE
  WHEN is_member('HR')
    AND employment_type = 'Full-Time'
    AND EXISTS (
      SELECT 1
      FROM user_security_levels
      WHERE username = current_user()
        AND security_level != 'A+'
    )
  THEN 'MASK'
  ELSE Argument1
END;

ALTER TABLE `product_demo`.`schema1`.`hr_employees`
ALTER COLUMN `salary`
SET MASK `product_demo`.`schema1`.`mask_salary`
USING COLUMNS (`employment_type`);

image.png

image.png

The salaries for employees 1001, 1002, and 1005 will be visible to Alice because she is in the HR group with A+ security level, and these employees are full-time.