Installation and Configuration

Prev Next

Overview

Implementing effective access control for Google BigQuery requires precise configuration of Asset Types. A General Access Policy defines how resources are protected based on Attributes such as projects, datasets, tables, and columns. This document outlines the steps to configure Asset Types for row-level and column-level access control and install the BigQuery Interceptor to enforce these policies.

By following these instructions, you can establish a robust access control framework for your BigQuery assets, ensuring security and compliance with organizational policies.

How it Works

The Authorizer enforcement flow integrates dynamic Policy evaluation to modify SQL queries before execution. This streamlined process provides granular control over row-level and column-level access, safeguarding sensitive data while aligning with the organization's data governance framework. The process operates as follows:

  • User Access and Authentication: A user accesses the application, which redirects them to authenticate via the Identity Provider (IdP).
  • Query Submission: The application sends an SQL request to the PlainID Google BigQuery Authorizer. This component functions as a Query Modifier and evaluates the request using dynamic Policy modeling.
  • Policy Evaluation and Query Modification: The Authorizer enforces access control by applying the defined Policies. It modifies the original query based on the user's permissions and returns the updated query to the application.
  • Query Execution: The Application submits the modified query to Google BigQuery for execution, ensuring compliance with organizational access control policies.

Flow Diagram

GBQ Flow Diagram

  1. The user accesses the app.
  2. The user is redirected to complete the authentication process on the Identity Provider (IdP).
  3. The application sends the SQL request to the PlainID Google BigQuery Authorizer. The Authorizer is a Query Modifier – a PDP component designated for Google BigQuery that modifies queries based on the dynamically calculated access decision. The modified query is sent back to the application.
  4. The modified request is sent to Google BigQuery by the Application (specific path based on each Application’s unique architecture).

Authorizer Request and Parameters

The POST /enforcer/bigquery/{projectId}.{datasetName} endpoint allows users to submit SQL queries for enforcement based on Policy Resolution rules. This API processes incoming queries and applies access control measures according to the defined Policies. Users can configure various parameters to customize the query execution, including options for debugging and formatting.

The request body must include the SQL query along with the necessary Runtime resolution credentials to ensure accurate Policy enforcement.

Endpoint
POST /enforcer/bigquery/{projectId}.{datasetName}?pretty=false&dry=false

Headers

Header Description Required Values
Content-Type Must be application/json Yes application/json
Accept Default is application/json. Also supports text/plain. No application/json, text/plain

Parameters

Parameter Type Required Default Values Description
projectId string No - Any valid project ID Project identifier
datasetName string No - Any valid dataset name Dataset name
pretty boolean No false true, false Enables formatting
dry boolean No false true, false The "dry run" parameter avoids applying rules and returns incoming SQL without any changes (according to Policy Resolution rules)
debug boolean No false true, false Adds debug info such as requestId, enforcerProperties, and cacheProperties.

Body

Field Type Description
sql string SQL query.
resolution object Runtime Resolution credentials.

Policy Modeling

Asset Type Configuration for Google BigQuery

Row-Level Security (RLS) and Column-Level Security (CLS) are key access control mechanisms in Google BigQuery that help enforce data governance by restricting access based on user roles and attributes. For more information about Asset Type Configurations, refer to our documentation on Data Access Policies

Google BigQuery Special Modeling Instructions

Google BigQuery includes special modeling instructions that help enforce fine-grained access control and ensure data governance by aligning policies with BigQuery’s project, dataset, and table structure. These instructions provide guidelines for configuring RLS and CLS to manage access based on user roles and attributes.

Row-Level Security (RLS)

RLS controls access to specific rows within a BigQuery table based on predefined conditions. Policies are applied using Asset Types that align with BigQuery’s project, dataset, and table hierarchy. The naming convention must follow project_ID.dataset.table_name, ensuring consistency with BigQuery’s structure.

  • Global Policies: Use an asterisk (*) to define Policies that apply across multiple datasets or tables (e.g., project_ID.*.*).
  • Asset Attributes: The Attribute names used in Policies must correspond exactly to the column names in BigQuery and are case-sensitive.

Example Use Case: A sales manager can only access rows related to their assigned region within the sales_data table, while another manager has access to a different region.

Column-Level Security (CLS)

CLS restricts access to specific columns within a table to control exposure of sensitive data. Policies are defined using mandatory Asset Attributes that map directly to BigQuery components, ensuring precise control over data visibility.

  • Mandatory Attributes:

    Parameter Description
    column BigQuery Column Name
    projectid BigQuery Project ID
    dataset BigQuery Dataset Name
    table BigQuery Table Name
  • Ensure these attributes align with BigQuery’s project, dataset, and table hierarchy.
    Example Use Case: A policy can be applied to restrict access to the balance column in the account table within the ds1 dataset under project1, ensuring that only authorized users can view financial data.

By implementing RLS and CLS, organizations can enforce fine-grained access control, ensuring compliance with security policies while maintaining flexibility in data management.

Authorizer Setup and Configuration

Prerequisites

In order to configure the PlainID Google BigQuery Authorizer, you need to create a Service Account in Google Cloud Platform:

  • Create a GCP Service Account in the GCP IAM
  • Grant the following Roles to the created Service Account:
    • BigQuery Connection User
    • BigQuery Data Viewer
    • BigQuery User
  • Create a new Key for the Service Account
  • Download the Key in JSON format and save it into a file, e.g. creds.json

Installation

This section outlines the installation process for the BigQuery Interceptor, which integrates with the PlainID platform to enforce Policy-based access control.
To install the Platform BigQuery Interceptor:

  1. Pull the BigQuery Interceptor container image from the PlainID Container Registry using the docker pull command:
    docker pull <platform registry location>:<version>
    
    For example:
    docker pull gcr.io/plainid-presales/bigquery:1.0
    
  2. Create a secret from the credentials JSON:
    kubectl create secret generic google-creds --from-file=<json location>
    
    Example:
    kubectl create secret generic google-creds --from-file=query-enforcer/creds.json
    
  3. Update the bigq.yaml file as needed (see Sample Configuration below).
  4. Apply the configuration:
    kubectl apply -f bigq.yaml
    

Configuration

This section lists the key properties for customizing the BigQuery integration, including connection settings.

Sample Configuration bigq.yaml

This sample bigq.yaml contains the following configurations:

  • Configmap
  • Service
  • Deployment
    • Secret (Optional)
apiVersion: v1
# ConfigMap
data:
  application.properties: |
    server.port=8080
    #server.tomcat.uri-encoding=UTF-8
    #server.error.whitelabel.enabled=false

    logging.config=classpath:logback-spring.xml
    # The PlainID logging level options include trace, debug, info, error, and OFF.
    logging.level.org.springframework=OFF
    logging.level.org.springframework.core=OFF
    logging.level.org.springframework.beans=OFF
    logging.level.org.springframework.context=OFF
    logging.level.org.springframework.transaction=OFF
    logging.level.org.springframework.web=OFF
    logging.level.org.springframework.test=OFF
    logging.level.org.springframework.data=OFF
    logging.level.com.google.zetasql=info
    logging.level.com.plainid=error

    spring.main.banner-mode=OFF
    spring.http.encoding.charset=UTF-8
    spring.http.encoding.enabled=true
    spring.http.encoding.force=true
    spring.mvc.throw-exception-if-no-handler-found=true
    spring.resources.add-mappings=false

    spring.redis.host=redis.example.com
    spring.redis.port=6379
    spring.redis.password=SXSIvHJo1OzaWnU2YHZ8p3us0z
    spring.data.redis.ssl.enabled=true

    management.security.enabled=false
    management.endpoints.enabled-by-default=false
    management.endpoints.web.exposure.include=info, health
    management.endpoint.info.enabled=true
    management.endpoint.health.enabled=true
    management.endpoint.health.show-components=always
    management.endpoint.health.show-details=always
    management.endpoint.health.probes.enabled=true
    management.endpoint.health.group.liveness.include=livenessState
    management.endpoint.health.group.liveness.show-details=always
    management.endpoint.health.group.readiness.include=readinessState
    management.endpoint.health.group.readiness.show-details=always
    management.health.postgres.enabled=true
    management.health.readinessstate.enabled=true
    management.info.git.mode=full

    spring.cloud.gcp.core.enabled=false
    spring.cloud.gcp.bigquery.enabled=false

    app.request-context.prefix=context
    # update the following parameter to your PDP URL
    app.policy-resolution.api-url=https://URL-TO-PDP-RUNTIME/api/runtime/resolution/v3
    app.cache.prefix=query-enforcer.v2
    app.cache.ttl-policy-resolution=3000
    app.cache.ttl-db-schema-initial-delay=3000
    app.cache.ttl-db-schema=600000
    app.cache.ttl-catalog=300000
    # update to the full path of the GCP Service Account Credentials file:
    app.enforcer.bigquery-credential-files[0]=file:creds.json
    app.enforcer.resolutionresponse-empty=deny
    app.enforcer.resolutionaccess-empty=deny
    app.enforcer.resolutionprivileges-empty=allow
    app.enforcer.ignoreforbidden-columns=deny
    app.enforcer.columnenforcermasks[string]=***
    app.enforcer.resolutionmasking-as-column=deny
    app.enforcer.resolution-attribute-projectId=projectid
    app.enforcer.resolution-attribute-dataset=dataset
    app.enforcer.resolution-attribute-table=table
    app.enforcer.resolution-attribute-column=column
    app.enforcer.bigquery-information-schema=allow
    app.enforcer.ignore-enforcer-fails=deny
    # app.enforcer.ignore-tables[0]=plainid-142815.*.*
    app.enforcer.ignore-forbidden-columns=deny
    # SSL configuration (Uncomment to enable SSL)
    # The format used for the keystore. It could be set to JKS in case it is a JKS file
    # server.ssl.key-store-type=PKCS12
    # The path to the keystore containing the certificate
    # server.ssl.key-store=/app/keystore.jks
    # The password used to generate the certificate
    # server.ssl.key-store-password=111
    # The alias mapped to the certificate
    # server.ssl.key-alias=111
    # security.require-ssl=false
kind: ConfigMap
metadata:
  name: bigq-config

---
# Service
apiVersion: v1
kind: Service
metadata:
  name: bigq
spec:
  selector:
    app: bigq
  ports:
  - name: "bigq"
    protocol: "TCP"
    port: 8080
    targetPort: 8080
---
#Deployment and Optional Secret
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bigq
spec:
  selector:
    matchLabels:
      app: bigq
  strategy:
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: bigq
    spec:
      containers:
        # The following 3 lines are optional - depending on the used redis - internal or managed
       - image: redis
         name: redis
         resources: {}
        # Use the Image and Tag provided by PlainID
       - image: plainid/authz-bigquery:5.2321.0
         imagePullPolicy: Always
         name: bigq
         resources: {}
         volumeMounts:
         - name: bigq
           mountPath: /app/application.properties
           subPath: application.properties
         - name: creds
           mountPath: /app/creds.json
           # The name of the credentials JSON file - matching the name of the local file `creds.json`
           subPath: creds.json
      volumes:
      - name: bigq
        configMap:
          name: bigq-config
      - name: creds
        secret:
          secretName: google-creds

Configuration Properties

Property Description Values
app.policy-resolution.api-url The Runtime Resolution API URL Example: https://localhost:8010/api/runtime/resolution/v3
app.policy-resolution.connection-timeout Sets the connection timeout in milliseconds Default: 10000
app.policy-resolution.socket-timeout Sets the socket timeout in milliseconds Default: 60000
app.policy-resolution.automatic-retries Toggle disabling automatic retries (up to 4 times) for socket timeouts Default: true
app.bigquery.total-timeout totalTimeout has ultimate control over how long the logic should keep trying the remote call until it gives up completely. The higher the total timeout, the more retries can be attempted. Default: 0 milliseconds
app.bigquery.initial-retry-delay InitialRetryDelay controls the delay before the first retry. Subsequent retries use this value adjusted according to the RetryDelayMultiplier. Default: 0 milliseconds
app.bigquery.max-retry-delay MaxRetryDelay puts a limit on the value of the retry delay, so that the RetryDelayMultiplier cannot increase the retry delay higher than this amount. Default: 0 milliseconds
app.bigquery.retry-delay-multiplier RetryDelayMultiplier controls the change in retry delay. The retry delay of the previous call is multiplied by the RetryDelayMultiplier to calculate the retry delay for the next call. Default: 1.0
app.bigquery.max-attempts MaxAttempts defines the maximum number of attempts to perform. If this value is greater than 0, and the number of attempts reaches this limit, the logic will give up retrying, even if the total retry time is still lower than TotalTimeout. Default: 0
app.cache.prefix Redis cache prefix Default: query-enforcer.v2
app.cache.ttl-policy-resolution Cache TTL in milliseconds for Policy Resolution response Default: 300000 (5m)
Minimum: 2000 (2s)
app.cache.ttl-db-schema Cache TTL in milliseconds for BigQuery request Default: 10800000 (3h)
Minimum: 60000 (1m)
app.cache.ttl-catalog Cache TTL in milliseconds for building internal DB schema for dataset from enforcer request Default: 600000 (10m)
Minimum: 10000 (10s)
app.enforcer.bigquery-credential-files List of GCP credential filenames
if empty or not defined the workload identity will be used for authentication.
The first credentials file is used as default in cases where the project user is querying doesn't have a designated file.
Example: app.enforcer.bigquery-credential-files[0]=project-12345.json
app.enforcer.resolution-response-empty How to act in case there is no response (empty response) from the Policy Decision Point. allow – enable access without any limitations. deny – block all access. Default: deny
app.enforcer.resolution-access-empty How to act in case response includes row level policy only (no column level response). allow – All columns are returned to the user. deny – All columns are masked. Default: deny
app.enforcer.resolution-privileges-empty How to act in case the response includes column level policy only, (No row level response). allow – no limitation on row level. deny – no rows are returned to the user Default: deny
app.enforcer.resolution-masking-as-column How to act in case the response from Policy Decision Point does not contain any access but does contain masking instruction.
allow – mask all columns in the table based on masking instructions.
deny – block all access.
Note: this is relevant only when app.enforcer.resolution-response-empty is defined as deny
V2.4 and above
Default: deny
app.enforcer.resolution-attribute-projectId Name of "projectId" attribute in policy Asset Template Default: projectid
app.enforcer.resolution-attribute-dataset Name of "dataset" attribute in policy Asset Template Default: dataset
app.enforcer.resolution-attribute-table Name of "table" attribute in policy Asset Template Default: table
app.enforcer.resolution-attribute-column Name of "column" attribute in policy Asset Template Default: column
app.enforcer.resolution-masking-resource-type-name Name of the Asset Template used to manage masking instructions.
V2.4 and above
Default: Masking_Instructions
app.enforcer.resolution-attribute-maskAs Name of attribute used to manage the masking instruction within the Asset Template
V2.4 and above
Default: maskas
app.enforcer.bigquery-information-schema Supporting queries with using information_schema tables. Possible values: allow or deny. After changing this value the DB schema cache must be updated. Default: allow
app.enforcer.ignore-enforcer-fails Indicates what to do in case a SQL query that can’t be parsed is sent to the enforcer. Possible values: allow or deny Default: deny
app.enforcer.ignore-tables List of tables with full name “project.dataset.table” to ignore for enforcing. It is possible to use “*” (star) instead of name for each part. Example:
app.enforcer.ignore-tables[0]=*.*.members
app.enforcer.ignore-forbidden-columns A behavior of using a forbidden column is sent within the WHERE clause of SQL query. Possible values: allow or deny Default: deny
app.enforcer.bigquery-prefetch-db-schema Filtering used metadata based on mentioned tables in SQL.
If true might be received RESOURCE_EXHAUSTED error in case BigQuery contains a very big list of datasets/tables/columns.
If false the metadata will be fetched at runtime.
V2.2 and above
Default: true
app.enforcer.column-enforcer-masks[coulmnType] Define default masking instruction per Column Type. Will be considered in case no masking instruction will return from the Policy Decision Point V2.4 and above Examples:
app.enforcer.column-enforcer-masks[string]=***
app.enforcer.column-enforcer-masks[numerical]=-1
app.enforcer.column-enforcer-masks[bool]=false
app.enforcer.column-enforcer-masks[timestamp]=0
app.enforcer.column-enforcer-masks[date]=2007-12-03
app.enforcer.column-enforcer-masks[time]=10:15:30

Authentication to PDP in Google BigQuery (GBQ) Integration

This section details the methods for authenticating with the Policy Decision Point (PDP), including configuration options for scopes using client secrets or JWTs, as well as advanced use cases for passing additional headers.

Authentication to the PDP is managed at the Scope level. Follow these guidelines based on your Scope configuration:

Scopes Using Client Secrets

  • For Scopes configured to work with a Secret, the client calls the GBQ Authorizer using a Client Secret.

Scopes Using JWTs

  • For Scopes configured with JWTs, the client sends the JWT as the Authorization header when calling the GBQ Authorizer.
  • Additionally, configure the GBQ Authorizer to allow the Authorization header to pass through by adding Authorization to the HEADERS_PASS_THROUGH_WHITE_LIST setting.

Including Additional Headers

  • If your application requires passing additional headers to the PDP, include them in the HEADERS_PASS_THROUGH_WHITE_LIST setting, regardless of the Scope's authentication method.
  • For example, to pass an external request ID such as X-Request-ID, include it in the headers pass-through configuration. This setup supports advanced use cases such as injecting custom headers.