Installing and Configuring the Access File Authorizer
    • 06 Jan 2025
    • 23 Minutes to read
    • Dark
      Light
    • PDF

    Installing and Configuring the Access File Authorizer

    • Dark
      Light
    • PDF

    Article summary

    Authorizer Overview

    The Access File Authorizer supports Authorization enforcement in an offline manner, enabling bulk Authorization calculation and managing authorization decision states. It processes data and generates access files in specific templated structures, which can be consumed by offline systems, legacy technologies that can't integrate directly, and other use cases such as administration time access provisioning and access reporting/auditing. This is particularly suited for systems that
    lack integration points for Runtime calculation or enforcement, but require dynamic, Policy-based Authorization managed centrally in PlainID.

    General Operation Flow

    The Authorizer process flow follows these main steps:

    • Load Subject Data from a predefined source and store it in an operational database deployed for the Authorizer.

      • This data represents the population for which authorization is calculated.
      • A common use case involves a full or partial set of organization identities, where authorization is calculated and output to an access file.
      • Note: Only subject IDs and attribute checksums are stored to avoid duplicating your data source.
    • Process PDP Requests for all subjects in the population and store the results in the operational database.

      • PDP requests are generated iteratively and processed in parallel.
      • Access decisions or error responses are stored for each subject as an access state.
    • Generate an Access File based on the subject's access state saved in the database and the file template.

    This flow outlines how the Authorizer operates. Additional nuances for this flow, detailed below, include:

    • Running a full subject population versus running only over-updated subjects.
    • Using Identities or Assets as subjects with the relevant PDP flows.
    • Executing multiple flows within a single job.
    • Choosing between manual and scheduled job execution.

    Copy of MarketectureDiagram 4.pptx.svg

    Initiation

    1. An access file is generated, initiated either by an administrator or automatically through a scheduling mechanism.

    Phase 1
    2. The Authorizer reads a list of subjects from the client’s data source.

    Phase 2
    3. The Authorizer writes the retrieved subjects to an operational database.
    4. The Authorizer queries the PDP to calculate an Authorization Decision for each subject.
    5. The PDP's access decisions are stored in the operational database for each subject.

    Phase 3
    6. The Authorizer generates an access file for all subjects and saves it in the storage volume.

    Access Usage
    7. Applications retrieve and use the access file to enforce user access.
    8. User actions in the Application are approved or denied based on the Authorizations provided in the access file.

    Participating Components

    • Access File Authorizer Service - The Authorizer Service managing the complete process, including data fetching and persistence, access processing, and output file generation.
    • Operational Database - Stores Subject population, Authorization Decisions and processing status.
    • PDP - Calculates Authorization Decisions based on policies and data.
    • PIP - Integrates with external data sources to provide user population and other required data.
    • Redis - Stores and manages job scheduling data and execution using Redis queuing.
    • Scheduler - Oversees the execution of configured job flows, utilizing cron-based scheduling, concurrent job handling, a locking mechanism, and status tracking and history.

    Deployment and Configuration

    Prerequisites

    Users need the following prerequisites before installing the Access File Authorizer:

    • PlainID Private Docker Hub access
    • An operational database: PostgreSQL with read and write permissions.
    • Access to the relevant Subjects Data source database.
    • Allocated POD resources: Minimum 2 CPU and 4GB of memory.
    • Access permission on the POD "/app/data" folder to write files.
    • Access to Storage for output files writing/copying.
    • PAA Load Requirements: Ability to accommodate extensive load by configuring the PDP and PIP replicas as required.

    How to Install and Upgrade

    The Access File Authorizer deployed as a K8s Stateful Set. It should run alongside a PAA deployment as it integrates with the PDP and PIP.

    To install or upgrade:

    1. Obtain an Authorizer pack from PlainID Global Services that includes the Helm Chart and other config files.
    2. Use the values-custom.yaml file to define the Helm Chart specifications as per your Environment's deployment requirements.
      • Configure the pull secret for accessing Docker images from PlainID Docker Hub.
      • Configure the Authorizer using a Config Map and Environment Variables (under the extraEnv: section):
        • Database Connection: Specify details for the database used in data processing (e.g., PostgreSQL).
        • Data Source Configuration: Provide connection details for the relevant data source.
        • PDP Configuration: Adjust settings according to your deployment requirements.
        • Jobs & Flows Configuration: Define parameters for processing jobs and flows.
      • Refer to the example configuration for more information.
    3. Install the Helm Chart using the custom custom-values.yaml file.
      • It is recommended to deploy the Access File POD within the same namespace as the PAA.
    helm install <release-name> authz-access-file --values values-custom.yaml
    
    1. Post deployment validation
      • Check that the Access File service is up and running and review the logs for any significant errors.
        • The error ERROR: rabbit component is null. skipping fetching schemas from tenant-mgmt can be safely ignored.
      • During the initial deployment, the service startup should also create tables in the operational DB you configured. You can validate that these tables created successfully in the DB.

    Note: We recommend defining a test job in the configuration with a test flow, targeting a limited subject population and running this job as a post deployment validation.

    1. Upgrade the Authorizer version by adjusting the image tag number and relevant configuration. Ensure that you also upgrade the deployment using the Helm Upgrade:
    helm upgrade -i <release-name> authz-access-file --values values-custom.yaml
    

    Configuration

    Key Glossary

    TermDescription
    JobsThe main working unit of the Authorizer.
    The Authorizer triggers, executes, and manages operations at the job level.
    A job can include multiple flows, resulting in different outputs from the same run.
    FlowsThe basic working unit that handles the complete process of loading the subject population, managing access decisions, and generating output files.
    Each Flow references specific configurations to set up the data pulled from a data source, define the PDP configuration to use, and convert authorization decisions into the output file.
    Data SourcesA set of definitions used to connect to a database and pull subject population data into the Flow process.
    PDPsA set of definitions that manage the execution of authorization requests during Flow processing, including the configuration of concurrency and timeouts.
    LocationsA set of definitions specifying where to generate output files during execution.
    ConvertersA set of definitions that specify how to generate output files using Go Templates.

    Structure

    1. The Authorizer configuration structure is a hierarchical YAML with sets of keys and a referencing structure:
      • Jobs are defined by referencing the included Flows and additional job metadata.
      • Flows are defined by setting the following:
        • Mode - defines the flow's processing mode, Normal (deltas), or Full.
        • Source - references Data Sources and additional metadata.
        • PDP - references the PDP and additional specific Authorization request flags and definitions.
        • Converter - references a converter and location, defining how the flow's output is generated
    2. The full YAML config structure is defined in the values-custom.yaml under
    plainIDConfig:
      config.yaml:
    
    1. Use Environment Variables in the config.yaml to keep it readable and organized (strongly recommended). All configuration values support Environment Variable substitution using the following format: ${ENV_VAR_NAME:default_value}. The Environment Variable is then defined under the extraEnv: section in the values-custom.yaml
    2. In addition, the values-custom.yaml includes additional service configurations such as, Image, Resources, Ingress etc.

    Parameters

    NameRequiredDescriptionValue Examples
    management.portYesManagement API port9090
    http.portYesHTTP server port8080
    http.enableXSSValidatorNoEnable XSS validationtruefalse
    http.jwt.jwtIgnoreVerificationNoIgnore JWT verificationtruefalse
    log.levelNoLogging leveltracedebuginfowarnwarningerrorfatalpanic
    log.formatNoLogging formattextjson
    redis.hostYesRedis host"localhost""my-plainid-paa-redis-master"
    redis.portYesRedis port6379
    redis.passwordYesRedis password"secret123"
    redis.dbNoRedis database number0
    db.enabledYesEnable database integrationtrue
    db.usernameYesDatabase username"postgres"
    db.passwordYesDatabase password"secret123"
    db.hostYesDatabase host"localhost""my-plainid-paa-postgresql"
    db.portYesDatabase port5432
    db.databaseYesDatabase name"authz_db"
    db.driverYesDatabase driver type"postgres"
    db.migrationSchemasNoSchemas to apply migrations["public"]
    db.migrationTypeNoType of migrations to run"goose"
    db.migrationOnStartNoRun migrations on startuptruefalse
    db.schemaNoDefault database schema"public"
    db.sslNoEnable SSL for databasetruefalse
    db.maxIdleConnsNoMax idle connections10
    db.maxOpenConnsNoMax open connections10
    db.connMaxLifetimeNoConnection max lifetime in seconds3600
    datasources.<name>.usernameYesData source username"pip"
    datasources.<name>.passwordYesData source password"pa33word"
    datasources.<name>.hostYesData source host"localhost"
    datasources.<name>.portYesData source port30303
    datasources.<name>.databaseYesData source database name"vdb"
    datasources.<name>.sslmodeNoEnable SSL for data sourcetruefalse
    datasources.<name>.maxIdleConnsNoData source max idle connections10
    datasources.<name>.maxOpenConnsNoData source max open connections10
    datasources.<name>.connMaxLifetimeNoData source connection max lifetime in seconds3600
    datasources.<name>.connectTimeoutNoData source connection timeout in seconds10
    pdps.<name>.typeYesPDP type"runtime"
    pdps.<name>.runtimeParameters.urlYesPDP runtime URL"http://localhost:30040"
    pdps.<name>.runtimeParameters.timeoutNoPDP request timeout"30s"
    pdps.<name>.runtimeParameters.maxConcurrentConnectionsNoMax concurrent PDP connections5
    pdps.<name>.runtimeParameters.ignoreSSLNoIgnore SSL verification for PDPtruefalse
    locations.<name>.pathYesOutput location path"/path/to/output"
    converters.<name>.typeYesConverter type"goTemplate"
    converters.<name>.templateProperties.contentYesTemplate content"{\n \"users\": [\n {{- range $i, $data := . }}\n {{- if $i }},{{ end }}\n {\n \"id\": \"{{ $data.identity.uid }}\",\n \"name\": \"{{ $data.identity.name }}\"\n }\n {{- end }}\n ]\n}"
    flows.<name>.modeNoFlow execution mode"Normal""Full"
    flows.<name>.source.datasourceYesFlow data source reference"ds1"
    flows.<name>.source.schemaYesFlow source schema"public"
    flows.<name>.source.tableYesFlow source table"users"
    flows.<name>.source.uidYesFlow population source unique identifier column name"id"
    flows.<name>.pdp.idYesFlow PDP reference"pdp1"
    flows.<name>.pdp.runtimeParameters.typeYesFlow PDP runtime type"userList"
    flows.<name>.pdp.runtimeParameters.userListParameters.resourceTypeYes*Resource type for user list"groups-new"
    flows.<name>.pdp.runtimeParameters.userAccessTokenParameters.entityTypeIdYes*Entity type ID"entity-type-1"
    flows.<name>.pdp.runtimeParameters.pathYesPDP API path"/api/runtime/userlist/v3"
    flows.<name>.pdp.runtimeParameters.clientIdYesPDP client ID"client123"
    flows.<name>.pdp.runtimeParameters.clientSecretYesPDP client secret"secret123"
    flows.<name>.pdp.runtimeParameters.includeAssetNoInclude Asset in PDP responsetruefalse
    flows.<name>.pdp.runtimeParameters.includeIdentityNoInclude Identity in PDP responsetruefalse
    flows.<name>.pdp.runtimeParameters.includeAccessPolicyNoInclude Access Policy in PDP responsetruefalse
    flows.<name>.convert.batchSizeNoProcessing batch size10
    flows.<name>.convert.converters.<name>.idYesConverter reference"t1"
    flows.<name>.convert.converters.<name>.output.transientNoMark output as temporarytruefalse
    flows.<name>.convert.converters.<name>.output.location.idYesOutput location reference"l1"
    flows.<name>.convert.converters.<name>.output.location.filenameTemplateYesOutput filename template"users-{{nowWithFormat \"20060102\"}}.json""access-{{.Context.BatchID}}.csv"
    jobs.<name>.flowsYesList of flows to execute

    Note: The flows are executed in the order they are listed in the job's flows array.
    ["flow1", "flow2"]
    jobs.<name>.maxWorkersYesMax concurrent workers100
    jobs.<name>.modeNoJob execution mode"Normal""Full"
    jobs.<name>.scheduleNoCron schedule expression"0 * * * *"
    jobs.<name>.timeoutNo

    If not defined, 30m by default. We recommend always specifying it.
    Job execution timeout"24h"
    jobs.<name>.converters.<name>.idYes*Job converter reference"t1"
    jobs.<name>.converters.<name>.output.location.idYes*Job output location reference"l1"
    jobs.<name>.converters.<name>.output.location.filenameTemplateYes*Job output filename template"output-{{nowWithFormat "20060102"}}.json"

    * Required if parent configuration is used

    Examples

    • Basic Configuration
    management:
      port: ${MANAGEMENT_PORT:8081}
    
    http:
      port: ${APP_PORT:8080}
      enableXSSValidator: true
      jwt:
        jwtIgnoreVerification: ${JWT_IGNORE_VERIFICATION:true}
    
    redis:
      host: ${REDIS_HOST:localhost}
      port: ${REDIS_PORT:6379}
      password: ${REDIS_PASS}
      db: ${REDIS_DB:0}
    
    db:
      enabled: true
      username: ${DB_USER:offline}
      password: ${DB_PASS:offline}
      host: ${DB_HOST:localhost}
      port: ${DB_PORT:5432}
      database: ${DB_DATABASE:offline}
      driver: postgres
      migrationSchemas:
        - ${DB_SCHEMA:public}
      migrationType: goose
      migrationOnStart: ${DB_MIGRATION_ON_START:true}
      schema: ${DB_SCHEMA:public}
      ssl: ${DB_SSL:false}
      maxIdleConns: ${DB_MAX_IDLE_CONNECTIONS:1}
      maxOpenConns: ${DB_MAX_OPEN_CONNECTIONS:10}
      connMaxLifetime: ${DB_CONNECTIONS_MAX_LIFE_TIME_SECONDS:3600}
    
    datasources:
      ds1:
        username: ${DS1_DB_USER:pip}
        password: ${DS1_DB_PASS:pa33word}
        host: ${DS1_DB_HOST:localhost}
        port: ${DS1_DB_PORT:30303}
        database: ${DS1_DB_DATABASE:vdb}
        sslmode: ${DS1_DB_SSL_MODE:false}
        connectTimeout: ${DS1_CONNECT_TIMEOUT:10}
    
    pdps:
      pdp1:
        type: ${PDP1_TYPE:runtime}
        runtimeParameters:
          url: ${PDP1_URL:http://localhost:30040}
          maxConcurrentConnections: ${PDP1_MAX_CONCURRENT_CONNECTIONS:5}
          timeout: ${PDP1_TIMEOUT:30s}
          ignoreSSL: ${PDP1_IGNORE_SSL:false}
    locations:
      l1:
        path: ${LOCATION1_PATH}
    converters:
      t1:
        type: goTemplate
        templateProperties:
          content: ${CONVERTER1_CONTENT}
      t2:
        type: goTemplate
        templateProperties:
          content: ${JOB1_CONVERTER1_CONTENT}
    flows:
      flow1:
        mode: Full
        source:
          datasource: ds1
          schema: ${FLOW1_SOURCE_SCHEMA}
          table: ${FLOW1_SOURCE_TABLE}
          uid: ${FLOW1_SOURCE_UID}
        pdp:
          id: pdp1
          runtimeParameters:
            type: ${FLOW1_PDP_RUNTIME_PARAMETERS_TYPE:userList}
            userListParameters:
              resourceType: ${FLOW1_PDP_RUNTIME_PARAMETERS_USER_LIST_PARAMETERS_RESOURCE_TYPE}
            userAccessTokenParameters:
              entityTypeId: ${FLOW1_PDP_RUNTIME_PARAMETERS_USER_ACCESS_TOKEN_PARAMETERS_ENTITY_TYPE_ID}
            path: ${FLOW1_PDP_RUNTIME_PARAMETERS_PATH}
            clientId: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_ID}
            clientSecret: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_SECRET}
            includeAsset: true
            includeIdentity: true
            includeAccessPolicy: false
        convert:
          batchSize: ${FLOW1_CONVERT_BATCH_SIZE:10}
          converters:
            c1:
              id: t1
              output:
                transient: true
                location:
                  id: l1
                  filenameTemplate: ${FLOW1_CONVERTER1_FILENAME_TEMPLATE:output-jb1-flow1-{{nowWithFormat "20060102-150405"}}.json}
    jobs:
      jb1:
        flows:
          - flow1
        maxWorkers: ${JOB1_MAX_WORKERS:100}
    
    • Advanced Configuration with Multiple Flows and Converters
    management:
      port: ${MANAGEMENT_PORT:8081}
    
    http:
      port: ${APP_PORT:8080}
      enableXSSValidator: true
      jwt:
        jwtIgnoreVerification: ${JWT_IGNORE_VERIFICATION:true}
    
    log:
      level: info
      format: text
    
    redis:
      host: ${REDIS_HOST:localhost}
      port: ${REDIS_PORT:6379}
      password: ${REDIS_PASS}
      db: ${REDIS_DB:0}
    
    db:
      enabled: true
      username: ${DB_USER:offline}
      password: ${DB_PASS:offline}
      host: ${DB_HOST:localhost}
      port: ${DB_PORT:5432}
      database: ${DB_DATABASE:offline}
      driver: postgres
      migrationSchemas:
        - ${DB_SCHEMA:public}
      migrationType: goose
      migrationOnStart: ${DB_MIGRATION_ON_START:true}
      schema: ${DB_SCHEMA:public}
      ssl: ${DB_SSL:false}
      maxIdleConns: ${DB_MAX_IDLE_CONNECTIONS:1}
      maxOpenConns: ${DB_MAX_OPEN_CONNECTIONS:10}
      connMaxLifetime: ${DB_CONNECTIONS_MAX_LIFE_TIME_SECONDS:3600}
    
    datasources:
      ds1:
        username: ${DS1_DB_USER:pip}
        password: ${DS1_DB_PASS:pa33word}
        host: ${DS1_DB_HOST:localhost}
        port: ${DS1_DB_PORT:30303}
        database: ${DS1_DB_DATABASE:vdb}
        sslmode: ${DS1_DB_SSL_MODE:false}
        connectTimeout: ${DS1_CONNECT_TIMEOUT:10}
    
    pdps:
      pdp1:
        type: ${PDP1_TYPE:runtime}
        runtimeParameters:
          url: ${PDP1_URL:http://localhost:30040}
          maxConcurrentConnections: ${PDP1_MAX_CONCURRENT_CONNECTIONS:5}
          timeout: ${PDP1_TIMEOUT:30s}
          ignoreSSL: ${PDP1_IGNORE_SSL:false}
    locations:
      l1:
        path: ${LOCATION1_PATH}
    converters:
      t1:
        type: goTemplate
        templateProperties:
          content: ${CONVERTER1_CONTENT}
      t2:
        type: goTemplate
        templateProperties:
          content: ${JOB1_CONVERTER1_CONTENT}
    flows:
      flow1:
        mode: Full
        source:
          datasource: ds1
          schema: ${FLOW1_SOURCE_SCHEMA}
          table: ${FLOW1_SOURCE_TABLE}
          uid: ${FLOW1_SOURCE_UID}
        pdp:
          id: pdp1
          runtimeParameters:
            type: ${FLOW1_PDP_RUNTIME_PARAMETERS_TYPE:userList}
            userListParameters:
              resourceType: ${FLOW1_PDP_RUNTIME_PARAMETERS_USER_LIST_PARAMETERS_RESOURCE_TYPE}
            userAccessTokenParameters:
              entityTypeId: ${FLOW1_PDP_RUNTIME_PARAMETERS_USER_ACCESS_TOKEN_PARAMETERS_ENTITY_TYPE_ID}
            path: ${FLOW1_PDP_RUNTIME_PARAMETERS_PATH}
            clientId: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_ID}
            clientSecret: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_SECRET}
            includeAsset: true
            includeIdentity: true
            includeAccessPolicy: false
        convert:
          batchSize: ${FLOW1_CONVERT_BATCH_SIZE:10}
          converters:
            c1:
              id: t1
              output:
                transient: true
                location:
                  id: l1
                  filenameTemplate: ${FLOW1_CONVERTER1_FILENAME_TEMPLATE:output-jb1-flow1-{{nowWithFormat "20060102-150405"}}.json}
    jobs:
      jb1:
        flows:
          - flow1
        maxWorkers: ${JOB1_MAX_WORKERS:100}
        schedule: "00 18 * * *"
        converters:
          jc1:
            id: t2
            output:
              location:
                id: l1
                filenameTemplate: ${JOB1_CONVERTER1_FILENAME_TEMPLATE:output-aggregate1-jb1-{{nowWithFormat "20060102-150405"}}.json}
    

    Subject Data Sources & PDP Flows

    The Access File Authorizer processes subjects population, calculates access for each subject, and generates an access file representing Authorizations for the entire population. To support this process, the Authorizer loads a predefined subjects population from a customer's data source and evaluates each with a predefined PDP calculation flow based on modeled Templates and Policies in the PlainID Platform.

    Data Sources

    The subjects processed by using the Authorizer are either Identities or Assets (resources) for which we need to process Authorization decisions. Data sources are defined in the Authorizer configuration in two parts:

    • Data Source Connectivity - Defined under the datasources set in the configuration and contains the connection details such as host, port, user and password.

    • Data Reference - Defined under the flow: source and contains a reference to the connectivity setup as well as the schema, table and uid setup, referencing the source of data to be fetched for subjects.

    The Data Source configuration can be a direct connection to a customer DB or by leveraging the PIP service using postgres transport. If using the PIP, the connectivity is set to the pip-operator using the Postgres port and by referencing a View name as the Source table. This allows using different types of Data Sources (not only databases), creating virtual views defining population subsets, and performing any required data normalization based on PlainID PIP capabilities.

    The uid should be a unique identifier of the subject, such as UserID for Identities or AssetID for resources.

    Note: The subjects data source must be preordered by the uid field to avoid any data inconsistencies and problems with multi values. The ordering can be also achieved by using PIP Views.

    PDP Flows

    As the Authorizer is capable to process both types of subjects, whether Identities or Assets, it can use two different PDP calculation flows, User Access Token or User List. We recommend to define your use case modeling and test your PDP calculation prior to setting it up for use by the Authorizer. You can consult with PlainID Global Services team on the proper modeling.

    For each Flow, you can reference a predefined PDP and specify additional parameters for the required PDP calculation. These parameters include:

    • clientId and clientSecret to identify the Policy Scope used
    • Type of PDP flow, either userList or userAccessToken
    • PDP endpoint to use, based on the flow type, specified with the path parameter
    • Identity/Asset Types to determine the Templates used for calculation, with resourceType and entityTypeId parameters
    • PDP request flags to enrich the PDP response for file processing, including includeAsset, includeIdentity, and includeAccessPolicy Example PDP (Runtime) parameters defined for a Flow:
    runtimeParameters:
      type: ${FLOW1_PDP_RUNTIME_PARAMETERS_TYPE:userList}
      userListParameters:
        resourceType: <RESOURCE_TYPE>
      userAccessTokenParameters:
        entityTypeId: <ENTITY_TYPE_ID>
      path: <PDP URI PATH>
      clientId: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_ID}
      clientSecret: ${FLOW1_PDP_RUNTIME_PARAMETERS_CLIENT_SECRET}
      includeAsset: true
      includeIdentity: true
      includeAccessPolicy: false
    

    Templates

    The Authorizer uses Go Templates as a standard way of defining file structure and injecting data elements into it, while processing the access file based on the PDP authorization decisions fetched and stored for subjects population.

    We recommend simulating PDP requests (e.g., using Postman) and working with authentic sample responses when building and defining the Go template. For guidance on creating Go Templates, consider utilizing resources like repeatit.io or gotemplate.io. Additionally, you can consult the PlainID Professional Services team for further assistance.

    In addition, the Authorizer exposes a template validation endpoint, which assists in validating your Templates. See more info under Authorizer API Endpoints.

    Supported Template Functions

    Templates defined with Go Templates can include custom functions executed as part of the file generation. The Authorizer currently supports these functions:

    • Pad Right - Can be used in your output file when padding any line with spaces.
      • Syntax - padRight <String> <Length>
      • Examples -
        • { {{ padRight "UserData": [ 123 -}} - In this example, the output line will be "UserData": [ and spaces are padded up to 123 line length.
        • {{- range $index, $data := . }} {{if gt $index 0 }}{{padRight ",{" 123 }}{{else}}{{ padRight "{" 123 }}{{ end }}
        • {{ padRight "UserID": [ 123 -}}
    • File Input - Allows you to inject a file into the template. See a detailed explanation under Flow Results Aggregator on how this function can be used.

    Additional Template Hints

    You can use these templating hints:

    • Use {{ "\r\n" }} in the template to mark a new line.
    • Use If statements for conditional placement of data in the template. Example:
      • {{ if gt $index 0 }}
    • Use - to trim whitespace before or after the template expression, depending on its placement.
    • Use ranges to iterate over data lists. Example:
      • {{- range $entityIndex, $entity := (index $data.response 0).access }} -
      • The range iterates over the elements in .access, which assigns the:
        • index to $entityIndex.
        • value of the current element to $entity.
      • index $data.response 0 retrieves the first element in the $data.response array.
      • .access retrieves the access field from the first element of $data.response.
    • Use this snippet to add array data elements with a comma after, excluding a comma from the last item {{- range $index, $data := . }}{{ if gt $index 0 }},{{ "\r\n" }}{{ end }}.

    Filename Template

    The file name of the output files is also configured and can be defined using Templates including a time stamping as part of the name like this output-jb3-flow3-{{nowWithFormat "20060102-150405"}}.json. The nowWithFormat template function gets a time stamp format and uses it with the current time stamp when creating the file name.

    Flow Results Aggregator

    The Flow Results Aggregator enables complex file generation by combining results from multiple flows. This is particularly useful for intermediate processing steps that require creating file structures, which can later be incorporated into a more complex template.

    At the job level, you can define an additional template converter that takes one or more Flow output files as inputs and integrates them into the defined template along with other template structures and elements. This is done by using the File Input template function to reference other Flow files. The syntax is:

    {{ fileInput "<flowName>" "<converterId>" }}

    For example, this template definition aggregates outputs from multiple flows into a final job output file in JSON format:

    {
      "ResourceTypes": [
        {
          {{ fileInput "flow2" "c2" }},
          {{ fileInput "flow3" "c3" }},
          {{ fileInput "flow4" "c4" }}
        }
      ],
      "AdditionalAccess": {{ fileInput "flow1" "c1" }}
      // where flow1 and c1 are references to the flowId and its defined converter.
    }
    

    At the Flow level, you can configure whether the generated Flow file is temporary by setting output: transient: true:

    • If set to true, the Flow file is saved temporarily until it is used by the aggregation template during job completion. It is automatically cleaned up after successful aggregation.
    • If set to false, the Flow file is saved as an output file at the designated location, and will be injected into the aggregation template.

    Example Configuration with multiple flows and file aggregation used as the job converter:

    converters:
      t1: ...
      t2: ...
      t3: ...
      t4: ...
      aggregator_template:
        type: goTemplate
        templateProperties:
          content: |
            {
    		  "ResourcesTypes":[
    			{
    			  {{ fileInput "flow2" "c2" }},
    			  {{ fileInput "flow3" "c3" }},
    			  {{ fileInput "flow4" "c4" }}
    			}
    		  ],
    		  "AdditionalAccess": {{ fileInput "flow1" "c1" }}
    		}
    flows:
      # Main flow with permanent output
      flow1:
        source:
          datasource: ds1
          schema: ${FLOW1_SCHEMA:public}
          table: ${FLOW1_TABLE:users}
        convert:
          converters:
            c1:
              id: t1
              output:
                location:
                  id: l1
                  filenameTemplate: ${FLOW1_OUTPUT:users-{{nowWithFormat "20060102"}}.json}
    
      # Flows with temporary outputs
      flow2:
        source:
          datasource: ds1
          schema: ${FLOW2_SCHEMA:public}
          table: ${FLOW2_TABLE:locations}
        convert:
          converters:
            c2:
              id: t2
              output:
                transient: true  # Mark as temporary
                location:
                  id: l1
                  filenameTemplate: ${FLOW2_OUTPUT:temp-loc-{{nowWithFormat "20060102"}}.json}
    
      flow3:
        source:
          datasource: ds1
          schema: ${FLOW3_SCHEMA:public}
          table: ${FLOW3_TABLE:departments}
        convert:
          converters:
            c3:
              id: t3
              output:
                transient: true  # Mark as temporary
                location:
                  id: l1
                  filenameTemplate: ${FLOW3_OUTPUT:temp-dept-{{nowWithFormat "20060102"}}.json}
    
      flow4:
        source:
          datasource: ds1
          schema: ${FLOW4_SCHEMA:public}
          table: ${FLOW4_TABLE:offices}
        convert:
          converters:
            c4:
              id: t4
              output:
                transient: true  # Mark as temporary
                location:
                  id: l1
                  filenameTemplate: ${FLOW4_OUTPUT:temp-office-{{nowWithFormat "20060102"}}.json}
    
    jobs:
      complex_job:
        timeout: ${JOB1_TIMEOUT:24h}
        flows:
          - flow1
          - flow2
          - flow3
          - flow4
        converters:
          final:
            id: aggregator_template
            output:
              location:
                id: l1
                filenameTemplate: ${FINAL_OUTPUT:final-{{nowWithFormat "20060102"}}.json}
    

    In this example:

    • flow1 generates a permanent output with user data
    • flow2flow3, and flow4 generate temporary files (marked with transient: true)
    • The job's final converter uses the fileInput function to:
      • Combine related data from flows 2-4
      • Include additional data from flow1
      • Generate a single aggregated output file
    • All temporary files (2,3 and 4) are automatically cleaned up after successful processing
    • Final result of the job will be file1 and aggregated file

    Using the Authorizer

    After configuring and deploying, the Authorizer can be used to run jobs and consume the output access files. In this section you can find details on the different Modes of executions, Authorizer APIs, Scheduler details, and more.

    Authorizer API Endpoints

    The following API endpoints are exposed by the Authorizer to execute jobs and manage processes:

    Job Execution API

    The jobs can be triggered via the Jobs API, or based on a cron schedule in the configuration. For more information, see the Scheduled Job Execution section.

    PUT http://{{authz-host}}:8080/1.0/jobs/{{job_name}}

    Returns:

    • HTTP 200 Success - Job was successfully triggered
    • HTTP 409 Conflict - Job with same name is already running. The request is rejected with an error message, prevents multiple instances of the same job from running simultaneously.

    Example:

    # Start a job
    curl -X PUT http://localhost:8080/1.0/jobs/job1 \
      -H 'Content-Type: application/json' \
      -d '{"running": true}'
    

    Job Force Execution API

    The API endpoint supports optional force query parameter which allows you to execute a job even if it is already running. This terminates the existing job run and starts a new one.

    This is useful if you need to:

    • Restart a stuck job
    • Override current execution with different parameters
    • Start an urgent run regardless of current state
    Important Note

    Use with caution when starting an urgent run, as it can interrupt ongoing processing

    PUT http://{{authz-host}}:8080/1.0/jobs/{{job_name}}?force=true

    Example:

    # Force start a job (even if one is already running)
    curl -X PUT 'http://localhost:8080/1.0/jobs/job1?force=true' \
      -H 'Content-Type: application/json' \
      -d '{"running": true}'
    

    Job Execution History API

    "This API endpoint retrieves the execution history of jobs, providing detailed status information for management and tracking purposes. It also supports filtering by run ID." GET http://{{authz-host}}:8080/1.0/jobs/{{job_name}}/history

    Examples:

    # Get all history for a job
    curl http://localhost:8080/1.0/jobs/job1/history
    
    # Filter by specific run ID
    curl http://localhost:8080/1.0/jobs/job1/history?filter[run_id]=ba47244d-75dc-4ad6-8bd0-c91f395ff907
    

    Response example:

    {
      "data": [
        {
          "job_id": "jb1",
          "run_id": "ba47244d-75dc-4ad6-8bd0-c91f395ff907",
          "status": "SUCCESS",
          "mode": "Normal",
          "success_flows": ["flow1"],
          "success_row_count": 18,
          "start_time": "2024-12-24T13:22:03.988281+02:00",
          "end_time": "2024-12-24T13:22:47.070309+02:00"
        }
      ]
    }
    

    Notes:

    • Possible statuses - SUCCESS, RUNNING, and FAILED
    • Timestamps are in UTC.
    • The history data is stored and fetched from the operational database. You may need to maintain this database table and maintain it over time to avoid storing long history and large amount of data.

    Template Validation API

    The Authorizer generates output files based on Authorization Decisions stored in the operational database and Go Templates that define the file structure and data conversion. To simplify the process of defining Go Templates, the Authorizer exposes a template validation endpoint, allowing users to test template definitions with a sample PDP authorization decision.

    This endpoint supports:

    • Pre-deployment template testing
    • Inline Templates and configured converter reference support
    • Template syntax and execution validation

    POST http://{{authz-host}}:8080/1.0/templates/validation

    The endpoint receives a payload with two parameters:

    • template - An inline escaped Go template string
    • converterId(alternative to Template) - A reference to a configured converter
    • inputs - A JSON array with a sample PDP response simulating the Authorization decision which is processed during flow execution based on PDP definitions.

    The endpoint returns template validation errors or a sample of the generated file based on the provided inputs when validation succeeds.

    Example with inline template:

    curl -X POST http://localhost:8080/1.0/templates/validation \
      -H 'Content-Type: application/json' \
      -d '{
        "template": "{\"groups\":[{{- range $i, $data := . }}{{- if $i }},{{end}}{\"groupName\":\"{{ $data.asset.path }}\"}{{- end}}]}",
        "inputs": [
          {
            "asset": {
              "path": "GroupName12",
              "attributes": {
                "Platforms": ["Platform"]
              }
            }
          }
        ]
      }'
    

    Example with converter reference:

    curl -X POST http://localhost:8080/1.0/templates/validation \
      -H 'Content-Type: application/json' \
      -d '{
        "converterId": "t1",
        "inputs": [
          {
            "asset": {
              "path": "GroupName12",
              "attributes": {
                "Platforms": ["Platform"]
              }
            }
          }
        ]
      }'
    

    Modes of Execution

    Each Flow can be configured to run in one of two modes: Full or Normal.

    Normal Mode (default)

    • Performs delta detection by calculating and comparing hashes of all subject attributes fetched from the data source, efficiently processing only subjects that have changed since the last run.
    • Ideal for regular incremental updates.

    Full Mode

    • Executes a complete run, processing all subjects fetched from the data source, regardless of changes in the subject's attributes or their previous state.
    • Recommended when:
      • The population size is manageable within the required time constraints
      • System-wide policy changes necessitate a recalculation of all access
      • Full synchronization of access rights is required

    Mode Configuration Hierarchy

    The execution mode can be specified at different levels, with each level overriding the previous one:

    1. Flow Level (Base configuration)
      flows:
        flow1:
          mode: Normal
      
    2. Job Level (Overrides Flow mode)
      jobs:
        job1:
          mode: Full
          flows:
            - flow1
      
    3. API Level (Overrides both job and flow modes)
      curl -X PUT http://localhost:8080/1.0/jobs/job1 \
        -H 'Content-Type: application/json' \
        -d '{"running": true, "mode": "Full"}' // Or "Normal"
      

    This hierarchical configuration provides granular control over execution modes, allowing you to:

    • Set default modes at the Flow level
    • Override multiple Flow modes at the job level
    • Use different modes via APIs without modifying configuration

    Example configurations:

    flows:
      flow1:
        mode: Normal # Default to incremental updates
      flow2:
        mode: Full # Always do full processing
    
    jobs:
      daily_job:
        mode: Normal # Override both flows to Normal mode
        flows:
          - flow1
          - flow2
      weekly_sync:
        mode: Full # Override both flows to Full mode
        flows:
          - flow1
          - flow2
    

    Scheduled Job Execution

    The Authorizer uses Redis for reliable job scheduling and execution. The schedule parameter is used to specify the cron schedule for the job.

    You can get more info on cron scheduling expressions from the web in sites such as https://crontab.guru/ or https://crontab.cronhub.io/

    Here are some job scheduling configuration examples:

    jobs:
      # Daily job at 6 PM
      daily_job:
        flows:
          - flow1
        maxWorkers: 100
        schedule: "00 18 * * *"
    
      # Weekly job on Monday at 2 AM
      weekly_sync:
        flows:
          - flow1
          - flow2
        maxWorkers: 50
        schedule: "00 02 * * 1"
    
      # Every 15 minutes
      frequent_check:
        flows:
          - flow3
        maxWorkers: 20
        schedule: "*/15 * * * *"
    
      # Multiple times per day
      multi_daily:
        flows:
          - flow1
        maxWorkers: 30
        schedule: "00 */4 * * *" # Every 4 hours
    

    Ongoing Maintenance, Monitoring & Logging

    Maintenance

    • Operational Database - The predefined Operational Postgres DB requires ongoing customer maintenance, including backups, cleanups, reindexing etc. Consult with PlainID Professional Services team for guidelines and assistance.
    • Policy Changes – Customers should be aware of any Policy changes that require access recalculation. Necessary changes should be made, and a full recalculation and generation of access files triggered as needed, at their discretion.
    • Data Source Changes – Customers should be aware of any system or data source changes/updates that may require adjustments to configuration or policies, as well as a full recalculation of access files.

    Monitoring

    • Job Monitoring – Customers should monitor job execution status using the Job Execution History API, the operational DB status table, and access decision errors to track major events and errors during execution.
    • PDP Audits and Logs – Since the PDP interacts extensively with this Authorizer, monitoring PDP audits and logs can help identify unusual system behaviors that may require administrative attention.

    Logging

    • Customer should collect and monitor Authorizer logs to observe for major events as well as for troubleshooting.
      • Important log entries:
    Log Entry ExampleDescription
    received job: {"id":"jb1","trigger":"cron","mode":""}Scheduled job received
    Start Request: PUT /1.0/jobs/jb1Job started
    finished to process job: jb1Job completed
    starting to populate subjects from flows for job: jb1Fetching data from the data source
    starting to calculate access for job: jb1PDP processing access
    finished converting flow: jobID: jb1, flowID: flow1Flow conversion completed

    Was this article helpful?