Attribute Metadata Details

Complete reference for attribute metadata, covering discovered source schema fields and configurable extended metadata that controls SQL transformation, security policies, and code generation.

Overview

Attributes contain extensive metadata that controls how they're transformed into SQL. This metadata is stored in two main tables:

  1. core.datasource_attribute - Core attribute information

  2. core.datasource_attribute_polymorphic_version - Version-specific metadata

DataPancake discovers the source schema attributes and generates metadata based on those attributes, then creates additional extended metadata to allow you to provide additional configuration information for flattening and normalization, security policy integration, schema transformation, and data enrichment.


Source Schema vs. Extended Metadata

Source Schema Metadata:

  • Automatically discovered during scanning

  • Represents what DataPancake found in your data

  • Includes paths, data types, sample values, and structure information

  • Generally read-only (except for record status)

Extended Metadata:

  • User-configurable fields for transformation

  • Controls how attributes become SQL columns

  • Includes data type mappings, transformations, security policies, and semantic layer configurations

  • Fully editable to customize pipeline behavior


Metadata Organization

The Pipeline Designer organizes attribute metadata into logical sections:

  • Source Schema Metadata - Discovered attributes, paths, types, and structure information

  • Destination Schema Metadata - Snowflake data type configuration and key identification

  • Transformation & Security Metadata - Column naming, transformations, semantic layer, and security policies

  • Arrays & Nested Structures Metadata - Array-specific configuration and relationships

  • Foreign Key Metadata - Foreign key relationships for array tables


Core Attribute Fields

These fields are stored in core.datasource_attribute and apply to all polymorphic versions of an attribute:

Attribute Path (ATTRIBUTE_PATH)

  • The full path to the attribute in the source data

  • Example: customer.contact.email

  • Used to reference the attribute in source queries

Attribute Path Embedded (ATTRIBUTE_PATH_EMBEDDED)

  • Path for attributes found within embedded/stringified JSON

  • Tracks nested JSON structures within string fields

Attribute Name (ATTRIBUTE_NAME)

  • The leaf name of the attribute

  • Example: For path customer.contact.email, the name is email

Attribute Level (ATTRIBUTE_LEVEL)

  • The nesting depth of the attribute

  • Root level attributes have level 0

  • Each nested level increments the count

Attribute Order (ATTRIBUTE_ORDER)

  • Ordering information for attributes at the same level

  • Used for consistent presentation

Parent Object (PARENT_OBJECT)

  • The parent object path containing this attribute

  • Example: For customer.contact.email, parent object is customer.contact

Parent Array (PARENT_ARRAY)

  • The parent array path if this attribute is within an array

  • Example: For orders[0].order_id, parent array is orders

  • Used for foreign key relationship configuration

Parent Array Embedded (PARENT_ARRAY_EMBEDDED)

  • Parent array path for attributes within embedded JSON arrays

Record Status (RECORD_STATUS)

  • 'active' - Attribute is active and can be used

  • 'inactive' - Attribute is deactivated

Attribute Type (ATTRIBUTE_TYPE)

  • 'Discovered' - Found during scanning

  • 'Schema' - Created from schema sample

  • 'Virtual' - User-created custom attribute

Attribute Schema Component Type (ATTRIBUTE_SCHEMA_COMPONENT_TYPE)

  • Classifies the schema component type

  • Used for internal organization

Is Primary Key (IS_PRIMARY_KEY)

  • Boolean indicating if this attribute is part of the primary key

  • Used for metadata table generation and relationship tracking

Date Active (VERSION_STATUS_DATE)

  • Timestamp when the attribute was first discovered or created

  • Used for tracking attribute lifecycle


Polymorphic Version Fields

These fields are stored in core.datasource_attribute_polymorphic_version and are specific to each polymorphic version:

Type Information:

  • SOURCE_DATA_TYPE - The inferred data type from source data

  • POLYMORPHIC_ATTRIBUTE_NAME - Unique name for this version

  • VERSION_STATUS - Active or inactive status

  • IS_ARRAY - Boolean indicating if this version is an array type

  • ARRAY_TYPE - Object, primitive, or polymorphic array type

  • ARRAY_PRIMITIVE_TYPE - Data type of primitive array elements

  • HAS_EMBEDDED_CONTENT - Boolean for embedded/stringified JSON

  • SAMPLE_VALUE - Sample value from source data

Snowflake Data Type Configuration:

  • DATA_PLATFORM_DATA_TYPE - Snowflake data type

  • DATA_PLATFORM_DATA_TYPE_PRECISION - Precision for numeric types

  • DATA_PLATFORM_DATA_TYPE_SCALE - Scale for numeric types

  • DATA_PLATFORM_DATA_TYPE_DATETIMEFORMAT - DateTime format strings

  • USE_DATETIME_FORMAT - Boolean for using datetime formats

Transformation & Naming:

  • ALIAS_NAME - Custom column name alias

  • CODE_GENERATED_COLUMN_NAME - Final column name in generated SQL

  • NULL_VALUE_EXPRESSION - SQL expression for null handling

  • TRANSFORMATION_TYPE - No Transformation or SQL Expression

  • TRANSFORMATION_EXPRESSION - Custom SQL transformation

  • TRANSFORMATION_EXPRESSION_COMMENT - Documentation for transformation

Semantic Layer:

  • INCLUDE_IN_SEMANTIC_LAYER - Boolean for including in views

  • SEMANTIC_LAYER_EXPRESSION - Additional transformation for views

  • SEMANTIC_LAYER_EXPRESSION_COMMENT - Documentation for view transformation

  • SEMANTIC_LAYER_ALIAS_NAME - Custom alias for views

Security:

  • INCLUDE_IN_SECURITY_ROW_LEVEL_POLICY - Boolean for row-level security

  • MASKING_POLICY_NAME - Name of masking policy

  • MASKING_POLICY_PARAMETERS - Parameters for masking policy

Schema Consolidation:

  • SCHEMA_INSERT_REGULAR_EXPRESSION_SEARCH - Regex pattern for consolidation

  • SCHEMA_INSERT_SQL_EXPRESSION - SQL expression for consolidation

Code Generation:

  • INCLUDE_IN_CODE_GEN - Boolean for including in code generation

  • PARENT_INCLUDE_IN_CODE_GEN - Boolean for parent inclusion

Additional Metadata:

  • IS_ENUM - Boolean for enum values

  • HAS_ALL_UNIQUE_VALUES - Boolean for unique values

  • HAS_ALL_NULL_VALUES - Boolean for null values

  • ADD_TO_CLUSTER_BY - Boolean for CLUSTER BY clause


Metadata Workflow

  1. Discovery - Source schema metadata is automatically discovered during scanning

  2. Configuration - Extended metadata is configured in the Pipeline Designer

  3. Code Generation - Metadata drives SQL code generation for Dynamic Tables and Views

  4. Materialization - Generated SQL creates materialized tables with configured transformations

  5. Semantic Layer - Additional transformations applied in views for analytics


Quick Reference

Editable in Source Schema Tab:

  • Attribute Record Status

Editable in Destination Schema Tab:

  • Snowflake Data Type

  • Precision and Scale

  • DateTime Format

  • Use DateTime Format

  • Primary Key

  • Contains Enum Values

  • Unique Values

Editable in Transformation/Security Tab:

  • Materialized Alias

  • Null Value Expression

  • Transformation Type and Expression

  • Schema Consolidation fields

  • Include in View

  • View Transformation Expression

  • View Alias

  • Row Access Policy

  • Masking Policy

Editable in Arrays Section:

  • Include in Code Gen

  • Materialized Table Alias Name

  • View Alias Name

  • Relationship Name, Description, Type, Join Type

  • View Row Access Policy

Editable in Foreign Keys Section:

  • Parent Attribute Name

  • Foreign Key Column Alias

  • Add to Row Access Policy

Last updated

Was this helpful?