Source Schema

Read-only metadata discovered during scanning, including attribute paths, nesting levels, data types, structure information, and sample values representing what DataPancake found in your source data.

Overview

Source Schema metadata represents what DataPancake discovered during scanning. Primarily read-only (except RECORD_STATUS). Provides structure, types, and content information.


Source Schema Fields

Path Information

Attribute Path (ATTRIBUTE_PATH)

  • The full path to the attribute in the source data

  • Example: customer.contact.email

  • Used to reference the attribute in source queries

  • Read-only - Set during discovery

Attribute Path Embedded (ATTRIBUTE_PATH_EMBEDDED)

  • Path for attributes found within embedded/stringified JSON

  • Tracks nested JSON structures within string fields

  • Example: For JSON stored as a string, this tracks the path within that JSON

  • Read-only - Set during discovery

Attribute Name (ATTRIBUTE_NAME)

  • The leaf name of the attribute

  • Example: For path customer.contact.email, the name is email

  • Read-only - Extracted from attribute path


Structure Information

Attribute Level (ATTRIBUTE_LEVEL)

  • The nesting depth of the attribute

  • Root level attributes have level 0

  • Each nested level increments the count

  • Example: customer = level 0, customer.contact = level 1, customer.contact.email = level 2

  • Read-only - Calculated during discovery

Attribute Order (ATTRIBUTE_ORDER)

  • Ordering information for attributes at the same level

  • Used for consistent presentation in the UI

  • Read-only - Set during discovery

Parent Object (PARENT_OBJECT)

  • The parent object path containing this attribute

  • Example: For customer.contact.email, parent object is customer.contact

  • Empty for root-level attributes

  • Read-only - Set during discovery

Parent Array (PARENT_ARRAY)

  • The parent array path if this attribute is within an array

  • Example: For orders[0].order_id, parent array is orders

  • Used for foreign key relationship configuration

  • Empty for non-array attributes

  • Read-only - Set during discovery

Parent Array Embedded (PARENT_ARRAY_EMBEDDED)

  • Parent array path for attributes within embedded JSON arrays

  • Tracks arrays within stringified JSON

  • Read-only - Set during discovery


Type Information

Source Data Type (SOURCE_DATA_TYPE)

  • Inferred data type from source data

  • Values: 'str', 'int', 'float', 'bool', 'object', 'array', 'null'

  • Read-only - Inferred during scanning

Polymorphic Attribute Name (POLYMORPHIC_ATTRIBUTE_NAME)

  • Unique name for this polymorphic version

  • Format: {attribute_name}_{type} or {attribute_name}_array_{array_type}

  • Examples: email_str, price_float, orders_array_object

  • Read-only - Generated based on source data type

Array Type (ARRAY_TYPE)

  • For array attributes, the type of array elements

  • Values: 'object', 'primitive', 'primitive,object'

  • Only applicable when SOURCE_DATA_TYPE = 'array'

  • Read-only - Determined during discovery

Array Primitive Type (ARRAY_PRIMITIVE_TYPE)

  • For primitive arrays, the data type of array elements

  • Values: 'str', 'int', 'float', 'bool'

  • Only applicable when ARRAY_TYPE = 'primitive' or ARRAY_TYPE = 'primitive,object'

  • Read-only - Determined during discovery


Content Information

Sample Value (SAMPLE_VALUE)

  • A sample value from the source data

  • For string types, defaults to "string value" for privacy/security

  • Helps users understand the data content

  • Read-only - Captured during scanning

Has Embedded Content (HAS_EMBEDDED_CONTENT)

  • Boolean indicating if this attribute contains embedded/stringified JSON

  • Triggers recursive parsing of the embedded content

  • When TRUE, DataPancake recursively parses the JSON string to discover nested attributes

  • Read-only - Detected during scanning


Attribute Classification

Attribute Type (ATTRIBUTE_TYPE)

  • The origin of the attribute

  • Values:

    • 'Discovered' - Found during scanning

    • 'Schema' - Created from schema sample

    • 'Virtual' - User-created custom attribute

  • Read-only - Set based on how attribute was created

Attribute Schema Component Type (ATTRIBUTE_SCHEMA_COMPONENT_TYPE)

  • Classifies the schema component type

  • Used for internal organization and categorization

  • Read-only - Set during discovery

Version Status Date (VERSION_STATUS_DATE)

  • Timestamp when polymorphic version was created or last activated

  • Used for tracking schema evolution

  • Read-only - Set when version is created/activated


Status Control

Attribute Record Status (RECORD_STATUS)

  • Controls whether attribute is active or inactive

  • Values: 'active', 'inactive'

  • Editable - Only editable field in Source Schema tab

  • Active attributes included in code generation (if INCLUDE_IN_CODE_GEN = TRUE)

  • Inactive attributes excluded from code generation


Virtual Attributes

User-created custom attributes (created via UI or sp_upsert_virtual_datasource_attribute).

Required fields:

  • Attribute Name (no spaces)

  • Source Data Type

  • Snowflake Data Type

  • Transformation Expression (SQL)

Optional fields:

  • Parent Array (for array-level virtual attributes)

  • W_QUESTION_CATEGORY (for Cortex Analyst semantic models)

  • Description

Characteristics:

  • Single polymorphic version (no polymorphism)

  • Automatically set to INCLUDE_IN_CODE_GEN = TRUE

  • ATTRIBUTE_TYPE = 'Virtual'

  • Can reference other attributes using {attribute_name} placeholder in expressions


Common Scenarios

Understanding nested structure:

  • Use ATTRIBUTE_LEVEL (0 = root, increments per level) and PARENT_OBJECT to understand nesting

Identifying embedded JSON:

  • Check HAS_EMBEDDED_CONTENT = TRUE for stringified JSON

  • ATTRIBUTE_PATH_EMBEDDED shows nested structure within string fields

Working with arrays:

  • ARRAY_TYPE indicates 'object', 'primitive', or 'primitive,object'

  • ARRAY_PRIMITIVE_TYPE (if applicable) shows element type

  • PARENT_ARRAY shows containing array for nested attributes

Deactivating attributes:

  • Set RECORD_STATUS = 'inactive' to exclude from code generation without deleting

  • Useful for temporarily excluding, preserving history, or testing

Last updated

Was this helpful?