Attribute Types
Overview of the three attribute types—Discovered (from scans), Schema (from samples), and Virtual (user-created)—and when to use each.
Attribute Types
DataPancake categorizes attributes into three types based on their origin:
Discovered Attributes
Definition: Attributes automatically discovered during the scanning process.
Characteristics:
Created when
ATTRIBUTE_CREATE_TYPEis set to'Discover'in the scan configurationRepresent actual fields found in your source data
Cannot be deleted (only marked inactive)
Form the foundation of your schema discovery
All 7 polymorphic versions are proactively created when first discovered
Use Cases:
Standard schema discovery workflows
When you want to discover all attributes from actual data
Creation Process:
Created during scan process
Attribute record created in datasource_attribute table
All 7 polymorphic versions proactively created in datasource_attribute_polymorphic_version table
Only the version(s) matching the discovered data type are activated (status = 'active')
Non-matching versions are created but remain inactive (status = 'inactive')
Initial status dates set for activated versions
Schema Attributes
Definition: Attributes created from a schema sample rather than full data scanning.
Characteristics:
Created when
ATTRIBUTE_CREATE_TYPEis set to'Schema'in the scan configurationBased on a schema sample provided during data source creation
Useful for rapid prototyping without scanning full datasets
Can be updated when full scans are performed
All 7 polymorphic versions are proactively created when first created
Use Cases:
Quick onboarding with schema samples
Prototyping pipelines before full data is available
Testing configurations with sample schemas
Creation Process:
Similar to Discovered attributes, but based on schema sample rather than full data scan
All 7 polymorphic versions are created upfront
Versions matching the schema sample are activated
Virtual Attributes
Definition: User-created custom attributes that don't exist in the source data.
Characteristics:
Created manually by users through the UI or API
Defined with transformation expressions
Used for derived fields, calculated metrics, and semantic model components
Can be used for filters, metrics, facts, and dimensions in Cortex Analyst semantic models
Only one polymorphic version created (no polymorphism for virtual attributes)
Automatically set to
INCLUDE_IN_CODE_GEN = TRUE
Use Cases:
Derived fields - Calculate values from existing attributes (e.g.,
full_name = first_name || ' ' || last_name)Semantic model metrics - Create aggregations for Cortex Analyst (e.g.,
total_revenue = SUM(order_total))Semantic model filters - Define filterable dimensions
Business logic - Add computed columns not in source data
Creating Virtual Attributes:
Must specify: attribute name, source data type, Snowflake data type
Can optionally specify: parent array, transformation expression, W_QUESTION_CATEGORY (for semantic models)
Created via UI or API (
sp_upsert_virtual_datasource_attribute)
Creation Process:
Created via UI or API (sp_upsert_virtual_datasource_attribute)
User specifies name, types, and transformation expression
Single polymorphic version created (no polymorphism for virtual attributes)
Automatically set to INCLUDE_IN_CODE_GEN = TRUE
Comparison
Source
Full data scan
Schema sample
User-created
Polymorphic Versions
All 7 created
All 7 created
Single version
Can Delete
No (mark inactive)
No (mark inactive)
Yes
Use Case
Production discovery
Rapid prototyping
Custom logic
Update Method
Re-scan
Re-scan or update
UI/API update
Last updated
Was this helpful?