Attributes
Overview
Attributes are the fundamental building blocks of DataPancake's schema discovery and pipeline generation process. When DataPancake performs a deep, recursive scan of your semi-structured data source, it discovers every attribute that has ever occurred at any level of nesting depth. This comprehensive discovery includes:
All nested structures - Attributes at any depth, regardless of how deeply nested
Embedded/stringified JSON - JSON blobs stored as escaped strings within string fields
Polymorphic data - Attributes where the same path contains different data types across different records
Attributes serve as the metadata foundation for generating relational Dynamic Tables and Views from your semi-structured data. Each attribute represents a potential column in your flattened, normalized output.
Mental Model
Think of attributes as the "DNA" of your semi-structured data. Just as DNA contains all the genetic information needed to build an organism, attributes contain all the structural information needed to transform your JSON/XML into relational tables.
Key Concepts:
One Attribute, Multiple Versions - A single attribute path (like
customer.address) can have multiple polymorphic versions if it appears as different data types (string, object, array) in different recordsProactive Polymorphic Accounting - DataPancake proactively creates all 7 possible polymorphic versions for every attribute upfront, not reactively when polymorphism is detected
Discovery vs. Configuration - Attributes are discovered automatically, but you configure how they're transformed into SQL
Lifecycle Management - Attributes can be active or inactive, and versions are activated/deactivated as schema evolves
Critical Understanding: When DataPancake first discovers an attribute, it doesn't just create a version for the data type it found. Instead, it proactively creates all 7 possible polymorphic versions immediately. Only the version(s) matching the discovered data type are activated. This means the system is always ready for any polymorphic variation that may occur in the future, making schema evolution seamless and eliminating technical debt.
What You'll Learn
This Attributes section covers:
Attribute Discovery - How attributes are discovered during scanning
Attribute Types - Discovered, Schema, and Virtual attributes
Polymorphic Versions - How DataPancake handles the 7 polymorphic variations
Attribute Metadata - All metadata fields that control transformation
Arrays & Nested Structures - How arrays are handled
Attribute Lifecycle - Creation, versioning, and schema evolution
Attributes in Code Generation - How attributes become SQL columns
Best Practices - Configuration recommendations
Integration & API - Programmatic access
Quick Reference
Attribute Types:
Discovered - Found during scanning
Schema - Created from schema samples
Virtual - User-created custom attributes
The 7 Polymorphic Variations:
str(String)int(Integer)float(Decimal)bool(Boolean)objectarray_primitivearray_object
Key Tables:
core.datasource_attribute- Core attribute informationcore.datasource_attribute_polymorphic_version- Version-specific metadata
Last updated
Was this helpful?