Attributes

Overview

Attributes are the fundamental building blocks of DataPancake's schema discovery and pipeline generation process. When DataPancake performs a deep, recursive scan of your semi-structured data source, it discovers every attribute that has ever occurred at any level of nesting depth. This comprehensive discovery includes:

  • All nested structures - Attributes at any depth, regardless of how deeply nested

  • Embedded/stringified JSON - JSON blobs stored as escaped strings within string fields

  • Polymorphic data - Attributes where the same path contains different data types across different records

Attributes serve as the metadata foundation for generating relational Dynamic Tables and Views from your semi-structured data. Each attribute represents a potential column in your flattened, normalized output.


Mental Model

Think of attributes as the "DNA" of your semi-structured data. Just as DNA contains all the genetic information needed to build an organism, attributes contain all the structural information needed to transform your JSON/XML into relational tables.

Key Concepts:

  1. One Attribute, Multiple Versions - A single attribute path (like customer.address) can have multiple polymorphic versions if it appears as different data types (string, object, array) in different records

  2. Proactive Polymorphic Accounting - DataPancake proactively creates all 7 possible polymorphic versions for every attribute upfront, not reactively when polymorphism is detected

  3. Discovery vs. Configuration - Attributes are discovered automatically, but you configure how they're transformed into SQL

  4. Lifecycle Management - Attributes can be active or inactive, and versions are activated/deactivated as schema evolves

Critical Understanding: When DataPancake first discovers an attribute, it doesn't just create a version for the data type it found. Instead, it proactively creates all 7 possible polymorphic versions immediately. Only the version(s) matching the discovered data type are activated. This means the system is always ready for any polymorphic variation that may occur in the future, making schema evolution seamless and eliminating technical debt.


What You'll Learn

This Attributes section covers:

  • Attribute Discovery - How attributes are discovered during scanning

  • Attribute Types - Discovered, Schema, and Virtual attributes

  • Polymorphic Versions - How DataPancake handles the 7 polymorphic variations

  • Attribute Metadata - All metadata fields that control transformation

  • Arrays & Nested Structures - How arrays are handled

  • Attribute Lifecycle - Creation, versioning, and schema evolution

  • Attributes in Code Generation - How attributes become SQL columns

  • Best Practices - Configuration recommendations

  • Integration & API - Programmatic access


Quick Reference

Attribute Types:

  • Discovered - Found during scanning

  • Schema - Created from schema samples

  • Virtual - User-created custom attributes

The 7 Polymorphic Variations:

  1. str (String)

  2. int (Integer)

  3. float (Decimal)

  4. bool (Boolean)

  5. object

  6. array_primitive

  7. array_object

Key Tables:

  • core.datasource_attribute - Core attribute information

  • core.datasource_attribute_polymorphic_version - Version-specific metadata

Last updated

Was this helpful?