Polymorphic Versions

How DataPancake proactively creates all 7 polymorphic versions for every attribute upfront, then activates only the versions that match discovered data types.

What Are Polymorphic Versions?

Polymorphic versions represent different data type variations of the same attribute path. DataPancake proactively accounts for all possible polymorphic variations by creating all 7 possible polymorphic versions for every attribute when it's first discovered, regardless of what type was actually found in the data.

The 7 Polymorphic Variations

DataPancake recognizes that any attribute path can potentially appear in 7 different ways across different records:

str (String) — Text values
int (Integer) — Whole numbers
float (Decimal) — Floating-point numbers
bool (Boolean) — True/false values
object — Nested objects/structures
array_primitive — Arrays containing primitive values (strings, numbers, booleans)
array_object — Arrays containing objects/structures

Key Insight: DataPancake doesn't wait to discover polymorphic variations—it proactively creates all 7 versions for every attribute when first encountered.

Proactive Version Creation

How it works:

When an attribute is first discovered

DataPancake immediately creates all 7 polymorphic version records for that attribute.
Each version is initially created with status = 'inactive'.
Only the version(s) that match the actually discovered data type are activated (status = 'active').
Inactive versions remain in the system, ready to be activated if that data type appears in future scans.

Rationale

Future-proofing: If an attribute becomes polymorphic later (e.g., status changes from string to number), the version already exists and can be activated.
Consistent structure: All attributes have the same potential polymorphic structure, making the system predictable.
Schema drift handling: When schema changes occur, new polymorphic versions can be activated without structural changes.
Zero technical debt: The system is always ready for any polymorphic variation.

How Polymorphic Versions Are Named

Each polymorphic version gets a unique name based on its data type:

Primitive types: {attribute_name}_{type}
- Examples: status_string, price_int, rating_float, is_active_bool
Object type: {attribute_name}_object
- Example: address_object
Array types: {attribute_name}_array_{array_type}
- Examples: tags_array_primitive, orders_array_object

Example: Polymorphic Version Creation

When DataPancake first discovers an attribute path property_type as a string:

What actually happens:

Initial discovery

DataPancake creates the attribute record for property_type.
Proactively creates all 7 polymorphic versions:
- property_type_string → activated (matches discovered type)
- property_type_int → inactive
- property_type_float → inactive
- property_type_bool → inactive
- property_type_object → inactive
- property_type_array_primitive → inactive
- property_type_array_object → inactive

Later: data becomes polymorphic

If a new record contains property_type as an object:

{"property_type": {"usage": "Commercial", "sq_ft": 20000}}

DataPancake detects the new data type during scanning.
Activates the existing property_type_object version (it was already created).
Updates the version status from 'inactive' to 'active' and sets the version status date.

Result: Both versions are now active:

property_type_string (active)
property_type_object (active)

Version Status

Each polymorphic version has a status:

Active — This data type has been found in the data and is included in code generation
Inactive — This data type hasn't been found yet, but the version exists and is ready to be activated

Status management:

Versions are activated when their corresponding data type is discovered.
Versions can become inactive if that data type disappears from the data (schema evolution).
Only active versions are included in SQL code generation.
Inactive versions remain in the system for potential future activation.

Why Polymorphic Versions Exist

Semi-structured data is flexible, and the same path can contain:

Different primitive types — status might be a string in some records and a number in others
Objects vs. primitives — address might be a string in some records and an object in others
Arrays vs. primitives — tags might be a string in some records and an array in others
Different array types — An array might contain objects in some cases and primitives in others

By proactively creating all possible versions, DataPancake ensures:

No surprises: The system is always ready for any polymorphic variation
Smooth schema evolution: New variations can be activated without structural changes
Consistent metadata: All attributes follow the same polymorphic structure

Managing Polymorphic Versions

Reviewing Versions

Check all versions

Check all polymorphic versions for an attribute.
Understand why multiple versions exist.

Decide inclusion

Decide which versions to include in code generation.

Ongoing review

Monitor version statuses and version status dates for schema changes.

Handling Polymorphism

Use transformation expressions to handle type variations.
Consider data quality implications.
Document polymorphic behavior.

Version Management Best Practices

Keep active only the versions you need.
Monitor version status dates for schema changes.
Review inactive versions periodically.
Understand that inactive versions are ready for future activation.

PreviousAttribute Types NextAttribute Metadata

Last updated 16 hours ago

Was this helpful?

Good night