Polymorphic Versions

How DataPancake proactively creates all 7 polymorphic versions for every attribute upfront, then activates only the versions that match discovered data types.

What Are Polymorphic Versions?

Polymorphic versions represent different data type variations of the same attribute path. DataPancake proactively accounts for all possible polymorphic variations by creating all 7 possible polymorphic versions for every attribute when it's first discovered, regardless of what type was actually found in the data.


The 7 Polymorphic Variations

DataPancake recognizes that any attribute path can potentially appear in 7 different ways across different records:

  • str (String) — Text values

  • int (Integer) — Whole numbers

  • float (Decimal) — Floating-point numbers

  • bool (Boolean) — True/false values

  • object — Nested objects/structures

  • array_primitive — Arrays containing primitive values (strings, numbers, booleans)

  • array_object — Arrays containing objects/structures


Key Insight: DataPancake doesn't wait to discover polymorphic variations—it proactively creates all 7 versions for every attribute when first encountered.


Proactive Version Creation

How it works:

1

When an attribute is first discovered

  • DataPancake immediately creates all 7 polymorphic version records for that attribute.

  • Each version is initially created with status = 'inactive'.

  • Only the version(s) that match the actually discovered data type are activated (status = 'active').

  • Inactive versions remain in the system, ready to be activated if that data type appears in future scans.

2

Rationale

  • Future-proofing: If an attribute becomes polymorphic later (e.g., status changes from string to number), the version already exists and can be activated.

  • Consistent structure: All attributes have the same potential polymorphic structure, making the system predictable.

  • Schema drift handling: When schema changes occur, new polymorphic versions can be activated without structural changes.

  • Zero technical debt: The system is always ready for any polymorphic variation.


How Polymorphic Versions Are Named

Each polymorphic version gets a unique name based on its data type:

  • Primitive types: {attribute_name}_{type}

    • Examples: status_string, price_int, rating_float, is_active_bool

  • Object type: {attribute_name}_object

    • Example: address_object

  • Array types: {attribute_name}_array_{array_type}

    • Examples: tags_array_primitive, orders_array_object


Example: Polymorphic Version Creation

When DataPancake first discovers an attribute path property_type as a string:

What actually happens:

1

Initial discovery

  • DataPancake creates the attribute record for property_type.

  • Proactively creates all 7 polymorphic versions:

    • property_type_stringactivated (matches discovered type)

    • property_type_int → inactive

    • property_type_float → inactive

    • property_type_bool → inactive

    • property_type_object → inactive

    • property_type_array_primitive → inactive

    • property_type_array_object → inactive

2

Later: data becomes polymorphic

If a new record contains property_type as an object:

  • DataPancake detects the new data type during scanning.

  • Activates the existing property_type_object version (it was already created).

  • Updates the version status from 'inactive' to 'active' and sets the version status date.

Result: Both versions are now active:

  • property_type_string (active)

  • property_type_object (active)


Version Status

Each polymorphic version has a status:

  • Active — This data type has been found in the data and is included in code generation

  • Inactive — This data type hasn't been found yet, but the version exists and is ready to be activated

Status management:

  • Versions are activated when their corresponding data type is discovered.

  • Versions can become inactive if that data type disappears from the data (schema evolution).

  • Only active versions are included in SQL code generation.

  • Inactive versions remain in the system for potential future activation.


Why Polymorphic Versions Exist

Semi-structured data is flexible, and the same path can contain:

  • Different primitive types — status might be a string in some records and a number in others

  • Objects vs. primitives — address might be a string in some records and an object in others

  • Arrays vs. primitives — tags might be a string in some records and an array in others

  • Different array types — An array might contain objects in some cases and primitives in others

By proactively creating all possible versions, DataPancake ensures:

  • No surprises: The system is always ready for any polymorphic variation

  • Smooth schema evolution: New variations can be activated without structural changes

  • Consistent metadata: All attributes follow the same polymorphic structure


Managing Polymorphic Versions

Reviewing Versions

1

Check all versions

  • Check all polymorphic versions for an attribute.

  • Understand why multiple versions exist.

2

Decide inclusion

  • Decide which versions to include in code generation.

3

Ongoing review

  • Monitor version statuses and version status dates for schema changes.

Handling Polymorphism

  • Use transformation expressions to handle type variations.

  • Consider data quality implications.

  • Document polymorphic behavior.

Version Management Best Practices

  • Keep active only the versions you need.

  • Monitor version status dates for schema changes.

  • Review inactive versions periodically.

  • Understand that inactive versions are ready for future activation.

Last updated

Was this helpful?