# Polymorphic Versions

## What Are Polymorphic Versions?

Polymorphic versions represent different data type variations of the same attribute path. DataPancake creates **all 7 polymorphic versions** for every attribute when first discovered, regardless of the discovered type.

***

## The 7 Polymorphic Versions

DataPancake creates these 7 versions for every attribute:

* `str` — String values
* `int` — Integer values
* `float` — Floating-point numbers
* `bool` — Boolean values
* `object` — Nested objects
* `array_primitive` — Arrays of primitives (strings, numbers, booleans)
* `array_object` — Arrays of objects

***

***

## Proactive Version Creation

**When an attribute is first discovered:**

1. All 7 polymorphic version records are created in `core.datasource_attribute_polymorphic_version`
2. Each version is created with `VERSION_STATUS = 'inactive'`
3. Only versions matching the discovered data type are set to `VERSION_STATUS = 'active'`
4. Inactive versions remain ready for activation in future scans

**Why:** Enables seamless schema evolution—when a new data type appears, the version already exists and can be activated without creating new records.

***

## Naming Convention

Polymorphic versions are named in `POLYMORPHIC_ATTRIBUTE_NAME`:

* Primitive types: `{attribute_name}_{type}` (e.g., `status_str`, `price_int`, `rating_float`, `is_active_bool`)
* Object: `{attribute_name}_object` (e.g., `address_object`)
* Arrays: `{attribute_name}_array_{array_type}` (e.g., `tags_array_primitive`, `orders_array_object`)

***

## Example: Polymorphic Version Creation

When DataPancake first discovers an attribute path `property_type` as a string:

What actually happens:

{% stepper %}
{% step %}

#### Initial discovery

* DataPancake creates the attribute record for `property_type`.
* Proactively creates all 7 polymorphic versions:
  * `property_type_str` → **activated** (matches discovered type)
  * `property_type_int` → inactive
  * `property_type_float` → inactive
  * `property_type_bool` → inactive
  * `property_type_object` → inactive
  * `property_type_array_primitive` → inactive
  * `property_type_array_object` → inactive
    {% endstep %}

{% step %}

#### Later: data becomes polymorphic

If a new record contains `property_type` as an object:

```json
{"property_type": {"usage": "Commercial", "sq_ft": 20000}}
```

* DataPancake detects the new data type during scanning.
* Activates the existing `property_type_object` version (it was already created).
* Updates the version status from `'inactive'` to `'active'` and sets the version status date.

Result: Both versions are now active:

* `property_type_str` (active)
* `property_type_object` (active)
  {% endstep %}
  {% endstepper %}

***

## Version Status

**`VERSION_STATUS` values:**

* `'active'` — Data type found in data; included in code generation
* `'inactive'` — Version exists but not yet found; ready for activation

**Status transitions:**

* Activated when corresponding data type is discovered in scans
* May become inactive if data type disappears (schema evolution)
* Only `active` versions are included in SQL code generation

***

## Why Proactive Creation?

Semi-structured data often has polymorphic paths (same path, different types across records). By creating all 7 versions upfront:

* Schema evolution is seamless—no new records needed, just activation
* Consistent metadata structure across all attributes
* Zero technical debt from reactive version creation

***

## Managing Versions

**Review versions:**

* Check all 7 versions for each attribute in the UI
* Monitor `VERSION_STATUS_DATE` to track schema evolution
* Use `INCLUDE_IN_CODE_GEN` to control which active versions generate columns

**Handling polymorphism:**

* Use transformation expressions to handle type variations
* Set `INCLUDE_IN_CODE_GEN = FALSE` for versions you don't need
* Monitor newly activated versions after scans
