Attribute Discovery Process

How DataPancake discovers attributes during scanning, including polymorphic detection and recursive parsing of stringified JSON.

Discovery Process

  1. Recursively traverse records - Every nested level is visited

  2. Track attribute paths - Complete paths from root to leaf (e.g., customer.contact.email)

  3. Identify data types - Source data type inferred for each path occurrence

  4. Create all 7 polymorphic versions - Proactively created when attribute first discovered

  5. Activate matching versions - Only versions matching discovered types set to VERSION_STATUS = 'active'

  6. Create attribute records - Records created in core.datasource_attribute and core.datasource_attribute_polymorphic_version


What Gets Discovered

  • All attribute paths - Complete paths from root to leaf (e.g., customer.contact.email)

  • Nested objects - Every level of object nesting

  • Nested arrays - Both object arrays (ARRAY_TYPE = 'object') and primitive arrays (ARRAY_TYPE = 'primitive')

  • Embedded JSON - JSON stored as strings (HAS_EMBEDDED_CONTENT = TRUE); recursively parsed

  • Polymorphic variations - All data type variations for the same path (handled via polymorphic versions)


Example: Polymorphic Discovery

Record 1:

Record 2:

Discovery Results:

  • customer.address:

    • Record 1: straddress_str activated

    • Record 2: objectaddress_object activated (existing version)

  • customer.metadata:

    • HAS_EMBEDDED_CONTENT = TRUE

    • Recursively parsed to discover: metadata.source, metadata.tags[]

Last updated

Was this helpful?