Attribute Discovery Process

How DataPancake discovers attributes during scanning, including polymorphic detection and recursive parsing of stringified JSON.

How Attributes Are Discovered

1

Recursively traverse records

DataPancake traverses every record in your data source, visiting every nested level.

2

Track attribute paths

Every path encountered (from root to leaf) is tracked, including deeply nested paths.

3

Identify data types

For each path occurrence, DataPancake identifies the data type observed (string, number, object, array, etc.).

4

Proactively create polymorphic versions

When an attribute is first discovered, DataPancake proactively creates all 7 polymorphic versions for that attribute.

5

Activate matching versions

Only the version(s) matching the actual data types found are activated; the other versions remain available for future variations.

6

Create attribute records with metadata

An attribute record is created that includes metadata about the discovery (paths, types observed, timestamps, etc.).


What Gets Discovered

  • All attribute paths — complete paths from root to leaf (e.g., customer.contact.email)

  • Nested objects — every level of object nesting

  • Nested arrays — both object arrays and primitive arrays

  • Embedded JSON — JSON stored as strings (stringified/escaped JSON)

  • Polymorphic variations — all data type variations for the same path


Example Discovery

Given multiple records with polymorphic data and stringified JSON:

Record 1:

{
  "customer_id": "C001",
  "customer": {
    "name": "John Doe",
    "address": "123 Main St, Anytown, ST 12345",
    "contact": {
      "email": "[email protected]",
      "phone": "555-1234"
    },
    "metadata": "{\"source\":\"web\",\"campaign\":\"summer2024\",\"tags\":[\"vip\",\"premium\"]}"
  },
  "orders": [
    {"order_id": "O001", "total": 100.50},
    {"order_id": "O002", "total": 250.75}
  ]
}

Record 2:

What DataPancake Discovers:

Standard Attributes:

  • customer_id (string)

  • customer (object)

  • customer.name (string)

  • customer.contact (object)

  • customer.contact.email (string)

  • customer.contact.phone (string)

  • orders (array)

  • orders[0].order_id (string)

  • orders[0].total (number)

Polymorphic Address:

customer.address appears as:

  • Record 1: string → address_string activated

  • Record 2: object → address_object activated (existing version, no new record needed)

Stringified JSON:

customer.metadata contains stringified JSON. DataPancake:

  • Detects embedded content (HAS_EMBEDDED_CONTENT = TRUE)

  • Recursively parses to discover: metadata.source, metadata.campaign, metadata.tags[]

Result: Both address_string and address_object are active. All 7 versions were created upfront—polymorphism is handled by activating existing versions, not creating new ones.

Last updated

Was this helpful?