# Source Schema

## Overview

Source Schema metadata represents what DataPancake discovered during scanning. Primarily read-only (except `RECORD_STATUS`). Provides structure, types, and content information.

***

## Source Schema Fields

### Path Information

**Attribute Path (`ATTRIBUTE_PATH`)**

* The full path to the attribute in the source data
* Example: `customer.contact.email`
* Used to reference the attribute in source queries
* **Read-only** - Set during discovery

**Attribute Path Embedded (`ATTRIBUTE_PATH_EMBEDDED`)**

* Path for attributes found within embedded/stringified JSON
* Tracks nested JSON structures within string fields
* Example: For JSON stored as a string, this tracks the path within that JSON
* **Read-only** - Set during discovery

**Attribute Name (`ATTRIBUTE_NAME`)**

* The leaf name of the attribute
* Example: For path `customer.contact.email`, the name is `email`
* **Read-only** - Extracted from attribute path

***

### Structure Information

**Attribute Level (`ATTRIBUTE_LEVEL`)**

* The nesting depth of the attribute
* Root level attributes have level 0
* Each nested level increments the count
* Example: `customer` = level 0, `customer.contact` = level 1, `customer.contact.email` = level 2
* **Read-only** - Calculated during discovery

**Attribute Order (`ATTRIBUTE_ORDER`)**

* Ordering information for attributes at the same level
* Used for consistent presentation in the UI
* **Read-only** - Set during discovery

**Parent Object (`PARENT_OBJECT`)**

* The parent object path containing this attribute
* Example: For `customer.contact.email`, parent object is `customer.contact`
* Empty for root-level attributes
* **Read-only** - Set during discovery

**Parent Array (`PARENT_ARRAY`)**

* The parent array path if this attribute is within an array
* Example: For `orders[0].order_id`, parent array is `orders`
* Used for foreign key relationship configuration
* Empty for non-array attributes
* **Read-only** - Set during discovery

**Parent Array Embedded (`PARENT_ARRAY_EMBEDDED`)**

* Parent array path for attributes within embedded JSON arrays
* Tracks arrays within stringified JSON
* **Read-only** - Set during discovery

***

### Type Information

**Source Data Type (`SOURCE_DATA_TYPE`)**

* Inferred data type from source data
* Values: `'str'`, `'int'`, `'float'`, `'bool'`, `'object'`, `'array'`, `'null'`
* **Read-only** - Inferred during scanning

**Polymorphic Attribute Name (`POLYMORPHIC_ATTRIBUTE_NAME`)**

* Unique name for this polymorphic version
* Format: `{attribute_name}_{type}` or `{attribute_name}_array_{array_type}`
* Examples: `email_str`, `price_float`, `orders_array_object`
* **Read-only** - Generated based on source data type

**Array Type (`ARRAY_TYPE`)**

* For array attributes, the type of array elements
* Values: `'object'`, `'primitive'`, `'primitive,object'`
* Only applicable when `SOURCE_DATA_TYPE = 'array'`
* **Read-only** - Determined during discovery

**Array Primitive Type (`ARRAY_PRIMITIVE_TYPE`)**

* For primitive arrays, the data type of array elements
* Values: `'str'`, `'int'`, `'float'`, `'bool'`
* Only applicable when `ARRAY_TYPE = 'primitive'` or `ARRAY_TYPE = 'primitive,object'`
* **Read-only** - Determined during discovery

***

### Content Information

**Sample Value (`SAMPLE_VALUE`)**

* A sample value from the source data
* For string types, defaults to `"string value"` for privacy/security
* Helps users understand the data content
* **Read-only** - Captured during scanning

**Has Embedded Content (`HAS_EMBEDDED_CONTENT`)**

* Boolean indicating if this attribute contains embedded/stringified JSON
* Triggers recursive parsing of the embedded content
* When `TRUE`, DataPancake recursively parses the JSON string to discover nested attributes
* **Read-only** - Detected during scanning

***

### Attribute Classification

**Attribute Type (`ATTRIBUTE_TYPE`)**

* The origin of the attribute
* Values:
  * `'Discovered'` - Found during scanning
  * `'Schema'` - Created from schema sample
  * `'Virtual'` - User-created custom attribute
* **Read-only** - Set based on how attribute was created

**Attribute Schema Component Type (`ATTRIBUTE_SCHEMA_COMPONENT_TYPE`)**

* Classifies the schema component type
* Used for internal organization and categorization
* **Read-only** - Set during discovery

**Version Status Date (`VERSION_STATUS_DATE`)**

* Timestamp when polymorphic version was created or last activated
* Used for tracking schema evolution
* **Read-only** - Set when version is created/activated

***

### Status Control

**Attribute Record Status (`RECORD_STATUS`)**

* Controls whether attribute is active or inactive
* Values: `'active'`, `'inactive'`
* **Editable** - Only editable field in Source Schema tab
* Active attributes included in code generation (if `INCLUDE_IN_CODE_GEN = TRUE`)
* Inactive attributes excluded from code generation

***

## Virtual Attributes

User-created custom attributes (created via UI or `sp_upsert_virtual_datasource_attribute`).

**Required fields:**

* Attribute Name (no spaces)
* Source Data Type
* Snowflake Data Type
* Transformation Expression (SQL)

**Optional fields:**

* Parent Array (for array-level virtual attributes)
* `W_QUESTION_CATEGORY` (for Cortex Analyst semantic models)
* Description

**Characteristics:**

* Single polymorphic version (no polymorphism)
* Automatically set to `INCLUDE_IN_CODE_GEN = TRUE`
* `ATTRIBUTE_TYPE = 'Virtual'`
* Can reference other attributes using `{attribute_name}` placeholder in expressions

***

## Common Scenarios

**Understanding nested structure:**

* Use `ATTRIBUTE_LEVEL` (0 = root, increments per level) and `PARENT_OBJECT` to understand nesting

**Identifying embedded JSON:**

* Check `HAS_EMBEDDED_CONTENT = TRUE` for stringified JSON
* `ATTRIBUTE_PATH_EMBEDDED` shows nested structure within string fields

**Working with arrays:**

* `ARRAY_TYPE` indicates `'object'`, `'primitive'`, or `'primitive,object'`
* `ARRAY_PRIMITIVE_TYPE` (if applicable) shows element type
* `PARENT_ARRAY` shows containing array for nested attributes

**Deactivating attributes:**

* Set `RECORD_STATUS = 'inactive'` to exclude from code generation without deleting
* Useful for temporarily excluding, preserving history, or testing


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datapancake.com/core-concepts/attribute-metadata/attribute-metadata-details/source-schema.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
