# What is DataPancake?

{% embed url="<https://vimeo.com/1095116976/1af89f88a3?share=copy>" %}

DataPancake is a complete solution for semi-structured data. You can now Pancake your data™ - Flatten, normalize, enrich, and secure complex JSON/XML with a Cortex AI Data Dictionary Builder and a Cortex Analyst Semantic Model Generator - all securely inside of Snowflake.

***

### DataPancake's Core Features

<details>

<summary><span data-gb-custom-inline data-tag="emoji" data-code="1f50d">🔍</span> Schema Discovery</summary>

* Recursively scans 100% of your semi-structured data (JSON, XML, Avro, Parquet, and ORC) to discover all attributes including nested arrays, objects, and every polymorphic version of each attribute.
* Detects all 7 polymorphic data type variations (4 primitives, 2 types of arrays, and objects).
* Identifies escaped JSON within string fields.
* Scanning and discovery benefits from Snowflake's vertical scaling.
* Infers Snowflake destination data types including correct datetime formats for accurate type conversion.

</details>

<details>

<summary>🛠️ Pipeline Designer</summary>

* Enables users to customize how each pipeline SQL DDL will be generated.
* Configure foreign key relationships for nested arrays.
* Apply column-level transformation logic during the materialization process.
* Create virtual attributes for derived fields or semantic model metrics and filters.
* Configure row access and column masking policies integration.
* Configure semantic layer of views including additional column-level transformations.

</details>

<details>

<summary>✨ SQL Code Generation</summary>

* Generates SQL DDL code needed to create relational dynamic tables and policy-infused views in Snowflake based on your configured attribute metadata.
* Code-generated Snowflake Dynamic Table SQL DDL using DataPancake ITDCs (Immutable Typed Derived Columns) to create technical-debt-free pipelines.
* Reflects configured transformations, foreign keys, and virtual attributes allowing for post-normalized table joins.
* Code-generated views selecting data from normalized dynamic tables that incorporate row-access and column-masking security policies and additional column-level transformations.
* Code-generated streams, tasks, and tables to track dynamic table metadata including insert and last updated datetime.

</details>

<details>

<summary>🚨 Schema Drift Monitoring</summary>

* Continuously monitors and alerts you when your semi-structured data source schema changes.
* Detects schema drift in semi-structured data sources like JSON and XML.
* Flags changes in data types, structure, and new attributes.
* Alerts users to configure newly discovered attributes and regenerate pipeline code.
* Optionally generates updated pipeline SQL DDL upon schema change detection.

</details>

<details>

<summary>📚 Data Dictionary Builder</summary>

* Creates a comprehensive data dictionary that includes definitions, synonyms, and sample values for every attribute with integration to the Semantic Model Generator for Cortex Analyst.
* Uses your preferred LLM to generate definitions, synonyms, and sample values.
* Extends DataPancake’s system prompt with your own custom context for greater clarity and improved responses.
* Generates descriptions for the data source, nested arrays, and attributes.

</details>

<details>

<summary>🧠 Cortex Analyst Semantic Model Generator</summary>

* Generates Cortex Analyst–ready YAML files that define your complete semantic model.
* Automatically includes relationship metadata based on selected columns.
* Integrated with the Pipeline Designer and Data Dictionary Builder.
* Configures custom metrics, facts, and filters through virtual attributes, then adds verified queries and custom instructions.

</details>
