> For the complete documentation index, see [llms.txt](https://docs.datapancake.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.datapancake.com/getting-started/what-is-datapancake.md).

# What is DataPancake?

{% embed url="<https://vimeo.com/1095116976/1af89f88a3?share=copy>" %}

DataPancake is a complete solution for semi-structured data. You can now Pancake your data™ - Flatten, normalize, enrich, and secure complex JSON/XML with a Cortex AI Data Dictionary Builder and a Cortex Analyst Semantic Model Generator - all securely inside of Snowflake.

***

### DataPancake's Core Features

<details>

<summary><span data-gb-custom-inline data-tag="emoji" data-code="1f50d">🔍</span> Schema Discovery</summary>

* Recursively scans 100% of your semi-structured data (JSON, XML, Avro, Parquet, and ORC) to discover all attributes including nested arrays, objects, and every polymorphic version of each attribute.
* Detects all 7 polymorphic data type variations (4 primitives, 2 types of arrays, and objects).
* Identifies escaped JSON within string fields.
* Scanning and discovery benefits from Snowflake's vertical scaling.
* Infers Snowflake destination data types including correct datetime formats for accurate type conversion.

</details>

<details>

<summary>🛠️ Pipeline Designer</summary>

* Enables users to customize how each pipeline SQL DDL will be generated.
* Configure foreign key relationships for nested arrays.
* Apply column-level transformation logic during the materialization process.
* Create virtual attributes for derived fields or semantic model metrics and filters.
* Configure row access and column masking policies integration.
* Configure semantic layer of views including additional column-level transformations.

</details>

<details>

<summary>✨ SQL Code Generation</summary>

* Generates SQL DDL code needed to create relational dynamic tables and policy-infused views in Snowflake based on your configured attribute metadata.
* Code-generated Snowflake Dynamic Table SQL DDL using DataPancake ITDCs (Immutable Typed Derived Columns) to create technical-debt-free pipelines.
* Reflects configured transformations, foreign keys, and virtual attributes allowing for post-normalized table joins.
* Code-generated views selecting data from normalized dynamic tables that incorporate row-access and column-masking security policies and additional column-level transformations.
* Code-generated streams, tasks, and tables to track dynamic table metadata including insert and last updated datetime.

</details>

<details>

<summary>🚨 Schema Drift Monitoring</summary>

* Continuously monitors and alerts you when your semi-structured data source schema changes.
* Detects schema drift in semi-structured data sources like JSON and XML.
* Flags changes in data types, structure, and new attributes.
* Alerts users to configure newly discovered attributes and regenerate pipeline code.
* Optionally generates updated pipeline SQL DDL upon schema change detection.

</details>

<details>

<summary>📚 Data Dictionary Builder</summary>

* Creates a comprehensive data dictionary that includes definitions, synonyms, and sample values for every attribute with integration to the Semantic Model Generator for Cortex Analyst.
* Uses your preferred LLM to generate definitions, synonyms, and sample values.
* Extends DataPancake’s system prompt with your own custom context for greater clarity and improved responses.
* Generates descriptions for the data source, nested arrays, and attributes.

</details>

<details>

<summary>🧠 Cortex Analyst Semantic Model Generator</summary>

* Generates Cortex Analyst–ready YAML files that define your complete semantic model.
* Automatically includes relationship metadata based on selected columns.
* Integrated with the Pipeline Designer and Data Dictionary Builder.
* Configures custom metrics, facts, and filters through virtual attributes, then adds verified queries and custom instructions.

</details>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datapancake.com/getting-started/what-is-datapancake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.