What is DataPancake?
DataPancake is a complete solution for semi-structured data. You can now Pancake your data™ - Flatten, normalize, enrich, and secure complex JSON/XML with a Cortex AI Data Dictionary Builder and a Cortex Analyst Semantic Model Generator - all securely inside of Snowflake.
DataPancake's Core Features
🔍 Schema Discovery
Recursively scans 100% of your semi-structured data (JSON, XML, Avro, Parquet, and ORC) to discover all attributes including nested arrays, objects, and every polymorphic version of each attribute.
Detects all 7 polymorphic data type variations (4 primitives, 2 types of arrays, and objects).
Identifies escaped JSON within string fields.
Scanning and discovery benefits from Snowflake's vertical scaling.
Infers Snowflake destination data types including correct datetime formats for accurate type conversion.
🛠️ Pipeline Designer
Enables users to customize how each pipeline SQL DDL will be generated.
Configure foreign key relationships for nested arrays.
Apply column-level transformation logic during the materialization process.
Create virtual attributes for derived fields or semantic model metrics and filters.
Configure row access and column masking policies integration.
Configure semantic layer of views including additional column-level transformations.
✨ SQL Code Generation
Generates SQL DDL code needed to create relational dynamic tables and policy-infused views in Snowflake based on your configured attribute metadata.
Code-generated Snowflake Dynamic Table SQL DDL using DataPancake ITDCs (Immutable Typed Derived Columns) to create technical-debt-free pipelines.
Reflects configured transformations, foreign keys, and virtual attributes allowing for post-normalized table joins.
Code-generated views selecting data from normalized dynamic tables that incorporate row-access and column-masking security policies and additional column-level transformations.
Code-generated streams, tasks, and tables to track dynamic table metadata including insert and last updated datetime.
🚨 Schema Drift Monitoring
Continuously monitors and alerts you when your semi-structured data source schema changes.
Detects schema drift in semi-structured data sources like JSON and XML.
Flags changes in data types, structure, and new attributes.
Alerts users to configure newly discovered attributes and regenerate pipeline code.
Optionally generates updated pipeline SQL DDL upon schema change detection.
📚 Data Dictionary Builder
Creates a comprehensive data dictionary that includes definitions, synonyms, and sample values for every attribute with integration to the Semantic Model Generator for Cortex Analyst.
Uses your preferred LLM to generate definitions, synonyms, and sample values.
Extends DataPancake’s system prompt with your own custom context for greater clarity and improved responses.
Generates descriptions for the data source, nested arrays, and attributes.
🧠 Cortex Analyst Semantic Model Generator
Generates Cortex Analyst–ready YAML files that define your complete semantic model.
Automatically includes relationship metadata based on selected columns.
Integrated with the Pipeline Designer and Data Dictionary Builder.
Configures custom metrics, facts, and filters through virtual attributes, then adds verified queries and custom instructions.
Last updated
Was this helpful?