What is DataPancake?

DataPancake is a complete solution for semi-structured data. You can now Pancake your data™ - Flatten, normalize, enrich, and secure complex JSON/XML with a Cortex AI Data Dictionary Builder and a Cortex Analyst Semantic Model Generator - all securely inside of Snowflake.


DataPancake's Core Features

🔍 Schema Discovery
  • Recursively scans 100% of your semi-structured data (JSON, XML, Avro, Parquet, and ORC) to discover all attributes including nested arrays, objects, and every polymorphic version of each attribute.

  • Detects all 7 polymorphic data type variations (4 primitives, 2 types of arrays, and objects).

  • Identifies escaped JSON within string fields.

  • Scanning and discovery benefits from Snowflake's vertical scaling.

  • Infers Snowflake destination data types including correct datetime formats for accurate type conversion.

🛠️ Pipeline Designer
  • Enables users to customize how each pipeline SQL DDL will be generated.

  • Configure foreign key relationships for nested arrays.

  • Apply column-level transformation logic during the materialization process.

  • Create virtual attributes for derived fields or semantic model metrics and filters.

  • Configure row access and column masking policies integration.

  • Configure semantic layer of views including additional column-level transformations.

✨ SQL Code Generation
  • Generates SQL DDL code needed to create relational dynamic tables and policy-infused views in Snowflake based on your configured attribute metadata.

  • Code-generated Snowflake Dynamic Table SQL DDL using DataPancake ITDCs (Immutable Typed Derived Columns) to create technical-debt-free pipelines.

  • Reflects configured transformations, foreign keys, and virtual attributes allowing for post-normalized table joins.

  • Code-generated views selecting data from normalized dynamic tables that incorporate row-access and column-masking security policies and additional column-level transformations.

  • Code-generated streams, tasks, and tables to track dynamic table metadata including insert and last updated datetime.

🚨 Schema Drift Monitoring
  • Continuously monitors and alerts you when your semi-structured data source schema changes.

  • Detects schema drift in semi-structured data sources like JSON and XML.

  • Flags changes in data types, structure, and new attributes.

  • Alerts users to configure newly discovered attributes and regenerate pipeline code.

  • Optionally generates updated pipeline SQL DDL upon schema change detection.

📚 Data Dictionary Builder
  • Creates a comprehensive data dictionary that includes definitions, synonyms, and sample values for every attribute with integration to the Semantic Model Generator for Cortex Analyst.

  • Uses your preferred LLM to generate definitions, synonyms, and sample values.

  • Extends DataPancake’s system prompt with your own custom context for greater clarity and improved responses.

  • Generates descriptions for the data source, nested arrays, and attributes.

🧠 Cortex Analyst Semantic Model Generator
  • Generates Cortex Analyst–ready YAML files that define your complete semantic model.

  • Automatically includes relationship metadata based on selected columns.

  • Integrated with the Pipeline Designer and Data Dictionary Builder.

  • Configures custom metrics, facts, and filters through virtual attributes, then adds verified queries and custom instructions.

Last updated

Was this helpful?