DataPancake™ Docs
  • DataPancake™ Documentation
  • Overview
    • What is DataPancake™?
    • Understanding the Challenges of Semi-Structured Data
  • Getting Started
    • Quick Start Guides
      • Real Estate JSON Quick Start
      • FDA_UDI Data Quick Start
      • Internal Data Quick Start
      • Alerts
    • Application Overview
      • Data Source Overview
      • Manage Data Source
      • Manage Scan Configuration
      • Scan Data Source
      • Data Source Attributes
      • Dynamic Table SQL
      • Manage Warehouse
  • Topics
    • Warehouses
    • Data Sources
    • Scan Configurations
    • Attributes
    • Scan Processing
    • Views
    • Attribute Consolidation
    • Pricing
  • DEVELOPMENT
    • Release Notes
Powered by GitBook
On this page
  • DataPancake™ V1.46 Release Notes
  • Value- Based Pricing Model for Billing
  • Data Source Overview
  • Navigation
  • Quick Start Script Builders
  • Data Source
  • Scan Configuration
  • Scan Data Source
  • Data Source Attributes
  • Data Source SQL Code Generation
  • Data Dictionary Builder
  • Semantic Model Code Generation
  • Worksheet Commands
  1. DEVELOPMENT

Release Notes

DataPancake™ V1.46 Release Notes

Value- Based Pricing Model for Billing

New features have been added to the DataPancake offering. The billing model for DataPancake has been updated to create billable events monthly based on the feature used and the number of attributes contained in a particular data source.

Features offered are:

Schema Discovery (formerly known as Schema Summary) - Free

Attribute Metadata Management (known formerly as Schema Analysis) - Paid

SQL Code Generation (formerly known as Extract, Relate, and Flatten) - Paid

Data Dictionary Builder - Paid

Security Policy Integration - Paid

Cortex Analyst Semantic Model Code Generation - Paid

Feature

Price per Attribute

Attribute Metadata Management

$0.50

SQL Code Generation

$0.90

Data Dictionary Builder

$0.30

Security Policy Integration

$0.75

Semantic Model Code Generation

$0.50

Data Source Overview

New columns have been added to the data source overview grid to indicate which services have been enabled for each data source.

Scans in Process will now show a more accurate estimate of the amount of time it will take to complete a scan based on the baseline scan settings. The baseline scan settings can be reset in the Data Source screen. The next scan completed will recompute the average number of records processed based on the number of threads used during the scan. Each subsequent scan will use the baseline scan settings to estimate the time needed to complete the scan.

Navigation

A new left hand navigation menu has been added to better organize the available features in the application.

Quick Start Script Builders

Three new script builders have been created.

Individual Data Sources

Multiple Data Sources

Schema Drift Alert

Data Source

DataPancake Services

Users will now have the option to enable specific services for each data source. Services replace the option to select a specific Product Tier.

Column Data Type

Users will now have the option to select a source column data type. Currently the only option is Variant. In a future release users will have the option to select String.

Sample Schema Data

Users can supply a single document to represent the schema for a specific data source. This sample document can be in either a JSON or XML format. For example, users can create a sample XML document from an existing XSD file or create a sample JSON document from an existing Avro Schema. Once the sample has been configured users can configure a scan to create attribute metadata from the provided sample instead of through data discovery.

Output Object Settings

Users can choose the type of output object. Current options include Dynamic Table or Table.

Dynamic Table Metadata

Users can now configure a data source to generate the code necessary to track Dynamic Table metadata including the insert and last updated datetimes. This will allow users to see when a Dynamic Table record was inserted or modified. The generated code will create a new table to store the metadata, a merge statement to initialize the table, a stream for the root level Dynamic Table to track changes, and a task to keep the table updated on users configured schedule.

Baseline Scan Settings

Users can see the average number of records processed based on a specific thread count calculated from scan. These numbers can be recalculated by resetting the values to 0. The next subsequent scan will update these calculations.

Required Fields

When creating or updating a data source the user can see what fields are required if a specific piece of information has not been supplied.

Scan Configuration

Users can now configure the type of attribute creation a scan will use. The Discover attribute creation type will generate attribute metadata based on a discovery of the data located in the database object configured in the data source. The Schema attribute creation type will generate attribute metadata based on the single sample document configured for the data source. Multiple scan configurations can be created for a data source using both attribute creation types to allow a user to compare a schema against the data in a data source.

Scan Data Source

No changes

Data Source Attributes

Users can now configure whether an attribute is unique, is a primary key, or whether the attribute contains enum values only.

Users can now modify the status of an attribute from active to inactive. Inactive attributes are not included in the code generation process. Users can now also see the difference between attributes discovered in the data source vs attributes created from the data source’s sample document.

The Data Governance tab has been renamed to “Semantic Layer - Security Policy”

Arrays

Users can now configure relationship information for each nested array including:

  • Relationship Name

  • Relationship Description

  • Relationship Type

  • Join Type

This relationship information will be used as part of the semantic model code generation process.

Data Source SQL Code Generation

If a user has configured a data source to create Dynamic Table metadata the code will be generated to allow for the creation of the necessary database objects to maintain metadata that will track the insert and last update datetimes of each row in the root level Dynamic Table.

Data Dictionary Builder

The data dictionary/glossary/synonym builder uses Cortex AI to assist users in building a data dictionary for each data source. Descriptions can be created for the data source. Descriptions and synonyms can be created for each nested array and descriptions, synonyms, and sample values can be created for each attribute. Users can choose which LLM model they want to use to generate responses.

Semantic Model Code Generation

Users can create semantic model yaml files for use with Cortex Analyst. Information generated using the data dictionary builder will be included in the generation of all attribute descriptions and synonyms.

Users can choose which attributes to include and can configure additional properties such as:

  • W Question Category (Dimension, Time Dimension, Facts, Metric, Filter)

  • Sample Value (multiple comma separated values)

  • Enum Values (designates that the attribute contains only enum values)

  • Semantic Model Description (in addition to the glossary definition)

  • Semantic Model Expression

  • Cortex Search Service Name

  • Cortex Search Database

  • Cortex Search Schema

  • Cortex Search Column

Additional sections generated include:

  • Relationships

  • Verified Queries

  • Custom Instructions

Worksheet Commands

No Changes

DataPancake V1.38 Release Notes

Manage Data Source

Default Refresh_Mode to Incremental

Defaulted the Refresh_Mode to Incremental for all new data sources with a product tier of Extract, Relate, and Flatten.

Support for New Data Formats

Added support for new data formats including Avro, Parquet, ORC, and XML.

Case Sensitivity Checkbox

Added new checkbox to configure whether generated dynamic tables and views enable case sensitivity for column names.

Kafka Deduplication Expression

For data sources that are flattening data from a Kafka topic, the unique identifier has been replaced with a deduplication expression allowing users to configure the sql used to deduplicate kafka messages.

Manage Scan Configuration

No Changes

Scan Data Source

Datetime Formats

Bug Fix: Updated datetime formats for time zone offsets. Previously if a string included a plus sign followed by the offset value, or the time followed by a “Z” representing UTC, the format string used in the generated code would be incorrectly identified. This issue has been resolved.

Support for XML in Variant Columns

Pancake now supports the scanning and discovery of XML data stored in a Variant Snowflake column.

Data Source Attributes

Foreign Keys in Row Access Policy

In the foreign keys configuration the user now has the option to include the foreign key in the Row Access Policy.

Virtual Attributes with Custom SQL

Virtual Attributes: The ability to create a virtual attribute with a custom SQL expression is now available. SQL expressions can also refer to other flattened columns and can be added as foreign keys to nested arrays.

Data Source SQL Generation

Dynamic Tables for XML in Variant Columns

DataPancake now supports the generation of dynamic tables to extract, related, and flatten XML data stored in a Variant column.

Kafka Metadata Flattening

If the ‘Include Stream Message Metadata’ checkbox is checked then Kafka metadata will be flattened and included in the root dynamic table as well all nested dynamic tables for each array. This data will be available in all views generated as part of the data governance semantic layer.

Case Sensitivity for Column Names

If the ‘Use Case Sensitivity’ checkbox is checked then each column generated for dynamic tables and views will be enclosed with a double quote. This will allow for any special characters used as part of the column name.

Bug Fix: Exclude Unchecked Arrays

Bug Fix: If the ‘Include in Code Gen’ checkbox is unchecked for any array the corresponding view will no longer be generated.

Custom Deduplication SQL for Kafka

For data sources that are flattening a Kafa Topic the deduplication sql will now use the new data source property called ‘Deduplication SQL Expression’. This will allow the user to configure the SQL to their specific needs including the choice between using the Rank() or Row_Number() functions.

Root Table Select Prefix

The data source value configured for the ‘Root Dynamic Table Select Prefix’ will now be available in the root level view in the Data Governance Semantic Layer. All columns configured in this value will need to be aliased with an ‘as <alias_name>’.

Foreign Keys in Semantic Layer Views

Foreign keys configured for arrays will now be added to the Row Access Policy definitions in the Data Governance Semantic Layer views if the user configured the foreign key should be added to the Row Access Policy.

Virtual Attributes in Code Generation

Virtual attributes are now included in the code generation of dynamic tables and views. Virtual attributes are defined after the discovered attributes so that virtual attribute SQL expressions can refer to flattened columns created from discovered attributes.

Other Notes

Rebranding

Pancake has been rebranded as DataPancake to better articulate its purpose.

Pancake V1.32 Release Notes

Data Source Overview

Filter default update to Active

Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.

Manage Data Source

Dynamic table deployment database and schema name

Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.

Dynamic Table row level policy name

Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.

Root Table Select Prefix

Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.

Schema Consolidation (Search/Replace)

Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.

Schema Filter

Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.

Manage Scan Configuration

No Updates

Scan Data Source

Configurable schema consolidation

The scan process will consolidate the discovered schema based on the data source consolidation configuration. This consolidation process will reduce the number of attributes based on the search and replace regex values configured by the user.

Configurable schema filtering

The scan process will designate the attribute’s record status as ‘inactive’ for any attributes that meet one or more of the filter criteria. All inactive attributes will be excluded from the code generation process.

Enhanced Datetime inference

Additional ISO formats are now being recognized successfully

Data Source Attributes

Attribute level masking policy

Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.

Attribute row level policy inclusion

Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.

Attribute Path Insert Search/Replace

Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.

Arrays Grid (with include in code gen and row level policy name)

Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.

Attribute cluster inclusion

Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.

Attribute Record Status

If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.

Dynamic Table SQL Generation

Primitive Array - Multiple Data Types

The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.

Auto alias creation for datetime attributes

If the polymorphic version of an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”

Schema Consolidation Search/Insert

The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.

Dynamic table create name

Array dynamic tables will use the table alias name, if one is provided

Dynamic table names will be use the deployment database and schema if one was provided.

If a dynamic table name includes special characters the entire name will be enclosed in double quotes.

Root table selected columns prefix

The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.

Filtered attributes are not included in the code generation process

The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.

Generate Array dynamic tables based on Include in Code Gen

The code generation process will exclude dynamic tables for arrays that have been configured to be excluded.

Worksheet Commands

No Updates

Pancake V1.30 Release Notes

Data Source Overview

Filter default update to Active

Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.

Manage Data Source

Dynamic table deployment database and schema name

Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.

Dynamic Table row level policy name

Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.

Root Table Select Prefix

Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.

Schema Consolidation (Search/Replace)

Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.

Schema Filter

Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.

Manage Scan Configuration

No Updates

Scan Data Source

Configurable schema consolidation

Configurable schema filtering

Bug Fix - Update to Datetime inference

Bug Fix - Issue with multiple procedure calls

Data Source Attributes

Attribute level masking policy

Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.

Attribute row level policy inclusion

Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.

Attribute Path Insert Search/Replace

Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.

Arrays Grid (with include in code gen and row level policy name)

Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.

Attribute cluster inclusion

Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.

Attribute Record Status

If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.

Dynamic Table SQL Generation

Primitive Array - Multiple Data Types

The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.

Auto alias creation for datetime attributes

If the polymorphic version of a an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”

Schema Consolidation Search/Insert

The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.

Dynamic table create name

Array dynamic tables will now use the table alias name, if one was provided, and will also use the deployment database and schema if one was provided.

Root table selected columns prefix

The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.

Filtered attributes are not included in the code generation process

The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.

Generate Array dynamic tables based on Include in Code Gen

Creation of cluster by clause

Create views for each dynamic table with column level masking policies

Creation of Row policy clause for root dynamic table

Worksheet Commands

No Updates

Pancake V1.29 Release Notes

Data Source Overview

No Updates

Manage Data Source

No Updates

Manage Scan Configuration

Configuration Settings Calculator

Allows a user to gather diagnostic information about the data source to retrieve data used in the Calculator’s variables. Based on the warehouse size chosen, the calculator will produce values for the Number of Procedure Calls, Record Count Per Procedure Call and Record Count Per Thread Worker Process that can be used for the required configuration settings. The calculator will help the user to avoid trying to process too much information at one time and either running out of memory or exceeding the maximum timeout of 60 minutes per procedure call as set by Snowflake.

Record Count Per Thread Worker Process - Position Moved

The “Record Count Per Thread Worker Process” has been moved below the “Record Count Per Procedure Call” textbox.

Bug Fix - Checkbox Value Reset

Fixed a bug where sometimes the checkbox values for Enable Schedule and Auto Code Generate were not getting reset after the creation of a new scan configuration.

Scan Data Source

Default Value for Precision and Scale

During the discovery of an attribute’s float polymorphic version the metadata will use a default value for the precision of 38 and a default value of 10 for the scale. The user can still modify this value post scan.

Datetime Inference for ISO date update

ISO datetime values with time zones and seconds represented with a “+” will now be recognized and set to use the TIMESTAMP_TZ Snowflake Data Type

Enhanced error handling for large datasets

Out of memory errors can occur when using a warehouse size that is too small for the amount of data being processed. The new error handling will report these types of messages and make it easier to determine when to use a larger warehouse size.

Data Source Attributes

New attribute metadata fields to determine whether a custom datetime format should be used

Datetime Format now supports multiple formats

Allows for a user to configure the datetime format of an attribute by providing a comma separated list of datetime formats when an attribute in a data source may have values that use multiple datetime formats.

Rearrangement of grid columns

Columns have been rearranged in the editable attributes grid to make it easier to access the columns that can be updated.

Updated Attribute Filter

The filter now supports filtering the source data type or the Snowflake data type.

Dynamic Table SQL Generation

Numeric Value Flattening Code for Invalid Characters

If an attribute has embedded or stringified JSON that has a polymorphic version using an int or float data type, the sql generated will now perform a conversion from the variant value to VARCHAR and then will then use a TRY_TO_? Function to determine if the value can be converted to the appropriate data type. Some values are recognized as a Snowflake INTEGER or DECIMAL using the TYPEOF function but cannot be converted to the respective Snowflake data type.

Bug Fix - Embedded JSON with “null” values

The generated code will now attempt to parse the embedded or stringified JSON using the TRY_PARSE_JSON. If the embedded JSON contains the value of “null” the resulting value for that column will be NULL and not cause a Dynamic Table deployment execution error

Multiple datetime formats for a single column

The code generation for a string using a datetime inference containing multiple formats will use the COALESCE function combined with the TRY_TO_? Function and will use the first one that produces a non null value. If all datetime formats result in a NULL value then the value returned for that specific row will be NULL.

User configurable use of a custom datetime format

If the user disabled the use of the datetime format the standard TRY_TO_? Function will be used without specifying a datetime format resulting in the use of the AUTO datetime inference.

Worksheet Commands

No Updates

Pancake V1.27 Release Notes

Dynamic Table SQL Generation

New Attribute Polymorphic Version Field - Null Value Expression

Allows for the configuration of a custom SQL expression to be used in place of the NULL keyword if the value in a row is null or does not match the expected polymorphic column's data type.

New Attribute Polymorphic Version Field - Polymorphic Column Name Alias

Allows for the configuration of an alias for a polymorphic column name which will be used in the construction of the Dynamic Table Create Statement instead of the default polymorphic attribute name.

New Attribute Polymorphic Version Fields - Transformation Type and Transformation Expression

These new fields allow the user to override the default generated code for a specific attribute polymorphic version with a custom SQL expression. Users can use "{attribute_name}" as a textual placeholder in the expression for the json attribute data. This placeholder will be replaced in the code gen process with the appropriate reference to the attribute's path even if the attribute represents embedded or stringified JSON.

New Data Source Field - Deduplicate Messages

This new field in the Source Stream Settings of the Data Source screen allows users to determine if the generated Dynamic Table SQL code should add the necessary SQL to deduplicate messages streamed into Snowflake from Kafka and only include the most recent version of a record based on the primary key and the created datetime of the message. This deduplication process requires the user to provide the primary key(s) of the message's JSON schema. If this checkbox is not checked all records will be included when the Dynamic Table is deployed.

New Data Source Field - Nested Dynamic Table Optional Parameters

Scanning and Discovery

New Scan Configuration Field - Record Count Per Thread Process

This new field allows for the configuration of the number of records passed to each thread worker process. The size of the warehouse determines the amount of memory available to the scanning process. Each thread can process a batch of records in parallel with other threads. The projected memory utilization for the scanning process can be calculated by multiplying the average size of the JSON value times the the number in the "Record Count Per Thread Process" field times the number of threads being used.

Scanning Process - Memory Usage Enhancement

After processing the data for each thread, the memory used for storing the data will be released. This will have a significant performance and memory impact for larger datasets.

Scanning Process - Error Handling Enhancement

If the scanning process is interrupted to do an “Out of memory” error, an exception will be recorded and the status of the scan will be set to "Failed".

Scanning Process - Datetime Format Discovery for ISO Datetimes values with nanoseconds

In the previous version of Pancake, an ISO datetime with nanoseconds was not correctly identified as a datetime value. This issue has been resolved and these types of datetime strings will now be recognized with the correct Snowflake data type and datetime format.

Data Source Set Up

Data Source Screen Enhancement - Dropdowns for Database, Schema, and Objects

Replaced the textboxes for Database, Schema, and Database Object with dropdowns that show which objects Pancake has access to. Only the objects that a user has granted the appropriate privileges to Pancake will be visible.

Last updated 29 days ago

Allows for the configuration of whether a custom datetime format should be used as part of the sql code gen process. If the user disables the datetime format then the standard TRY_TO_? Function will be used in the Dynamic Table column definition without specifying a datetime format resulting in the use of the AUTO inference. Find more information related to the AUTO datetime inference in the Snowflake documentation.

This new field allows users to configure optional parameters such as Refresh_Mode to be included in the Create Dynamic Table statement for all dynamic tables created for nested arrays. Optional parameters can be found here. This field is separate from the renamed Root Dynamic Table Optional Parameters field that stores optional parameters to be included in the generation of the root level dynamic table.

https://docs.snowflake.com/en/sql-reference/data-types-datetime#date-and-time-formats
https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table#optional-parameters