Release Notes

DataPancake V1.38 Release Notes

Manage Data Source

Default Refresh_Mode to Incremental

Defaulted the Refresh_Mode to Incremental for all new data sources with a product tier of Extract, Relate, and Flatten.

Support for New Data Formats

Added support for new data formats including Avro, Parquet, ORC, and XML.

Case Sensitivity Checkbox

Added new checkbox to configure whether generated dynamic tables and views enable case sensitivity for column names.

Kafka Deduplication Expression

For data sources that are flattening data from a Kafka topic, the unique identifier has been replaced with a deduplication expression allowing users to configure the sql used to deduplicate kafka messages.

Manage Scan Configuration

No Changes

Scan Data Source

Datetime Formats

Bug Fix: Updated datetime formats for time zone offsets. Previously if a string included a plus sign followed by the offset value, or the time followed by a “Z” representing UTC, the format string used in the generated code would be incorrectly identified. This issue has been resolved.

Support for XML in Variant Columns

Pancake now supports the scanning and discovery of XML data stored in a Variant Snowflake column.

Data Source Attributes

Foreign Keys in Row Access Policy

In the foreign keys configuration the user now has the option to include the foreign key in the Row Access Policy.

Virtual Attributes with Custom SQL

Virtual Attributes: The ability to create a virtual attribute with a custom SQL expression is now available. SQL expressions can also refer to other flattened columns and can be added as foreign keys to nested arrays.

Data Source SQL Generation

Dynamic Tables for XML in Variant Columns

Pancake now supports the generation of dynamic tables to extract, related, and flatten XML data stored in a Variant column.

Kafka Metadata Flattening

If the ‘Include Stream Message Metadata’ checkbox is checked then Kafka metadata will be flattened and included in the root dynamic table as well all nested dynamic tables for each array. This data will be available in all views generated as part of the data governance semantic layer.

Case Sensitivity for Column Names

If the ‘Use Case Sensitivity’ checkbox is checked then each column generated for dynamic tables and views will be enclosed with a double quote. This will allow for any special characters used as part of the column name.

Bug Fix: Exclude Unchecked Arrays

Bug Fix: If the ‘Include in Code Gen’ checkbox is unchecked for any array the corresponding view will no longer be generated.

Custom Deduplication SQL for Kafka

For data sources that are flattening a Kafa Topic the deduplication sql will now use the new data source property called ‘Deduplication SQL Expression’. This will allow the user to configure the SQL to their specific needs including the choice between using the Rank() or Row_Number() functions.

Root Table Select Prefix

The data source value configured for the ‘Root Dynamic Table Select Prefix’ will now be available in the root level view in the Data Governance Semantic Layer. All columns configured in this value will need to be aliased with an ‘as <alias_name>’.

Foreign Keys in Semantic Layer Views

Foreign keys configured for arrays will now be added to the Row Access Policy definitions in the Data Governance Semantic Layer views if the user configured the foreign key should be added to the Row Access Policy.

Virtual Attributes in Code Generation

Virtual attributes are now included in the code generation of dynamic tables and views. Virtual attributes are defined after the discovered attributes so that virtual attribute SQL expressions can refer to flattened columns created from discovered attributes.

Other Notes

Rebranding

Pancake has been rebranded as DataPancake to better articulate its purpose.

Pancake V1.32 Release Notes

Data Source Overview

Filter default update to Active

Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.

Manage Data Source

Dynamic table deployment database and schema name

Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.

Dynamic Table row level policy name

Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.

Root Table Select Prefix

Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.

Schema Consolidation (Search/Replace)

Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.

Schema Filter

Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.

Manage Scan Configuration

No Updates

Scan Data Source

Configurable schema consolidation

The scan process will consolidate the discovered schema based on the data source consolidation configuration. This consolidation process will reduce the number of attributes based on the search and replace regex values configured by the user.

Configurable schema filtering

The scan process will designate the attribute’s record status as ‘inactive’ for any attributes that meet one or more of the filter criteria. All inactive attributes will be excluded from the code generation process.

Enhanced Datetime inference

Additional ISO formats are now being recognized successfully

Data Source Attributes

Attribute level masking policy

Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.

Attribute row level policy inclusion

Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.

Attribute Path Insert Search/Replace

Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.

Arrays Grid (with include in code gen and row level policy name)

Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.

Attribute cluster inclusion

Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.

Attribute Record Status

If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.

Dynamic Table SQL Generation

Primitive Array - Multiple Data Types

The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.

Auto alias creation for datetime attributes

If the polymorphic version of an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”

Schema Consolidation Search/Insert

The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.

Dynamic table create name

Array dynamic tables will use the table alias name, if one is provided

Dynamic table names will be use the deployment database and schema if one was provided.

If a dynamic table name includes special characters the entire name will be enclosed in double quotes.

Root table selected columns prefix

The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.

Filtered attributes are not included in the code generation process

The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.

Generate Array dynamic tables based on Include in Code Gen

The code generation process will exclude dynamic tables for arrays that have been configured to be excluded.

Worksheet Commands

No Updates

Pancake V1.30 Release Notes

Data Source Overview

Filter default update to Active

Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.

Manage Data Source

Dynamic table deployment database and schema name

Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.

Dynamic Table row level policy name

Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.

Root Table Select Prefix

Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.

Schema Consolidation (Search/Replace)

Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.

Schema Filter

Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.

Manage Scan Configuration

No Updates

Scan Data Source

Configurable schema consolidation

Configurable schema filtering

Bug Fix - Update to Datetime inference

Bug Fix - Issue with multiple procedure calls

Data Source Attributes

Attribute level masking policy

Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.

Attribute row level policy inclusion

Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.

Attribute Path Insert Search/Replace

Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.

Arrays Grid (with include in code gen and row level policy name)

Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.

Attribute cluster inclusion

Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.

Attribute Record Status

If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.

Dynamic Table SQL Generation

Primitive Array - Multiple Data Types

The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.

Auto alias creation for datetime attributes

If the polymorphic version of a an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”

Schema Consolidation Search/Insert

The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.

Dynamic table create name

Array dynamic tables will now use the table alias name, if one was provided, and will also use the deployment database and schema if one was provided.

Root table selected columns prefix

The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.

Filtered attributes are not included in the code generation process

The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.

Generate Array dynamic tables based on Include in Code Gen

Creation of cluster by clause

Create views for each dynamic table with column level masking policies

Creation of Row policy clause for root dynamic table

Worksheet Commands

No Updates

Pancake V1.29 Release Notes

Data Source Overview

No Updates

Manage Data Source

No Updates

Manage Scan Configuration

Configuration Settings Calculator

Allows a user to gather diagnostic information about the data source to retrieve data used in the Calculator’s variables. Based on the warehouse size chosen, the calculator will produce values for the Number of Procedure Calls, Record Count Per Procedure Call and Record Count Per Thread Worker Process that can be used for the required configuration settings. The calculator will help the user to avoid trying to process too much information at one time and either running out of memory or exceeding the maximum timeout of 60 minutes per procedure call as set by Snowflake.

Record Count Per Thread Worker Process - Position Moved

The “Record Count Per Thread Worker Process” has been moved below the “Record Count Per Procedure Call” textbox.

Bug Fix - Checkbox Value Reset

Fixed a bug where sometimes the checkbox values for Enable Schedule and Auto Code Generate were not getting reset after the creation of a new scan configuration.

Scan Data Source

Default Value for Precision and Scale

During the discovery of an attribute’s float polymorphic version the metadata will use a default value for the precision of 38 and a default value of 10 for the scale. The user can still modify this value post scan.

Datetime Inference for ISO date update

ISO datetime values with time zones and seconds represented with a “+” will now be recognized and set to use the TIMESTAMP_TZ Snowflake Data Type

Enhanced error handling for large datasets

Out of memory errors can occur when using a warehouse size that is too small for the amount of data being processed. The new error handling will report these types of messages and make it easier to determine when to use a larger warehouse size.

Data Source Attributes

New attribute metadata fields to determine whether a custom datetime format should be used

Allows for the configuration of whether a custom datetime format should be used as part of the sql code gen process. If the user disables the datetime format then the standard TRY_TO_? Function will be used in the Dynamic Table column definition without specifying a datetime format resulting in the use of the AUTO inference. Find more information related to the AUTO datetime inference in the Snowflake documentation. https://docs.snowflake.com/en/sql-reference/data-types-datetime#date-and-time-formats

Datetime Format now supports multiple formats

Allows for a user to configure the datetime format of an attribute by providing a comma separated list of datetime formats when an attribute in a data source may have values that use multiple datetime formats.

Rearrangement of grid columns

Columns have been rearranged in the editable attributes grid to make it easier to access the columns that can be updated.

Updated Attribute Filter

The filter now supports filtering the source data type or the Snowflake data type.

Dynamic Table SQL Generation

Numeric Value Flattening Code for Invalid Characters

If an attribute has embedded or stringified JSON that has a polymorphic version using an int or float data type, the sql generated will now perform a conversion from the variant value to VARCHAR and then will then use a TRY_TO_? Function to determine if the value can be converted to the appropriate data type. Some values are recognized as a Snowflake INTEGER or DECIMAL using the TYPEOF function but cannot be converted to the respective Snowflake data type.

Bug Fix - Embedded JSON with “null” values

The generated code will now attempt to parse the embedded or stringified JSON using the TRY_PARSE_JSON. If the embedded JSON contains the value of “null” the resulting value for that column will be NULL and not cause a Dynamic Table deployment execution error

Multiple datetime formats for a single column

The code generation for a string using a datetime inference containing multiple formats will use the COALESCE function combined with the TRY_TO_? Function and will use the first one that produces a non null value. If all datetime formats result in a NULL value then the value returned for that specific row will be NULL.

User configurable use of a custom datetime format

If the user disabled the use of the datetime format the standard TRY_TO_? Function will be used without specifying a datetime format resulting in the use of the AUTO datetime inference.

Worksheet Commands

No Updates

Pancake V1.27 Release Notes

Dynamic Table SQL Generation

New Attribute Polymorphic Version Field - Null Value Expression

Allows for the configuration of a custom SQL expression to be used in place of the NULL keyword if the value in a row is null or does not match the expected polymorphic column's data type.

New Attribute Polymorphic Version Field - Polymorphic Column Name Alias

Allows for the configuration of an alias for a polymorphic column name which will be used in the construction of the Dynamic Table Create Statement instead of the default polymorphic attribute name.

New Attribute Polymorphic Version Fields - Transformation Type and Transformation Expression

These new fields allow the user to override the default generated code for a specific attribute polymorphic version with a custom SQL expression. Users can use "{attribute_name}" as a textual placeholder in the expression for the json attribute data. This placeholder will be replaced in the code gen process with the appropriate reference to the attribute's path even if the attribute represents embedded or stringified JSON.

New Data Source Field - Deduplicate Messages

This new field in the Source Stream Settings of the Data Source screen allows users to determine if the generated Dynamic Table SQL code should add the necessary SQL to deduplicate messages streamed into Snowflake from Kafka and only include the most recent version of a record based on the primary key and the created datetime of the message. This deduplication process requires the user to provide the primary key(s) of the message's JSON schema. If this checkbox is not checked all records will be included when the Dynamic Table is deployed.

New Data Source Field - Nested Dynamic Table Optional Parameters

This new field allows users to configure optional parameters such as Refresh_Mode to be included in the Create Dynamic Table statement for all dynamic tables created for nested arrays. Optional parameters can be found here. https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table#optional-parameters This field is separate from the renamed Root Dynamic Table Optional Parameters field that stores optional parameters to be included in the generation of the root level dynamic table.

Scanning and Discovery

New Scan Configuration Field - Record Count Per Thread Process

This new field allows for the configuration of the number of records passed to each thread worker process. The size of the warehouse determines the amount of memory available to the scanning process. Each thread can process a batch of records in parallel with other threads. The projected memory utilization for the scanning process can be calculated by multiplying the average size of the JSON value times the the number in the "Record Count Per Thread Process" field times the number of threads being used.

Scanning Process - Memory Usage Enhancement

After processing the data for each thread, the memory used for storing the data will be released. This will have a significant performance and memory impact for larger datasets.

Scanning Process - Error Handling Enhancement

If the scanning process is interrupted to do an “Out of memory” error, an exception will be recorded and the status of the scan will be set to "Failed".

Scanning Process - Datetime Format Discovery for ISO Datetimes values with nanoseconds

In the previous version of Pancake, an ISO datetime with nanoseconds was not correctly identified as a datetime value. This issue has been resolved and these types of datetime strings will now be recognized with the correct Snowflake data type and datetime format.

Data Source Set Up

Data Source Screen Enhancement - Dropdowns for Database, Schema, and Objects

Replaced the textboxes for Database, Schema, and Database Object with dropdowns that show which objects Pancake has access to. Only the objects that a user has granted the appropriate privileges to Pancake will be visible.

Last updated