Release Notes
Last updated
Last updated
Structured Data Source Support Data sources now support both Semi-Structured and Structured types. Structured sources can leverage secure semantic layers using the Data Dictionary and Semantic Model Generator.
Virtual Attribute Enhancements
Virtual attributes can now include a description.
Attributes categorized as metric or filter (via the "W Question Category") are now auto-included in semantic model generation.
Attributes can now be edited directly.
Semantic Model Generator
New filter added for "W Question Category" to refine model output.
Pipeline Designer Updates
New comment fields for both raw and semantic layer transformation expressions.
Comments are now visible in the transformation view.
Snowflake data types now reflect primitive array types such as ARRAY<VARCHAR>
.
Warehouse Configuration Enhancements
New Record Status field with default set to Active.
Default baseline scan settings set to 10,000 rows/min with 8 threads (adjustable in the Data Source screen).
New Public Procedure
add_datasource_with_scan_and_features_paid
: Allows users to programmatically create a data source with scans, deployment DB/schema, and additional configurations.
Left Navigation Updates
Attribute Metadata → Pipeline Designer
Data Dictionary → Data Dictionary Designer
Semantic Model → Semantic Model Designer
Navigation Bar Fix Resolved visibility issues on smaller screens for:
Pipeline Designer
Semantic Model Designer
Data Source
Data Dictionary Performance Improved performance when saving changes.
Add Virtual Attribute Feature now supports inline descriptions and integration with Semantic Model Designer.
Bug Fixes
Resolved issue where Verified Date was not updating correctly.
Prevented duplicate foreign keys when alias is used across multiple arrays.
Table and column names in YAML are now generated in uppercase.
New View: vw_datasource_attribute_genai
Provides visibility into all semantic model metadata.
New Metadata Views
Security
Data Dictionary
Transformation
Access Control Enhancements
Added application-level RBAC using new access control tables, procedures, and views.
New role: app_admin
Can call add_user(username, can_initiate_scans BOOLEAN)
procedure to manage scan permissions.
First security feature now allows admins to restrict scan initiation per user.
Manifest Updates
Logging now limited to error-level only (reduced trace-level logging).
Manifest now requests additional privileges, including access to current session user.
Storage Optimization
Added TRANSIENT
keyword to dynamically generated SQL for dynamic tables to reduce storage cost by disabling time travel.
Data Type Defaults
Boolean types now default to Dimension in "W Question Category".
Expanded Format Support
Added support for scanning string columns (not just VARIANT).
JSON discovery now supported in Iceberg string columns with dynamic table generation.
Dynamic Table Alias Behavior
Nested attributes now replace the full path instead of appending to the end, improving naming clarity.
Decimal Type Handling
Fixed bug where 0
values in DECIMAL
columns were incorrectly materialized as NULL
.
Code now checks for 0
and casts explicitly as DECIMAL
.
Semantic Model Designer
Resolved error when creating a verified query without a verified date.
Scan Execution Handling
When a scan fails due to missing USAGE
privilege on a replaced warehouse, a user-friendly message is now returned along with the required GRANT USAGE
SQL statement.
New features have been added to the DataPancake offering. The billing model for DataPancake has been updated to create billable events monthly based on the feature used and the number of attributes contained in a particular data source.
Schema Discovery (formerly known as Schema Summary) - Free
Attribute Metadata Management (known formerly as Schema Analysis) - Paid
SQL Code Generation (formerly known as Extract, Relate, and Flatten) - Paid
Data Dictionary Builder - Paid
Security Policy Integration - Paid
Cortex Analyst Semantic Model Code Generation - Paid
Feature
Price per Attribute
Attribute Metadata Management
$0.50
SQL Code Generation
$0.90
Data Dictionary Builder
$0.30
Security Policy Integration
$0.75
Semantic Model Code Generation
$0.50
New columns have been added to the data source overview grid to indicate which services have been enabled for each data source.
Scans in Process will now show a more accurate estimate of the amount of time it will take to complete a scan based on the baseline scan settings. The baseline scan settings can be reset in the Data Source screen. The next scan completed will recompute the average number of records processed based on the number of threads used during the scan. Each subsequent scan will use the baseline scan settings to estimate the time needed to complete the scan.
A new left hand navigation menu has been added to better organize the available features in the application.
Individual Data Sources
Multiple Data Sources
Schema Drift Alert
Users will now have the option to enable specific services for each data source. Services replace the option to select a specific Product Tier.
Users will now have the option to select a source column data type. Currently the only option is Variant. In a future release users will have the option to select String.
Users can supply a single document to represent the schema for a specific data source. This sample document can be in either a JSON or XML format. For example, users can create a sample XML document from an existing XSD file or create a sample JSON document from an existing Avro Schema. Once the sample has been configured users can configure a scan to create attribute metadata from the provided sample instead of through data discovery.
Users can choose the type of output object. Current options include Dynamic Table or Table.
Users can now configure a data source to generate the code necessary to track Dynamic Table metadata including the insert and last updated datetimes. This will allow users to see when a Dynamic Table record was inserted or modified. The generated code will create a new table to store the metadata, a merge statement to initialize the table, a stream for the root level Dynamic Table to track changes, and a task to keep the table updated on users configured schedule.
Users can see the average number of records processed based on a specific thread count calculated from scan. These numbers can be recalculated by resetting the values to 0. The next subsequent scan will update these calculations.
When creating or updating a data source the user can see what fields are required if a specific piece of information has not been supplied.
Users can now configure the type of attribute creation a scan will use. The Discover attribute creation type will generate attribute metadata based on a discovery of the data located in the database object configured in the data source. The Schema attribute creation type will generate attribute metadata based on the single sample document configured for the data source. Multiple scan configurations can be created for a data source using both attribute creation types to allow a user to compare a schema against the data in a data source.
No changes
Users can now configure whether an attribute is unique, is a primary key, or whether the attribute contains enum values only.
Users can now modify the status of an attribute from active to inactive. Inactive attributes are not included in the code generation process. Users can now also see the difference between attributes discovered in the data source vs attributes created from the data source’s sample document.
The Data Governance tab has been renamed to “Semantic Layer - Security Policy”
Users can now configure relationship information for each nested array including:
Relationship Name
Relationship Description
Relationship Type
Join Type
This relationship information will be used as part of the semantic model code generation process.
If a user has configured a data source to create Dynamic Table metadata the code will be generated to allow for the creation of the necessary database objects to maintain metadata that will track the insert and last update datetimes of each row in the root level Dynamic Table.
The data dictionary/glossary/synonym builder uses Cortex AI to assist users in building a data dictionary for each data source. Descriptions can be created for the data source. Descriptions and synonyms can be created for each nested array and descriptions, synonyms, and sample values can be created for each attribute. Users can choose which LLM model they want to use to generate responses.
Users can create semantic model yaml files for use with Cortex Analyst. Information generated using the data dictionary builder will be included in the generation of all attribute descriptions and synonyms.
Users can choose which attributes to include and can configure additional properties such as:
W Question Category (Dimension, Time Dimension, Facts, Metric, Filter)
Sample Value (multiple comma separated values)
Enum Values (designates that the attribute contains only enum values)
Semantic Model Description (in addition to the glossary definition)
Semantic Model Expression
Cortex Search Service Name
Cortex Search Database
Cortex Search Schema
Cortex Search Column
Additional sections generated include:
Relationships
Verified Queries
Custom Instructions
No Changes
Defaulted the Refresh_Mode to Incremental for all new data sources with a product tier of Extract, Relate, and Flatten.
Added support for new data formats including Avro, Parquet, ORC, and XML.
Added new checkbox to configure whether generated dynamic tables and views enable case sensitivity for column names.
For data sources that are flattening data from a Kafka topic, the unique identifier has been replaced with a deduplication expression allowing users to configure the sql used to deduplicate kafka messages.
No Changes
Bug Fix: Updated datetime formats for time zone offsets. Previously if a string included a plus sign followed by the offset value, or the time followed by a “Z” representing UTC, the format string used in the generated code would be incorrectly identified. This issue has been resolved.
Pancake now supports the scanning and discovery of XML data stored in a Variant Snowflake column.
In the foreign keys configuration the user now has the option to include the foreign key in the Row Access Policy.
Virtual Attributes: The ability to create a virtual attribute with a custom SQL expression is now available. SQL expressions can also refer to other flattened columns and can be added as foreign keys to nested arrays.
DataPancake now supports the generation of dynamic tables to extract, related, and flatten XML data stored in a Variant column.
If the ‘Include Stream Message Metadata’ checkbox is checked then Kafka metadata will be flattened and included in the root dynamic table as well all nested dynamic tables for each array. This data will be available in all views generated as part of the data governance semantic layer.
If the ‘Use Case Sensitivity’ checkbox is checked then each column generated for dynamic tables and views will be enclosed with a double quote. This will allow for any special characters used as part of the column name.
Bug Fix: If the ‘Include in Code Gen’ checkbox is unchecked for any array the corresponding view will no longer be generated.
For data sources that are flattening a Kafa Topic the deduplication sql will now use the new data source property called ‘Deduplication SQL Expression’. This will allow the user to configure the SQL to their specific needs including the choice between using the Rank() or Row_Number() functions.
The data source value configured for the ‘Root Dynamic Table Select Prefix’ will now be available in the root level view in the Data Governance Semantic Layer. All columns configured in this value will need to be aliased with an ‘as <alias_name>’.
Foreign keys configured for arrays will now be added to the Row Access Policy definitions in the Data Governance Semantic Layer views if the user configured the foreign key should be added to the Row Access Policy.
Virtual attributes are now included in the code generation of dynamic tables and views. Virtual attributes are defined after the discovered attributes so that virtual attribute SQL expressions can refer to flattened columns created from discovered attributes.
Pancake has been rebranded as DataPancake to better articulate its purpose.
Filter default update to Active
Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.
Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.
Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.
Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.
Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.
Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.
No Updates
The scan process will consolidate the discovered schema based on the data source consolidation configuration. This consolidation process will reduce the number of attributes based on the search and replace regex values configured by the user.
The scan process will designate the attribute’s record status as ‘inactive’ for any attributes that meet one or more of the filter criteria. All inactive attributes will be excluded from the code generation process.
Additional ISO formats are now being recognized successfully
Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.
Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.
Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.
Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.
Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.
If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.
The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.
If the polymorphic version of an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”
The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.
Array dynamic tables will use the table alias name, if one is provided
Dynamic table names will be use the deployment database and schema if one was provided.
If a dynamic table name includes special characters the entire name will be enclosed in double quotes.
The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.
The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.
The code generation process will exclude dynamic tables for arrays that have been configured to be excluded.
No Updates
Filter default update to Active
Defaults the search to active but the user can still filter for statuses of inactive, deleted, or all. Fixed an issue that prevented the user from filtering on the record status and one of the other filters.
Dynamic table deployment database and schema name
Allows for dynamic tables to be deployed to a different database and schema than where the raw data is stored.
Dynamic Table row level policy name
Allows for the configuration of a row level policy for the root dynamic table. This will be used in the view code generation process which is coming in the next release.
Root Table Select Prefix
Allows for the selection of other columns in the table which contains the variant data source column. This can be a comma separated list of columns.
Schema Consolidation (Search/Replace)
Allows for the configuration of schema consolidation when the number of attributes in the source data are too numerous (5K+). A search and replace regex value allows the discovered schema to be reduced. This feature is used in conjunction with the new attribute metadata, Consolidation Search and Consolidation Insert SQL.
Schema Filter
Allows for the configuration of schema filtering. This feature will not prevent schema from being scanned and discovered but it will prevent discovered schema from being included in the code generation process. The filter is created through a regex value.
No Updates
Configurable schema consolidation
Configurable schema filtering
Bug Fix - Update to Datetime inference
Bug Fix - Issue with multiple procedure calls
Attribute level masking policy
Allows for the configuration a masking policy for each attribute. This data will be used in the view code generation process. The user will also have the ability to configure the masking policy parameters if the masking policy is conditional.
Attribute row level policy inclusion
Allows for the configuration of the row level policy. The user can determine which columns should be included in the row level policy configured with the data source or the array.
Attribute Path Insert Search/Replace
Allows for the configuration of a search and insert value that works together with data source consolidation. The search value will find a value in the path and then insert a SQL expression immediately after the search value to create a concatenated value used in conjunction with the GET_PATH function. This feature enables the generated dynamic table SQL statement to access the actual path of the json path that was previously consolidated.
Arrays Grid (with include in code gen and row level policy name)
Allows for the configuration of array level settings including a row level policy name, a dynamic table alias name, and a check box to determine if the array is included in the code generation process.
Attribute cluster inclusion
Allows for the configuration of the dynamic table cluster clause. If the column is configured to be included in the cluster it will be added to the dynamic table cluster statement for the dynamic table the column belongs to.
Attribute Record Status
If attributes have been filtered they will be configured with an attribute status of ‘inactive’. All other attributes will have a record status of ‘active’. The attribute filter on the Data Attributes screen now supports the ability to filter on this status.
Primitive Array - Multiple Data Types
The code generation process now supports primitive arrays that contain multiple data types. The primitive attribute can also now support transformations.
Auto alias creation for datetime attributes
If the polymorphic version of a an attribute is determined to be a date/time/datetime value an alias will be created automatically to append the Snowflake data type in lieu of “_str”
Schema Consolidation Search/Insert
The code generation process will use the attribute’s consolidation search and insert values to access the json path that was consolidated during the scan process based on the consolidations created for the data source.
Dynamic table create name
Array dynamic tables will now use the table alias name, if one was provided, and will also use the deployment database and schema if one was provided.
Root table selected columns prefix
The code generation process will include the select prefix configured with the data source as part of the root level dynamic table select statement.
Filtered attributes are not included in the code generation process
The code generation process will exclude any attributes with an attribute_record_status equal to ‘inactive’.
Generate Array dynamic tables based on Include in Code Gen
Creation of cluster by clause
Create views for each dynamic table with column level masking policies
Creation of Row policy clause for root dynamic table
No Updates
No Updates
No Updates
Configuration Settings Calculator
Allows a user to gather diagnostic information about the data source to retrieve data used in the Calculator’s variables. Based on the warehouse size chosen, the calculator will produce values for the Number of Procedure Calls, Record Count Per Procedure Call and Record Count Per Thread Worker Process that can be used for the required configuration settings. The calculator will help the user to avoid trying to process too much information at one time and either running out of memory or exceeding the maximum timeout of 60 minutes per procedure call as set by Snowflake.
Record Count Per Thread Worker Process - Position Moved
The “Record Count Per Thread Worker Process” has been moved below the “Record Count Per Procedure Call” textbox.
Bug Fix - Checkbox Value Reset
Fixed a bug where sometimes the checkbox values for Enable Schedule and Auto Code Generate were not getting reset after the creation of a new scan configuration.
Default Value for Precision and Scale
During the discovery of an attribute’s float polymorphic version the metadata will use a default value for the precision of 38 and a default value of 10 for the scale. The user can still modify this value post scan.
Datetime Inference for ISO date update
ISO datetime values with time zones and seconds represented with a “+” will now be recognized and set to use the TIMESTAMP_TZ Snowflake Data Type
Enhanced error handling for large datasets
Out of memory errors can occur when using a warehouse size that is too small for the amount of data being processed. The new error handling will report these types of messages and make it easier to determine when to use a larger warehouse size.
New attribute metadata fields to determine whether a custom datetime format should be used
Datetime Format now supports multiple formats
Allows for a user to configure the datetime format of an attribute by providing a comma separated list of datetime formats when an attribute in a data source may have values that use multiple datetime formats.
Rearrangement of grid columns
Columns have been rearranged in the editable attributes grid to make it easier to access the columns that can be updated.
Updated Attribute Filter
The filter now supports filtering the source data type or the Snowflake data type.
Numeric Value Flattening Code for Invalid Characters
If an attribute has embedded or stringified JSON that has a polymorphic version using an int or float data type, the sql generated will now perform a conversion from the variant value to VARCHAR and then will then use a TRY_TO_? Function to determine if the value can be converted to the appropriate data type. Some values are recognized as a Snowflake INTEGER or DECIMAL using the TYPEOF function but cannot be converted to the respective Snowflake data type.
Bug Fix - Embedded JSON with “null” values
The generated code will now attempt to parse the embedded or stringified JSON using the TRY_PARSE_JSON. If the embedded JSON contains the value of “null” the resulting value for that column will be NULL and not cause a Dynamic Table deployment execution error
Multiple datetime formats for a single column
The code generation for a string using a datetime inference containing multiple formats will use the COALESCE function combined with the TRY_TO_? Function and will use the first one that produces a non null value. If all datetime formats result in a NULL value then the value returned for that specific row will be NULL.
User configurable use of a custom datetime format
If the user disabled the use of the datetime format the standard TRY_TO_? Function will be used without specifying a datetime format resulting in the use of the AUTO datetime inference.
No Updates
Dynamic Table SQL Generation
New Attribute Polymorphic Version Field - Null Value Expression
Allows for the configuration of a custom SQL expression to be used in place of the NULL keyword if the value in a row is null or does not match the expected polymorphic column's data type.
New Attribute Polymorphic Version Field - Polymorphic Column Name Alias
Allows for the configuration of an alias for a polymorphic column name which will be used in the construction of the Dynamic Table Create Statement instead of the default polymorphic attribute name.
New Attribute Polymorphic Version Fields - Transformation Type and Transformation Expression
These new fields allow the user to override the default generated code for a specific attribute polymorphic version with a custom SQL expression. Users can use "{attribute_name}" as a textual placeholder in the expression for the json attribute data. This placeholder will be replaced in the code gen process with the appropriate reference to the attribute's path even if the attribute represents embedded or stringified JSON.
New Data Source Field - Deduplicate Messages
This new field in the Source Stream Settings of the Data Source screen allows users to determine if the generated Dynamic Table SQL code should add the necessary SQL to deduplicate messages streamed into Snowflake from Kafka and only include the most recent version of a record based on the primary key and the created datetime of the message. This deduplication process requires the user to provide the primary key(s) of the message's JSON schema. If this checkbox is not checked all records will be included when the Dynamic Table is deployed.
New Data Source Field - Nested Dynamic Table Optional Parameters
Scanning and Discovery
New Scan Configuration Field - Record Count Per Thread Process
This new field allows for the configuration of the number of records passed to each thread worker process. The size of the warehouse determines the amount of memory available to the scanning process. Each thread can process a batch of records in parallel with other threads. The projected memory utilization for the scanning process can be calculated by multiplying the average size of the JSON value times the the number in the "Record Count Per Thread Process" field times the number of threads being used.
Scanning Process - Memory Usage Enhancement
After processing the data for each thread, the memory used for storing the data will be released. This will have a significant performance and memory impact for larger datasets.
Scanning Process - Error Handling Enhancement
If the scanning process is interrupted to do an “Out of memory” error, an exception will be recorded and the status of the scan will be set to "Failed".
Scanning Process - Datetime Format Discovery for ISO Datetimes values with nanoseconds
In the previous version of Pancake, an ISO datetime with nanoseconds was not correctly identified as a datetime value. This issue has been resolved and these types of datetime strings will now be recognized with the correct Snowflake data type and datetime format.
Data Source Set Up
Data Source Screen Enhancement - Dropdowns for Database, Schema, and Objects
Replaced the textboxes for Database, Schema, and Database Object with dropdowns that show which objects Pancake has access to. Only the objects that a user has granted the appropriate privileges to Pancake will be visible.
Allows for the configuration of whether a custom datetime format should be used as part of the sql code gen process. If the user disables the datetime format then the standard TRY_TO_? Function will be used in the Dynamic Table column definition without specifying a datetime format resulting in the use of the AUTO inference. Find more information related to the AUTO datetime inference in the Snowflake documentation.
This new field allows users to configure optional parameters such as Refresh_Mode to be included in the Create Dynamic Table statement for all dynamic tables created for nested arrays. Optional parameters can be found here. This field is separate from the renamed Root Dynamic Table Optional Parameters field that stores optional parameters to be included in the generation of the root level dynamic table.