Data Source Overview
Last updated
Last updated
The Data Source Overview page in Pancake gives you a high level overview of all the data sources you have added to the app, as well as any summary information for data sources you've scanned. The core feature of this page is the Data Source table and the filter for it.
If this is your first time launching Pancake and you have just installed the app, in the upper right corner you will see a small icon for the Readme. This contains several Quick Start scripts designed to configure the app for use, there are global permissions and warehouse provisioning which are necessary for Pancake to function.
The Data Sources table contains some basic information including:
Name - the name the user created for the data source when it was created, which likely reflects some combination of the database, schema, environment, and column, but note that it does not require explicitly referencing any of those elements.
Type - the Snowflake data object type such as Table, External Table, or View.
Object Name - the name of the Snowflake object which contains the JSON data.
The Data Source Overview page also contains key pieces of information about each data source added to Pancake. This level of detail is called a Schema Summary and is free for all users of Pancake. You can add an unlimited number of data sources, scan them, and get this level of detail about your data for no charge.
This includes:
Total Attributes - the total number of attributes discovered during scanning.
Polymorphic - the number of polymorphic attributes in the data source.
Arrays - total number of arrays, inclusive of polymorphic attributes.
Objects - total number of objects in the data source.
Max Level - the maximum nested depth of the arrays.
Complexity Score - a measure of how difficult a given data source would be to work with manually, this is a rough measure designed to help users quickly understand the quality of their data and triage their work.
Other important information on this page includes:
Connection Status - this indicates if a data source has been connected to Pancake successfully. Data sources which are not able to connect should fail when users attempt to add the data source.
Last Scan - the date of the most recent scan of this data source. Note that this is the most recent scan using any Scan Configuration.
Column Name - this is the actual name of the VARIANT
column in which the JSON data is stored.
Tags - this holds any tags a user has added to a data source.
Product Tier - the current product tier for a given data source, which can be Schema Summary, Schema Analysis, or Dynamic Table Generation.
Status - indicates the current status of the data source, which can be Active, Inactive, or Deleted.
Users can filter data sources added to Pancake by name, object name, user-defined tags, or the status of the data source.
Data sources can have a status of Active, Inactive, or Deleted. Pancake will default the Data source filter to "Active," so you must switch the filter to "All" if you want to see a full list on the Data Source Overview screen.
For the purposes of quickly sorting by how difficult a scanned data source will be to work with, Pancake includes a "Complexity Score" measure on the Data Source Overview screen. This calculation is a simple equation multiplying each element of the JSON data source that could impact the difficulty of schema discovery, extraction, and flattening of data.
In the future, these weights will be editable by users so your organization can weight each element appropriately given your capabilities and capacity.
The Scans in Process table displays a full list of active scans, including those which have just been initiated or are still finalizing. Users can find the name of the data source as well as the scan configuration being used. Other details from the scan configuration are also found in this table.
If for some reason you accidentially initiate a scan with an incorrect scan configuration or a scan has stalled without being caught by Pancake's error/exception handling, users can select an active scan from a dropdown in the Cancel Scan section of the page and click a button to end the process.