🛠️Internal Data Quick Start

The internal setup guide is intended to walk you through creating Pancake optimized warehouses and adding datasources to Pancake that are connected to your own internal data including tables, views, materialized views, and external tables. Code blocks can also be found in the Readme inside the Pancake app and are referred to here as Script #3 and Script #4 to align with the app.

  • Script #3 is used to set up global privileges and warehouses.

  • Script #4 is to used to grant select privileges to database objects and create Pancake data sources.

Script #3 - Global Privileges and Warehouse Set Up

Script 3 is used for setting global privileges for Pancake and setting up the warehouses the app will use for scanning. If you follow the instructions contained in the comments through to the end you should be able to run the entire worksheet at once.

Pay close attention to which roles are performing which action to ensure each has sufficient privileges to execute each task.

Script #3 - Global Privileges and Warehouse Set Up
/* ********** SCRIPT #3 - Your Data

Global Privileges and Warehouse Set Up Script

Script Steps

#1 - Grant the Pancake Application Role to other roles in your organization
#2 - Grant the EXECUTE TASK privilege to Pancake so the app can execute tasks in your account. Tasks are used to initiate data scans.
#3 - Create the Snowflake Virtual Warehouses for Pancake and grant the USAGE privilege for those warehouses to Pancake so the app can assign a warehouse to a task and then start the task.
#4 - Create a temporary table used to store the Pancake Procedure call results used to add warehouses to Pancake
#5 - Add the warehouse instance details to Pancake.


Script Placeholders:

(For convenience, you can use the built in find/replace feature in Snowflake for the following placeholders)

<application_install_role> (x2)
<accountadmin_or_equivalent_role> (x2)
<warehouse_create_role> (x1)
<other_roles_that_need_access_to_pancake> (x1 optional)

**After the placeholder values have been updated you can run the entire script at once. The results of each step will be returned after the last statement in the script is executed.

*/


--#1 - Execute the following SQL statement to grant your ACCOUNTADMIN role or any other role that can grant the EXECUTE TASK privilege on your account to PANCAKE so that role can complete the next step. If you are installing Pancake with the ACCOUNTADMIN role then grant access to other roles that will need access to Pancake such as the role used to create warehouses.
USE ROLE <application_install_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <accountadmin_or_equivalent_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <warehouse_create_role>;

--optional
--GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <other_roles_that_need_access_to_pancake>;

--#2 - Execute this SQL to grant the EXECUTE TASK PRIVILEGE to PANCAKE. Tasks are used by PANCAKE to scan data.
USE ROLE <accountadmin_or_equivalent_role>;
GRANT EXECUTE TASK ON ACCOUNT TO APPLICATION PANCAKE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO APPLICATION PANCAKE;

--#3 - To perform the next set of actions please use the role that has access to manage and create Snowflake virtual warehouses.
USE ROLE <warehouse_create_role>;

--Execute the following SQL to create Snowflake virtual warehouses that will be used by Pancake to scan semi-structured data.
--An X-Small virtual warehouse will be used to create data sources and complete quick scans with a limit 1000 or less per data source.
CREATE OR REPLACE WAREHOUSE PANCAKE_X_SMALL_01 WITH
  WAREHOUSE_SIZE = 'X-Small';

--A medium sized Snowflake virtual warehouse with snowpark optimization is required to scan data because of its 
--increased memory capacity. The max concurrency level is set to 1 to maximize compute and memory availability.
CREATE OR REPLACE WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 WITH
  WAREHOUSE_SIZE = 'Medium'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;

--Execute the following SQL to grant access to the newly created warehouse from Step #5 to the PANCAKE application.
GRANT USAGE on WAREHOUSE PANCAKE_X_SMALL_01 to APPLICATION PANCAKE;
GRANT USAGE on WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 to APPLICATION PANCAKE;


--#4 - Create the tempoary table to store the Pancake procedure call results

--To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.
USE ROLE <application_install_role>;

--Execute the following SQL statement to create a temporary table used to store results that are created by Pancake Procedures which are executed in later step #5.
CREATE or replace TEMPORARY TABLE PANCAKE.CORE.SETUP_RESULTS (
    object_type varchar,
    object_name varchar,
    call_result varchar
);

--#5 - Create the following code block for each warehouse you have granted usage rights to use with PANCAKE.
--Add the newly created virtual warehouse in Step #5a and #6a to PANCAKE
DECLARE
    warehouse_name VARCHAR DEFAULT 'PANCAKE_X_SMALL_01';
    warehouse_size VARCHAR DEFAULT 'X-Small';
    call_result VARCHAR;
BEGIN
    CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
    insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
    return call_result;
END;

--Add the newly created virtual warehouse in Step #5b and #6b to PANCAKE
DECLARE
    warehouse_name VARCHAR DEFAULT 'PANCAKE_MEDIUM_OPTIMIZED_01';
    warehouse_size VARCHAR DEFAULT 'Medium';
    call_result VARCHAR;
BEGIN
    CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
    insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
    return call_result;
END;

--Show the results from all PANCAKE actions performed.
select * from PANCAKE.CORE.SETUP_RESULTS;

Script #4 - Grant Select Privileges and Adding Data Sources

Now you're ready to identify VARIANT columns in your Snowflake environment, which often contain RAW JSON and are what Pancake is designed to scan and analyze. Specifically, it is designed to filter for database objects that have columns with a VARIANT data type. Those objects could be Tables, External Tables, Views, or Materialized Views.

Don't worry about checking yourself if a given column contains valid JSON, Pancake will perform a check and only add data sources which the app can successfully connect to.

The script will return high level information about those columns, as well as the GRANT_STATEMENT_CODE_BLOCK and CREATE_DATASOURCE_CODE_BLOCK columns. These statements will be used as part of Script #4 (Data Source Creation), so we recommend running this script in its own worksheet for later reference. You can also download the result as a .csv file so you can copy and paste multiple code blocks at once later on in the quick start process:

You must run this script for each database that may have database objects you wish to scan.

You will need to use a role which has sufficient privileges for the databases you wish to check and specifiy a database.

The GRANT_STATEMENT_CODE_BLOCK and CREATE_DATASOURCE_CODE_BLOCK statements returned should look something like this:

-- GRANT STATEMENT
GRANT USAGE ON DATABASE TDAA_FDA_DATA TO APPLICATION PANCAKE;
INSERT INTO PANCAKE.CORE.SETUP_RESULTS SELECT 'Grant Select to Database', 'TDAA_FDA_DATA', * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
--OR
GRANT IMPORTED PRIVILEGES ON DATABASE TDAA_FDA_DATA TO APPLICATION PANCAKE;
INSERT INTO PANCAKE.CORE.SETUP_RESULTS SELECT 'Grant Select to Database', 'TDAA_FDA_DATA', * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));

-- CREATE DATA SOURCE STATEMENT
DECLARE 
	datasource_name VARCHAR DEFAULT 'TDAA_FDA_DATA.MEDICAL_DEVICE.UDI_JSON'; 
	object_type VARCHAR DEFAULT 'Table'; 
	database_name VARCHAR DEFAULT 'TDAA_FDA_DATA'; 
	schema_name VARCHAR DEFAULT 'MEDICAL_DEVICE'; 
	object_name VARCHAR DEFAULT 'UDI_JSON'; 
	column_name VARCHAR DEFAULT 'JSON_DATA'; 
	datasource_tag VARCHAR DEFAULT ''; 
	warehouse_name VARCHAR DEFAULT 'PANCAKE_X_SMALL_01'; 
	record_limit DEFAULT 150; 
	initiate_scan DEFAULT TRUE; 
	call_result VARCHAR;
BEGIN 
	CALL PANCAKE.core.add_datasource_with_scan( 
:datasource_name,:object_type,:database_name,:schema_name,:object_name,:column_name, 
:warehouse_name, :record_limit, :initiate_scan, :datasource_tag) into
:call_result;

insert into PANCAKE.CORE.SETUP_RESULTS values (:object_type,:object_name,:call_result);

END;

You will copy and paste the complete versions of these statements futher down in the script

The next part of the script is used to grant SELECT privileges to database object and create Pancake data sources. As with Script #4, if you follow the instructions through to the end you should be able to run the entire worksheet at once.

By default data sources are added with a "quick scan" Scan Configuration which scans 150 records. If you wish to add data sources and initiate a scan with more records or a complete scan, you can edit the record_limit DEFAULT 150 line before running the statement.

Script #4 - Grant Select Privileges and Adding Data Sources
/* ********** SCRIPT #4 - Internal Data - Data Source, Scan Configuration, and Initial Scan

Grant Select Privileges to Database Objects and Create Pancake Data Source - Set Up Script


Script Steps
#1 - Execute the provided select query to find variant columns in a database. The results of the select statement will include column information and also include code blocks that can be copy/pasted into this script to quickly onboard data sources in to Pancake.
#2 - Create the tempoary table to store the results of Grant SQL statements and Pancake procedure calls
#3 - Grant the Select privilege or Import Privileges to the database objects you wish to scan.
#4 - Call procedure to add data sources to Pancake with a quick start scan configuration, which will then also initiate the scan.


Script Placeholders:

(For convenience, you can use the built in find/replace feature in Snowflake for the following placeholders)
 
<database_access_role> (x4)
<application_install_role> (x3)

<database_name> (x5 per database)
<schema_name> (x1 per schema)

**After updating the placeholder values you can 
- run the query in step #1 which will return the code blocks to grant privileges and add data sources to Pancake. 
- copy and paste the values from the code blocks retrieved in Step #1 into the appropriate location in steps #3 and #4 in this worksheet 
- Execute the sql in steps #2 to the end of the script . 

The results of each step will be returned after the last statement in the script is executed. 

**If you are not connecting to a database in your instance and are only connecting to a shared database then comment out the two sql statements under #3A.


#1 - Variant Column Query
Copy and run the following query for each database that may have database objects you wish to scan. This query is designed to filter for database objects that have columns with a VARIANT data type. Those objects could be Tables, External Tables, Views, or Materialized Views. You can download the results of this query into Google Sheets or Microsoft Excel to access code blocks generated for each object and column combination into this worksheet. 

****In order to onboard data sources into PANCAKE more quickly, you can use the pre-generated code blocks to grant access to database objects to Pancake and for executing Pancake procedures to add data sources and initiate scans.

To perform the next set of actions please use the role that has access to all of the database objects in this database.
*/

USE ROLE <database_access_role>;

select columns.table_catalog as db_name, columns.table_schema as schema_name, columns.table_name, column_name, tables.table_type as object_type, tables.row_count as object_row_count, tables.bytes as object_total_bytes,
concat(
case when databases.type = 'IMPORTED DATABASE' then
concat('GRANT IMPORTED PRIVILEGES ON DATABASE ', case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, columns.table_catalog, case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, ' TO APPLICATION PANCAKE;')
else 
concat('GRANT REFERENCES, SELECT ON  ', case when tables.table_type = 'BASE TABLE' then 'TABLE' ELSE tables.table_type end ,' ', columns.table_catalog, '.', columns.table_schema, '.', case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, columns.table_name, case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, ' TO APPLICATION PANCAKE;') end,
'\r\nINSERT INTO PANCAKE.CORE.SETUP_RESULTS SELECT ''Grant Select to Database'', ''', case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, columns.table_catalog, case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end ,'''' , ', * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));')
as grant_statement_code_block, 
concat('DECLARE \r\n\t',
'datasource_name VARCHAR DEFAULT ', '''',  columns.table_catalog, '.', columns.table_schema, '.', columns.table_name, '''; \r\n\t',
'object_type VARCHAR DEFAULT ', '''', case when tables.table_type = 'BASE TABLE' then 'Table' when tables.table_type ='VIEW' then 'View' when tables.table_type = 'MATERIALIZED TABLE' then 'Materialized Table' when tables.table_type = 'EXTERNAL TABLE' then 'External Table' end, '''; \r\n\t',
'database_name VARCHAR DEFAULT ', '''', case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, columns.table_catalog, case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, '''; \r\n\t','schema_name VARCHAR DEFAULT ', '''', case when upper(columns.table_schema) not like columns.table_schema or rlike(columns.table_schema, '.[0-9].*') then '"' else '' end, columns.table_schema, case when upper(columns.table_schema) not like columns.table_schema or rlike(columns.table_schema, '.[0-9].*') then '"' else '' end, '''; \r\n\t',
'object_name VARCHAR DEFAULT ', '''', case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, columns.table_name, case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, '''; \r\n\t','column_name VARCHAR DEFAULT ', '''', case when upper(column_name) not like column_name or rlike(column_name, '.[0-9].*') then '"' else '' end,  column_name, case when upper(column_name) not like column_name or rlike(column_name, '.[0-9].*') then '"' else '' end, '''; \r\n\t',
'datasource_tag VARCHAR DEFAULT ', '''', '''; \r\n\t',
'warehouse_name VARCHAR DEFAULT ', '''', 'PANCAKE_X_SMALL_01', '''; \r\n\t','record_limit DEFAULT 150; \r\n\t', 'initiate_scan DEFAULT TRUE; \r\n\t', 'call_result VARCHAR;\r\n',
'BEGIN \r\n\t','CALL PANCAKE.core.add_datasource_with_scan( \r\n',':datasource_name,:object_type,:database_name,:schema_name,:object_name,:column_name, \r\n',':warehouse_name, :record_limit, :initiate_scan, :datasource_tag) into\r\n',':call_result;\r\n',
'\r\nINSERT INTO PANCAKE.CORE.SETUP_RESULTS values (:object_type,:object_name,:call_result);\r\nreturn call_result;\r\n','\r\nEND;') as create_datasource_code_block
from <database_name>.information_schema.columns inner join <database_name>.information_schema.tables on columns.table_catalog = tables.table_catalog and 
columns.table_schema = tables.table_schema and columns.table_name = tables.table_name inner join <database_name>.information_schema.databases on tables.table_catalog = databases.database_name
where data_type = 'VARIANT' and is_dynamic = 'NO';

--#2 - Create the tempoary table to store the results of Grant SQL statemetns and Pancake procedure calls

--To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.
USE ROLE <application_install_role>;

--Execute the following SQL statement to create a temporary table used to store results that are created by Grant statements and Pancake Procedures which are executed in later steps (#9 and #15).
CREATE or replace TEMPORARY TABLE PANCAKE.CORE.SETUP_RESULTS (
    object_type varchar,
    object_name varchar,
    call_result varchar
);

--If you plan to use multiple roles to provide Pancake access to various databases then add a new GRANT INSERT and optional INSERT INTO statement(s) here for each role you will use to grant the select privilege. If you plan to use only one role to grant the select privilege to various database objects then you will only need these two sql statements.
GRANT INSERT,SELECT ON TABLE PANCAKE.CORE.SETUP_RESULTS TO ROLE <database_access_role>;
INSERT INTO PANCAKE.CORE.SETUP_RESULTS SELECT 'Results Table Insert Grant', '<database_access_role>', * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));

--#3 - Grant the Select privilege to the database objects that Pancake will be allowed to scan.
--To perform the next set of actions please use the role that has access to all of the database objects in this database you wish to scan.
USE ROLE <database_access_role>;

/*
The following SQL statements are needed for each database, schema and database object you wish to grant access to Pancake to. You can use the pre-generated SQL statements found in the results of the Variant Column Query in Step #1 in the grant_statement_code_block. You can optionally download the results and open the csv file in Excel or upload to Google Sheets. Copy the code blocks for the database objects you want to scan and paste over the placeholder.
*/

--#3A - Execute the following SQL for each internal/external database that has database objects you wish to scan and discover. 
--If your database was imported or from a share these next two statements are not required and be commented out.
GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;

--paste the SQL grant statements that were created in the Variant Column Query in Step #1 located in the "grant_statement_code_block" column below this line.



/*
#4 - Add data sources to Pancake.
The following section defines each of the components of the SQL Anonymous Code Block used to add database sources to Pancake.

DATA SOURCE NAME - a user defined name for the data source
OBJECT TYPE (can be a Table, View, Materialized View, or External Table)
DATABASE NAME
SCHEMA NAME
OBJECT NAME (the name of the table, view, materialized view, or external table)
OBJECT ATTRIBUTE NAME - the name of the VARIANT column
OBJECT UNIQUE ATTRIBUTE NAME - the name of the attribute at the root level that is unique across the entire dataset
WAREHOUSE NAME - the name of the warehouse created in the first step of this process
RECORD LIMIT - The record limit of the number of rows to scan (quick scan)
INITIATE SCAN - Whether or not to initiate the scan after it has been added to pancake. This quick scan will determine the records per minute metric to be used in creating a scan configuration to complete the full scan.

copy and paste the SQL code block from the Variant Column Query in Step #1 stored in the "create_datasource_code_block" column. To use the Variant Column Query in Step #1, you can optionally download the results and open the csv file in Excel or upload to Google Sheets. Copy the code blocks for the database objects you want to scan and paste into an editor. Remove any double quotes. Then copy and paste the code blocks over the placeholder text below. 

After pasting the code blocks you can modify the datasource_tag which is optional.

To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.
*/

USE ROLE <application_install_role>;

--paste the SQL create Pancake data source statements that were created in the Variant Column Query in Step #1 located in the create_datasource_code_block column below this line



--Show the results from all PANCAKE actions performed.
select * from PANCAKE.CORE.SETUP_RESULTS;



/*

Pancake Views

Tier 1 - Schema Summary

select * from PANCAKE.CORE.VW_DATASOURCES;
select * from PANCAKE.CORE.VW_DATASOURCE_SCAN_CONFIGURATIONS;
select * from PANCAKE.CORE.VW_DATASOURCE_SCANS_IN_PROCESS;
select * from PANCAKE.CORE.VW_DATASOURCE_SCANS;
select * from PANCAKE.CORE.VW_DATASOURCE_SCAN_ERRORS;
select * from PANCAKE.CORE.VW_DATASOURCE_PRODUCT_TIER_AUDIT_LOG;
select * from PANCAKE.CORE.VW_DATASOURCE_BILLING_EVENTS;

Tier 2 - Schema Analysis

select * from PANCAKE.CORE.VW_DATASOURCE_ALL_ATTRIBUTES;
select * from PANCAKE.CORE.VW_DATASOURCE_VERSION_ALL_ATTRIBUTES;


*/

Start Scanning

Congratulations! If you open up the Pancake application and navigate to the Data Source Overview page you should see the Data Source table beginning to fill with sources and high level summary information about your data.

For more information about adding data sources after you've completed this set up, either from a worksheet or in the app itself, refer to the Data Sources documentation.

Last updated