Warehouses

In Pancake, warehouses refer to the specific virtual warehouses in your Snowflake account that you make available to the application as compute resources. As part of the application setup, Pancake is granted privileges which allow you to maintain control over the creation and management of your virtual warehouses via the application. This object is what allows Pancake to use the warehouses you make available to the app for scanning.

Our recommendation is that you set up a batch of warehouses used only by Pancake to handle concurrent scans and to cover a range of document sizes and complexity. We advise that you follow a naming convention of pancake_<warehouse size>_<number> (e.g. pancake_medium_snowpark_optimized_3) for ease of selection and use within the app or in a worksheet.

Currently, a given Scan Configuration must be assigned to a specific warehouse and each warehouse can only support one scan at a time. Make a note of this if you have a large number of scans processing across many warehouses to avoid over-reliance on a single resource.

Warehouses & Processing

Pancake makes efficient use of Snowflake's warehouse resources through a novel approach to job handling and how it handles vertical scaling across threads. However, there are very real memory constraints and the speed of a scan will depend on the complexity of a data source in addition to the number of documents in the data source.

In testing, Pancake is able to acheive scan speeds of approximately 1,000,000 records per minute on a medium Snowpark-Optimized warehouse, with External Tables running slightly slower. There are diminishing returns as you scale up in warehouse size, so it is unlikely you will get significantly better performance from a 2XL than from a Medium Snowpark-Optimized. For this reason we recommend provisioning multiple medium warehouses if you have a number of large data sources you wish to scan regulalry.

For most customers and use cases, we recommend using a single medium warehouse with the default Scan Configuration using a record limit of 0 and allowing Pancake to leverage the maximum number of threads in a single procedure call.

Snowflake has an organizational-level limit on the runtimes of Snowpark procedures in native apps which will kill a running task after 3600 seconds. If you have a data source that takes longer than one hour to complete scanning, the scan will fail. For very large data sources you must break the scan into multiple procedure calls from the Advanced Settings section of the Scan Configuration screen.

Adding Warehouses to Pancake

During Setup

Adding warehouses are the required first step when installing Pancake and configuring it for use, and must be done immediately following the granting of global permissions. This is represented in the setup script, which can be found in the app by clicking the Readme button in the upport right corner of the interface.

Here is the excerped code from that script:

Warehouse Setup Script Code
USE ROLE <application_install_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <accountadmin_or_equivalent_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <warehouse_create_role>;

USE ROLE <accountadmin_or_equivalent_role>;
GRANT EXECUTE TASK ON ACCOUNT TO APPLICATION PANCAKE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO APPLICATION PANCAKE;

USE ROLE <warehouse_create_role>;

CREATE OR REPLACE WAREHOUSE PANCAKE_X_SMALL_01 WITH
  WAREHOUSE_SIZE = 'X-Small';
CREATE OR REPLACE WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 WITH
  WAREHOUSE_SIZE = 'Medium'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;
  
GRANT USAGE on WAREHOUSE PANCAKE_X_SMALL_01 to APPLICATION PANCAKE;
GRANT USAGE on WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 to APPLICATION PANCAKE;

USE ROLE <application_install_role>;
DECLARE
    warehouse_name VARCHAR DEFAULT 'PANCAKE_X_SMALL_01';
    warehouse_size VARCHAR DEFAULT 'X-Small';
    call_result VARCHAR;
BEGIN
    CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
    insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
    return call_result;
END;
DECLARE
    warehouse_name VARCHAR DEFAULT 'PANCAKE_MEDIUM_OPTIMIZED_01';
    warehouse_size VARCHAR DEFAULT 'Medium';
    call_result VARCHAR;
BEGIN
    CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
    insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
    return call_result;
END;

select * from PANCAKE.CORE.SETUP_RESULTS;

You can add additional warehouses during setup by adding additional statements where you define the name, size, and type (for performance and memory-related reasons, we advise leaving MAX_CONCURRENCY_LEVEL set to 1 for all Snowpark Optimized warehouses).

CREATE OR REPLACE WAREHOUSE PANCAKE_LARGE_OPTIMIZED_01 WITH
  WAREHOUSE_SIZE = 'Large'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;
  
CREATE OR REPLACE WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_02 WITH
  WAREHOUSE_SIZE = 'Medium'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;
  
CREATE OR REPLACE WAREHOUSE PANCAKE_SMALL_01 WITH
  WAREHOUSE_SIZE = 'Small'
  
CREATE OR REPLACE WAREHOUSE PANCAKE_2XL_OPTIMIZED_01 WITH
  WAREHOUSE_SIZE = '2X-Large'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;

You may add as many warehouses as you wish during setup.

After Setup

Following the initial setup, you may find that your organization's needs are high enough to require adding additional warehouses. While you will not need to grant the global permissions again, you will need to specifically GRANT USAGE on any newly created warehouses to Pancake.

USE ROLE <warehouse_create_role>;

CREATE OR REPLACE WAREHOUSE PANCAKE_LARGE_OPTIMIZED_03 WITH
  WAREHOUSE_SIZE = 'Large'
  WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
  MAX_CONCURRENCY_LEVEL = 1;
  
GRANT USAGE on WAREHOUSE PANCAKE_LARGE_OPTIMIZED_03 to APPLICATION PANCAKE;

USE ROLE <application_install_role>;

DECLARE
    warehouse_name VARCHAR DEFAULT 'PANCAKE_LARGE_OPTIMIZED_03';
    warehouse_size VARCHAR DEFAULT 'LARGE';
    call_result VARCHAR;
BEGIN
    CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
    insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
    return call_result;
END;

You will need to run this set of statements in this order every time you wish to add a new warehouse.

Adding a Warehouse from the Pancake app

If you would prefer to add warehouses to Pancake using the application interface, you can create the warehouses directly from Snowsight (Snowflake's UI) on the Warehouses page under the Admin menu. However, you must still grand usage of that warehouse to Pancake through a worksheet.

GRANT USAGE on WAREHOUSE PANCAKE_LARGE_OPTIMIZED_03 to APPLICATION PANCAKE;

After granting Pancake usage on the warehouse, you can navigate to the Manage Warehouses page in the app. Enter the name of the exact name of the warehouse and the size, then click the "Save" button.

Last updated