Warehouses
In Pancake, warehouses refer to the specific virtual warehouses in your Snowflake account that you make available to the application as compute resources. As part of the application setup, Pancake is granted privileges which allow you to maintain control over the creation and management of your virtual warehouses via the application. This object is what allows Pancake to use the warehouses you make available to the app for scanning.
Our recommendation is that you set up a batch of warehouses used only by Pancake to handle concurrent scans and to cover a range of document sizes and complexity. We advise that you follow a naming convention of pancake_<warehouse size>_<number>
(e.g. pancake_medium_snowpark_optimized_3
) for ease of selection and use within the app or in a worksheet.
Currently, a given Scan Configuration must be assigned to a specific warehouse and each warehouse can only support one scan at a time. Make a note of this if you have a large number of scans processing across many warehouses to avoid over-reliance on a single resource.
Warehouses & Processing
Pancake makes efficient use of Snowflake's warehouse resources through a novel approach to job handling and how it handles vertical scaling across threads. However, there are very real memory constraints and the speed of a scan will depend on the complexity of a data source in addition to the number of documents in the data source.
In testing, Pancake is able to acheive scan speeds of approximately 1,000,000 records per minute on a medium Snowpark-Optimized warehouse, with External Tables running slightly slower. There are diminishing returns as you scale up in warehouse size, so it is unlikely you will get significantly better performance from a 2XL than from a Medium Snowpark-Optimized. For this reason we recommend provisioning multiple medium warehouses if you have a number of large data sources you wish to scan regulalry.
For most customers and use cases, we recommend using a single medium warehouse with the default Scan Configuration using a record limit of 0 and allowing Pancake to leverage the maximum number of threads in a single procedure call.
Snowflake has an organizational-level limit on the runtimes of Snowpark procedures in native apps which will kill a running task after 3600 seconds. If you have a data source that takes longer than one hour to complete scanning, the scan will fail. For very large data sources you must break the scan into multiple procedure calls from the Advanced Settings section of the Scan Configuration screen.
Adding Warehouses to Pancake
During Setup
Adding warehouses are the required first step when installing Pancake and configuring it for use, and must be done immediately following the granting of global permissions. This is represented in the setup script, which can be found in the app by clicking the Readme button in the upport right corner of the interface.
Here is the excerped code from that script:
You can add additional warehouses during setup by adding additional statements where you define the name, size, and type (for performance and memory-related reasons, we advise leaving MAX_CONCURRENCY_LEVEL
set to 1
for all Snowpark Optimized warehouses).
You may add as many warehouses as you wish during setup.
After Setup
Following the initial setup, you may find that your organization's needs are high enough to require adding additional warehouses. While you will not need to grant the global permissions again, you will need to specifically GRANT USAGE
on any newly created warehouses to Pancake.
You will need to run this set of statements in this order every time you wish to add a new warehouse.
Adding a Warehouse from the Pancake app
If you would prefer to add warehouses to Pancake using the application interface, you can create the warehouses directly from Snowsight (Snowflake's UI) on the Warehouses page under the Admin menu. However, you must still grand usage of that warehouse to Pancake through a worksheet.
After granting Pancake usage on the warehouse, you can navigate to the Manage Warehouses page in the app. Enter the name of the exact name of the warehouse and the size, then click the "Save" button.
Last updated