Scan Processing

Pancake uses a custom, proprietary approach to utilizing compute resources in Snowflake. Pancake's approach results in very good performance, which can scan tens of millions of documents in minutes, but makes configuring scans for large data sources more complex.

In Pancake, a Scan Configuration is a stored set of parameters for scanning a specific data source. Users may wish to configure quick scans when onboarding data sources, full scans at regular intervals if there is a chance of schema changes across the whole document, or configure a scan with a where clause to only scan for changes since the most recent scan. Scan configurations are flexible, and designed to give users granular control over how compute resources are utilized within the app.

Pancake will automatically set the number of threads for a Scan Configuration based on the maximum number available for a given warehouse size for optimal vertical scaling of the compute resource.

Last updated 1 year ago