DataPancake™ Docs
  • DataPancake™ Documentation
  • Overview
    • Understanding the Challenges of Semi-Structured Data
    • Installation
  • Getting Started
    • How to Guides (Script Builder)
      • Admin Guides
        • How to Grant Account Level Privileges to DataPancake
        • How to Configure Availabile Warehouses in DataPancake
        • How to Manage Access to DataPancake
      • Data Source Guides
        • How to Create a Single DataPancake Data Source (Script Builder)
        • How to Create Multiple DataPancake Data Sources (Script Builder)
      • Alert Guides
        • How to Create Schema Drift Alerts (Script Builder)
    • How to Guides (UI)
      • How to Upgrade DataPancake from the Free Trial
      • How to Modify Enabled Features for a Data Source
      • How to Create a Data Source for Data Streamed from Kafka
    • Application Overview
      • Reference Architecture
        • Generated SQL DDL Deployment Options
  • DEVELOPMENT
    • Release Notes
Powered by GitBook
On this page
  1. Topics

Scan Processing

Last updated 12 months ago

Pancake uses a custom, proprietary approach to utilizing compute resources in Snowflake. Pancake's approach results in very good performance, which can scan tens of millions of documents in minutes, but makes configuring scans for large data sources more complex.

In Pancake, a Scan Configuration is a stored set of parameters for scanning a specific data source. Users may wish to configure quick scans when onboarding data sources, full scans at regular intervals if there is a chance of schema changes across the whole document, or configure a scan with a where clause to only scan for changes since the most recent scan. Scan configurations are flexible, and designed to give users granular control over how compute resources are utilized within the app.

Pancake will automatically set the number of threads for a Scan Configuration based on the maximum number available for a given warehouse size for optimal vertical scaling of the compute resource.