DataPancake™ Docs
  • DataPancake™ Documentation
  • Overview
    • What is DataPancake™?
    • Understanding the Challenges of Semi-Structured Data
  • Getting Started
    • Quick Start Guides
      • Real Estate JSON Quick Start
      • FDA_UDI Data Quick Start
      • Internal Data Quick Start
      • Alerts
      • Page
    • Application Overview
      • Data Source Overview
      • Manage Data Source
      • Manage Scan Configuration
      • Scan Data Source
      • Data Source Attributes
      • Dynamic Table SQL
      • Manage Warehouse
  • How to Guides
    • How to Purchase & Install DataPancake
    • How to Manage Access to DataPancake
    • How to Grant DataPancake Account & Cortex AI Privileges
    • How to Add Warehouses to DataPancake
    • How to Create a Single DataPancake Data Source (Script Builder)
  • Topics
    • Warehouses
    • Data Sources
    • Scan Configurations
    • Attributes
    • Scan Processing
    • Views
    • Attribute Consolidation
    • Pricing
  • DEVELOPMENT
    • Release Notes
Powered by GitBook
On this page
  1. Topics

Scan Processing

Last updated 11 months ago

Pancake uses a custom, proprietary approach to utilizing compute resources in Snowflake. Pancake's approach results in very good performance, which can scan tens of millions of documents in minutes, but makes configuring scans for large data sources more complex.

In Pancake, a Scan Configuration is a stored set of parameters for scanning a specific data source. Users may wish to configure quick scans when onboarding data sources, full scans at regular intervals if there is a chance of schema changes across the whole document, or configure a scan with a where clause to only scan for changes since the most recent scan. Scan configurations are flexible, and designed to give users granular control over how compute resources are utilized within the app.

Pancake will automatically set the number of threads for a Scan Configuration based on the maximum number available for a given warehouse size for optimal vertical scaling of the compute resource.