Scan Configurations
Introduction to scan configurations - learn and control how DataPancake scans your data.
Overview
Scan configurations control how DataPancake scans data sources. Each configuration defines compute resources, scanning strategy, scheduling, and attribute discovery method.
Configuration Sections
Basic Configuration Settings -
CONFIGURATION_NAME,ATTRIBUTE_CREATE_TYPE,RECORD_STATUS,CODE_GENERATE_ON_VERSION_CHANGEScheduling Settings -
MONITOR_ENABLED,MONITOR_CRON_SCHEDULE,MONITOR_CRON_TIMEZONEData Source & Warehouse Settings -
SCAN_RECORD_LIMIT,SCAN_WHERE_CLAUSE, warehouse selectionVertical Scale Settings - Thread count,
PROCEDURE_INSTANCE_COUNT,PROCEDURE_INSTANCE_ROW_COUNT,SCAN_ORDER_BY,THREAD_PROCESS_RECORD_COUNT(semi-structured only)Source Stream Settings -
SOURCE_STREAM_LAST_SCANNED_TIMESTAMPfor incremental scanningCommon Patterns & Best Practices - Configuration patterns and optimization
Quick Reference
Essential:
CONFIGURATION_NAME- Unique identifierATTRIBUTE_CREATE_TYPE-'Discover'(production) or'Schema'(prototyping)Virtual Warehouse - Compute resource
SCAN_RECORD_LIMIT- Number of records (0 = unlimited)
Advanced:
Number of Threads - Parallel processing (defaults to warehouse max, semi-structured only)
PROCEDURE_INSTANCE_COUNT- Split scans across multiple calls (60-minute timeout)MONITOR_CRON_SCHEDULE- Automated scanning with cron expressions
See Scan Processing for scanning details. See Warehouses for warehouse selection.
Last updated
Was this helpful?