Common Patterns & Best Practices
Pre-configured patterns and optimization recommendations for common scan scenarios.
Common Configuration Patterns
Quick Onboarding Scan:
ATTRIBUTE_CREATE_TYPE:'Schema'(if schema sample available) or'Discover'SCAN_RECORD_LIMIT:10,000(or small sample)PROCEDURE_INSTANCE_COUNT:1Number of Threads: Default (maximum)
CODE_GENERATE_ON_VERSION_CHANGE:FALSE(review first)MONITOR_ENABLED:FALSEUse case: Initial data source setup, testing, quick schema validation
Full Production Scan:
ATTRIBUTE_CREATE_TYPE:'Discover'SCAN_RECORD_LIMIT:0(unlimited)PROCEDURE_INSTANCE_COUNT:1(or multiple if needed)Number of Threads: Default (maximum)
CODE_GENERATE_ON_VERSION_CHANGE:TRUEMONITOR_ENABLED:TRUE(daily/weekly as needed)Use case: Production data sources, complete attribute discovery, automated pipeline updates
Incremental Scan:
ATTRIBUTE_CREATE_TYPE:'Discover'SCAN_RECORD_LIMIT:0(unlimited)SCAN_WHERE_CLAUSE:created_date > '<last_scanned_timestamp>'SOURCE_STREAM_LAST_SCANNED_TIMESTAMP: Auto-updated after scansPROCEDURE_INSTANCE_COUNT:1Number of Threads: Default (maximum)
CODE_GENERATE_ON_VERSION_CHANGE:TRUEMONITOR_ENABLED:TRUE(hourly/daily)Use case: Regularly updated data sources, delta scanning workflows, schema monitoring
Large Dataset Scan:
ATTRIBUTE_CREATE_TYPE:'Discover'SCAN_RECORD_LIMIT:0(unlimited)PROCEDURE_INSTANCE_COUNT:4(or as needed)PROCEDURE_INSTANCE_ROW_COUNT:1,000,000(or calculated)SCAN_ORDER_BY:unique_id(or appropriate attribute)Number of Threads: Default (maximum)
CODE_GENERATE_ON_VERSION_CHANGE:TRUEMONITOR_ENABLED:TRUE(as needed)Use case: Data sources exceeding 60-minute scan time, very large datasets (millions+ records), complex nested structures
Memory-Constrained Scan:
ATTRIBUTE_CREATE_TYPE:'Discover'SCAN_RECORD_LIMIT:0(unlimited) or limitedPROCEDURE_INSTANCE_COUNT:1or multipleNumber of Threads: Reduced (e.g., 2-4 instead of max)
THREAD_PROCESS_RECORD_COUNT: ReducedCODE_GENERATE_ON_VERSION_CHANGE:TRUEMONITOR_ENABLED:FALSEorTRUEUse case: Memory errors during scanning, very complex nested structures, large arrays and polymorphic variations, smaller warehouse sizes
Best Practices
Configuration management:
Use descriptive, consistent names (e.g.,
customer_events_daily_full)Create separate configurations for different use cases (quick scan, full scan, incremental scan)
Keep only active configurations visible; use Inactive status for configurations you may reuse
Performance optimization:
Use medium warehouses for most use cases; reserve larger warehouses for complex data sources
Use default (maximum) threads unless memory constrained
Start with single procedure call (
PROCEDURE_INSTANCE_COUNT = 1); use multiple calls only if exceeding 60 minutes
Scheduling strategy:
Schedule during low-usage periods
Avoid overlapping scans on same warehouse
Use
SCAN_WHERE_CLAUSEwith timestamp filters for incremental scanningLeverage
SOURCE_STREAM_LAST_SCANNED_TIMESTAMP; clear timestamp for full re-scans
Error prevention:
Monitor scan duration; use multiple procedure calls if needed
Start with default settings; reduce thread count or
THREAD_PROCESS_RECORD_COUNTif memory errors occurValidate
SCAN_WHERE_CLAUSEand testSCAN_ORDER_BYattributes before useEnsure
PROCEDURE_INSTANCE_ROW_COUNTis accurate when using multiple procedure calls
Last updated
Was this helpful?