Vertical Scale Settings
Control thread count, procedure calls, and memory management for optimal scan performance.
Vertical scale settings control how DataPancake utilizes compute resources within a single warehouse. These settings are only available for semi-structured data sources.
Number of Threads
Number of parallel threads used during scanning. Defaults to maximum threads available for chosen warehouse size.
Default behavior:
Automatically set to warehouse maximum
Small: 1-2 threads, Medium: 4-8 threads, Large: 8-16 threads, 2X-Large: 16+ threads
Manual override:
Can be reduced for memory-constrained scenarios
Lower thread count = less memory usage, slower processing
Higher thread count = more memory usage, faster processing (up to warehouse max)
Note: Use default (maximum) for most scenarios. Reduce only if experiencing memory errors.
Number of Procedure Calls (PROCEDURE_INSTANCE_COUNT)
PROCEDURE_INSTANCE_COUNT)Total number of procedure calls required to process entire dataset. Use multiple calls when a single call cannot complete within 60 minutes.
When to use multiple calls:
Data source takes longer than 60 minutes to scan
Very large datasets (millions of records)
Complex nested structures requiring more processing time
Configuration:
Set to
1for single procedure call (default)Set to
2or more to split scan across multiple callsEach call must complete within 60-minute timeout
Calls execute sequentially via Snowflake Tasks
Important constraints:
WHERE clause not available - Cannot use
SCAN_WHERE_CLAUSEwhenPROCEDURE_INSTANCE_COUNT > 1ORDER BY required - Must specify
SCAN_ORDER_BYwhenPROCEDURE_INSTANCE_COUNT > 1Record count required - Must set
PROCEDURE_INSTANCE_ROW_COUNTwhenPROCEDURE_INSTANCE_COUNT > 1
Note: Start with single call (1) and monitor scan duration. Use multiple calls only if scans exceed 60 minutes.
Record Count Per Procedure Call (PROCEDURE_INSTANCE_ROW_COUNT)
PROCEDURE_INSTANCE_ROW_COUNT)Number of records processed per procedure call. Required when PROCEDURE_INSTANCE_COUNT > 1.
Calculation:
Total records ÷ Number of procedure calls = Records per call
Must be large enough to process all rows
Cannot be so large that some calls have zero rows
Example:
Dataset: 2,000,000 records, 2 procedure calls
Valid: 1,000,000 records per call (2,000,000 ÷ 2 = 1,000,000)
Invalid: 500,000 records per call (would only process 1,000,000 total)
Invalid: 2,500,000 records per call (exceeds total, would leave empty calls)
Note: Calculate: Total records ÷ Number of calls. Add small buffer (5-10%) to account for data growth.
Order By (SCAN_ORDER_BY)
SCAN_ORDER_BY)One or more attributes used to order the data source.
When required:
PROCEDURE_INSTANCE_COUNT > 1- Ensures consistent record partitioning across calls, prevents duplicate or missed records
When optional:
PROCEDURE_INSTANCE_COUNT = 1andSCAN_RECORD_LIMIT > 0- Can improve performance for limited scans
When not required:
PROCEDURE_INSTANCE_COUNT = 1andSCAN_RECORD_LIMIT = 0- Full table scan in single call
Format examples:
Single attribute:
json_data:"_id"orcustomer_idMultiple attributes:
customer_id, json_data:"_id"
Note: Use unique or nearly-unique attributes. Prefer indexed columns for performance.
Record Count Per Thread Worker Process (THREAD_PROCESS_RECORD_COUNT)
THREAD_PROCESS_RECORD_COUNT)Number of records passed to each thread worker process during a scan procedure call. Controls how much data is loaded into memory at one time.
Memory calculation:
Default behavior:
Automatically set based on warehouse size and data complexity
Typically defaults to 1,000 records per thread
Can be adjusted if memory errors occur
Optimization:
Reduce if experiencing memory errors
Increase if memory allows and you want faster processing
Balance between memory usage and processing efficiency
Note: Start with default value. Reduce if memory errors occur during scanning.
Last updated
Was this helpful?