Vertical Scale Settings

Control thread count, procedure calls, and memory management for optimal scan performance.

Vertical scale settings control how DataPancake utilizes compute resources within a single warehouse. These settings are only available for semi-structured data sources.

Number of Threads

Number of parallel threads used during scanning. Defaults to maximum threads available for chosen warehouse size.

Default behavior:

Automatically set to warehouse maximum
Small: 1-2 threads, Medium: 4-8 threads, Large: 8-16 threads, 2X-Large: 16+ threads

Manual override:

Can be reduced for memory-constrained scenarios
Lower thread count = less memory usage, slower processing
Higher thread count = more memory usage, faster processing (up to warehouse max)

Note: Use default (maximum) for most scenarios. Reduce only if experiencing memory errors.

Number of Procedure Calls (`PROCEDURE_INSTANCE_COUNT`)

Total number of procedure calls required to process entire dataset. Use multiple calls when a single call cannot complete within 60 minutes.

When to use multiple calls:

Data source takes longer than 60 minutes to scan
Very large datasets (millions of records)
Complex nested structures requiring more processing time

Configuration:

Set to 1 for single procedure call (default)
Set to 2 or more to split scan across multiple calls
Each call must complete within 60-minute timeout
Calls execute sequentially via Snowflake Tasks

Important constraints:

WHERE clause not available - Cannot use SCAN_WHERE_CLAUSE when PROCEDURE_INSTANCE_COUNT > 1
ORDER BY required - Must specify SCAN_ORDER_BY when PROCEDURE_INSTANCE_COUNT > 1
Record count required - Must set PROCEDURE_INSTANCE_ROW_COUNT when PROCEDURE_INSTANCE_COUNT > 1

Note: Start with single call (1) and monitor scan duration. Use multiple calls only if scans exceed 60 minutes.

Record Count Per Procedure Call (`PROCEDURE_INSTANCE_ROW_COUNT`)

Number of records processed per procedure call. Required when PROCEDURE_INSTANCE_COUNT > 1.

Calculation:

Total records ÷ Number of procedure calls = Records per call
Must be large enough to process all rows
Cannot be so large that some calls have zero rows

Example:

Dataset: 2,000,000 records, 2 procedure calls
Valid: 1,000,000 records per call (2,000,000 ÷ 2 = 1,000,000)
Invalid: 500,000 records per call (would only process 1,000,000 total)
Invalid: 2,500,000 records per call (exceeds total, would leave empty calls)

Note: Calculate: Total records ÷ Number of calls. Add small buffer (5-10%) to account for data growth.

Order By (`SCAN_ORDER_BY`)

One or more attributes used to order the data source.

When required:

PROCEDURE_INSTANCE_COUNT > 1 - Ensures consistent record partitioning across calls, prevents duplicate or missed records

When optional:

PROCEDURE_INSTANCE_COUNT = 1 and SCAN_RECORD_LIMIT > 0 - Can improve performance for limited scans

When not required:

PROCEDURE_INSTANCE_COUNT = 1 and SCAN_RECORD_LIMIT = 0 - Full table scan in single call

Format examples:

Single attribute: json_data:"_id" or customer_id
Multiple attributes: customer_id, json_data:"_id"

Note: Use unique or nearly-unique attributes. Prefer indexed columns for performance.

Record Count Per Thread Worker Process (`THREAD_PROCESS_RECORD_COUNT`)

Number of records passed to each thread worker process during a scan procedure call. Controls how much data is loaded into memory at one time.

Memory calculation:

Memory Used = Avg document size (bytes) × 2 × Thread Count × Record Count Per Thread

Default behavior:

Automatically set based on warehouse size and data complexity
Typically defaults to 1,000 records per thread
Can be adjusted if memory errors occur

Optimization:

Reduce if experiencing memory errors
Increase if memory allows and you want faster processing
Balance between memory usage and processing efficiency

Note: Start with default value. Reduce if memory errors occur during scanning.

PreviousData Source & Warehouse Settings NextSource Stream Settings

Last updated 2 months ago

Was this helpful?

Good night

hashtagNumber of Threads

hashtagNumber of Procedure Calls (PROCEDURE_INSTANCE_COUNT)

hashtagRecord Count Per Procedure Call (PROCEDURE_INSTANCE_ROW_COUNT)

hashtagOrder By (SCAN_ORDER_BY)

hashtagRecord Count Per Thread Worker Process (THREAD_PROCESS_RECORD_COUNT)

Number of Threads

Number of Procedure Calls (`PROCEDURE_INSTANCE_COUNT`)

Record Count Per Procedure Call (`PROCEDURE_INSTANCE_ROW_COUNT`)

Order By (`SCAN_ORDER_BY`)

Record Count Per Thread Worker Process (`THREAD_PROCESS_RECORD_COUNT`)