Data Source & Warehouse Settings

Configure record limits, WHERE clauses, and warehouse selection for scan operations.

Data Source Settings

Record Limit (SCAN_RECORD_LIMIT)

  • Limits number of records scanned. Set to 0 for unlimited (full scan).

  • Use cases: Quick scans (e.g., 10,000 for rapid discovery), full scans (0), testing (small limits), sampling

  • Monitor scan duration to ensure it stays under 60 minutes


Where Clause (SCAN_WHERE_CLAUSE)

  • SQL WHERE clause to filter records during scanning

  • Constraint: Only available when PROCEDURE_INSTANCE_COUNT = 1 (cannot use with multiple procedure calls)

  • Use cases: Incremental scanning (timestamp filters), partitioned scanning, data filtering

Examples:

WHERE created_date >= CURRENT_DATE - 7
WHERE partition_key = '2024-01'
WHERE status = 'active'

Note: Combine with SOURCE_STREAM_LAST_SCANNED_TIMESTAMP for delta scans. Test WHERE clauses before enabling scheduled scans.


Warehouse Settings

Virtual Warehouse

  • Snowflake virtual warehouse assigned to this scan configuration

  • Each configuration must be assigned to exactly one warehouse

  • Only warehouses registered in DataPancake are available

  • Warehouse must be connected and active

  • Each warehouse can only run one scan at a time

  • Warehouse size affects scan performance

Warehouse status indicators:

  • Connected - Available and ready

  • Not Connected - Connection issue (cannot save configuration)

  • Inactive - Disabled (warning shown)

  • Deleted - Marked for deletion (cannot save configuration)

Last updated

Was this helpful?