How to Manage a Scan Configuration

Create or modify a scan configuration for a data source.

Configure Required Scan Configuration Settings

1. Navigate to the Scan Configurations Page

2. Select a data source

3. Enter the Configuration Name

4. Select the Virtual Warehouse

5. Ensure the Warehouse is Connected

6. (Optional) Configure additional settings

Base Configuration Settings

Scan Schedule Settings

Data Source Settings

Vertical Scale Settings

Source Stream Settings

7. Save the scan configuration


Optional Base Configuration Settings

Set the Attribute Create Type

The default Discover create type uses scanned data to create attribute metadata.

The Schema create type uses the Data Source Object Schema Sample data to create the attribute metadata.

Update the Scan Configuration Status

Enable Auto Code Generate

If enabled, DataPancake will automatically generate dynamic table SQL code if the data source's schema or polymorphic state changes.


(Optional) Scan Schedule Settings

1. Enable a Scan Schedule

2. Enter a Cron Schedule

A cron schedule is required if you enable scheduling. See the examples below for valid cron schedules:

Hourly at 30 minutes after the hour

Daily at 3:00 am

Weekly at 3:00 am every Monday

Monthly at 3:00 am on the 1st of each month

3. Select a Cron Time Zone

A cron timezone is required in addition to the cron schedule if you enable scheduling.


(Optional) Data Source Settings

Enter a Record Limit

Enter a Where Clause

The where clause can only be used with a single procedure call.


(Optional) Vertical Scale Settings

Modify the Number of Threads

The number of threads will default to the maximum number of threads available to the virtual warehouse chosen.

Modify the Number of Procedure Calls

The number of scan procedure calls required to process the entire dataset.

Use multiple procedure calls when a single call cannot be completed in under sixty minutes which is the default timeout for the Snowpark Python Sandbox.

The where clause parameter is not available if the number of procedure calls is greater than 1.

Modify the Record Count Per Procedure Call

The record count per procedure is required when the number of procedure calls is greater than 1.

The record count chosen needs to be large enough to process all the rows in the data source but not too large as to create procedure calls with no rows to process based on the criteria you have provided and the number of calls entered.

Ex. 2,000,000 rows can successfully be divided into two calls with 1m records per call. But it cannot be divided into 2 calls with 500,000 rows per call or 3 calls with 2.5m rows per call.


(Optional) Source Stream Settings

Modify the Last Scanned Timestamp

The last timestamp scanned. This value will be used as part of the where clause when scanning data from this data source. To scan the entire datasource remove the timestamp if one exists.

Last updated

Was this helpful?