How to Create Multiple DataPancake Data Sources (Script Builder)

Build a script to create and initiate scans for multiple data sources in DataPancake.

This script builder is not available for the following scenarios:

  • Data sources with a pre-configured schema

  • Data sources that contain stream data such as Kafka

See "How to Create a DataPancake Data Source (UI)" if needed.

1. Navigate to the "Data Source" script builder

2. Click the "Multiple Data Sources" tab

3. (Optional) Change the app database name

The default is DATAPANCAKE

4. Select the warehouse to be used by DataPancake to scan data

5. Select the Semi-structured Data Format Type

6. Enter a record limit for the initial scan

If you want to scan all records, set the limit to 0

7. Enter the role used to install DataPancake

8. Enter the role which has the select privilege for the database object to be scanned

9. Enter the Snowflake database name that contains the data

If the name is case sensitive, put the name in double quotes such as "EXAMPLE_NAME"

10. Enter the name of the schema which contains the database objects to be scanned

If the name is case sensitive, put the name in double quotes such as "EXAMPLE_NAME"

11. Click the "Generate Script - Step 1" button

12. Copy the script

13. Open a new Snowflake worksheet and paste the script

14. Review the script & add modifications as needed

Filtering

Add additional filters to the where clause to limit the amount of rows returned.

Additionally, choose which table code blocks are included in step 2 by selecting code blocks in individual rows to create multiple data sources for.

Examples of additional columns that can be added to the where clause are: columns.table_name, columns.column_name


Format Type

If the source tables to be scanned store a mix of different semi-structured data format types (such as JSON and XML), then the where clause will need to be modified to return only the tables storing data that match the data format type that was specified.

If the where clause is not used to further limit the result then all VARIANT columns will be returned. In this case, select the code blocks for tables that match the data format type specified.

Warehouse

When executing a script for multiple scans, take into consideration the size of the source table and the number of rows being scanned.

If the option to initiate the scan has been selected, then the specified warehouse will call the DataPancake procedure (which creates data sources and starts scans) synchronously for all tables that you choose to add to the step 2 script.

Optionally, change the name of the warehouse specified in the select statement to change the scan warehouse so that sets of procedure calls are executed on different warehouses in the step 2 script enabling the work to be distributed in parallel.

15. Execute the script

Use the blue dropdown arrow at the top right and click "Run All"

16. Ensure successful execution by examining the result set

17. Scroll to the GRANT_STATEMENT_CODE_BLOCK column of the result set

18. Copy the grant statements for the data sources

19. In DataPancake, navigate to step 2 of the script builder

20. Paste the SQL grant statements

21. Copy the create statements

22. In DataPancake, paste the SQL procedure statements

23. Click the "Generate Script - Step 2" button

24. Copy the script

25. In a Snowflake worksheet, paste the script

26. Review & Execute the script

27. Ensure successful execution by examining the call result column in the result set

Any errors that occur will be detailed in the call result column.

For example, if you tried to scan XML data sources as JSON, you will see a detailed error message mentioning The data source connection failed... Invalid JSON Object

Fix and rerun any failed statements as needed.

An example result set for multiple data sources. Notice how the XML data sources failed if we tried to scan them as JSON.

Last updated