DataPancake™ Docs
  • DataPancake™ Documentation
  • Overview
    • What is DataPancake™?
    • Understanding the Challenges of Semi-Structured Data
  • Getting Started
    • How to Guides (Script Builder)
      • Admin Guides
        • How to Purchase & Install DataPancake
        • How to Manage Access to DataPancake
        • How to Grant DataPancake Account & Cortex AI Privileges
        • How to Add Warehouses to DataPancake
      • Data Source Guides
        • How to Create a Single DataPancake Data Source (Script Builder)
        • How to Create Multiple DataPancake Data Sources (Script Builder)
      • Alert Guides
        • How to Create Schema Drift Alerts (Script Builder)
    • Application Overview
      • Data Source Overview
      • Manage Data Source
      • Manage Scan Configuration
      • Scan Data Source
      • Data Source Attributes
      • Dynamic Table SQL
      • Manage Warehouse
  • Topics
    • Warehouses
    • Data Sources
    • Scan Configurations
    • Attributes
    • Scan Processing
    • Views
    • Attribute Consolidation
    • Pricing
  • DEVELOPMENT
    • Release Notes
Powered by GitBook
On this page
  • 1. Navigate to the "Data Source" script builder
  • 2. Click the "Multiple Data Sources" tab
  • 3. (Optional) Change the app database name
  • 4. Select the warehouse to be used by DataPancake to scan data
  • 5. Select the Semi-structured Data Format Type
  • 6. Enter a record limit for the initial scan
  • 7. Enter the role used to install DataPancake
  • 8. Enter the role which has the select privilege for the database object to be scanned
  • 9. Enter the Snowflake database name that contains the data
  • 10. Enter the name of the schema which contains the database objects to be scanned
  • 11. Click the "Generate Script - Step 1" button
  • 12. Copy the script
  • 13. Open a new Snowflake worksheet and paste the script
  • 14. Review the script & add modifications as needed
  • 15. Execute the script
  • 16. Ensure successful execution by examining the result set
  • 17. Scroll to the GRANT_STATEMENT_CODE_BLOCK column of the result set
  • 18. Copy the grant statements for the data sources
  • 19. In DataPancake, navigate to step 2 of the script builder
  • 20. Paste the SQL grant statements
  • 21. Copy the create statements
  • 22. In DataPancake, paste the SQL procedure statements
  • 23. Click the "Generate Script - Step 2" button
  • 24. Copy the script
  • 25. In a Snowflake worksheet, paste the script
  • 26. Review & Execute the script
  • 27. Ensure successful execution by examining the call result column in the result set
  1. Getting Started
  2. How to Guides (Script Builder)
  3. Data Source Guides

How to Create Multiple DataPancake Data Sources (Script Builder)

Build a script to create and initiate scans for multiple data sources in DataPancake.

Last updated 6 hours ago

Please ensure the have been completed before creating your first data source.

This script builder is not available for the following scenarios:

  • Data sources with a pre-configured schema

  • Data sources that contain stream data such as Kafka

See "How to Create a DataPancake Data Source (UI)" if needed.

1. Navigate to the "Data Source" script builder

2. Click the "Multiple Data Sources" tab

3. (Optional) Change the app database name

The default is DATAPANCAKE

4. Select the warehouse to be used by DataPancake to scan data

5. Select the Semi-structured Data Format Type

6. Enter a record limit for the initial scan

If you want to scan all records, set the limit to 0

7. Enter the role used to install DataPancake

8. Enter the role which has the select privilege for the database object to be scanned

9. Enter the Snowflake database name that contains the data

If the name is case sensitive, put the name in double quotes such as "EXAMPLE_NAME"

10. Enter the name of the schema which contains the database objects to be scanned

If the name is case sensitive, put the name in double quotes such as "EXAMPLE_NAME"

11. Click the "Generate Script - Step 1" button

12. Copy the script

13. Open a new Snowflake worksheet and paste the script

14. Review the script & add modifications as needed

Filtering

Add additional filters to the where clause to limit the amount of rows returned.

Additionally, choose which table code blocks are included in step 2 by selecting code blocks in individual rows to create multiple data sources for.

Examples of additional columns that can be added to the where clause are: columns.table_name, columns.column_name


Format Type

If the source tables to be scanned store a mix of different semi-structured data format types (such as JSON and XML), then the where clause will need to be modified to return only the tables storing data that match the data format type that was specified.

If the where clause is not used to further limit the result then all VARIANT columns will be returned. In this case, select the code blocks for tables that match the data format type specified.

If a code block is copied and used in the step 2 script for a table that stores JSON data, and XML has been specified for the data format type, the step 2 script will produce an error indicating an invalid data format type and the data source will not be created for that table.

Warehouse

When executing a script for multiple scans, take into consideration the size of the source table and the number of rows being scanned.

If the option to initiate the scan has been selected, then the specified warehouse will call the DataPancake procedure (which creates data sources and starts scans) synchronously for all tables that you choose to add to the step 2 script.

Optionally, change the name of the warehouse specified in the select statement to change the scan warehouse so that sets of procedure calls are executed on different warehouses in the step 2 script enabling the work to be distributed in parallel.

15. Execute the script

Use the blue dropdown arrow at the top right and click "Run All"

16. Ensure successful execution by examining the result set

17. Scroll to the GRANT_STATEMENT_CODE_BLOCK column of the result set

18. Copy the grant statements for the data sources

19. In DataPancake, navigate to step 2 of the script builder

20. Paste the SQL grant statements

21. Copy the create statements

22. In DataPancake, paste the SQL procedure statements

23. Click the "Generate Script - Step 2" button

24. Copy the script

25. In a Snowflake worksheet, paste the script

26. Review & Execute the script

27. Ensure successful execution by examining the call result column in the result set

Any errors that occur will be detailed in the call result column.

For example, if you tried to scan XML data sources as JSON, you will see a detailed error message mentioning The data source connection failed... Invalid JSON Object

Fix and rerun any failed statements as needed.

An example result set for multiple data sources. Notice how the XML data sources failed if we tried to scan them as JSON.
Admin Guides