Adding Data Sources
Methods and requirements for adding data sources to DataPancake.
Overview
Data sources can be added to DataPancake through two methods: the application interface or SQL worksheet. Both methods require appropriate Snowflake privileges to be granted first.
Prerequisites
Required Privileges
Before adding a data source, you must grant DataPancake the necessary privileges to access the source object.
For Standard Databases:
GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT REFERENCES, SELECT ON <object_type> <database_name>.<schema_name>.<object_name> TO APPLICATION PANCAKE;For Shared Databases:
GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT IMPORTED PRIVILEGES ON DATABASE <database_name> TO APPLICATION PANCAKE;Object Types:
TABLEfor tablesVIEWfor viewsMATERIALIZED VIEWfor materialized viewsEXTERNAL TABLEfor external tablesDYNAMIC TABLEfor dynamic tablesICEBERG TABLEfor Iceberg tables
Privilege Verification
The system validates privileges when:
Creating a new data source
Saving data source settings
Attempting to scan the data source
If privileges are missing, you'll receive an error message with the exact SQL statements needed to grant access.
Adding from Application Interface
The application interface provides a guided experience for adding data sources with validation and error handling.
Validation
The application validates:
Required Fields - All required fields must be provided
Unique Name - Data source name must be unique
Connection - System attempts to connect to source object
Privileges - Verifies required privileges are granted
Error Handling
Connection Failures:
System displays specific error message
Provides SQL statements to grant required privileges
Allows correction and retry
Validation Errors:
Missing required fields are highlighted
Error messages explain what's needed
Sidebar shows list of missing fields
Duplicate Names:
Error message indicates duplicate name
Must choose a different name
Adding via SQL Worksheet
SQL worksheet addition enables programmatic data source creation using stored procedures. This method is ideal for bulk operations, automation, and integration with other systems.
Stored Procedure: core.add_datasource_with_scan
The primary stored procedure for adding data sources with optional initial scan.
Signature:
Parameters
Required Parameters:
DATASOURCE_NAME- Unique name for the data sourceDATASOURCE_TYPE-'Semi-Structured'or'Structured'DATASOURCE_FORMAT- Format type:'JSON','Avro','Parquet','ORC', or'XML'(Semi-Structured only)ATTRIBUTE_CREATE_TYPE-'Discover'(scan data) or'Schema'(use schema sample)OBJECT_TYPE-'Table','View','External Table','Dynamic Table','Materialized View', or'Iceberg Table'OBJECT_DATABASE_NAME- Database name containing the source objectOBJECT_SCHEMA_NAME- Schema name containing the source objectOBJECT_NAME- Name of the source objectOBJECT_COLUMN_NAME- Column name (required for Semi-Structured)OBJECT_COLUMN_DATA_TYPE-'Variant'or'String'(required for Semi-Structured)
Optional Parameters:
DATASOURCE_SCHEMA_SAMPLE- JSON/XML schema sample (for Schema attribute creation type)SOURCE_DATA_PLATFORM-'Kafka'or'DynamoDB'(for streaming platforms)INITIATE_SCAN-TRUEto perform initial scan,FALSEto skipSCAN_WAREHOUSE_NAME- Warehouse name for initial scanSCAN_RECORD_LIMIT- Number of records to scan (0 = unlimited, default 150 for quick scan)BASE_SCAN_WAREHOUSE_THREAD_COUNT- Thread count for baseline (default 8)BASE_SCAN_RECORDS_PER_MINUTE- Records per minute for baseline (default 10000)DATASOURCE_TAG- Optional tags for organization
Example: Basic Semi-Structured Data Source
Example: Full Scan on Creation
Example: Using Schema Sample
Example: Structured Data Source
Return Values
The stored procedure returns a string indicating success or failure:
Success: Returns success message
Error: Returns error message with details
Connection Failed: Returns connection error with privilege grant statements
Duplicate Name: Returns error indicating duplicate data source name
Bulk Data Source Creation
You can use SQL scripts to find and create data sources in bulk:
Initial Quick Scan
When adding a data source, you can optionally perform an initial "quick scan" to discover the schema immediately.
Quick Scan Configuration
Default Settings:
Record Limit: 150 records
Warehouse: PANCAKE_X_SMALL_01 (or specified warehouse)
Attribute Create Type: Discover (scans actual data)
Quick Scan Benefits
Immediate Schema Discovery — See schema structure right away
Fast Onboarding — Quick validation of data source setup
Cost Effective — Small record limit keeps costs low
Baseline Setup — Establishes initial attribute metadata
When to Skip Initial Scan
Skip the initial scan when:
You have a schema sample and want to use Schema attribute creation type
You want to configure materialization settings first
You're creating data sources in bulk and will scan later
You want to review configuration before scanning
Full Scan During Creation
To perform a full scan when creating a data source:
Set
SCAN_RECORD_LIMITto0(unlimited) or a number greater than total recordsUse a Medium or larger warehouse (e.g.,
PANCAKE_MEDIUM_OPTIMIZED_01)Ensure warehouse has sufficient resources for the data size
Note: Full scans during creation may take significant time and cost. Consider using quick scan first, then performing full scans via scan configurations.
Troubleshooting
Summary
Methods:
Application Interface (guided experience)
SQL Worksheet (programmatic/bulk)
Requirements:
Appropriate Snowflake privileges
Valid source object
Unique data source name
Initial Scan:
Optional quick scan (150 records default)
Can skip and scan later
Can perform full scan on creation
Note: Start with quick scan (default 150 records), then configure and perform full scans via scan configurations.
For information on data source types, see Data Source Types. For basic configuration, see Basic Configuration Settings.
Last updated
Was this helpful?