Adding Data Sources

Methods and requirements for adding data sources to DataPancake.

Overview

Data sources can be added to DataPancake through two methods: the application interface or SQL worksheet. Both methods require appropriate Snowflake privileges to be granted first.


Prerequisites

Required Privileges

Before adding a data source, you must grant DataPancake the necessary privileges to access the source object.

For Standard Databases:

GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT REFERENCES, SELECT ON <object_type> <database_name>.<schema_name>.<object_name> TO APPLICATION PANCAKE;

For Shared Databases:

GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT IMPORTED PRIVILEGES ON DATABASE <database_name> TO APPLICATION PANCAKE;

Object Types:

  • TABLE for tables

  • VIEW for views

  • MATERIALIZED VIEW for materialized views

  • EXTERNAL TABLE for external tables

  • DYNAMIC TABLE for dynamic tables

  • ICEBERG TABLE for Iceberg tables

Privilege Verification

The system validates privileges when:

  • Creating a new data source

  • Saving data source settings

  • Attempting to scan the data source

If privileges are missing, you'll receive an error message with the exact SQL statements needed to grant access.


Adding from Application Interface

The application interface provides a guided experience for adding data sources with validation and error handling.

1
  • Open the DataPancake application

  • Navigate to "Data Source Management" → "Data Sources"

2

Select "New" Action

  • Use the sidebar action selector

  • Choose "New" to create a new data source

3

Configure Basic Settings

  • Enter Data Source Name (required, unique)

  • Select Data Source Type: Semi-Structured or Structured

  • Set Status: Active, Inactive, or Deleted

4

Configure Source Object (Semi-Structured)

  • Select Object Type (Table, View, External Table, etc.)

  • Select Format Type (JSON, Avro, Parquet, ORC, XML)

  • Select Column Data Type (VARIANT or String)

  • Select Database

  • Select Schema

  • Select Object Name

  • Enter Column Name (VARIANT/String column name)

5

Configure Source Object (Structured)

  • Select Object Type

  • Select Database

  • Select Schema

  • Select Object Name

6

Optional Configuration

  • Add Data Source Tags

  • Configure Product Tiers/Features

  • Add Schema Sample (for Semi-Structured)

  • Configure Materialization Settings (if SQL Code Generation enabled)

7

Save Data Source

  • Click "Save Data Source" button

  • System validates connection and required fields

  • Success message confirms data source creation

Validation

The application validates:

  • Required Fields - All required fields must be provided

  • Unique Name - Data source name must be unique

  • Connection - System attempts to connect to source object

  • Privileges - Verifies required privileges are granted

Error Handling

Connection Failures:

  • System displays specific error message

  • Provides SQL statements to grant required privileges

  • Allows correction and retry

Validation Errors:

  • Missing required fields are highlighted

  • Error messages explain what's needed

  • Sidebar shows list of missing fields

Duplicate Names:

  • Error message indicates duplicate name

  • Must choose a different name


Adding via SQL Worksheet

SQL worksheet addition enables programmatic data source creation using stored procedures. This method is ideal for bulk operations, automation, and integration with other systems.

Stored Procedure: core.add_datasource_with_scan

The primary stored procedure for adding data sources with optional initial scan.

Signature:

Parameters

Required Parameters:

  • DATASOURCE_NAME - Unique name for the data source

  • DATASOURCE_TYPE - 'Semi-Structured' or 'Structured'

  • DATASOURCE_FORMAT - Format type: 'JSON', 'Avro', 'Parquet', 'ORC', or 'XML' (Semi-Structured only)

  • ATTRIBUTE_CREATE_TYPE - 'Discover' (scan data) or 'Schema' (use schema sample)

  • OBJECT_TYPE - 'Table', 'View', 'External Table', 'Dynamic Table', 'Materialized View', or 'Iceberg Table'

  • OBJECT_DATABASE_NAME - Database name containing the source object

  • OBJECT_SCHEMA_NAME - Schema name containing the source object

  • OBJECT_NAME - Name of the source object

  • OBJECT_COLUMN_NAME - Column name (required for Semi-Structured)

  • OBJECT_COLUMN_DATA_TYPE - 'Variant' or 'String' (required for Semi-Structured)

Optional Parameters:

  • DATASOURCE_SCHEMA_SAMPLE - JSON/XML schema sample (for Schema attribute creation type)

  • SOURCE_DATA_PLATFORM - 'Kafka' or 'DynamoDB' (for streaming platforms)

  • INITIATE_SCAN - TRUE to perform initial scan, FALSE to skip

  • SCAN_WAREHOUSE_NAME - Warehouse name for initial scan

  • SCAN_RECORD_LIMIT - Number of records to scan (0 = unlimited, default 150 for quick scan)

  • BASE_SCAN_WAREHOUSE_THREAD_COUNT - Thread count for baseline (default 8)

  • BASE_SCAN_RECORDS_PER_MINUTE - Records per minute for baseline (default 10000)

  • DATASOURCE_TAG - Optional tags for organization

Example: Basic Semi-Structured Data Source

Example: Full Scan on Creation

Example: Using Schema Sample

Example: Structured Data Source

Return Values

The stored procedure returns a string indicating success or failure:

  • Success: Returns success message

  • Error: Returns error message with details

  • Connection Failed: Returns connection error with privilege grant statements

  • Duplicate Name: Returns error indicating duplicate data source name

Bulk Data Source Creation

You can use SQL scripts to find and create data sources in bulk:


Initial Quick Scan

When adding a data source, you can optionally perform an initial "quick scan" to discover the schema immediately.

Quick Scan Configuration

Default Settings:

  • Record Limit: 150 records

  • Warehouse: PANCAKE_X_SMALL_01 (or specified warehouse)

  • Attribute Create Type: Discover (scans actual data)

Quick Scan Benefits

  • Immediate Schema Discovery — See schema structure right away

  • Fast Onboarding — Quick validation of data source setup

  • Cost Effective — Small record limit keeps costs low

  • Baseline Setup — Establishes initial attribute metadata

When to Skip Initial Scan

Skip the initial scan when:

  • You have a schema sample and want to use Schema attribute creation type

  • You want to configure materialization settings first

  • You're creating data sources in bulk and will scan later

  • You want to review configuration before scanning

Full Scan During Creation

To perform a full scan when creating a data source:

  • Set SCAN_RECORD_LIMIT to 0 (unlimited) or a number greater than total records

  • Use a Medium or larger warehouse (e.g., PANCAKE_MEDIUM_OPTIMIZED_01)

  • Ensure warehouse has sufficient resources for the data size

Note: Full scans during creation may take significant time and cost. Consider using quick scan first, then performing full scans via scan configurations.



Troubleshooting

Connection Failed
  • Verify privileges are granted correctly

  • Check object exists and is accessible

  • For shared databases, use GRANT IMPORTED PRIVILEGES

  • Verify database and schema names are correct

Duplicate Name
  • Choose a different, unique name

  • Check existing data sources for similar names

  • Use more specific naming with database/schema context

Invalid Format
  • Verify format type matches actual data format

  • Check column data type (VARIANT vs String)

  • For String columns, ensure parsing expression is correct

Scan Failures
  • Verify warehouse exists and is accessible

  • Check warehouse size is appropriate for data volume

  • Review scan error messages for specific issues

  • Start with smaller record limits


Summary

Methods:

  • Application Interface (guided experience)

  • SQL Worksheet (programmatic/bulk)

Requirements:

  • Appropriate Snowflake privileges

  • Valid source object

  • Unique data source name

Initial Scan:

  • Optional quick scan (150 records default)

  • Can skip and scan later

  • Can perform full scan on creation

Note: Start with quick scan (default 150 records), then configure and perform full scans via scan configurations.

For information on data source types, see Data Source Types. For basic configuration, see Basic Configuration Settings.

Last updated

Was this helpful?