Adding Data Sources

Methods and requirements for adding data sources to DataPancake.

Overview

Data sources can be added to DataPancake through two methods: the application interface or a SQL file. Both methods require appropriate Snowflake privileges to be granted first.

Prerequisites

Required Privileges

Before adding a data source, you must grant DataPancake the necessary privileges to access the source object.

For Standard Databases:

GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT REFERENCES, SELECT ON <object_type> <database_name>.<schema_name>.<object_name> TO APPLICATION PANCAKE;

For Shared Databases:

GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;
GRANT IMPORTED PRIVILEGES ON DATABASE <database_name> TO APPLICATION PANCAKE;

Object Types:

TABLE for tables
VIEW for views
MATERIALIZED VIEW for materialized views
EXTERNAL TABLE for external tables
DYNAMIC TABLE for dynamic tables
ICEBERG TABLE for Iceberg tables

Privilege Verification

The system validates privileges when:

Creating a new data source
Saving data source settings
Attempting to scan the data source

If privileges are missing, you'll receive an error message with the exact SQL statements needed to grant access.

Adding from Application Interface

The application interface provides a guided experience for adding data sources with validation and error handling.

Navigate to Data Sources

Open the DataPancake application
Navigate to "Data Source Management" → "Data Sources"

Select "New" Action

Use the sidebar action selector
Choose "New" to create a new data source

Configure Basic Settings

Enter Data Source Name (required, unique)
Select Data Source Type: Semi-Structured or Structured
Set Status: Active, Inactive, or Deleted

Configure Source Object (Semi-Structured)

Select Object Type (Table, View, External Table, etc.)
Select Format Type (JSON, Avro, Parquet, ORC, XML)
Select Column Data Type (VARIANT or String)
Select Database
Select Schema
Select Object Name
Enter Column Name (VARIANT/String column name)

Configure Source Object (Structured)

Select Object Type
Select Database
Select Schema
Select Object Name

Optional Configuration

Add Data Source Tags
Configure Product Tiers/Features
Add Schema Sample (for Semi-Structured)
Configure Materialization Settings (if SQL Code Generation enabled)

Save Data Source

Click "Save Data Source" button
System validates connection and required fields
Success message confirms data source creation

Validation

The application validates:

Required Fields - All required fields must be provided
Unique Name - Data source name must be unique
Connection - System attempts to connect to source object
Privileges - Verifies required privileges are granted

Error Handling

Connection Failures:

System displays specific error message
Provides SQL statements to grant required privileges
Allows correction and retry

Validation Errors:

Missing required fields are highlighted
Error messages explain what's needed
Sidebar shows list of missing fields

Duplicate Names:

Error message indicates duplicate name
Must choose a different name

Adding via SQL file

SQL file addition enables programmatic data source creation using stored procedures. This method is ideal for bulk operations, automation, and integration with other systems.

Stored Procedure: core.add_datasource_with_scan

The primary stored procedure for adding data sources with optional initial scan.

Signature:

CALL core.add_datasource_with_scan(
    DATASOURCE_NAME string,
    DATASOURCE_TYPE string,
    DATASOURCE_FORMAT string,
    ATTRIBUTE_CREATE_TYPE string,
    DATASOURCE_SCHEMA_SAMPLE string,
    SOURCE_DATA_PLATFORM string,
    OBJECT_TYPE string,
    OBJECT_DATABASE_NAME string,
    OBJECT_SCHEMA_NAME string,
    OBJECT_NAME string,
    OBJECT_COLUMN_DATA_TYPE string,
    OBJECT_COLUMN_NAME string,
    INITIATE_SCAN boolean,
    SCAN_WAREHOUSE_NAME string,
    SCAN_RECORD_LIMIT int,
    BASE_SCAN_WAREHOUSE_THREAD_COUNT int,
    BASE_SCAN_RECORDS_PER_MINUTE int,
    DATASOURCE_TAG string
)

Parameters

Required Parameters:

DATASOURCE_NAME - Unique name for the data source
DATASOURCE_TYPE - 'Semi-Structured' or 'Structured'
DATASOURCE_FORMAT - Format type: 'JSON', 'Avro', 'Parquet', 'ORC', or 'XML' (Semi-Structured only)
ATTRIBUTE_CREATE_TYPE - 'Discover' (scan data) or 'Schema' (use schema sample)
OBJECT_TYPE - 'Table', 'View', 'External Table', 'Dynamic Table', 'Materialized View', or 'Iceberg Table'
OBJECT_DATABASE_NAME - Database name containing the source object
OBJECT_SCHEMA_NAME - Schema name containing the source object
OBJECT_NAME - Name of the source object
OBJECT_COLUMN_NAME - Column name (required for Semi-Structured)
OBJECT_COLUMN_DATA_TYPE - 'Variant' or 'String' (required for Semi-Structured)

Optional Parameters:

DATASOURCE_SCHEMA_SAMPLE - JSON/XML schema sample (for Schema attribute creation type)
SOURCE_DATA_PLATFORM - 'Kafka' or 'DynamoDB' (for streaming platforms)
INITIATE_SCAN - TRUE to perform initial scan, FALSE to skip
SCAN_WAREHOUSE_NAME - Warehouse name for initial scan
SCAN_RECORD_LIMIT - Number of records to scan (0 = unlimited, default 150 for quick scan)
BASE_SCAN_WAREHOUSE_THREAD_COUNT - Thread count for baseline (default 8)
BASE_SCAN_RECORDS_PER_MINUTE - Records per minute for baseline (default 10000)
DATASOURCE_TAG - Optional tags for organization

Example: Basic Semi-Structured Data Source

DECLARE 
    datasource_name VARCHAR DEFAULT 'PRODUCTION.MEDICAL_DEVICE.UDI_JSON'; 
    object_type VARCHAR DEFAULT 'Table'; 
    database_name VARCHAR DEFAULT 'PRODUCTION'; 
    schema_name VARCHAR DEFAULT 'MEDICAL_DEVICE'; 
    object_name VARCHAR DEFAULT 'UDI_JSON'; 
    column_name VARCHAR DEFAULT 'JSON_DATA'; 
    datasource_tag VARCHAR DEFAULT ''; 
    warehouse_name VARCHAR DEFAULT 'PANCAKE_X_SMALL_01'; 
    record_limit INT DEFAULT 150; 
    initiate_scan BOOLEAN DEFAULT TRUE; 
    call_result VARCHAR;
BEGIN 
    CALL PANCAKE.core.add_datasource_with_scan( 
        :datasource_name,
        'Semi-Structured',
        'JSON',
        'Discover',
        '',
        '',
        :object_type,
        :database_name,
        :schema_name,
        :object_name,
        'Variant',
        :column_name,
        :initiate_scan,
        :warehouse_name,
        :record_limit,
        8,
        10000,
        :datasource_tag
    ) INTO :call_result;
    
    RETURN :call_result;
END;

Example: Full Scan on Creation

DECLARE 
    datasource_name VARCHAR DEFAULT 'PRODUCTION.MEDICAL_DEVICE.UDI_JSON'; 
    warehouse_name VARCHAR DEFAULT 'PANCAKE_MEDIUM_OPTIMIZED_01'; 
    record_limit INT DEFAULT 0; -- 0 = unlimited
    call_result VARCHAR;
BEGIN 
    CALL PANCAKE.core.add_datasource_with_scan( 
        :datasource_name,
        'Semi-Structured',
        'JSON',
        'Discover',
        '',
        '',
        'Table',
        'PRODUCTION',
        'MEDICAL_DEVICE',
        'UDI_JSON',
        'Variant',
        'JSON_DATA',
        TRUE,
        :warehouse_name,
        :record_limit,
        8,
        10000,
        ''
    ) INTO :call_result;
    
    RETURN :call_result;
END;

Example: Using Schema Sample

DECLARE 
    schema_sample VARCHAR DEFAULT '{"id": "string", "name": "string", "items": [{"item_id": "string"}]}';
    call_result VARCHAR;
BEGIN 
    CALL PANCAKE.core.add_datasource_with_scan( 
        'PRODUCTION.MEDICAL_DEVICE.UDI_JSON',
        'Semi-Structured',
        'JSON',
        'Schema', -- Use schema sample
        :schema_sample,
        '',
        'Table',
        'PRODUCTION',
        'MEDICAL_DEVICE',
        'UDI_JSON',
        'Variant',
        'JSON_DATA',
        FALSE, -- Don't initiate scan
        '',
        0,
        8,
        10000,
        ''
    ) INTO :call_result;
    
    RETURN :call_result;
END;

Example: Structured Data Source

DECLARE 
    call_result VARCHAR;
BEGIN 
    CALL PANCAKE.core.add_datasource_with_scan( 
        'PRODUCTION.CUSTOMER_DATA.ORDERS_TABLE',
        'Structured',
        '', -- No format for structured
        'Discover',
        '',
        '',
        'Table',
        'PRODUCTION',
        'CUSTOMER_DATA',
        'ORDERS_TABLE',
        '', -- No column for structured
        '', -- No column name for structured
        TRUE,
        'PANCAKE_X_SMALL_01',
        150,
        8,
        10000,
        ''
    ) INTO :call_result;
    
    RETURN :call_result;
END;

Return Values

The stored procedure returns a string indicating success or failure:

Success: Returns success message
Error: Returns error message with details
Connection Failed: Returns connection error with privilege grant statements
Duplicate Name: Returns error indicating duplicate data source name

Bulk Data Source Creation

You can use SQL scripts to find and create data sources in bulk:

-- Example: Find all tables with VARIANT columns and generate add statements
SELECT 
    'CALL PANCAKE.core.add_datasource_with_scan(' ||
    '''' || table_catalog || '.' || table_schema || '.' || table_name || ''',' ||
    '''Semi-Structured'',' ||
    '''JSON'',' ||
    '''Discover'',' ||
    '''',' ||
    '''',' ||
    '''Table'',' ||
    '''' || table_catalog || ''',' ||
    '''' || table_schema || ''',' ||
    '''' || table_name || ''',' ||
    '''Variant'',' ||
    '''' || column_name || ''',' ||
    'TRUE,' ||
    '''PANCAKE_X_SMALL_01'',' ||
    '150,' ||
    '8,' ||
    '10000,' ||
    ''''');' AS add_statement
FROM information_schema.columns
WHERE data_type = 'VARIANT'
AND table_schema NOT LIKE 'PANCAKE%';

Initial Quick Scan

When adding a data source, you can optionally perform an initial "quick scan" to discover the schema immediately.

Quick Scan Configuration

Default Settings:

Record Limit: 150 records
Warehouse: PANCAKE_X_SMALL_01 (or specified warehouse)
Attribute Create Type: Discover (scans actual data)

Quick Scan Benefits

Immediate Schema Discovery — See schema structure right away
Fast Onboarding — Quick validation of data source setup
Cost Effective — Small record limit keeps costs low
Baseline Setup — Establishes initial attribute metadata

When to Skip Initial Scan

Skip the initial scan when:

You have a schema sample and want to use Schema attribute creation type
You want to configure materialization settings first
You're creating data sources in bulk and will scan later
You want to review configuration before scanning

Full Scan During Creation

To perform a full scan when creating a data source:

Set SCAN_RECORD_LIMIT to 0 (unlimited) or a number greater than total records
Use a Medium or larger warehouse (e.g., PANCAKE_MEDIUM_OPTIMIZED_01)
Ensure warehouse has sufficient resources for the data size

Note: Full scans during creation may take significant time and cost. Consider using quick scan first, then performing full scans via scan configurations.

Troubleshooting

Connection Failed

Verify privileges are granted correctly
Check object exists and is accessible
For shared databases, use GRANT IMPORTED PRIVILEGES
Verify database and schema names are correct

Duplicate Name

Choose a different, unique name
Check existing data sources for similar names
Use more specific naming with database/schema context

Invalid Format

Verify format type matches actual data format
Check column data type (VARIANT vs String)
For String columns, ensure parsing expression is correct

Scan Failures

Verify warehouse exists and is accessible
Check warehouse size is appropriate for data volume
Review scan error messages for specific issues
Start with smaller record limits

Summary

Methods:

Application Interface (guided experience)
SQL file (programmatic/bulk)

Requirements:

Appropriate Snowflake privileges
Valid source object
Unique data source name

Initial Scan:

Optional quick scan (150 records default)
Can skip and scan later
Can perform full scan on creation

Note: Start with quick scan (default 150 records), then configure and perform full scans via scan configurations.

For information on data source types, see Data Source Types. For basic configuration, see Basic Configuration Settings.

PreviousData Sources NextBaseline Scan Settings

Last updated 1 month ago

Was this helpful?

Good night

hashtagOverview

hashtagPrerequisites

hashtagRequired Privileges

hashtagPrivilege Verification

hashtagAdding from Application Interface

hashtagNavigate to Data Sources

hashtagSelect "New" Action

hashtagConfigure Basic Settings

hashtagConfigure Source Object (Semi-Structured)

hashtagConfigure Source Object (Structured)

hashtagOptional Configuration

hashtagSave Data Source

hashtagValidation

hashtagError Handling

hashtagAdding via SQL file

hashtagStored Procedure: core.add_datasource_with_scan

hashtagParameters

hashtagExample: Basic Semi-Structured Data Source

hashtagExample: Full Scan on Creation

hashtagExample: Using Schema Sample

hashtagExample: Structured Data Source

hashtagReturn Values

hashtagBulk Data Source Creation

hashtagInitial Quick Scan

hashtagQuick Scan Configuration

hashtagQuick Scan Benefits

hashtagWhen to Skip Initial Scan

hashtagFull Scan During Creation

hashtagTroubleshooting

hashtagSummary