Source Object Settings
Configuration for the Snowflake database object that serves as the data source.
Overview
Source Object Settings define the connection to the actual Snowflake object containing your data. These settings specify the database, schema, object, and column (for semi-structured data) that DataPancake will scan and process. Additional settings configure platform-specific options for streaming data sources.
Object Type Selection
The type of Snowflake object that contains your data. DataPancake uses this to construct proper SQL queries and grant statements.
Supported Object Types
For Semi-Structured Data Sources:
Dynamic Table - VARIANT columns only
External Table - JSON format only
Iceberg Table - VARIANT or STRING columns, JSON format only (STRING type supports JSON only, not XML)
Materialized View - VARIANT columns
Table - VARIANT columns
View - VARIANT columns
For Structured Data Sources:
Materialized View
Table
View
Database, Schema, and Object Selection
The three-part identifier for locating your source object in Snowflake.
Database Selection
Dropdown populated from databases accessible to your user (requires USAGE privilege)
Case-sensitive names are automatically quoted
Selection required before schema selection
Schema Selection
Dropdown populated after database selection
Shows schemas in selected database (requires USAGE privilege)
Refreshes when database changes
Case-sensitive names are automatically quoted
Object Name Selection
Dropdown populated after schema selection
Shows objects of selected type in selected schema
Only objects matching selected Object Type are shown
Refreshes when schema or object type changes
Requires SELECT privilege on object (for semi-structured) or appropriate privileges
Case-sensitive names are automatically quoted
Column Configuration (Semi-Structured Only)
For semi-structured data sources, you must specify the column containing the VARIANT or String data.
Column Name
Requirements:
Column must exist in the source object
Column must be of type VARIANT or String (VARCHAR/TEXT)
Required field for semi-structured data sources
Input:
Text field for column name
Case-sensitive if column name is quoted in Snowflake
Can include parsing expressions for String columns
Examples:
Simple column:
JSON_DATAQuoted column:
"JsonData"Parsing expression:
parse_json(RECORD_CONTENT:"fullDocument")
Column Data Type
VARIANT (Recommended)
Native Snowflake semi-structured type
Automatically parses JSON, Avro, Parquet, ORC, XML
Best performance
No parsing expressions needed
String
Text column containing JSON/XML strings
Requires parsing functions in Column Name field
Useful for streaming platforms
For Iceberg tables: STRING type supports JSON only (not XML)
Parsing Expressions (String columns only):
Enter parsing expressions in the Column Name field when using String column type. Example for Kafka/MongoDB:
Format Type (Semi-Structured Only)
The data format stored in the VARIANT or String column. Format type must match actual data format—incorrect format selection may cause scan failures.
Supported Formats
JSON - Standard JSON format
Avro - Apache Avro binary format
Parquet - Columnar storage format
ORC - Optimized Row Columnar format
XML - XML document format
Format Restrictions:
External Tables: JSON format only
Iceberg Tables: JSON format only (STRING type supports JSON only, not XML)
Schema Sample (Semi-Structured Only)
Optional JSON or XML document representing the complete schema of your data source.
Purpose:
Use "Schema" attribute create type in scan configurations (skips data scanning)
Faster initial setup for prototyping and testing
Can be updated when full scans are performed
Requirements:
Valid JSON or XML document
Should represent complete schema structure with all attributes and nested structures
Can be a single representative record
Usage: When Attribute Create Type is set to "Schema" in scan configurations, the system uses the schema sample instead of scanning data.
Platform/Application Settings (Semi-Structured Only)
Configuration for data sources from streaming platforms or document databases.
Source Platform/Application
Supported Platforms:
Kafka — Apache Kafka streaming platform
DynamoDB — Amazon DynamoDB document database
Other — Custom platforms can be configured
Purpose:
Identifies data source platform for specialized handling
Enables platform-specific features (e.g., stream message metadata inclusion)
Selection:
Dropdown of available platforms
Optional field (can be left blank)
Only relevant for streaming/document database sources
Data Deduplication
Toggle: Data Deduplication (By Primary Key)
When Enabled:
Creates two dynamic tables:
First table: Flattens root attributes
Second table: Filters to most recent record per primary key
Requirements:
SQL Code Generation feature must be enabled
Deduplication SQL Expression must be provided
Use Cases:
Streaming data sources with duplicate records
Need to maintain only latest record per key
Deduplication SQL Expression
Format:
QUALIFY clause with ROW_NUMBER() window function
PARTITION BY primary key
ORDER BY sort field (typically timestamp or offset)
Example — Kafka (with Stream Metadata):
Example — Timestamp-based:
Requirements:
Required when Data Deduplication is enabled
Must use valid Snowflake SQL syntax
Should reference attributes available in flattened data
Source Object Validation
The system validates source object connection when creating new data sources, saving settings, or before initiating scans.
Validation Checks:
Object exists and is accessible
Required privileges are granted
Column exists (for semi-structured)
Object type matches selection
Error Handling:
Connection Failed: System displays specific error message and provides SQL statements to grant privileges
Object Not Found: Error indicates object doesn't exist—verify database, schema, and object names
Column Not Found: Error indicates column doesn't exist—verify column name spelling and data type
Summary
Required Settings:
Object Type
Database
Schema
Object Name
Column Name (semi-structured)
Format Type (semi-structured)
Column Data Type (semi-structured)
Optional Settings:
Schema Sample
Source Platform/Application
Data Deduplication
Deduplication SQL Expression
For information on data source types, see Data Source Types. For materialization settings, see Materialization Settings.
Last updated
Was this helpful?