Source Object Settings

Configuration for the Snowflake database object that serves as the data source.

Overview

Source Object Settings define the connection to the actual Snowflake object containing your data. These settings specify the database, schema, object, and column (for semi-structured data) that DataPancake will scan and process. Additional settings configure platform-specific options for streaming data sources.

Object Type Selection

The type of Snowflake object that contains your data. DataPancake uses this to construct proper SQL queries and grant statements.

Supported Object Types

For Semi-Structured Data Sources:

Dynamic Table - VARIANT columns only
External Table - JSON format only
Iceberg Table - VARIANT or STRING columns, JSON format only (STRING type supports JSON only, not XML)
Materialized View - VARIANT columns
Table - VARIANT columns
View - VARIANT columns

For Structured Data Sources:

Materialized View
Table
View

Database, Schema, and Object Selection

The three-part identifier for locating your source object in Snowflake.

Database Selection

Dropdown populated from databases accessible to your user (requires USAGE privilege)
Case-sensitive names are automatically quoted
Selection required before schema selection

Schema Selection

Dropdown populated after database selection
Shows schemas in selected database (requires USAGE privilege)
Refreshes when database changes
Case-sensitive names are automatically quoted

Object Name Selection

Dropdown populated after schema selection
Shows objects of selected type in selected schema
Only objects matching selected Object Type are shown
Refreshes when schema or object type changes
Requires SELECT privilege on object (for semi-structured) or appropriate privileges
Case-sensitive names are automatically quoted

Column Configuration (Semi-Structured Only)

For semi-structured data sources, you must specify the column containing the VARIANT or String data.

Column Name

Requirements:

Column must exist in the source object
Column must be of type VARIANT or String (VARCHAR/TEXT)
Required field for semi-structured data sources

Input:

Text field for column name
Case-sensitive if column name is quoted in Snowflake
Can include parsing expressions for String columns

Examples:

Simple column: JSON_DATA
Quoted column: "JsonData"
Parsing expression: parse_json(RECORD_CONTENT:"fullDocument")

Column Data Type

VARIANT (Recommended)
- Native Snowflake semi-structured type
- Automatically parses JSON, Avro, Parquet, ORC, XML
- Best performance
- No parsing expressions needed
String
- Text column containing JSON/XML strings
- Requires parsing functions in Column Name field
- Useful for streaming platforms
- For Iceberg tables: STRING type supports JSON only (not XML)

Parsing Expressions (String columns only):

Enter parsing expressions in the Column Name field when using String column type. Example for Kafka/MongoDB:

parse_json(RECORD_CONTENT:"fullDocument")

Format Type (Semi-Structured Only)

The data format stored in the VARIANT or String column. Format type must match actual data format—incorrect format selection may cause scan failures.

Supported Formats

JSON - Standard JSON format
Avro - Apache Avro binary format
Parquet - Columnar storage format
ORC - Optimized Row Columnar format
XML - XML document format

Format Restrictions:

External Tables: JSON format only
Iceberg Tables: JSON format only (STRING type supports JSON only, not XML)

Schema Sample (Semi-Structured Only)

Optional JSON or XML document representing the complete schema of your data source.

Purpose:

Use "Schema" attribute create type in scan configurations (skips data scanning)
Faster initial setup for prototyping and testing
Can be updated when full scans are performed

Requirements:

Valid JSON or XML document
Should represent complete schema structure with all attributes and nested structures
Can be a single representative record

Usage: When Attribute Create Type is set to "Schema" in scan configurations, the system uses the schema sample instead of scanning data.

Platform/Application Settings (Semi-Structured Only)

Configuration for data sources from streaming platforms or document databases.

Source Platform/Application

Supported Platforms:

Kafka — Apache Kafka streaming platform
DynamoDB — Amazon DynamoDB document database
Other — Custom platforms can be configured

Purpose:

Identifies data source platform for specialized handling
Enables platform-specific features (e.g., stream message metadata inclusion)

Selection:

Dropdown of available platforms
Optional field (can be left blank)
Only relevant for streaming/document database sources

Data Deduplication

Toggle: Data Deduplication (By Primary Key)

When Enabled:

Creates two dynamic tables:
1. First table: Flattens root attributes
2. Second table: Filters to most recent record per primary key

Requirements:

SQL Code Generation feature must be enabled
Deduplication SQL Expression must be provided

Use Cases:

Streaming data sources with duplicate records
Need to maintain only latest record per key

Deduplication SQL Expression

Format:

QUALIFY clause with ROW_NUMBER() window function
PARTITION BY primary key
ORDER BY sort field (typically timestamp or offset)

Example — Kafka (with Stream Metadata):

QUALIFY ROW_NUMBER() OVER (PARTITION BY message_key ORDER BY message_offset DESC) = 1

Example — Timestamp-based:

QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_timestamp DESC) = 1

Requirements:

Required when Data Deduplication is enabled
Must use valid Snowflake SQL syntax
Should reference attributes available in flattened data

Source Object Validation

The system validates source object connection when creating new data sources, saving settings, or before initiating scans.

Validation Checks:

Object exists and is accessible
Required privileges are granted
Column exists (for semi-structured)
Object type matches selection

Error Handling:

Connection Failed: System displays specific error message and provides SQL statements to grant privileges
Object Not Found: Error indicates object doesn't exist—verify database, schema, and object names
Column Not Found: Error indicates column doesn't exist—verify column name spelling and data type

Summary

Required Settings:

Object Type
Database
Schema
Object Name
Column Name (semi-structured)
Format Type (semi-structured)
Column Data Type (semi-structured)

Optional Settings:

Schema Sample
Source Platform/Application
Data Deduplication
Deduplication SQL Expression

For information on data source types, see Data Source Types. For materialization settings, see Materialization Settings.

PreviousSecure View Settings NextScan Configurations

Last updated 2 months ago

Was this helpful?

Good night

hashtagOverview

hashtagObject Type Selection

hashtagSupported Object Types

hashtagDatabase, Schema, and Object Selection

hashtagDatabase Selection

hashtagSchema Selection

hashtagObject Name Selection

hashtagColumn Configuration (Semi-Structured Only)

hashtagColumn Name

hashtagColumn Data Type

hashtagFormat Type (Semi-Structured Only)

hashtagSupported Formats

hashtagSchema Sample (Semi-Structured Only)

hashtagPlatform/Application Settings (Semi-Structured Only)

hashtagSource Platform/Application

hashtagData Deduplication

hashtagDeduplication SQL Expression

hashtagSource Object Validation

hashtagSummary

Overview

Object Type Selection

Supported Object Types

Database, Schema, and Object Selection

Database Selection

Schema Selection

Object Name Selection

Column Configuration (Semi-Structured Only)

Column Name

Column Data Type

Format Type (Semi-Structured Only)

Supported Formats

Schema Sample (Semi-Structured Only)

Platform/Application Settings (Semi-Structured Only)

Source Platform/Application

Data Deduplication

Deduplication SQL Expression

Source Object Validation

Summary