Source Object Settings

Configuration for the Snowflake database object that serves as the data source.

Overview

Source Object Settings define the connection to the actual Snowflake object containing your data. These settings specify the database, schema, object, and column (for semi-structured data) that DataPancake will scan and process. Additional settings configure platform-specific options for streaming data sources.


Object Type Selection

The type of Snowflake object that contains your data. DataPancake uses this to construct proper SQL queries and grant statements.

Supported Object Types

For Semi-Structured Data Sources:

  • Dynamic Table - VARIANT columns only

  • External Table - JSON format only

  • Iceberg Table - VARIANT or STRING columns, JSON format only (STRING type supports JSON only, not XML)

  • Materialized View - VARIANT columns

  • Table - VARIANT columns

  • View - VARIANT columns

For Structured Data Sources:

  • Materialized View

  • Table

  • View


Database, Schema, and Object Selection

The three-part identifier for locating your source object in Snowflake.

Database Selection

  • Dropdown populated from databases accessible to your user (requires USAGE privilege)

  • Case-sensitive names are automatically quoted

  • Selection required before schema selection

Schema Selection

  • Dropdown populated after database selection

  • Shows schemas in selected database (requires USAGE privilege)

  • Refreshes when database changes

  • Case-sensitive names are automatically quoted

Object Name Selection

  • Dropdown populated after schema selection

  • Shows objects of selected type in selected schema

  • Only objects matching selected Object Type are shown

  • Refreshes when schema or object type changes

  • Requires SELECT privilege on object (for semi-structured) or appropriate privileges

  • Case-sensitive names are automatically quoted


Column Configuration (Semi-Structured Only)

For semi-structured data sources, you must specify the column containing the VARIANT or String data.

Column Name

Requirements:

  • Column must exist in the source object

  • Column must be of type VARIANT or String (VARCHAR/TEXT)

  • Required field for semi-structured data sources

Input:

  • Text field for column name

  • Case-sensitive if column name is quoted in Snowflake

  • Can include parsing expressions for String columns

Examples:

  • Simple column: JSON_DATA

  • Quoted column: "JsonData"

  • Parsing expression: parse_json(RECORD_CONTENT:"fullDocument")

Column Data Type

  • VARIANT (Recommended)

    • Native Snowflake semi-structured type

    • Automatically parses JSON, Avro, Parquet, ORC, XML

    • Best performance

    • No parsing expressions needed

  • String

    • Text column containing JSON/XML strings

    • Requires parsing functions in Column Name field

    • Useful for streaming platforms

    • For Iceberg tables: STRING type supports JSON only (not XML)

Parsing Expressions (String columns only):

Enter parsing expressions in the Column Name field when using String column type. Example for Kafka/MongoDB:


Format Type (Semi-Structured Only)

The data format stored in the VARIANT or String column. Format type must match actual data format—incorrect format selection may cause scan failures.

Supported Formats

  • JSON - Standard JSON format

  • Avro - Apache Avro binary format

  • Parquet - Columnar storage format

  • ORC - Optimized Row Columnar format

  • XML - XML document format

Format Restrictions:

  • External Tables: JSON format only

  • Iceberg Tables: JSON format only (STRING type supports JSON only, not XML)


Schema Sample (Semi-Structured Only)

Optional JSON or XML document representing the complete schema of your data source.

Purpose:

  • Use "Schema" attribute create type in scan configurations (skips data scanning)

  • Faster initial setup for prototyping and testing

  • Can be updated when full scans are performed

Requirements:

  • Valid JSON or XML document

  • Should represent complete schema structure with all attributes and nested structures

  • Can be a single representative record

Usage: When Attribute Create Type is set to "Schema" in scan configurations, the system uses the schema sample instead of scanning data.


Platform/Application Settings (Semi-Structured Only)

Configuration for data sources from streaming platforms or document databases.

Source Platform/Application

Supported Platforms:

  • Kafka — Apache Kafka streaming platform

  • DynamoDB — Amazon DynamoDB document database

  • Other — Custom platforms can be configured

Purpose:

  • Identifies data source platform for specialized handling

  • Enables platform-specific features (e.g., stream message metadata inclusion)

Selection:

  • Dropdown of available platforms

  • Optional field (can be left blank)

  • Only relevant for streaming/document database sources

Data Deduplication

Toggle: Data Deduplication (By Primary Key)

When Enabled:

  • Creates two dynamic tables:

    1. First table: Flattens root attributes

    2. Second table: Filters to most recent record per primary key

Requirements:

  • SQL Code Generation feature must be enabled

  • Deduplication SQL Expression must be provided

Use Cases:

  • Streaming data sources with duplicate records

  • Need to maintain only latest record per key

Deduplication SQL Expression

Format:

  • QUALIFY clause with ROW_NUMBER() window function

  • PARTITION BY primary key

  • ORDER BY sort field (typically timestamp or offset)

Example — Kafka (with Stream Metadata):

Example — Timestamp-based:

Requirements:

  • Required when Data Deduplication is enabled

  • Must use valid Snowflake SQL syntax

  • Should reference attributes available in flattened data


Source Object Validation

The system validates source object connection when creating new data sources, saving settings, or before initiating scans.

Validation Checks:

  • Object exists and is accessible

  • Required privileges are granted

  • Column exists (for semi-structured)

  • Object type matches selection

Error Handling:

  • Connection Failed: System displays specific error message and provides SQL statements to grant privileges

  • Object Not Found: Error indicates object doesn't exist—verify database, schema, and object names

  • Column Not Found: Error indicates column doesn't exist—verify column name spelling and data type


Summary

Required Settings:

  • Object Type

  • Database

  • Schema

  • Object Name

  • Column Name (semi-structured)

  • Format Type (semi-structured)

  • Column Data Type (semi-structured)

Optional Settings:

  • Schema Sample

  • Source Platform/Application

  • Data Deduplication

  • Deduplication SQL Expression

For information on data source types, see Data Source Types. For materialization settings, see Materialization Settings.

Last updated

Was this helpful?