Data Sources

Introduction to data sources and navigation hub for all data source documentation.

Overview

Data sources connect to Snowflake database objects and define what data to scan, how to process it, and what materialized objects to generate. Data sources are the entry point for all DataPancake operations—from schema discovery through code generation and materialization.

Key Concepts:

  1. Source Object Connection - Links to a specific Snowflake database object and column

  2. Data Type Classification - Semi-Structured (VARIANT/String columns) or Structured (relational)

  3. Product Tier Management - Enable specific features per data source

  4. Materialization Configuration - Define output objects (dynamic tables, views, etc.)

  5. Schema Transformation - Apply consolidations and filters during processing

  6. Baseline Performance - Track scan performance metrics for estimation


What You'll Learn

This Data Sources section covers:


Quick Reference

Essential Settings:

  • Data Source Name - Unique identifier (case-insensitive uniqueness check)

  • Data Source Type - Semi-Structured or Structured

  • Source Object - Database, schema, object, and column reference

  • Status - Active, Inactive, or Deleted

For Semi-Structured Data Sources:

  • Format Type - JSON, Avro, Parquet, ORC, or XML

  • Column Data Type - VARIANT or String

  • Column Name - Name of the VARIANT/String column (can include parsing expressions for String type)

Product Tiers:

  • Schema Discovery - Always enabled (free tier)

  • Pipeline Designer - Foundation for all paid features (auto-enabled with others)

  • SQL Code Generation - Enables materialization code generation (semi-structured only)

  • Additional Features - Data Dictionary Builder, Security Policy Integration, Semantic Model Generator

Materialization (Semi-Structured Only):

  • Output Object Type - Dynamic Table or Table

  • Root Table Name - Prefix for all generated objects (required when SQL Code Generation enabled)

  • Deployment Location - Database and schema for output objects (defaults to source location if not specified)

  • Dynamic Table Settings - Warehouse, target lag, optional parameters (required for Dynamic Table type)


Data Source Lifecycle

  1. Creation - Add data source via application interface or SQL worksheet (core.add_datasource_with_scan)

  2. Configuration - Set product tiers, materialization settings, and transformations

  3. Initial Scan - Perform quick scan (default 150 records) to discover schema (optional)

  4. Full Scan - Execute comprehensive scan using scan configurations

  5. Code Generation - Generate SQL code for materialized objects (requires SQL Code Generation feature)

  6. Deployment - Execute generated SQL to create output objects

  7. Maintenance - Update settings, add transformations, monitor performance

For detailed information on scanning data sources, see the Scan Configurations documentation. For information on generated attributes, see the Attribute Metadata documentation.

Last updated

Was this helpful?