Data Sources
Introduction to data sources and navigation hub for all data source documentation.
Overview
Data sources connect to Snowflake database objects and define what data to scan, how to process it, and what materialized objects to generate. Data sources are the entry point for all DataPancake operations—from schema discovery through code generation and materialization.
Key Concepts:
Source Object Connection - Links to a specific Snowflake database object and column
Data Type Classification - Semi-Structured (VARIANT/String columns) or Structured (relational)
Product Tier Management - Enable specific features per data source
Materialization Configuration - Define output objects (dynamic tables, views, etc.)
Schema Transformation - Apply consolidations and filters during processing
Baseline Performance - Track scan performance metrics for estimation
What You'll Learn
This Data Sources section covers:
Data Source Types - Semi-structured and structured data sources, format types, and column data types
Adding Data Sources - Adding via application interface or SQL worksheet, privileges, and connection validation
Basic Configuration Settings - Name, type, status, tags, and connection status
Product Tiers & Features - Feature tiers from Schema Discovery to Semantic Model Generator
Source Object Settings - Object selection, column configuration, schema samples, and platform settings
Materialization Settings - Output object types, naming, deployment location, and configuration
Dynamic Table Settings - Warehouse assignment, target lag, refresh modes, and metadata tables
Secure View Settings - View types, semantic layer, and row access policy integration
Schema Transformations - Consolidation rules, transformation types, and data type transformations
Schema Filters - Filter configuration, regular expressions, and attribute exclusion
Baseline Scan Settings - Performance calibration and scan estimation
Quick Reference
Essential Settings:
Data Source Name - Unique identifier (case-insensitive uniqueness check)
Data Source Type - Semi-Structured or Structured
Source Object - Database, schema, object, and column reference
Status - Active, Inactive, or Deleted
For Semi-Structured Data Sources:
Format Type - JSON, Avro, Parquet, ORC, or XML
Column Data Type - VARIANT or String
Column Name - Name of the VARIANT/String column (can include parsing expressions for String type)
Product Tiers:
Schema Discovery - Always enabled (free tier)
Pipeline Designer - Foundation for all paid features (auto-enabled with others)
SQL Code Generation - Enables materialization code generation (semi-structured only)
Additional Features - Data Dictionary Builder, Security Policy Integration, Semantic Model Generator
Materialization (Semi-Structured Only):
Output Object Type - Dynamic Table or Table
Root Table Name - Prefix for all generated objects (required when SQL Code Generation enabled)
Deployment Location - Database and schema for output objects (defaults to source location if not specified)
Dynamic Table Settings - Warehouse, target lag, optional parameters (required for Dynamic Table type)
Data Source Lifecycle
Creation - Add data source via application interface or SQL worksheet (
core.add_datasource_with_scan)Configuration - Set product tiers, materialization settings, and transformations
Initial Scan - Perform quick scan (default 150 records) to discover schema (optional)
Full Scan - Execute comprehensive scan using scan configurations
Code Generation - Generate SQL code for materialized objects (requires SQL Code Generation feature)
Deployment - Execute generated SQL to create output objects
Maintenance - Update settings, add transformations, monitor performance
For detailed information on scanning data sources, see the Scan Configurations documentation. For information on generated attributes, see the Attribute Metadata documentation.
Last updated
Was this helpful?