# Data Sources

### Overview

Data sources connect to Snowflake database objects and define what data to scan, how to process it, and what materialized objects to generate. Data sources are the entry point for all DataPancake operations—from schema discovery through code generation and materialization.

**Key Concepts:**

1. **Source Object Connection** - Links to a specific Snowflake database object and column
2. **Data Type Classification** - Semi-Structured (VARIANT/String columns) or Structured (relational)
3. **Product Tier Management** - Enable specific features per data source
4. **Materialization Configuration** - Define output objects (dynamic tables, views, etc.)
5. **Schema Transformation** - Apply consolidations and filters during processing
6. **Baseline Performance** - Track scan performance metrics for estimation

***

### What You'll Learn

This Data Sources section covers:

* [**Data Source Types**](https://docs.datapancake.com/core-concepts/data-sources/data-source-types) - Semi-structured and structured data sources, format types, and column data types
* [**Adding Data Sources**](https://docs.datapancake.com/core-concepts/data-sources/adding-data-sources) - Adding via application interface or SQL file, privileges, and connection validation
* [**Basic Configuration Settings**](https://docs.datapancake.com/core-concepts/data-sources/basic-configuration-settings) - Name, type, status, tags, and connection status
* [**Product Tiers & Features**](https://docs.datapancake.com/core-concepts/data-sources/product-tiers-and-features) - Feature tiers from Schema Discovery to Semantic Model Generator
* [**Source Object Settings**](https://docs.datapancake.com/core-concepts/data-sources/source-object-settings) - Object selection, column configuration, schema samples, and platform settings
* [**Materialization Settings**](https://docs.datapancake.com/core-concepts/data-sources/materialization-settings) - Output object types, naming, deployment location, and configuration
* [**Dynamic Table Settings**](https://docs.datapancake.com/core-concepts/data-sources/dynamic-table-settings) - Warehouse assignment, target lag, refresh modes, and metadata tables
* [**Secure View Settings**](https://docs.datapancake.com/core-concepts/data-sources/secure-view-settings) - View types, semantic layer, and row access policy integration
* [**Schema Transformations**](https://docs.datapancake.com/core-concepts/data-sources/schema-transformations) - Consolidation rules, transformation types, and data type transformations
* [**Schema Filters**](https://docs.datapancake.com/core-concepts/data-sources/schema-filters) - Filter configuration, regular expressions, and attribute exclusion
* [**Baseline Scan Settings**](https://docs.datapancake.com/core-concepts/data-sources/baseline-scan-settings) - Performance calibration and scan estimation

***

### Quick Reference

**Essential Settings:**

* **Data Source Name** - Unique identifier (case-insensitive uniqueness check)
* **Data Source Type** - Semi-Structured or Structured
* **Source Object** - Database, schema, object, and column reference
* **Status** - Active, Inactive, or Deleted

**For Semi-Structured Data Sources:**

* **Format Type** - JSON, Avro, Parquet, ORC, or XML
* **Column Data Type** - VARIANT or String
* **Column Name** - Name of the VARIANT/String column (can include parsing expressions for String type)

**Product Tiers:**

* **Schema Discovery** - Always enabled (free tier)
* **Pipeline Designer** - Foundation for all paid features (auto-enabled with others)
* **SQL Code Generation** - Enables materialization code generation (semi-structured only)
* **Additional Features** - Data Dictionary Builder, Security Policy Integration, Semantic Model Generator

**Materialization (Semi-Structured Only):**

* **Output Object Type** - Dynamic Table or Table
* **Root Table Name** - Prefix for all generated objects (required when SQL Code Generation enabled)
* **Deployment Location** - Database and schema for output objects (defaults to source location if not specified)
* **Dynamic Table Settings** - Warehouse, target lag, optional parameters (required for Dynamic Table type)

***

### Data Source Lifecycle

1. **Creation** - Add data source via application interface or SQL file (`core.add_datasource_with_scan`)
2. **Configuration** - Set product tiers, materialization settings, and transformations
3. **Initial Scan** - Perform quick scan (default 150 records) to discover schema (optional)
4. **Full Scan** - Execute comprehensive scan using scan configurations
5. **Code Generation** - Generate SQL code for materialized objects (requires SQL Code Generation feature)
6. **Deployment** - Execute generated SQL to create output objects
7. **Maintenance** - Update settings, add transformations, monitor performance

**For detailed information on scanning data sources, see the** [**Scan Configurations**](https://docs.datapancake.com/core-concepts/scan-configurations) **documentation. For information on generated attributes, see the** [**Attribute Metadata**](https://docs.datapancake.com/core-concepts/attribute-metadata) **documentation.**
