# Common Patterns & Best Practices

## Common Configuration Patterns

**Quick Onboarding Scan:**

* `ATTRIBUTE_CREATE_TYPE`: `'Schema'` (if schema sample available) or `'Discover'`
* `SCAN_RECORD_LIMIT`: `10,000` (or small sample)
* `PROCEDURE_INSTANCE_COUNT`: `1`
* Number of Threads: Default (maximum)
* `CODE_GENERATE_ON_VERSION_CHANGE`: `FALSE` (review first)
* `MONITOR_ENABLED`: `FALSE`
* Use case: Initial data source setup, testing, quick schema validation

**Full Production Scan:**

* `ATTRIBUTE_CREATE_TYPE`: `'Discover'`
* `SCAN_RECORD_LIMIT`: `0` (unlimited)
* `PROCEDURE_INSTANCE_COUNT`: `1` (or multiple if needed)
* Number of Threads: Default (maximum)
* `CODE_GENERATE_ON_VERSION_CHANGE`: `TRUE`
* `MONITOR_ENABLED`: `TRUE` (daily/weekly as needed)
* Use case: Production data sources, complete attribute discovery, automated pipeline updates

**Incremental Scan:**

* `ATTRIBUTE_CREATE_TYPE`: `'Discover'`
* `SCAN_RECORD_LIMIT`: `0` (unlimited)
* `SCAN_WHERE_CLAUSE`: `created_date > '<last_scanned_timestamp>'`
* `SOURCE_STREAM_LAST_SCANNED_TIMESTAMP`: Auto-updated after scans
* `PROCEDURE_INSTANCE_COUNT`: `1`
* Number of Threads: Default (maximum)
* `CODE_GENERATE_ON_VERSION_CHANGE`: `TRUE`
* `MONITOR_ENABLED`: `TRUE` (hourly/daily)
* Use case: Regularly updated data sources, delta scanning workflows, schema monitoring

**Large Dataset Scan:**

* `ATTRIBUTE_CREATE_TYPE`: `'Discover'`
* `SCAN_RECORD_LIMIT`: `0` (unlimited)
* `PROCEDURE_INSTANCE_COUNT`: `4` (or as needed)
* `PROCEDURE_INSTANCE_ROW_COUNT`: `1,000,000` (or calculated)
* `SCAN_ORDER_BY`: `unique_id` (or appropriate attribute)
* Number of Threads: Default (maximum)
* `CODE_GENERATE_ON_VERSION_CHANGE`: `TRUE`
* `MONITOR_ENABLED`: `TRUE` (as needed)
* Use case: Data sources exceeding 60-minute scan time, very large datasets (millions+ records), complex nested structures

**Memory-Constrained Scan:**

* `ATTRIBUTE_CREATE_TYPE`: `'Discover'`
* `SCAN_RECORD_LIMIT`: `0` (unlimited) or limited
* `PROCEDURE_INSTANCE_COUNT`: `1` or multiple
* Number of Threads: Reduced (e.g., 2-4 instead of max)
* `THREAD_PROCESS_RECORD_COUNT`: Reduced
* `CODE_GENERATE_ON_VERSION_CHANGE`: `TRUE`
* `MONITOR_ENABLED`: `FALSE` or `TRUE`
* Use case: Memory errors during scanning, very complex nested structures, large arrays and polymorphic variations, smaller warehouse sizes

***

## Best Practices

**Configuration management:**

* Use descriptive, consistent names (e.g., `customer_events_daily_full`)
* Create separate configurations for different use cases (quick scan, full scan, incremental scan)
* Keep only active configurations visible; use Inactive status for configurations you may reuse

**Performance optimization:**

* Use medium warehouses for most use cases; reserve larger warehouses for complex data sources
* Use default (maximum) threads unless memory constrained
* Start with single procedure call (`PROCEDURE_INSTANCE_COUNT = 1`); use multiple calls only if exceeding 60 minutes

**Scheduling strategy:**

* Schedule during low-usage periods
* Avoid overlapping scans on same warehouse
* Use `SCAN_WHERE_CLAUSE` with timestamp filters for incremental scanning
* Leverage `SOURCE_STREAM_LAST_SCANNED_TIMESTAMP`; clear timestamp for full re-scans

**Error prevention:**

* Monitor scan duration; use multiple procedure calls if needed
* Start with default settings; reduce thread count or `THREAD_PROCESS_RECORD_COUNT` if memory errors occur
* Validate `SCAN_WHERE_CLAUSE` and test `SCAN_ORDER_BY` attributes before use
* Ensure `PROCEDURE_INSTANCE_ROW_COUNT` is accurate when using multiple procedure calls


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datapancake.com/core-concepts/scan-configurations/common-patterns-and-best-practices.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
