Datasources - Bulk Update

Select and export the datasource metadata to a .csv file, edit in a spreadsheet like Google Sheets or Excel, then perform a bulk update in DataPancake.

Ensure you have completed Initialization Guide

1. Open a new Snowflake worksheet

2. Run a select statement for DATAPANCAKE.CORE.VW_DATASOURCES

SELECT
-- VIEW ONLY COLUMNS
DATASOURCE_ID,
DATASOURCE_TYPE,
DATA_FORMAT,
DATASOURCE_OBJECT_TYPE,
DATASOURCE_OBJECT_ATTRIBUTE_DATA_TYPE,
DATASOURCE_DATABASE,
DATASOURCE_SCHEMA,
DATASOURCE_OBJECT_NAME,
DATASOURCE_OBJECT_ATTRIBUTE_NAME,
LAST_SCAN,
ATTRIBUTE_COUNT,
ATTRIBUTE_MAX_LEVEL,
OBJECT_COUNT,
ARRAY_COUNT,
POLYMORPHIC_ATTRIBUTE_COUNT,
EMBEDDED_CONTENT_ATTRIBUTE_COUNT,
COMPLEXITY_SCORE,
DATASOURCE_STATUS,
CONNECTION_STATUS,
VIRTUAL_ATTRIBUTE_COUNT,
SCHEMA_ATTRIBUTE_COUNT,
DISCOVERED_ATTRIBUTE_COUNT,
-- PRODUCT TIER FLAGS
PRODUCT_TIER_ATTRIBUTE_METADATA_MANAGEMENT,
PRODUCT_TIER_SQL_CODE_GENERATION,
PRODUCT_TIER_DATA_DICTIONARY_BUILDER,
PRODUCT_TIER_SEMANTIC_MODEL_CODE_GENERATION,
PRODUCT_TIER_SECURITY_POLICY_INTEGRATION,
-- EDITABLE COLUMNS FOR BULK UPDATE
DATASOURCE_NAME,
DATASOURCE_TAG,
-- OUTPUT CONFIGURATION FIELDS
OUTPUT_OBJECT_TYPE,
OUTPUT_OBJECT_PREFIX_NAME,
OUTPUT_OBJECT_COMPUTE_RESOURCE_NAME,
OUTPUT_OBJECT_INTERVAL_QUANTITY,
OUTPUT_OBJECT_INTERVAL_UNIT_TYPE,
OUTPUT_OBJECT_OPTIONAL_PARAMETERS,
OUTPUT_OBJECT_OPTIONAL_PARAMETERS_ARRAY,
OUTPUT_OBJECT_DEDUPLICATE,
OUTPUT_OBJECT_SECURITY_ROW_LEVEL_POLICY_NAME_SEMANTIC_LAYER,
OUTPUT_OBJECT_DATABASE_NAME,
OUTPUT_OBJECT_SCHEMA_NAME,
OUTPUT_OBJECT_ROOT_SELECT_PREFIX,
OUTPUT_OBJECT_CASE_SENSITIVE,
-- SCAN CONFIGURATION FIELDS
BASE_SCAN_COMPUTE_RESOURCE_THREAD_COUNT,
BASE_SCAN_RECORDS_PER_MINUTE,
-- METADATA TABLE CONFIGURATION FIELDS
OUTPUT_OBJECT_CREATE_METADATA_TABLE,
OUTPUT_OBJECT_METADATA_TABLE_ALLOW_UPDATE,
OUTPUT_OBJECT_METADATA_TABLE_ALLOW_DELETE,
OUTPUT_OBJECT_METADATA_TABLE_COMPUTE_RESOURCE_NAME,
OUTPUT_OBJECT_METADATA_TABLE_CRON_SCHEDULE,
OUTPUT_OBJECT_METADATA_TABLE_CRON_TIMEZONE,
OUTPUT_OBJECT_METADATA_TABLE_DATABASE_NAME,
OUTPUT_OBJECT_METADATA_TABLE_SCHEMA_NAME,
-- DATA DICTIONARY
DATASOURCE_DESCRIPTION_PROMPT,
DATASOURCE_DESCRIPTION,
-- DERIVED COLUMNS FOR BULK UPDATE
FALSE AS GENERATE_CODE,
FALSE AS GENERATE_DATASOURCE_DESCRIPTION,
'mixtral-8x7b' AS MODEL_NAME,
250 AS WORD_LIMIT,
FROM DATAPANCAKE.CORE.VW_DATASOURCES
ORDER BY DATASOURCE_NAME;

You can choose which editable fields you wish to include in your select statement.

3. Verify the query successfully returns a list of attribute metadata

4. Download the results as a .csv file

5. Open the .csv file in either Excel or Google Sheets

6. Edit any editable columns as needed

7. Save (Excel) or download (Google Sheets) the file

8. Load the .csv into your stage

9. Run the bulk update stored procedure

When the stored procedure completes, the attribute metadata file will be moved to the archive folder.

call <database>.<schema>.sp_update_datapancake_datasource_metadata_core(
    '@<stage_name>/<staged_filename>.csv',
    '@<stage_name>/<archive_folder>/',
    '<snowflake_username>'
);

10. Verify the integration completed successfully by running the following query

select * from datapancake.core.vw_integration_process
where user_name = 'BWOLFF'
order by integration_process_date desc;

Last updated

Was this helpful?