JSON Tutorial
This detailed tutorial will walk you creating and configuring a JSON DataPancake pipeline using an example data set.
Deploy Data Objects & Load Data
Create the table and load the data which can be downloaded here:
Additionally, copy and deploy the UDFs and security policies.
Note: The Enterprise version of DataPancake is required to use the security policies used in this tutorial.
Create the Data Source
Use the Script Builder to generate the initial source.
Make sure to check "run scan", this will discover the raw schema and attributes.
Finalize the Schema & Rescan
Re-scan the data source with reset attributes = true.
Note: After manual metadata edits, reset attributes is no longer available.
Define Virtual Attributes
Create the new virtual attributes
This is where you can add surrogate primary keys, create calculated fields, and add ordering attributes for dedupe logic.
Configure Column Materialization
Add column materialization rules in the dynamic table layer and the secure view layer.
Apply Column-Level Schema Transformations
Apply column-level schema transformations.
You can also normalize data types and formats, and perform schema consolidation at the field level.
Merge Polymorphic Versions
Merge string, float, and integer variants into unified attributes.
Add Aliases
Add user-friendly alias names for columns and arrays where needed.
Add Security Policies
Configure security policies such as row-level access rules, and attribute-level security tags.
Configure Foreign Keys
Add relationships between flattened entities by configuring foreign keys.
Generate Code
Generate the dynamic SQL statements using your latest configurations.
Deploy and Validate
Review and deploy the generated code.
Make sure to perform data quality checks and validate security enforcement.
Pipeline Maintenance
Repeat steps 3 - 13 as many times as needed.
Last updated
Was this helpful?