FAQ

Frequently Asked Questions

FAQ

1. How do we ensure data quality when loading files?

If you use the DataPancake XML Chunker to load data, it reports a hash value you can check against a hash of the raw file.

  • A provided stored procedure can scan dynamic table columns for fields that are 100% null, which indicates potential issues.

  • DataPancake supports custom null expressions to make quality checks easier.

  • For deeper checks, you can:

    • Count XML elements in the raw file and compare them to rows in the final materialized tables.

    • Validate transformation expressions (e.g., formulas or UDFs) to avoid incorrect results.


2. Can we add metadata columns (e.g., filename, run date) to dynamic tables?

Yes. You can add virtual attributes at the root or within any nested array. These can be populated using any valid SQL expression, including values from other source columns.


3. How does the automated XML file ingestion process work?

We provide SQL code to automate the pipeline:

  • A task checks for new files in ADLS.

  • When a new file arrives, the XML Chunker UDTF brings the data into the raw staging table using a variant column.


4. Can we clean up dynamic tables with aliases, PKs, and FKs?

Yes. You can alias:

  • Raw dynamic table names

  • Dynamic column names

  • Primary/foreign key column names

  • View names and view column names

This allows you to simplify naming and hide unnecessary array data when creating views.

Last updated

Was this helpful?