Large XML File Data Loader

Load large XML files with a DataPancake chunking UDTF.

1. Copy your XML file(s) to a stage in Snowflake

2. Open a new Snowflake worksheet

3. Execute the following statement to create a new table

Replace the database, schema, and table name placeholders.

CREATE OR REPLACE TABLE <database_name>.<schema_name>.<table_name> (
  file_path   STRING           NOT NULL,                        -- path in stage where XML was read
  file_hash     VARCHAR(64)      NOT NULL,                      -- SHA-256 hex digest of the full file
  chunk_index   INTEGER          NOT NULL,                      -- zero-based index of this chunk
  chunk_count   INTEGER          NOT NULL,                      -- total number of chunks for this file
  chunk_xml     VARIANT          NOT NULL,                      -- parsed XML fragment
  processed_at  TIMESTAMP_NTZ    DEFAULT CURRENT_TIMESTAMP()    -- load time
)
COMMENT = 'Stores XML chunks emitted by DataPancake UDTF: FN_CHUNK_XML_FROM_STAGED_FILE_UDTF';

4. Execute the following statement to process the XML file

Replace the database, schema, table, stage, filename, header element name, chunk element name, and max_chunk_size_in_mb placeholders.

INSERT INTO <database_name>.<schema_name>.<table_name>
(file_path, file_hash, chunk_index, chunk_count, chunk_xml)
SELECT c.file_path, c.file_hash, c.chunk_index, c.chunk_count, parse_xml(c.chunk_xml)
FROM TABLE(
    datapancake.core.FN_CHUNK_XML_FROM_STAGED_FILE_UDTF(
    BUILD_SCOPED_FILE_URL('@<stage_name>', '<xml_file_name>'),
    '@<stage_name>/<xml_file_name>',
    '<header_element_name>',
    '<chunk_element_name>',
    cast(max_chunk_size_in_mb as double)
  )
) as c;

5. Verify the statements executed successfully

Last updated