Large XML File Data Loader
Load large XML files with a DataPancake chunking UDTF.
1. Copy your XML file(s) to a stage in Snowflake
2. Open a new Snowflake worksheet
3. Execute the following statement to create a new table
Replace the database, schema, and table name placeholders.
CREATE OR REPLACE TABLE <database_name>.<schema_name>.<table_name> (
file_path STRING NOT NULL, -- path in stage where XML was read
file_hash VARCHAR(64) NOT NULL, -- SHA-256 hex digest of the full file
chunk_index INTEGER NOT NULL, -- zero-based index of this chunk
chunk_count INTEGER NOT NULL, -- total number of chunks for this file
chunk_xml VARIANT NOT NULL, -- parsed XML fragment
processed_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP() -- load time
)
COMMENT = 'Stores XML chunks emitted by DataPancake UDTF: FN_CHUNK_XML_FROM_STAGED_FILE_UDTF';
4. Execute the following statement to process the XML file
Replace the database, schema, table, stage, filename, header element name, chunk element name, and max_chunk_size_in_mb placeholders.
INSERT INTO <database_name>.<schema_name>.<table_name>
(file_path, file_hash, chunk_index, chunk_count, chunk_xml)
SELECT c.file_path, c.file_hash, c.chunk_index, c.chunk_count, parse_xml(c.chunk_xml)
FROM TABLE(
datapancake.core.FN_CHUNK_XML_FROM_STAGED_FILE_UDTF(
BUILD_SCOPED_FILE_URL('@<stage_name>', '<xml_file_name>'),
'@<stage_name>/<xml_file_name>',
'<header_element_name>',
'<chunk_element_name>',
cast(max_chunk_size_in_mb as double)
)
) as c;
5. Verify the statements executed successfully
Last updated