Large JSON File Data Loader

Load large JSON files with a DataPancake chunking UDTF.

1. Upload your JSON file to a stage in Snowflake

2. Open a new Snowflake worksheet

3. Execute the following statement to create a new table

Replace the database, schema, and table name placeholders.

CREATE OR REPLACE TABLE <database_name>.<schema_name>.<table_name>(
  file_path   STRING             NOT NULL,                      -- path in stage where JSON was read
  file_hash     VARCHAR(64)      NOT NULL,                      -- SHA-256 hex digest of the full file
  chunk_index   INTEGER          NOT NULL,                      -- zero-based index of this chunk
  chunk_count   INTEGER          NOT NULL,                      -- total number of chunks for this file
  chunk_json    VARIANT          NOT NULL,                      -- parsed JSON fragment
  primary_key   STRING                   ,                      -- primary key
  processed_at  TIMESTAMP_NTZ    DEFAULT CURRENT_TIMESTAMP()    -- load time
)
COMMENT = 'Stores JSON chunks emitted by DataPancake UDTF: FN_CHUNK_JSON_FROM_STAGED_FILE_UDTF';

4. Execute the following statement to process the XML file

Replace the database, schema, table, stage, filename, header element name, chunk element name, and max_chunk_size_in_mb placeholders.

INSERT INTO <database_name>.<schema_name>.<table_name>
(file_path, file_hash, chunk_index, chunk_count, chunk_json, primary_key)
SELECT c.file_path, c.file_hash, cast(c.chunk_index as integer), cast(c.chunk_count as integer), parse_json(c.chunk_json), c.primary_key
FROM TABLE(
    fn_chunk_json_from_staged_file_udtf(
    BUILD_SCOPED_FILE_URL('@<stage_name>', '<file_name>'),
    '@<stage_name>/<file_name>',
    '<start_path_or_empty_string_for_root>',
    'chunk_path',
    'primary_key_path_or_expression',
    cast(<max_chunk_mb> as float)
  )
) as c;

5. Verify the statements executed successfully

Last updated