How to Create a Data Source for Data Streamed from Kafka

Create a data source for data streamed from Kafka in DataPancake.

1. Grant DataPancake access to the database, schema, and table

Open a new snowflake worksheet and run the necessary statements for the database, schema, and table which contain the Kafka data. For example:

GRANT USAGE ON DATABASE DB_NAME TO APPLICATION DATAPANCAKE;
GRANT USAGE ON SCHEMA DB_NAME.SCHEMA_NAME TO APPLICATION DATAPANCAKE;
GRANT REFERENCES, SELECT ON TABLE DB_NAME.SCHEMA_NAME.TABLE_NAME TO APPLICATION DATAPANCAKE;

2. In DataPancake, navigate to the Data Sources page

3. Enter a name for the new data source

4. Select the data source type

5. (Optional) Enter data source tags

These are simply used for filtering / searching. For example: dev, prod, api, csv

6. Select "Kafka" for the source stream platform

7. (Optional) Deduplicate Messages

If checked, two dynamic tables will be produced for the root attributes in the semi-structured data source. The first dynamic table will be used to flatten the root attributes. The second dynamic table will be used to filter the flattened rows using a window function (configured in the next step) to produce the most recent message for each primary key.

8. Enter the Deduplication SQL Expression

Required only if "Deduplicate Messages" is toggled on

The sql expression used to deduplicate the source stream messages based on a primary key and sort order to produce the most recent message. This value is required if the SQL Code Generation feature is selected and the Deduplicate Messages option is enabled.