How to Create a Data Source for Data Streamed from Kafka

Create a data source for data streamed from Kafka in DataPancake.

Last updated 3 days ago

How to Create a Data Source for Data Streamed from Kafka

Create a data source for data streamed from Kafka in DataPancake.

1. Grant DataPancake access to the database, schema, and table

Open a new snowflake worksheet and run the necessary statements for the database, schema, and table which contain the Kafka data. For example:

GRANT USAGE ON DATABASE DB_NAME TO APPLICATION DATAPANCAKE;
GRANT USAGE ON SCHEMA DB_NAME.SCHEMA_NAME TO APPLICATION DATAPANCAKE;
GRANT REFERENCES, SELECT ON TABLE DB_NAME.SCHEMA_NAME.TABLE_NAME TO APPLICATION DATAPANCAKE;

2. In DataPancake, navigate to the Data Sources page

3. Enter a name for the new data source

4. Select the data source type

5. (Optional) Enter data source tags

These are simply used for filtering / searching. For example: dev, prod, api, csv

6. Select "Kafka" for the source stream platform

7. (Optional) Deduplicate Messages

If checked, two dynamic tables will be produced for the root attributes in the semi-structured data source. The first dynamic table will be used to flatten the root attributes. The second dynamic table will be used to filter the flattened rows using a window function (configured in the next step) to produce the most recent message for each primary key.

8. Enter the Deduplication SQL Expression

Required only if "Deduplicate Messages" is toggled on

The sql expression used to deduplicate the source stream messages based on a primary key and sort order to produce the most recent message. This value is required if the SQL Code Generation feature is selected and the Deduplicate Messages option is enabled.

9. Select the Object Type

10. Select the Column Data Type

12. Select the Format Type

13. Select the Database

14. Select the Schema

15. Select the Object Name

16. Enter the Column Name

This column name can be a sql expression that refers to a specific path of the semi-structured data source. See the example below.

17. Click 'Enable Feature Selection'

18. Select "SQL Code Generation"

19. Select the Output Object Type

20. Configure additional settings as needed (🚧 WIP)

How to configure dynamic table settings (include metadata How to configure the semantic layer How to configure schema consolidation How to configure schema filters