DataPancake™ Docs
  • DataPancake™ Documentation
  • Overview
    • Understanding the Challenges of Semi-Structured Data
    • Installation
  • Getting Started
    • How to Guides (Script Builder)
      • Admin Guides
        • How to Grant Account Level Privileges to DataPancake
        • How to Configure Availabile Warehouses in DataPancake
        • How to Manage Access to DataPancake
      • Data Source Guides
        • How to Create a Single DataPancake Data Source (Script Builder)
        • How to Create Multiple DataPancake Data Sources (Script Builder)
      • Alert Guides
        • How to Create Schema Drift Alerts (Script Builder)
    • How to Guides (UI)
      • How to Upgrade DataPancake from the Free Trial
      • How to Modify Enabled Features for a Data Source
      • How to Create a Data Source for Data Streamed from Kafka
    • Application Overview
      • Reference Architecture
        • Generated SQL DDL Deployment Options
  • DEVELOPMENT
    • Release Notes
Powered by GitBook
On this page
  • 1. Grant DataPancake access to the database, schema, and table
  • 2. In DataPancake, navigate to the Data Sources page
  • 3. Enter a name for the new data source
  • 4. Select the data source type
  • 5. (Optional) Enter data source tags
  • 6. Select "Kafka" for the source stream platform
  • 7. (Optional) Deduplicate Messages
  • 8. Enter the Deduplication SQL Expression
  • 9. Select the Object Type
  • 10. Select the Column Data Type
  • 12. Select the Format Type
  • 13. Select the Database
  • 14. Select the Schema
  • 15. Select the Object Name
  • 16. Enter the Column Name
  • 17. Click 'Enable Feature Selection'
  • 18. Select "SQL Code Generation"
  • 19. Select the Output Object Type
  • 20. Configure additional settings as needed (🚧 WIP)
  • 21. Save the data source
  • 22. Verify the save completed successfully.
  1. Getting Started
  2. How to Guides (UI)

How to Create a Data Source for Data Streamed from Kafka

Create a data source for data streamed from Kafka in DataPancake.

Last updated 3 days ago

1. Grant DataPancake access to the database, schema, and table

Open a new snowflake worksheet and run the necessary statements for the database, schema, and table which contain the Kafka data. For example:

GRANT USAGE ON DATABASE DB_NAME TO APPLICATION DATAPANCAKE;
GRANT USAGE ON SCHEMA DB_NAME.SCHEMA_NAME TO APPLICATION DATAPANCAKE;
GRANT REFERENCES, SELECT ON TABLE DB_NAME.SCHEMA_NAME.TABLE_NAME TO APPLICATION DATAPANCAKE;

2. In DataPancake, navigate to the Data Sources page

3. Enter a name for the new data source

4. Select the data source type

5. (Optional) Enter data source tags

These are simply used for filtering / searching. For example: dev, prod, api, csv

6. Select "Kafka" for the source stream platform

7. (Optional) Deduplicate Messages

If checked, two dynamic tables will be produced for the root attributes in the semi-structured data source. The first dynamic table will be used to flatten the root attributes. The second dynamic table will be used to filter the flattened rows using a window function (configured in the next step) to produce the most recent message for each primary key.

8. Enter the Deduplication SQL Expression

Required only if "Deduplicate Messages" is toggled on

The sql expression used to deduplicate the source stream messages based on a primary key and sort order to produce the most recent message. This value is required if the SQL Code Generation feature is selected and the Deduplicate Messages option is enabled.

9. Select the Object Type

10. Select the Column Data Type

12. Select the Format Type

13. Select the Database

14. Select the Schema

15. Select the Object Name

16. Enter the Column Name

This column name can be a sql expression that refers to a specific path of the semi-structured data source. See the example below.

17. Click 'Enable Feature Selection'

18. Select "SQL Code Generation"

19. Select the Output Object Type

20. Configure additional settings as needed (🚧 WIP)

How to configure dynamic table settings (include metadata How to configure the semantic layer How to configure schema consolidation How to configure schema filters

21. Save the data source

The save button is near the bottom of the page.

22. Verify the save completed successfully.