The detection of changes (often referred to as "delta detection") is essential for effectively managing updates in Snowflake. This method involves using Amazon S3 or Google Cloud Storage as an interim staging area, where CSV files are updated by the source system. These CSV files serve as the foundation for delta detection and for making subsequent updates in Snowflake tables

EazyDI - CDC EazyDI.png

Implementation Steps

Step 1: EazyDI Pipeline Configuration

  1. Source Connector:

image-20240618-074900.png

image-20240618-074949.png

  1. S3 Connector as Target:

image-20240618-075101.png

  1. Continue with the field mapping and save the pipeline

image-20240618-075158.png

Step 2: Staging for Delta Detection in Snowflake

  1. CSV Files in S3:

Step 3: Snowflake Staging Table (Optional)

If required, you can load data from the S3 staging area into Snowflake tables first for staging using Snowflake's COPY INTO command. This step is optional and can be skipped if you prefer to directly integrate and process data from external stages in Snowflake.

  1. Create Snowflake Tables:

  2. Load Data from S3 to Snowflake

COPY INTO snowflake_table
FROM 's3://your-bucket/path/'
FILE_FORMAT = (TYPE = 'CSV');

Step 4: Delta Detection and Updates

  1. SQL for Delta Detection:

  2. Merging using a snowflake staging table

MERGE INTO target_table AS t
USING (
    SELECT * FROM snowflake_staging_table
) AS s
ON t.primary_key = s.primary_key
WHEN MATCHED THEN
    UPDATE SET t.column1 = s.column1, t.column2 = s.column2
WHEN NOT MATCHED THEN
    INSERT (primary_key, column1, column2)
    VALUES (s.primary_key, s.column1, s.column2);
  1. If you want to directly access the S3 files from Snowflake external stage, follow these steps:

-- Step 1: Define an external stage pointing to S3
CREATE OR REPLACE STAGE s3_stage
  URL = 's3://your-bucket/path/'
  CREDENTIALS = (
    AWS_KEY_ID = 'your_access_key_id',
    AWS_SECRET_KEY = 'your_secret_access_key'
  );

-- Step 2: Perform MERGE operation using the external stage
MERGE INTO target_table AS t
USING (
    SELECT * FROM @s3_stage/csv_file 
) AS s
ON t.primary_key = s.primary_key
WHEN MATCHED THEN
    UPDATE SET t.column1 = s.column1, t.column2 = s.column2
WHEN NOT MATCHED THEN
    INSERT (primary_key, column1, column2)
    VALUES (s.primary_key, s.column1, s.column2);

Considerations

Implementing delta detection using S3 as a Snowflake stage and merging statements in Snowflake provides an efficient method for managing data updates. This approach ensures data integrity and accuracy while leveraging Snowflake's capabilities for handling large datasets.

By using EazyDI to extract data from source systems and storing updated CSV files in an S3 staging area, coupled with implementing delta detection in Snowflake, you can effectively handle data updates and maintain precision in your analytical workflows. This method offers a scalable and reliable solution for integrating diverse data sources into Snowflake, facilitating enhanced analysis and reporting capabilities.