Read/Write Strategies for Pipelines in EazyDI
EazyDI pipelines provide flexibility in how data is read from the source and written to the target. Depending on the data processing requirements, users can select specific modes for reading and saving data. Below are the details of the available modes:
Read Modes (Source)
When configuring a pipeline to read data from the source, there are two read modes available:
ALL Mode:
This mode reads all records from the source every time the pipeline is executed.
Use this mode when you want to process the entire dataset from the source regardless of whether any changes have occurred since the last run.
Suitable for scenarios where all records need to be reprocessed regularly or where changes in the source dataset are not easily trackable.
Key-Based Increment Mode (KeyBaseIncrement):
This mode only reads records from the source that have changed since the last execution based on a unique key or timestamp field.
It helps to optimize performance by only processing incremental data, reducing the time and resources needed to process large datasets.
Suitable for scenarios where only new, updated, or deleted records since the last execution need to be processed.
Users will need to define a key or timestamp column that identifies which records are new or modified.
Save Modes (Target)
When writing data to the target, there are three save modes available:
Append Mode:
This mode appends the new records to the existing data in the target.
Use this mode when you want to add new data to the target without altering the existing records.
Suitable for use cases where historical data is retained and new records need to be appended regularly.
Overwrite Mode:
This mode overwrites all existing data in the target with the new data being processed.
Use this mode when you need to completely replace the target data with the latest dataset from the source.
Suitable for scenarios where data in the target should always reflect the most up-to-date version of the source data without retaining any older records.
Incremental Mode:
This mode updates the target with only the new or modified records from the source.
It updates existing records in the target or inserts new ones without removing any unmodified records.
Suitable for use cases where the target data needs to stay synchronized with changes in the source, such as updating customer records or inventory lists, while keeping previous data intact.
Summary of Pipeline Modes :-
Mode | Description | Use Case |
---|---|---|
ALL Mode (Read) | Reads all records from the source. | Reprocessing the entire dataset every time the pipeline is executed. |
Key-Based Increment (Read) | Reads only new or modified records based on a key or timestamp. | Processing incremental changes in the dataset since the last run. |
Append Mode (Save) | Appends new records to the existing data in the target. | Adding new records without affecting existing data in the target. |
Overwrite Mode (Save) | Overwrites all existing data in the target with the new dataset. | Replacing the target data completely with fresh data from the source. |
Incremental Mode (Save) | Updates the target with only new or modified records without deleting older data. | Keeping the target synchronized with the source by processing only changes since the last execution. |
By choosing the appropriate read and save modes, users can optimize the performance and behavior of their pipelines, ensuring the most efficient and accurate data processing based on their specific use cases.
Note :- You can view the scenarios where you can apply the appropriate read and save modes based on your specific needs. For detailed examples, please refer to the Scenarios for Using Read and Save Modes in Data Pipelines