Scenarios for Using Read and Save Modes in Data Pipelines
1. ALL Read Mode + Append Save Mode
Scenario:
You are working with a pipeline that processes daily sales data from a retail database, and you want to keep adding the sales data from each day to a central repository that tracks all sales history.
ALL Read Mode: Each day, the pipeline reads the entire sales dataset from the source, including sales data from previous days, even though only the new day’s data may be relevant.
Append Save Mode: When writing to the target, the new sales records (from the entire dataset) are added to the existing records in the target without removing any previous data.
Use Case:
Historical Data Tracking: You want to accumulate and track all sales over time. Even though the source provides the entire dataset, the append mode ensures that nothing is overwritten and all data is retained.
Data Collection: Scenarios where the system lacks proper mechanisms for incremental reading (e.g., no timestamp or unique key), but you still need to add new data to the existing records in the target.
2. ALL Read Mode + Overwrite Save Mode
Scenario:
You manage a product catalog, and every day, your pipeline reads the entire product database from your system, which includes daily price updates, stock availability, and other product information. Your target is a website that needs to display the latest product information.
ALL Read Mode: The pipeline reads the complete product list from the source every time, regardless of whether there are any changes in the product data.
Overwrite Save Mode: The pipeline then replaces all existing data in the target with the newly read dataset, ensuring that the target (e.g., website or app) always reflects the latest product information.
Use Case:
Up-to-Date Data: You need to ensure that your target always reflects the most current version of the source data, such as in a product catalog, where all data (including past and outdated entries) must be removed and replaced with the new dataset.
Static, Complete Data Sets: Scenarios where the source data must be refreshed in its entirety, and the previous data in the target becomes irrelevant after the new data is written.
3. ALL Read Mode + Incremental Save Mode
Scenario:
You have a customer database that contains all customer information, including both current and historical data. The pipeline reads the entire dataset daily to capture changes (new customers, updates to existing customers) and write only these changes to the target, which is a CRM system.
ALL Read Mode: The pipeline reads the entire customer dataset from the source, including all historical records.
Incremental Save Mode: The pipeline updates the target with only the new or modified customer records. Unmodified records in the target are left intact.
Use Case:
Synchronizing Changes: Ideal for scenarios where data changes frequently but only a portion of the dataset is updated each day, such as a customer or employee database where new entries or updates (e.g., contact info) need to be synchronized, while retaining older information.
Change Tracking: The source system doesn’t track changes on its own, so you need to read the entire dataset but only write the necessary updates to the target.
Summary of Scenarios:
Combination | Scenario | Use Case |
---|---|---|
ALL Read Mode + Append Save Mode | Reading all sales data daily and appending to a historical sales repository. | Accumulating historical data without losing any previous records. |
ALL Read Mode + Overwrite Save Mode | Reading the entire product catalog and replacing outdated information on a website with the latest data. | Ensuring the target always reflects the most up-to-date version of the source data, replacing all old records. |
ALL Read Mode + Incremental Save Mode | Reading the full customer database daily but only updating or adding new customer records in a CRM system. | Synchronizing changes with the target, retaining unmodified records while updating new or modified ones. |
By selecting the appropriate combination of ALL read mode and the various save modes, you can handle a wide range of data processing needs, from tracking historical data to maintaining up-to-date systems and efficiently handling incremental updates.