
Overview
Table difference is SSIS Data Flow Component designed to find differences between two tables and in particular simplify the management of slowly changing dimensions. The main advantage over Microsoft standard SCD component is the speed and ease of use.
Demonstration
Benchmark
The component was tested with 5.5 million records and 25 comparison columns on dual core AMD machine with 4GB RAM and standard SATA disk. The comparison process was complete in 7 minutes. The complexity of the algorithm is linear, so you can expect 10 million records processing to complete in approximately 15 minutes with the same hardware.
Introduction
One question arising very often in the Data Warehouse programming is "What are the differences between two tables?" You receive several million customers and should decide what changes need to be done. The standard SSIS toolkit provides SCD component for handling this process, but it is of no practical use when you have to handle more than a couple of thousand records. The Table Difference component doesn't have this limitation.
The component has two inputs: one for "old" and one for "new data". The old and new rows are compared and the result is sent to different outputs:
- Unchanged rows (same in both old and new)
- Deleted rows (appear in old but not in new)
- New rows (appear in new but not in old)
- Updated rows (appear in both old and new, but there is a change)
The diagram below illustrates the component functionality better than thousand words:

Occasionally you may have a requirement to perform different updates, for different columns. The component includes option to tag each input column with identifier. Then for each unique identifier, a separate update output is created and maintained. In this way you can tag with "10" the historical attributes and with "20" the other columns and decide what kind of operation to carry on with the different updates.
The component inputs MUST be sorted and have a collection of fields (keys) that let the component decide when two rows from the inputs represent the same row. This is easily accomplished in SQL with a simple “order by” and a convenient index. Usually slowly changing dimensions maintain an index by the business key, so the sorting requirement is easily accomplished and do not represent a problem.
The Input
The component has two inputs:
- Old Data Flow - attached to source returning existing rows.
- New Data Flow - attached to source returning new rows.
Each input has a buffer where all incoming rows are directed and maintained. A separate thread starts as soon as data from both buffers is available and checks for differences between them. The rows are sent to the appropriate output and then removed from the buffers. The component does not need to maintain all the data in memory. The data is removed as soon as it can be compared with corresponding row. For some obscure reason SSIS allocates more processing time to one of the input data flows, compared to the other. This results in additional memory consumption by the component. For this reason, if you have to compare more than half a million records, we recommend you use Flow Synchronization component in your data flow.
The Output
The component has three standard outputs for New, Deleted and Unchanged rows. For each unique user-defined UpdateID identifier, a separate Updated rows output is created. The outputs are named "UpdateID", followed by the identifier.
You can easily change the name of Update output to something more interesting like "Updated historical attribute" and we encourage you to do so. The collection of output fields is computed by the component via the intersection of the two inputs. If one column appears in only one input, it is not managed. If it appears in both, then it will be processed.
Input Setup
The component analyzes the old and new columns. If column name and type match, it is added to the list of available columns to manage. If the inputs are sorted, their sort columns will be marked as key fields, using the same order in which they appear in the sort. All other non-key columns are assigned a standard Update ID 10 and are managed as comparable columns.

- Check
-
Select input column processing type. This property has the options listed in the following table.
Value Description Key Field Column will be used to detect if old and new row match. The inputs must be sorted by this column. Compare Column will be compared to detect difference. Use New Column will be copied without checking from New Data Flow input to output. Use Old Column will be copied without checking from Old Data Flow input to output
- KeyOrder
-
Specify "Key Field" column order. This is the order in the query's "Order By" clause. The component does not have the correct order, so it is up to you to provide this information.
- UpdateID
-
Specify column update identifier. For each unique identifier, a separate update output is created. The lowest update identifier has the highest priority. For example let say we have AccountNumber column with update identifier 10 and AddressLine1 column with update identifier 20. If AccountNumber is different, the row will be directed to update output 10, even if AddressLine1 also contains difference.
Outputs Setup
Specify outputs name and description. If you are not interested for example in Unchanged output, then you can deselect it and avoid warnings for unused columns.

Output columns
Specify output columns options.

- Select output
-
Select output whose columns you want to modify.
- Sel
-
Select or deselect output column. Deselecting unused column removes warnings and increases the performance of component.
- Output Column Name
-
Specify custom output column name.
Miscellaneous
Specify string comparison options.

- Culture Identifier
-
Specify culture used to perform comparison. If not specified, component will use the culture of the running task.
- Ignore Case
-
Specify if you want to use case-insensitive comparison.
Warnings
This tab will contain all unused input columns. By checking the warnings you can see if the component is working with all needed columns.
CozyRoc

