NEW: Check our new hosted service COZYROC Cloud ...
Sort Plus
Overview
The Sort Plus Component is an SSIS Data Flow asynchronous transformation component, which is capable of sorting very large amounts of data. It uses a sophisticated proprietary algorithm that employs smart chunking and caching of data on disk. It enables an optimal utilization of your server's resources, by doing all processing in parallel, using a configurable number of threads.
With smaller amounts of data (e.g. well below 1GB), its speed is similar to the speed of the built-in Sort component. However, unlike the built-in Sort component which usually hangs when trying to process big volumes of data (e.g. more than 2 GB), our Sort Plus component completes its job and does so in a reasonable time.
Unlike the built-in Sort transformation, Sort Plus supports multiple inputs. The first input is used as a basis for selecting output columns and sorting keys. The others must have the same structure (i.e. columns with the same names and data types). For optimal throughput, the data from all inputs is processed in parallel.
Additionally, unlike the built-in component, large text columns (i.e. DT_TEXT and DT_NTEXT) can be part of the sorting key. Please note, that in such scenarios, you should expect decreased performance compared to using a fixed size string column (i.e. DT_STR or DT_WSTR).
Aside, from being able to process much larger amounts of data, the functionality of COZYROC’s Sort Plus component is very similar to the built-in one.
Parameters
General
Setting this option to true would output only one row for all rows that have equal sorting keys. As the processing is parallellized, this row wouldn't necessarily be the first encountered one. The default is false.
Use this parameter to specify the maximum amount of memory (in Megabytes) that the component is allowed to use . The default is 64MB. Please note, that as the size of the input rows can vary depending on the data, there might be some small fluctuations in the actual memory usage, so this parameter will be used as a guidance and not a strict limit.
Use this parameter to specify the maximum number of concurrently running sorting sub-tasks. Each chunk of data (i.e. pipeline buffer) gets processed in separate thread that performs a calculation-heavy sub-task. Leaving the default value -1 would instruct Sort Plus to use the number of logical threads (i.e. the number of cores multiplied by the number of hyperthreads per core available). Setting the parameter value to a number higher than the number of logical threads is not recommended.
Dealing with the data in chunks and offloading already sorted ones to disk is a key functionality of Sort Plus. Use this parameter to specify a folder where you want temporary (cache) files to be stored. If left empty, the system’s default temporary directory will be used.
Columns
The columns / sorting key setup page is quite similar to the one used by the built-in Sort component.
Each column can be selected as part of the sorting key and/or (de)selected to be included in the output. The latter is controlled with the Pass Through checkbox. Of course, all columns that are part of the sorting key must be sent to the output. The default is to pass through all of the columns.
The sorting key configuration table allows you to: change the sorting key position (Sort Order) for each column (Input Column), set a different name for it (Output Alias) and specify the direction of sorting (Sort Type). Additionally, if any of the selected sort columns are of string/text type, the Comparison Flags may be useful.
Outputs
The component has a single output which, by default, would contain all the columns. Any columns deselected in the Pass Through configuration, will be excluded in the output. The names of the output columns can be modified using Output Alias.
What's New
- New: Introduced component.
Ready to give it a try?
COZYROC SSIS+ Components Suite is free for testing in your development environment.