Component

SharePoint Destination

Overview

SharePoint Destination Component is SSIS Data Flow Component for loading data into Microsoft SharePoint list. The configuration options are provided in the standard data flow destination component dialog.

Setup

Use the options below to setup the component.

Options

SHAREPOINT

Select an existing SharePoint connection manager.

Related Topics: SharePoint Connection Manager

Action

Select destination list action. This property has the options listed in the following table.

Action Description
Create Create new item in the destination list.
Update Update existing item in the destination list. You have to specify the appropriate list item identifier for the update to work.
Delete Delete existing item from the destination list. You have to specify the appropriate list item identifier for the delete to work.

Note: After action change, you have to use Refresh command in the 'Column Mappings' tab to reload destination list metadata.

DestinationList

Specify the destination SharePoint list where the data is loaded.

ErrorRowDisposition (SharePoint Destination Input)

Specify how to handle rows with errors.

How to modify destination list dynamically at runtime?

The SharePoint destination list is available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [SharePoint Destination].[DestinationList]. This is the property containing the destination list.
  • Scroll-down and find Expressions property. Setup expression to modify the statement dynamically.

SharePoint Source

Overview

SharePoint Source Component is SSIS Data Flow Component for consuming data from SharePoint Service. The configuration options are provided in the standard data flow source component dialog.

Setup

Use the options below to setup the component.

Options

SHAREPOINT

Select an existing SharePoint connection manager.

Related Topics: SharePoint Connection Manager

BatchSize

Specify the number of rows to be retrieved as a batch. The maximum is 5000.

InputList

Select SharePoint list.

InputQuery

Specify CAML query statement. This property is optional.

InputView

Select SharePoint list view to use for input. This property is optional

IncludeFolders

Specify to include the folders in the result list. The default is false.

IsRecursive

Specify to include sub-folders items in the result list. The default is false.

How to modify source dynamically at runtime?

The source list or query are available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [SharePoint Source].[InputList]. This is the property containing the input SharePoint list.
  • Scroll-down and find property named like [SharePoint Source].[InputQuery]. This is the property containing the CAML query.
  • Scroll-down and find Expressions property. Setup expression to modify the above properties dynamically.

Excel Destination

Overview

Excel Destination Component is SSIS Data Flow Component for inserting data into Microsoft Excel worksheets.

The component dialog contains the following elements:

  • General tab - specify destination Excel worksheet and options.
  • Columns tab - select columns to insert.
  • Error Output tab - specify error handling options.

Setup

Use the options below to setup the component.

General

Connection

Select CozyRoc Excel connection manager.

Related Topics: Excel Connection Manager

Worksheet

Select destination Excel worksheet.

Start Column

Specify destination start column.

Start Row

Specify destination start row.

First row has column names

Specify first row contains column names.

Clear

Specify to clear worksheet before inserting data.

Save At End

Specify to save workbook at the end.

Columns

Use the Columns page to select output columns.

Input Column

View input (source) columns in the order in which the task will write them. You can change this order by first clearing the selected columns in the table discussed above, and then selecting input columns from the list in a different order.

Output Alias

Provide a unique name for each output column. The default is the name of the selected input (source) column.

Error Output

Use the Error Output page to select error handling options.

Input or Output

View the name of the data source.

Error

Specify what should happen when an error occurs: ignore the failure, redirect the row, or fail the component.

Related Topics: Handling Errors in Data

Truncation

Specify what should happen when a truncation occurs: ignore the failure, redirect the row, or fail the component.

Description

View the description of the error.

How to modify destination dynamically at runtime?

The destination worksheet and start columns and rows are available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [Excel Destination Plus].[Worksheet]. This is the property containing the destination worksheet.
  • Scroll-down and find property named like [Excel Destination Plus].[StartColumn] or [Excel Destination Plus].[StartRow]. These are the properties containing the destination column and row.
  • Scroll-down and find Expressions property. Setup expression to modify the worksheet or region dynamically.

Excel Source

Overview

Excel Source Component is SSIS Data Flow Component for consuming data from Excel worksheets.

The component dialog contains the following elements:

  • General tab - specify source Excel worksheet and region.
  • Columns tab - select columns to load.
  • Error Output tab - specify error handling options.

Setup

Use the options below to setup the component.

General

Connection

Select CozyRoc Excel connection manager.

Related Topics: Excel Connection Manager

Worksheet

Select source Excel worksheet.

First row has column names

Specify first row contains column names.

Column

Specify source region column information.

  • Start - start column. The column is 1-based.
  • Width - number of columns.
Row

Specify source region row information.

  • Start - start row. The row is 1-based.
  • Height - number of rows. If set to -1, the component will read all used rows from worksheet.
Output

Specify output type. This property has the options listed in the following table.

Value Description
Formatted Value Output formatted cell value.
Raw Value Output raw cell value.
Formula Output cell formula.
Reset

Click Reset button to reinitialize component metadata.

Columns

Use the Columns page to map an output column to external (source) column.

Available External Columns

View the list of available external columns in the data source. You cannot use this table to add or delete columns.

External Column

View external (source) columns in the order in which the task will read them. You can change this order by first clearing the selected columns in the table discussed above, and then selecting external columns from the list in a different order.

Output Column

Provide a unique name for each output column. The default is the name of the selected external (source) column.

Error Output

Use the Error Output page to select columns error handling options.

Input or Output

View the name of the data source.

Column

View selected source column.

Error

Specify what should happen when an error occurs: ignore the failure, redirect the row, or fail the component.

Related Topics: Handling Errors in Data

Truncation

Specify what should happen when a truncation occurs: ignore the failure, redirect the row, or fail the component.

Description

View the description of the error.

How to modify source dynamically at runtime?

The source worksheet and region are available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [Excel Source Plus].[Worksheet]. This is the property containing the worksheet.
  • Scroll-down and find property named like [Excel Source Plus].[Region]. This is the property containing the source region. The property is a string combination containing values for start column, start row, width and height. Each value is separated with comma (,).
  • Scroll-down and find Expressions property. Setup expression to modify the worksheet or region dynamically.

Dynamics CRM Source

Overview

Dynamics CRM Source Component is SSIS Data Flow Component for consuming data from Dynamics CRM Service. The configuration options are provided in the standard data flow source component dialog.

Setup

Use the options below to setup the component.

Options

CRM

Select an existing Dynamics CRM connection manager.

Related Topics: Dynamics CRM Connection Manager

BatchSize

Specify the number of rows to be retrieved as a batch. The maximum is 500.

InputType

Specify input type. This property has the options listed in the following table.

Value Description
Entity Input type is Dynamics CRM entity. Selecting this value displays the dynamic option Entity.
FetchXML Input type is FetchXML query. Selecting this value display the dynamic option FetchXML.

Note: After input change, you have to use Refresh command in the 'Column Mappings' tab to reload source entity metadata.

Entity

Select Dynamics CRM data entity.

FetchXML

Specify FetchXML statement. Select to design your query in the standard Advanced Find dialog.
Note: Advanced Find is not available in CRM 3.0

How to modify source dynamically at runtime?

The source entity or query are available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [Dynamics CRM Source].[FetchXML]. This is the property containing the query.
  • Scroll-down and find Expressions property. Setup expression to modify the query dynamically.

Flow Synchronization

Overview

Flow Synchronization is SSIS Data Flow Component, which makes two or more data flows run at the same speed. This is accomplished by stopping one data flow if the others run too slow. It is convenient companion to Table Difference component, resolving problems with memory consumption. However it can be used on its own, in case you want data flows to run at the same speed. The component uses semaphores to handle synchronization, resulting in very efficient CPUs utilization.

Setup

Use the options below to setup Flow Synchronization component for processing.

Options

Name

Specify synchronization name.

Name from variable

Synchronization name is specified in package variable.

Max delta

Maximum number of rows of difference between data flows.

Table Difference

Overview

Table difference is SSIS Data Flow Component designed to find differences between two tables and in particular simplify the management of slowly changing dimensions. The main advantage over Microsoft standard SCD component is the speed and ease of use.

Demonstration

Benchmark

The component was tested with 5.5 million records and 25 comparison columns on dual core AMD machine with 4GB RAM and standard SATA disk. The comparison process was complete in 7 minutes. The complexity of the algorithm is linear, so you can expect 10 million records processing to complete in approximately 15 minutes with the same hardware.

Introduction

One question arising very often in the Data Warehouse programming is "What are the differences between two tables?" You receive several million customers and should decide what changes need to be done. The standard SSIS toolkit provides SCD component for handling this process, but it is of no practical use when you have to handle more than a couple of thousand records. The Table Difference component doesn't have this limitation.

The component has two inputs: one for "old" and one for "new data". The old and new rows are compared and the result is sent to different outputs:

  • Unchanged rows (same in both old and new)
  • Deleted rows (appear in old but not in new)
  • New rows (appear in new but not in old)
  • Updated rows (appear in both old and new, but there is a change)

The diagram below illustrates the component functionality better than thousand words:

Occasionally you may have a requirement to perform different updates, for different columns. The component includes option to tag each input column with identifier. Then for each unique identifier, a separate update output is created and maintained. In this way you can tag with "10" the historical attributes and with "20" the other columns and decide what kind of operation to carry on with the different updates.

The component inputs MUST be sorted and have a collection of fields (keys) that let the component decide when two rows from the inputs represent the same row. This is easily accomplished in SQL with a simple “order by” and a convenient index. Usually slowly changing dimensions maintain an index by the business key, so the sorting requirement is easily accomplished and do not represent a problem.

The Input

The component has two inputs:

  • Old Data Flow - attached to source returning existing rows.
  • New Data Flow - attached to source returning new rows.

Each input has a buffer where all incoming rows are directed and maintained. A separate thread starts as soon as data from both buffers is available and checks for differences between them. The rows are sent to the appropriate output and then removed from the buffers. The component does not need to maintain all the data in memory. The data is removed as soon as it can be compared with corresponding row. For some obscure reason SSIS allocates more processing time to one of the input data flows, compared to the other. This results in additional memory consumption by the component. For this reason, if you have to compare more than half a million records, we recommend you use Flow Synchronization component in your data flow.

The Output

The component has three standard outputs for New, Deleted and Unchanged rows. For each unique user-defined UpdateID identifier, a separate Updated rows output is created. The outputs are named "UpdateID", followed by the identifier.

You can easily change the name of Update output to something more interesting like "Updated historical attribute" and we encourage you to do so. The collection of output fields is computed by the component via the intersection of the two inputs. If one column appears in only one input, it is not managed. If it appears in both, then it will be processed.

Input Setup

The component analyzes the old and new columns. If column name and type match, it is added to the list of available columns to manage. If the inputs are sorted, their sort columns will be marked as key fields, using the same order in which they appear in the sort. All other non-key columns are assigned a standard Update ID 10 and are managed as comparable columns.


Check

Select input column processing type. This property has the options listed in the following table.

Value Description
Key Field Column will be used to detect if old and new row match. The inputs must be sorted by this column.
Compare Column will be compared to detect difference.
Use New Column will be copied without checking from New Data Flow input to output.
Use Old Column will be copied without checking from Old Data Flow input to output
KeyOrder

Specify "Key Field" column order. This is the order in the query's "Order By" clause. The component does not have the correct order, so it is up to you to provide this information.

UpdateID

Specify column update identifier. For each unique identifier, a separate update output is created. The lowest update identifier has the highest priority. For example let say we have AccountNumber column with update identifier 10 and AddressLine1 column with update identifier 20. If AccountNumber is different, the row will be directed to update output 10, even if AddressLine1 also contains difference.

Outputs Setup

Specify outputs name and description. If you are not interested for example in Unchanged output, then you can deselect it and avoid warnings for unused columns.

Output columns

Specify output columns options.


Select output

Select output whose columns you want to modify.

Sel

Select or deselect output column. Deselecting unused column removes warnings and increases the performance of component.

Output Column Name

Specify custom output column name.

Miscellaneous

Specify string comparison options.


Culture Identifier

Specify culture used to perform comparison. If not specified, component will use the culture of the running task.

Ignore Case

Specify if you want to use case-insensitive comparison.

Warnings

This tab will contain all unused input columns. By checking the warnings you can see if the component is working with all needed columns.

Lookup

Overview

Lookup Plus Component is SSIS Data Flow Component providing enhanced functionality compared to the standard SSIS Lookup component. To simplify your data flows, the component allows multiple lookup definitions in the same dialog and definition of values when the input cannot be matched or when it is undefined. In addition the Lookup Plus component supports variety of connection types as a source, making it quite versatile.

Setup

Use the options below to setup Lookup Plus component for processing.

Options

Name

Specify component name.

Description

Specify component description.

Lookup definition

The lookup definitions are setup in the lookup definition grid.


Input Column

Select input column for lookup key.

Reference

Setup lookup reference.

Related Topics: Reference Setup

Result Column.

Specify lookup result output column name.

On NULL

Specify output value when input is undefined.

No Match

Specify output value when input doesn't produce match.

Refresh

Use the refresh command to update the lookup definitions with the latest metadata.

Reference Setup

The reference setup dialog is used to create lookup source. Three source types are available: table or view, SQL query and package variable.


ConnectionType

Select connection type. This property has the options listed in the following table.

Value Description
OLEDB Use OLEDB Connection Manager for Connection option.
ADO Use ADO Connection Manager for Connection option.
ADO.NET Use ADO.NET Connection Manager for Connection option.
Connection

Select an existing connection manager.

New

Click New to create a connection manager.

Related Topics: OLE DB Connection Manager, ADO Connection Manager, ADO.NET Connection Manager

Table / View radio button

Select an existing table or view as lookup source.

SQL Query radio button

Specify SQL statement as lookup source.

From Variable radio button

Select an existing package variable as lookup source.

Note: When variable is of type String, the component will treat the variable value as SQL statement. Key Column and Value Column have to be provided. When variable is of type Object, the component will expect variable object to support IDictionary interface. You can use this feature to cache and reuse one and the same lookup in multiple places. Key Column and Value Column are not needed.

Key Column

Select lookup source key column.

Value Column

Select lookup source value column.

Match multiple columns with Lookup Plus

You can match multiple columns using Lookup Plus component. Check the steps below for one possible solution:

  1. Create a new data flow. This data flow will build lookup reference dictionary and store it in package variable.
  2. Insert data flow source component.
  3. Insert Checksum Transform script after the component from step 2 to create unique CRC32 value for the match input columns.
  4. Insert Dictionary Destination script after the component from step 3. For key select the input column containing the CRC32 value. For value specify the appropriate reference column. Store the result dictionary object in package variable.
  5. Create another data flow, which will do the lookup and matching.
  6. Insert data flow source with the data that has to be matched.
  7. Insert Checksum Transform script after the component from step 6 to create unique CRC32 value for the required input columns.
  8. Insert Lookup Plus component. As key specify the input column containing the CRC32 value. In the reference dialog select "From Variable" option. Set to the variable you have used in step 4.

Samples

  • For a sample how to use a custom-crafted range dictionary object from variable, check this package. The range dictionary object is constructed with this script.

Salesforce Destination

Overview

Salesforce Destination Component is SSIS Data Flow Component for loading data into Salesforce object. The configuration options are provided in the standard data flow destination component dialog.

Setup

Use the options below to setup the component.

Options

SFORCE

Select an existing Salesforce connection manager.

Related Topics: Salesforce Connection Manager

Action (1.4 SR-2)

Select destination object action. This property has the options listed in the following table.

Action Description
Create Create new record in the destination object.
Update Update existing record in the destination object. You have to specify the appropriate object record identifier for the update to work.
Delete Delete existing record from the destination object. You have to specify the appropriate object record identifier for the delete to work.
Upsert (1.4 SR-3) Update and insert record in the destination object. Selecting this value display the dynamic option ExternalId.

Note: After action change, you have to use Refresh command in the 'Column Mappings' tab to reload destination object metadata.

BatchSize

Specify the number of rows to be sent as a batch. The maximum is 200.

DestinationObject

Specify the destination Salesforce object where the data is loaded.

ExternalId (1.4 SR-3)

Specify field for external identifiers used in the upsert action.

ErrorRowDisposition (Salesforce Destination Input)

Specify how to handle rows with errors.

How to modify destination object dynamically at runtime?

The destination object is available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [Salesforce Destination].[DestinationObject]. This is the property containing the destination table.
  • Scroll-down and find Expressions property. Setup expression to modify the statement dynamically.

Salesforce Source

Overview

Salesforce Source Component is SSIS Data Flow Component for consuming data from Salesforce Service. The configuration options are provided in the standard data flow source component dialog.

Setup

Use the options below to setup the component.

Options

SFORCE

Select an existing Salesforce connection manager.

Related Topics: Salesforce Connection Manager

BatchSize

Specify the number of rows to be retrieved as a batch. The maximum is 2000.

InputType

Specify input type. This property has the options listed in the following table.

Value Description
Object Input type is Salesforce object. Selecting this value displays the dynamic options Object.
Statement Input type is SOQL statement. Selecting this value display the dynamic options Statement.

Note: After input change, you have to use Refresh command in the 'Column Mappings' tab to reload source object metadata.

Object

Select SalesForce data object.

Statement

Specify SOQL statement.

IncludeDeleted (1.4 SR-1)

Specify to include all records in result, including the deleted. This property has the options listed in the following table.

Value Description
True The deleted records are included in the result.
False The deleted records are not included in the result.

How to modify source dynamically at runtime?

The source object or statement are available in the data flow properties list. Follow these steps to setup expression:

  • Right-click on data flow canvas and select Properties menu.
  • Scroll-down and find property named like [Salesforce Source].[Statement]. This is the property containing the statement.
  • Scroll-down and find Expressions property. Setup expression to modify the statement dynamically.
Syndicate content

PASS Summit '10

November 8 - 11, 2010 Washington State Convention & Trade Center in Seattle, WA

CozyRoc will be exhibiting at the Premier Conference for SQL SERVER Professionals. Visit CozyRoc at booth #425 and say HI. Register with code CR3D and get $100 off.

CozyRoc SSIS+ has been nominated for the 2010 SQL Server Magazine Community Choice Awards, category "Best Business Intelligence & Reporting Product". Vote Now to help us win this prestigious award.