Azure, SSIS, SSIS+, ADF, ADFv2

Recently, Microsoft announced the SSIS support in their Azure Data Factory Version 2 (ADFv2 IR) Integration Runtimes. This made it possible to deploy a SSIS package on a Azure-hosted SQL Database, provision an ADFv2 IR and enjoy the benefits of a true integration-in-the-cloud experience: hassle-free setup, easy scalability, flexible pay-as-you-go pricing and more...

Unfortunately, custom SSIS components and tasks were not supported initially... Now they are!

Overview of the setup

Currently, in private preview, and soon-to-be-public, is a version of ADFv2 IRs capable of executing third party SSIS components. The setup is quite straightforward: one puts all installation files for the custom components pack (e.g. .msi files), along with a command file with the fixed name main.cmd in an Azure Storage Blob container, which is then referenced in the IR creation process with a Shared Access Signature URI. During each IR node's instantiation, this blob container's content is copied inside node, and the main.cmd file executed. As simple, as that.

Luckily for you, COZYROC made it even easier. We've wrapped all the necessary operations in simple commandlets and bundled them in a PowerShell module, called SSIS-Plus-Azure.

Provisioning the Integration Runtime

A few simple steps are required for provisioning the ADFv2 Integration Runtime with the full COZYROC SSIS+ components pack.

Step 1: If not done already, create a SQL Database on Azure for the catalog

The ADFv2 Integration Runtime expects an existing SQL Database to serve as a package catalog, as described in this ADFv2 IR creation tutorial from Microsoft. Provide the credentials in the configuration file, as illustrated in Step 4.

Step 2: Create an Azure Storage Blob container for installation files

Using either the free Microsoft Azure Storage Explorer tool, or the Azure web portal. Then, put the name of this container in the configuration file, as described in Step 4.

Step 3: Install/Update the Azure PowerShell

We're currently relying on a private version of the shell modules, but it'll become public soon.

Step 4: Encapsulate your configuration in a JSON file

Include all parameters needed for performing the whole process: creating a data factory, verifying the connection to the DB (from step 1) and creating the IR. The file looks like this:

The last parameter is the name you have used to obtain your COZYROC license (via the COZYROC license app). All other properties are related to the actual IR creation. These are loaded like this:

Step 5: Publish required files in the Azure Storage Blob container

This can be done manually with the proper Microsoft tools, or using the PSH commandlets we provide, like this:

The latter commandlet uploads the latest SSIS+ Installer. The other file which is needed is main.cmd which gets executed as part of the IR provisioning. Because it has both common code for installing COZYROC's SSIS+ and custom lines dependent on the username, we've provided an easy way to fill in the Licensee information automatically. This line...

... replaces the {{Licensee}} entries in the provided template, based on the fact that our JSON configuration has the Licensee property filled in, and directly uploads the result to the Azure Storage Blob. Keep in mind that an intermediate main.cmd is created.

msiexec /i SSISPlus-x64.msi /quiet /l %CUSTOM_SETUP_SCRIPT_LOG_DIR%\install.log

"C:\Program Files (x86)\CozyRoc\SSIS\License.2017.exe" /azure /v /u {{Licensee}} /d /l "%CUSTOM_SETUP_SCRIPT_LOG_DIR%\license.log"


Step 6: Provision the Integration Runtime

This can be achieved with a single command:

Ready! The Data Factory and the Integration runtime are created. If one already has ADFv2 created and just wants to provision a new IR, this is easily done - just omit the -CreateDF switch.

And, creating an IR with the same parameters, but a different name is also very easy:

Step 7: Start/Stop the Integration Runtime

Once the IR is provisioned the only step left is to start it. Keep in mind that it takes time - between 20 and 30 minutes.

Stopping is just as easy, but much quicker:

Caveats and troubleshooting

There is no guarantee that this easy provisioning of ADFv2 IR will not have problems and, unfortunately, the error messages are not extremely helpful. The good news is that both the standard output and the error streams are logged during main.cmd execution, and can be inspected later. They can be found in the provided Blob container, in the folder named main.cmd.log, which has a corresponding sub-folder for each of the nodes. Each of these per-node folders will have four files: stderr.log, stdout.log, install.log and license.log. However, if some of the steps in the installation script fail, then the IR will not be started and no log files are provided. There will be no trace of the attempted IR node instantiation.

Package deployment and execution

After an ADFv2 Integration Runtime instance is started, deploying and running a package is very straightforward. Actually, it is no different from running an SSIS package from a catalog in an on-premise SQL Server. One must provide the Azure SQL Database's credentials for deploying the package, and later, using SQL Server Management Studio, for example, connect to this SQL Database instance and trigger the package execution.

The ADFv2 Integration Runtime can access other web services (e.g. common REST APIs), but, of course, getting access to some on-premise data needs additional VNet setup, like the example given here for Point-to-Site VNet configuration.