Parquet Source

Overview

Parquet Source Component is SSIS Data Flow Component for retrieving data from Apache Parquet file that supports multiple outputs via the composite records pattern.

Supports reading the Apache Parquet files.
Component metadata is automatically retrieved from the provided Parquet file.
Supports the following Parquet sources: File and Variable.
Supports composite outputs. Besides the root Parquet Source Output that contains the top-level fields, for any nested arrays, corresponding composite outputs get populated.
Supports an error output for redirecting problematic records (in case of a failure processing the field values).

Quick Start

Consume data from Parquet file

In this section we will show you how to set up a Parquet Source component.

In the SSIS Toolbox, locate the COZYROC's Parquet Source component and drag it onto the Data Flow canvas.

Double-click on the component on the canvas.
Once the component editor opens, select the fIle connection from File Connection menu. If the file is in the correct format, its data schema will be visualized in the Schema text editor. Note that the schema can be entered manually in the editor and the corresponding metadata will be generated even without a file currently available.

When clicking on Columns tab the component would prepare the outputs and external columns by analyzing the existing data in the Schema text editor. Please note that the Parquet Source can have multiple outputs (see the article about composite records), which columns you can see. The data in these outputs can be processed by downstream transformation and destination components(e.g. multiple OLE DB Destinations can store the data in SQL Server database).
Click OK to close the component editor.

Congratulations! You have successfully configured the Parquet Source component.

Contribute

Parameters

Configuration

Use the parameters below to configure the component.

Source
2.2 SR-2

Indicates the source of Parquet data. The following options are available:

Value Description

File Select an existing File Connection Manager or create a new one.

Variable The Parquet data is available in a variable. Select a variable or create a new one.
Variable
2.2 SR-2

A variable that contains Parquet data.
Schema

JSON string representing the schema of the Parquet file.
In 2.3 release new schema format is introduced. For more information visit Components Metadata Schema.

Value	Description
File	Select an existing File Connection Manager or create a new one.
Variable	The Parquet data is available in a variable. Select a variable or create a new one.

Knowledge Base

Where can I find the documentation for the Parquet Source?

What's New

2.3

New: Improved flat data processing speed.

2.2 SR-2

New: Support for stream input.

2.1 SR-1

Fixed: Incorrect lower-case headers (Thank you, Romain).
Fixed: Failed with error "System.IO.EndOfStreamException: Unable to read beyond the end of the stream." (Thank you, Naveen).
New: Considerable performance improvements.
Fixed: Data mismatch when reading from certain files (Thank you, Jessica).

2.1

New: Introduced component.

SSIS+ Components Suite

SSIS NoW

Excel Add-in for SAS

Overview

Quick Start

Parameters

Configuration

Knowledge Base

What's New

Newsletter

Contact Us

Follow Us

Support

SSIS+ Components Suite

SSIS NoW

Excel Add-in for SAS

Overview

Quick Start

Parameters

Configuration

Knowledge Base

What's New

Related documentation